Data Science Dojo
Copyright (c) 2019 - 2020


Level: Advanced
Recommended Use: Classification Models
Domain: Business/Finance

Default of Credit Card Clients Data Set

Estimate the probability of Default


This advanced level data set has 30000 rows and 24 columns. The data set could be used to estimate the probability of default payment by credit card client using the data provided. This data set is recommended for learning and practicing your skills in exploratory data analysis, data visualization, and classification modelling techniques. Feel free to explore the data set with multiple supervised and unsupervised learning techniques. The Following data dictionary gives more details on this data set:


Data Dictionary

Column Position Atrribute Name Definition Data Type Example % Null Ratios
1 X1: LIMIT_BAL Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit Quantitative 50000, 320000, 40000 0
2 X2: SEX Gender (1 = male; 2 = female) Quantitative 1, 2 0
3 X3: EDUCATION Education (1 = graduate school; 2 = university; 3 = high school; 4 = others) Quantitative 1, 2, 3 0
4 X4: MARRIAGE Marital status (1 = married; 2 = single; 3 = others) Quantitative 1, 2, 3 0
5 X5: AGE Age (year) Quantitative 37, 29, 43 0
6 X6: PAY_0 History of past payment. The repayment status in September, 2005* Quantitative 0, 1, -1 0
7 X7: PAY_2 History of past payment. The repayment status in August, 2005* Quantitative 0, 2, -2 0
8 X8: PAY_3 History of past payment. The repayment status in July, 2005* Quantitative 0, -2, -1 0
9 X9: PAY_4 History of past payment. The repayment status in June, 2005* Quantitative 0, 2, 1 0
10 X10: PAY_5 History of past payment. The repayment status in May, 2005* Quantitative 1, -2, 1 0
11 X11: PAY_6 History of past payment. The repayment status in April, 2005* Quantitative 0, 1, -1 0
12 X12: BILL_AMT1 Amount of bill statement in September, 2005 (NT dollar) Quantitative 46990, 58267, 38257 0
13 X13: BILL_AMT2 Amount of bill statement in August, 2005 (NT dollar) Quantitative 48233, 59246, 38901 0
14 X14: BILL_AMT3 Amount of bill statement in July, 2005 (NT dollar) Quantitative 49291, 60184, 38103 0
15 X15: BILL_AMT4 Amount of bill statement in June, 2005 (NT dollar) Quantitative 28314, 58622, 36207 0
16 X16: BILL_AMT5 Amount of bill statement in May, 2005 (NT dollar) Quantitative 28959, 62307, 33138 0
17 X17: BILL_AMT6 Amount of bill statement in April, 2005 (NT dollar) Quantitative 29547, 63526, 31339 0
18 X18: PAY_AMT1 Amount of previous payment. Paid in September, 2005 (NT dollar) Quantitative 2000, 2500, 1700 0
19 X19: PAY_AMT2 Amount of previous payment. Paid in August, 2005 (NT dollar) Quantitative 2019, 2500, 1504 0
20 X20: PAY_AMT3 Amount of previous payment. Paid in July, 2005 (NT dollar) Quantitative 1200, 0, 1200 0
21 X21: PAY_AMT4 Amount of previous payment. Paid in June, 2005 (NT dollar) Quantitative 1100, 4800, 1500 0
22 X22: PAY_AMT5 Amount of previous payment. Paid in May, 2005 (NT dollar) Quantitative 1069, 2400, 1500 0
23 X23: PAY_AMT6 Amount of previous payment. Paid in April, 2005 (NT dollar) Quantitative 1000, 1600, 1000 0
24 Y: Default Payment Next Month Probability of Default. (1: Yes, 0: No) Quantitative 1, 0 0

*The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.

Acknowledgement

This data set has been sourced from the Machine Learning Repository of University of California, Irvine Default of Credit Card Clients Data Set (UC Irvine). The UCI page mentions the following publication as the original source of the data set:

Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480