6.3 Credit Cards

We have a dataset with information on ten thousand customers. The objective is to predict which customers will stop paying their credit card debt.


Fuente: James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York

Probabilidad de Impago: Datos
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
default No No No No No No No No No No No No No No No No No No No No
balance 729.53 817.18 1073.55 529.25 785.66 919.59 825.51 808.67 1161.06 0.00 0.00 1220.58 237.05 606.74 1112.97 286.23 0.00 527.54 485.94 1095.07
137 174 202 207 210 242 244 264 342 346 350 358 407 440 441 488 541 546 577 582
default Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
balance 1487.00 2205.80 1774.69 1889.60 1899.39 1572.86 1964.48 1530.35 1642.82 1991.65 1550.45 1328.89 1700.60 1118.70 1119.10 1981.45 1717.07 1465.21 1763.58 1770.97
9981 9982 9983 9984 9985 9986 9987 9988 9989 9990 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
default No No No No No No No No No No No No No No No No No No No No
balance 770.02 739.42 623.53 506.63 875.24 842.95 401.33 1092.91 0.00 999.28 372.38 658.80 1111.65 938.84 172.41 711.56 757.96 845.41 1569.01 200.92



Probabilidad de Impago: Resumen de Datos
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 default
[numeric]
Min : 0
Mean : 0
Max : 1
0 : 9667 (96.7%)
1 : 333 ( 3.3%)
0
(0.0%)
2 balance
[numeric]
Mean (sd) : 835.4 (483.7)
min < med < max:
0 < 823.6 < 2654.3
IQR (CV) : 684.6 (0.6)
9227 distinct values 0
(0.0%)



Using simple Linear Regression


Using Logistic Regression


\[ \log\left[ \frac { \widehat{P( \operatorname{default} = \operatorname{1} )} }{ 1 - \widehat{P( \operatorname{default} = \operatorname{1} )} } \right] = -10.65 + 0.01(\operatorname{balance}) \]
Observations 10000
Dependent variable default
Type Generalized linear model
Family binomial
Link logit
χ²(1) 1324.198
Pseudo-R² (Cragg-Uhler) 0.490
Pseudo-R² (McFadden) 0.453
AIC 1600.451
BIC 1614.872
Est. S.E. z val. p
(Intercept) -10.651 0.361 -29.492 0.000
balance 0.005 0.000 24.953 0.000
Standard errors: MLE



Exercise

  • Interpret the estimated coefficients
  • Is this a good model?