####1) 使用方法
python scripts/stats.py 输入dataframe所在csv 输出dataframe所在csv 待分析特征变量
####2) 使用例子
python scripts/stats.py 'd:\age.csv' 'd:\output.csv' 'age'
####3) 参数解释
输入dataframe
UserId | age | |
0 | 27226838 | 22 |
0 | 17126415 | 34 |
0 | 47526334 | 56 |
... |
输出dataframe
age | cnt_rec | cnt_target | %target | %cnt_rec | %cnt_target | %cum_cnt_rec | %cum_cnt_target | cnt_nontarget | %cum_nontarget | %cum_target-%cum_nontarget |
18 | 110 | 9.0 | 8.18% | 0.36% | 0.53% | 0.36% | 0.53% | 101.0 | 0.35% | 0.18% |
... |
####1)使用方法
python scripts/woe.py 输入dataframe所在csv 待分析特征变量 分段表达式(用逗号连接) y变量
####2)使用例子
python scripts/woe.py "age.csv" "age" "20,30,45" "is_dft"
####3)参数解释
输入dataframe
UserId | age | is_dft | |
0 | 27226838 | 22 | 1 |
0 | 27242238 | 21 | 0 |
... |
输出结果
class | good | bad | %good | %bad | all | woe | iv | |
0 | (0,20.0] | 76 | 1519 | 4.76% | 95.24% | 1595 | -6.34584 | 0.048765 |
1 | (20,30] | 895 | 17129 | 4.97% | 95.03% | 18024 | -8.75549 | 0.561679 |
2 | (30,45] | 673 | 10021 | 6.29% | 93.71% | 10694 | 7.31007 | 0.372628 |
3 | (45...) | 75 | 869 | 7.94% | 92.06% | 944 | 2.57974 | 0.036832 |
4 | NA | 0 | 0 | nan% | nan% | 0 | NaN | 0.000000 |
1688 | 28819 | 5.53% | 94.47% | 30507 | 1.019905 |