能用模型拟合或预测debt collection吗？ - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 能用模型拟合或预测debt collection吗？

相关主题
● 问个logistic model的面试问题	● how to estimate distance between two distributions? Thanks
● 讨论个问题，classification 的label 非常不平均	● 如何同时测试2000组数据是否正太分布
● [合集] 如何检验normality??	● 急请教一个问题：histogram分布的形状
● how to determine data fit some distribution? thanks	● 想要描述不同种类的random variables之间的correlation有可能吗？
● which nonparametric test shoud I do	● Kolmogorov-Smirnov Test Statistic
● Kolmogorov-Smirnov test for discrete distributions	● normality check
● qqplot和非参检验	● 请问 Kolmogorov–Smirnov (KS) statistic。
● 问个goodness of fit的问题	● Paired T-test Assumption not Satisfied

相关话题的讨论汇总
话题: regression话题: linear话题: model话题: 预测话题: bounded

进入Statistics版参与讨论

1

(共1页)

c*******7 发帖数: 2506	1 最近面了一个debt collect公司，应该是没戏了。不过面试的时候他们提到他们现有的模型之一是个logistic model，预测是、否还款，现在要拓展到预测还款的数额。我当时胡诌了一个logitudinal model的idea，用讨债电话作为一个时间变量，但是感觉那个数额应该不是正态分布的。。。请教下这里的高手，这种问题应该用什么特别的 model吗？
s*****l 发帖数: 321	2 just linear regression, condition on pay back. at most do some transformation on variables. 【在 c*******7 的大作中提到】 : 最近面了一个debt collect公司，应该是没戏了。不过面试的时候他们提到他们现有的 : 模型之一是个logistic model，预测是、否还款，现在要拓展到预测还款的数额。我当 : 时胡诌了一个logitudinal model的idea，用讨债电话作为一个时间变量，但是感觉那 : 个数额应该不是正态分布的。。。请教下这里的高手，这种问题应该用什么特别的 : model吗？
A*******s 发帖数: 3942	3 linear regression first. then see if there is anything violating the assumption, such as heterodascity of residuals, bounded or truncated outcomes, etc... 【在 c*******7 的大作中提到】 : 最近面了一个debt collect公司，应该是没戏了。不过面试的时候他们提到他们现有的 : 模型之一是个logistic model，预测是、否还款，现在要拓展到预测还款的数额。我当 : 时胡诌了一个logitudinal model的idea，用讨债电话作为一个时间变量，但是感觉那 : 个数额应该不是正态分布的。。。请教下这里的高手，这种问题应该用什么特别的 : model吗？
D******n 发帖数: 2836	4 这是统计里面最难念的一个单词 hetero-skedasticity 【在 A*******s 的大作中提到】 : linear regression first. then see if there is anything violating the : assumption, such as heterodascity of residuals, bounded or truncated : outcomes, etc...
A*******s 发帖数: 3942	5 哈，被你纠错了...我每次都在google搜索栏里打出来hetero然后选下拉提示框的第一个，这次被google坑了... 另外，KS这俩人的名字我也从来都记不住... 【在 D******n 的大作中提到】 : 这是统计里面最难念的一个单词 : hetero-skedasticity
c*******7 发帖数: 2506	6 这样啊。。。那如果是bounded outcome，比如说，用还款的数额百分比（0－100%）做 outcome，该如何做呢？【在 A*******s 的大作中提到】 : linear regression first. then see if there is anything violating the : assumption, such as heterodascity of residuals, bounded or truncated : outcomes, etc...
A*******s 发帖数: 3942	7 要是0和100附近的数据点非常稀疏的话，我觉得直接用linear regression应该问题不大。否则的话，你会看到residual vs. Yhat plot上点的分布两端被限制在0 <= yhat+ r <=100。解决方法不少，不过我不大清楚业界常用的是啥，说错了莫怪 1. beta regression，outcome continuous in (0, 1) 2. 看看proc qlim，有一堆econometrician搞出来的model 3. plus各种zero inflated/truncated mixture 【在 c*******7 的大作中提到】 : 这样啊。。。那如果是bounded outcome，比如说，用还款的数额百分比（0－100%）做 : outcome，该如何做呢？
y*****y 发帖数: 98	8 如果是[0,1] bounded outcome score, 最简单的直接transform到[-infty,infty]，然后fit linear regression. 复杂点但更好的做法有, ordinal regression (McCullagh, 1980)； binomial-logit- normal, coarsened data model (Lesaffre, 2007). 【在 c*******7 的大作中提到】 : 这样啊。。。那如果是bounded outcome，比如说，用还款的数额百分比（0－100%）做 : outcome，该如何做呢？
D******n 发帖数: 2836	9 KS = Kolmogorov Smirnov ? (错了告诉我哈。）这个挺好记的。让我纠结的是为啥业界老爱用这个。学术界没人用的。【在 A*******s 的大作中提到】 : 哈，被你纠错了...我每次都在google搜索栏里打出来hetero然后选下拉提示框的第一 : 个，这次被google坑了... : 另外，KS这俩人的名字我也从来都记不住...
v****0 发帖数: 1887	10 in this case, reliable data is more important than the choice of models. 【在 c*******7 的大作中提到】 : 最近面了一个debt collect公司，应该是没戏了。不过面试的时候他们提到他们现有的 : 模型之一是个logistic model，预测是、否还款，现在要拓展到预测还款的数额。我当 : 时胡诌了一个logitudinal model的idea，用讨债电话作为一个时间变量，但是感觉那 : 个数额应该不是正态分布的。。。请教下这里的高手，这种问题应该用什么特别的 : model吗？
A*******s 发帖数: 3942	11 为啥纠结呢？有啥更好的metric么？我的理解是其实overall AUC和(max) KS都没啥实际意义某个score bin的partial AUC和KS才有。【在 D******n 的大作中提到】 : KS = Kolmogorov Smirnov ? (错了告诉我哈。） : 这个挺好记的。让我纠结的是为啥业界老爱用这个。学术界没人用的。

1

(共1页)

进入Statistics版参与讨论

相关主题
● Paired T-test Assumption not Satisfied	● which nonparametric test shoud I do
● 如果比较两个curve的形状是不是类是, 用什么方法	● Kolmogorov-Smirnov test for discrete distributions
● 如何计算两个分布的相似度	● qqplot和非参检验
● normality test of a set of data?	● 问个goodness of fit的问题
● 问个logistic model的面试问题	● how to estimate distance between two distributions? Thanks
● 讨论个问题，classification 的label 非常不平均	● 如何同时测试2000组数据是否正太分布
● [合集] 如何检验normality??	● 急请教一个问题：histogram分布的形状
● how to determine data fit some distribution? thanks	● 想要描述不同种类的random variables之间的correlation有可能吗？

相关话题的讨论汇总
话题: regression话题: linear话题: model话题: 预测话题: bounded

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)