c****s 发帖数: 63 | 1 现在完成了logistic regression的model在model-building data set,请问怎样用
validation data来validate我的结果是好的呢?因为logistic regression预测的结果
都是1或0。
还望大侠们多多指教,非常感谢。 |
s*********e 发帖数: 1051 | 2 statistician in different industry looks at different measures.
for risk modeling, the standard measures include but are not limited to KS
statistics, ROC, gini co-efficient, and divergence. for credit scoring, PDO
is also a measure for predictiveness.
for marketing, it is different story. they look at the lift at the top
decile. |
l***a 发帖数: 12410 | 3 re
PDO
【在 s*********e 的大作中提到】 : statistician in different industry looks at different measures. : for risk modeling, the standard measures include but are not limited to KS : statistics, ROC, gini co-efficient, and divergence. for credit scoring, PDO : is also a measure for predictiveness. : for marketing, it is different story. they look at the lift at the top : decile.
|
c****s 发帖数: 63 | 4 Sorry I didn't mention that.
In this model, Y is death(1/0) and Xs are a lot of dummy variables(
diagonoses).
So basically, it belongs to in biostatistics
还是有点糊涂,希望各位指点
我觉得应该用我得到的model在validation dataset 中建2 by 2 table to calculate
the accuracy。也就是说这个model在validation dataset中是不是也还算不错,如果
超过大概70%或80%,这个model就应该是不错了。
大家怎么看呢?谢谢!
|
S******y 发帖数: 1123 | 5 ROC curve would be a good tool since it plotts TPR against FPR at each level
of prob as cutting point.
Gains Chart is also widely used. |
D******n 发帖数: 2836 | 6 zan...in school, i always use ROC, but in my company they use KS score...nev
er heard of it before....
PDO
【在 s*********e 的大作中提到】 : statistician in different industry looks at different measures. : for risk modeling, the standard measures include but are not limited to KS : statistics, ROC, gini co-efficient, and divergence. for credit scoring, PDO : is also a measure for predictiveness. : for marketing, it is different story. they look at the lift at the top : decile.
|
c****s 发帖数: 63 | 7 Roc 我肯定是要算的,我的结果是0.76. 只是觉得一个Roc就说我的model可以了,是不
是还不能有说服力呢,何况我的Roc结果也不是特别好。所以,觉得应该还有其他方式
来验证一下吧 |
D******n 发帖数: 2836 | 8 i guess, it depends on the usage of the model.
namely your loss function.
【在 c****s 的大作中提到】 : Roc 我肯定是要算的,我的结果是0.76. 只是觉得一个Roc就说我的model可以了,是不 : 是还不能有说服力呢,何况我的Roc结果也不是特别好。所以,觉得应该还有其他方式 : 来验证一下吧
|
b*******r 发帖数: 152 | 9 compare to other benchmarks....
a single value of roc=0.76 rarely tells you anything. it could be very good
or very poorly in reality. |
b*******r 发帖数: 152 | 10 what's divergence here? the separation on good/bad? thx!
PDO
【在 s*********e 的大作中提到】 : statistician in different industry looks at different measures. : for risk modeling, the standard measures include but are not limited to KS : statistics, ROC, gini co-efficient, and divergence. for credit scoring, PDO : is also a measure for predictiveness. : for marketing, it is different story. they look at the lift at the top : decile.
|
|
|
c****s 发帖数: 63 | 11 I don't know what the divergence is either. Hope somebody can answer that.
Also, could some one tell me whether KS statistics, gini co-efficient or
divergence can be used in logistic regression model, or say dichotomous
outcome model? |
j*****e 发帖数: 182 | 12 This approach is correct. But you would need a relative large validation
data set.
KS statistic is not constructed to examine the goodness of fit of logistic
regression, no matter what people do in the real world.
calculate
【在 c****s 的大作中提到】 : Sorry I didn't mention that. : In this model, Y is death(1/0) and Xs are a lot of dummy variables( : diagonoses). : So basically, it belongs to in biostatistics : 还是有点糊涂,希望各位指点 : 我觉得应该用我得到的model在validation dataset 中建2 by 2 table to calculate : the accuracy。也就是说这个model在validation dataset中是不是也还算不错,如果 : 超过大概70%或80%,这个model就应该是不错了。 : 大家怎么看呢?谢谢! :
|
T*******I 发帖数: 5138 | 13 According to your first statement, you have built a logistic model with a
model-building dataset. And then, you mentioned you may have a validation
dataset. What you try to know is that if the model is good or not when it is
validated by the validation dataset. So, let me ask you several questions:
(1) What is the model-building dataset?
(2) What is the validation dataset?
(3) What is the relationship between the two datasets?
(4) Do the two datasets come from a same population?
(5) Is the v
【在 c****s 的大作中提到】 : 现在完成了logistic regression的model在model-building data set,请问怎样用 : validation data来validate我的结果是好的呢?因为logistic regression预测的结果 : 都是1或0。 : 还望大侠们多多指教,非常感谢。
|
T*******I 发帖数: 5138 | 14 One more opinion,
The so-called validation just means two situations:
1) the validation dataset comes from a same population as the model-buliding
dataset does if the model is good enough for the validation dataset thus
for the same population from which both the model-building dataset and the
validation dataset come;
2) or not, but this situation does not mean that the model is not good for
the population from which the model-building dataset comes. It may tell us
that the validation dataset ma
【在 T*******I 的大作中提到】 : According to your first statement, you have built a logistic model with a : model-building dataset. And then, you mentioned you may have a validation : dataset. What you try to know is that if the model is good or not when it is : validated by the validation dataset. So, let me ask you several questions: : (1) What is the model-building dataset? : (2) What is the validation dataset? : (3) What is the relationship between the two datasets? : (4) Do the two datasets come from a same population? : (5) Is the v
|
x*******i 发帖数: 1791 | 15 ROC,
还有那个ACFF和ACFFA(我记得是这么写啊,呵呵) |