Logistic regression，一个validation 的问题 - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - Logistic regression，一个validation 的问题

相关主题
● 如何做ordinal logistic regression的validation？	● 如何evaluate对binomial distribution的预测模型
● 老板总问：C很好为啥gain chart的夹区面积那么小	● 紧急求助一个LOGISTIC REGRESSION 问题.
● 请问如何验证已知的logistic regression models是不是能很好predict 自己的dataset	● 再问个SAS LOGISTIC REGRESSION的问题。
● 长周末了，还有人答疑么？关于CHI^2的。	● 这段R logistic regression code有没有问题？
● 请教Logistic regression的问题	● 有谁做过cross validation for multinomial logistic regression的？
● R-square of logistic regression	● [新手求救]怎样输出logistic regression的结果？
● 问一个biostatistician的面试问题	● 保险公司technical interview 会怎么问？
● How to test the difference between two C statistics （want the P	● 谁给说说marketing analysis主要做什么

相关话题的讨论汇总
话题: validation话题: dataset话题: model话题: logistic话题: regression

进入Statistics版参与讨论

(共1页)

c****s
发帖数: 63

现在完成了logistic regression的model在model-building data set,请问怎样用
validation data来validate我的结果是好的呢？因为logistic regression预测的结果
都是1或0。
还望大侠们多多指教，非常感谢。

s*********e
发帖数: 1051

statistician in different industry looks at different measures.
for risk modeling, the standard measures include but are not limited to KS
statistics, ROC, gini co-efficient, and divergence. for credit scoring, PDO
is also a measure for predictiveness.
for marketing, it is different story. they look at the lift at the top
decile.

l***a
发帖数: 12410

re

PDO

【在 s*********e 的大作中提到】

: statistician in different industry looks at different measures.
: for risk modeling, the standard measures include but are not limited to KS
: statistics, ROC, gini co-efficient, and divergence. for credit scoring, PDO
: is also a measure for predictiveness.
: for marketing, it is different story. they look at the lift at the top
: decile.

c****s
发帖数: 63

Sorry I didn't mention that.
In this model, Y is death(1/0) and Xs are a lot of dummy variables(
diagonoses).
So basically, it belongs to in biostatistics
还是有点糊涂，希望各位指点
我觉得应该用我得到的model在validation dataset 中建2 by 2 table to calculate
the accuracy。也就是说这个model在validation dataset中是不是也还算不错，如果
超过大概70%或80%，这个model就应该是不错了。
大家怎么看呢？谢谢！

S******y
发帖数: 1123

ROC curve would be a good tool since it plotts TPR against FPR at each level
of prob as cutting point.
Gains Chart is also widely used.

D******n
发帖数: 2836

zan...in school, i always use ROC, but in my company they use KS score...nev
er heard of it before....

PDO

【在 s*********e 的大作中提到】

c****s
发帖数: 63

Roc 我肯定是要算的，我的结果是0.76. 只是觉得一个Roc就说我的model可以了，是不
是还不能有说服力呢，何况我的Roc结果也不是特别好。所以，觉得应该还有其他方式
来验证一下吧

D******n
发帖数: 2836

i guess, it depends on the usage of the model.
namely your loss function.

【在 c****s 的大作中提到】

: Roc 我肯定是要算的，我的结果是0.76. 只是觉得一个Roc就说我的model可以了，是不
: 是还不能有说服力呢，何况我的Roc结果也不是特别好。所以，觉得应该还有其他方式
: 来验证一下吧

b*******r
发帖数: 152

compare to other benchmarks....
a single value of roc=0.76 rarely tells you anything. it could be very good
or very poorly in reality.

b*******r
发帖数: 152

what's divergence here? the separation on good/bad? thx!

PDO

【在 s*********e 的大作中提到】

相关主题
● R-square of logistic regression	● 如何evaluate对binomial distribution的预测模型
● 问一个biostatistician的面试问题	● 紧急求助一个LOGISTIC REGRESSION 问题.
● How to test the difference between two C statistics （want the P	● 再问个SAS LOGISTIC REGRESSION的问题。
进入Statistics版参与讨论

c****s
发帖数: 63

I don't know what the divergence is either. Hope somebody can answer that.
Also, could some one tell me whether KS statistics, gini co-efficient or
divergence can be used in logistic regression model, or say dichotomous
outcome model?

j*****e
发帖数: 182

This approach is correct. But you would need a relative large validation
data set.
KS statistic is not constructed to examine the goodness of fit of logistic
regression, no matter what people do in the real world.

calculate

【在 c****s 的大作中提到】

: Sorry I didn't mention that.
: In this model, Y is death(1/0) and Xs are a lot of dummy variables(
: diagonoses).
: So basically, it belongs to in biostatistics
: 还是有点糊涂，希望各位指点
: 我觉得应该用我得到的model在validation dataset 中建2 by 2 table to calculate
: the accuracy。也就是说这个model在validation dataset中是不是也还算不错，如果
: 超过大概70%或80%，这个model就应该是不错了。
: 大家怎么看呢？谢谢！
:

T*******I
发帖数: 5138

According to your first statement, you have built a logistic model with a
model-building dataset. And then, you mentioned you may have a validation
dataset. What you try to know is that if the model is good or not when it is
validated by the validation dataset. So, let me ask you several questions:
(1) What is the model-building dataset?
(2) What is the validation dataset?
(3) What is the relationship between the two datasets?
(4) Do the two datasets come from a same population?
(5) Is the v

【在 c****s 的大作中提到】

: 现在完成了logistic regression的model在model-building data set,请问怎样用
: validation data来validate我的结果是好的呢？因为logistic regression预测的结果
: 都是1或0。
: 还望大侠们多多指教，非常感谢。

T*******I
发帖数: 5138

One more opinion,
The so-called validation just means two situations:
1) the validation dataset comes from a same population as the model-buliding
dataset does if the model is good enough for the validation dataset thus
for the same population from which both the model-building dataset and the
validation dataset come;
2) or not, but this situation does not mean that the model is not good for
the population from which the model-building dataset comes. It may tell us
that the validation dataset ma

【在 T*******I 的大作中提到】

: According to your first statement, you have built a logistic model with a
: model-building dataset. And then, you mentioned you may have a validation
: dataset. What you try to know is that if the model is good or not when it is
: validated by the validation dataset. So, let me ask you several questions:
: (1) What is the model-building dataset?
: (2) What is the validation dataset?
: (3) What is the relationship between the two datasets?
: (4) Do the two datasets come from a same population?
: (5) Is the v

x*******i
发帖数: 1791

ROC,
还有那个ACFF和ACFFA（我记得是这么写啊，呵呵）

(共1页)

进入Statistics版参与讨论

相关主题
● 谁给说说marketing analysis主要做什么	● 请教Logistic regression的问题
● [R] ROC curve怎么指定cutoffs?	● R-square of logistic regression
● 123love@	● 问一个biostatistician的面试问题
● roc curve in R	● How to test the difference between two C statistics （want the P
● 如何做ordinal logistic regression的validation？	● 如何evaluate对binomial distribution的预测模型
● 老板总问：C很好为啥gain chart的夹区面积那么小	● 紧急求助一个LOGISTIC REGRESSION 问题.
● 请问如何验证已知的logistic regression models是不是能很好predict 自己的dataset	● 再问个SAS LOGISTIC REGRESSION的问题。
● 长周末了，还有人答疑么？关于CHI^2的。	● 这段R logistic regression code有没有问题？

相关话题的讨论汇总
话题: validation话题: dataset话题: model话题: logistic话题: regression

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天