S******y 发帖数: 1123 | 1 I am using logistic regression to model rare event, i.e.,
y=0 98.5%
y=1 1.5%
N= 11 million
I am thinking of over-sampling "y=1" observations to increase their
percentage from 1.5% to 10%. Then I will perform logistic regression.
Is this method valid? Will my estimates be biased?
Thanks. | A*******s 发帖数: 3942 | 2 see the discussion on
http://www.mitbbs.com/article_t/Statistics/31211743.html
【在 S******y 的大作中提到】 : I am using logistic regression to model rare event, i.e., : y=0 98.5% : y=1 1.5% : N= 11 million : I am thinking of over-sampling "y=1" observations to increase their : percentage from 1.5% to 10%. Then I will perform logistic regression. : Is this method valid? Will my estimates be biased? : Thanks.
| j*****e 发帖数: 182 | 3 You can sure oversample the rare event. This is known as the case-control
study.
But, only the slope estimate is meaningful, the intercept estiamte is not.
Unless you know the marginal probability of the rare event, you wouldn't be
able to predict the binary outcome.
There are a little bit of discussion given by Agresti's book. You can also
read Hosmer and Lemoshow's Applied logistic regression for more. | j*m 发帖数: 190 | 4 as my limited experience, the prediction might not be valid(I mean, bias...)
since the incidence rate is rare. |
|