 S******y发帖数: 1123 1I am using logistic regression to model rare event, i.e., y=0 98.5% y=1 1.5% N= 11 million I am thinking of over-sampling "y=1" observations to increase their percentage from 1.5% to 10%. Then I will perform logistic regression. Is this method valid? Will my estimates be biased? Thanks. A*******s发帖数: 3942 2see the discussion on http://www.mitbbs.com/article_t/Statistics/31211743.html 【在 S******y 的大作中提到】: I am using logistic regression to model rare event, i.e.,: y=0 98.5%: y=1 1.5%: N= 11 million: I am thinking of over-sampling "y=1" observations to increase their: percentage from 1.5% to 10%. Then I will perform logistic regression.: Is this method valid? Will my estimates be biased?: Thanks. j*****e发帖数: 182 3You can sure oversample the rare event. This is known as the case-control study. But, only the slope estimate is meaningful, the intercept estiamte is not. Unless you know the marginal probability of the rare event, you wouldn't be able to predict the binary outcome. There are a little bit of discussion given by Agresti's book. You can also read Hosmer and Lemoshow's Applied logistic regression for more. j*m发帖数: 190 4as my limited experience, the prediction might not be valid(I mean, bias...) since the incidence rate is rare.
