f**n 发帖数: 401 | 1 Consider multiple regression with independent variables x1...xn and
dependent variable y.
Suppose I have 12,000 observations. I randomly split the data into training
(70%) and testing(30%).
N, the total number of candidate variables is around 50. That is, I can in
the worst case fit my model to be:
model: y = x1, ... xn
In this case my adjusted R-square is around 60%
Based on business rules, I can segment the data into smaller pieces, e.g.,
each segment has 500, 200 or even 100 observations. If | d*******o 发帖数: 493 | 2 How about do a lift curve and have a look? | f**n 发帖数: 401 | 3 I do not know how lift curve can be done in my case: my problem is a
multiple regression, not a logistic one.
If I understand you correctly, I should really use hold-out validation data
to measure how the model works.
【在 d*******o 的大作中提到】 : How about do a lift curve and have a look?
| l***a 发帖数: 12410 | 4 I think first a power analysis needs to be done to decide the minimum sample
size, I am sure you know it :) Then, I think if you pay real attention to
take care of the multicollinearity and the number of selected predictors, it
will give you a very good chance to avoid overfitting. But remember there
is a rule of thumb that on average one predictor should have at least 10 obs
. Although I don't practically keep this rule all the time, it's still good
to keep it in mind.
training
【在 f**n 的大作中提到】 : Consider multiple regression with independent variables x1...xn and : dependent variable y. : Suppose I have 12,000 observations. I randomly split the data into training : (70%) and testing(30%). : N, the total number of candidate variables is around 50. That is, I can in : the worst case fit my model to be: : model: y = x1, ... xn : In this case my adjusted R-square is around 60% : Based on business rules, I can segment the data into smaller pieces, e.g., : each segment has 500, 200 or even 100 observations. If
|
|