由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - sample size vs. number of regressors
相关主题
model和variables都sig.但每个category都不sigrandom sampling in R
请教一个问题请教如何用SAS处理这个RANDOM SAMPLING的问题
求高手指教,这个随机过程是Winner Process吗?logistic regression结果释疑,解读
regression problem - go confused如何在一个regression model里面同时处理continuous和categorical变量
KS 的问题为啥做了segmentation后模型fit更差?
这样还能算Randomized sample吗今天和一个阿三聊segmented logistic regression
我用neural net做的model效果还不如logitstic regressionR里面用predict()的问题
sampling weight variable怎么用到linear regression里啊?急问大家一个问题,关于F test 和T test关系的
相关话题的讨论汇总
话题: regressors话题: number话题: sample话题: xn
进入Statistics版参与讨论
1 (共1页)
f**n
发帖数: 401
1
Consider multiple regression with independent variables x1...xn and
dependent variable y.
Suppose I have 12,000 observations. I randomly split the data into training
(70%) and testing(30%).
N, the total number of candidate variables is around 50. That is, I can in
the worst case fit my model to be:
model: y = x1, ... xn
In this case my adjusted R-square is around 60%
Based on business rules, I can segment the data into smaller pieces, e.g.,
each segment has 500, 200 or even 100 observations. If
d*******o
发帖数: 493
2
How about do a lift curve and have a look?
f**n
发帖数: 401
3
I do not know how lift curve can be done in my case: my problem is a
multiple regression, not a logistic one.
If I understand you correctly, I should really use hold-out validation data
to measure how the model works.

【在 d*******o 的大作中提到】
: How about do a lift curve and have a look?
l***a
发帖数: 12410
4
I think first a power analysis needs to be done to decide the minimum sample
size, I am sure you know it :) Then, I think if you pay real attention to
take care of the multicollinearity and the number of selected predictors, it
will give you a very good chance to avoid overfitting. But remember there
is a rule of thumb that on average one predictor should have at least 10 obs
. Although I don't practically keep this rule all the time, it's still good
to keep it in mind.

training

【在 f**n 的大作中提到】
: Consider multiple regression with independent variables x1...xn and
: dependent variable y.
: Suppose I have 12,000 observations. I randomly split the data into training
: (70%) and testing(30%).
: N, the total number of candidate variables is around 50. That is, I can in
: the worst case fit my model to be:
: model: y = x1, ... xn
: In this case my adjusted R-square is around 60%
: Based on business rules, I can segment the data into smaller pieces, e.g.,
: each segment has 500, 200 or even 100 observations. If

1 (共1页)
进入Statistics版参与讨论
相关主题
急问大家一个问题,关于F test 和T test关系的KS 的问题
急问高手,怎样在SAS实现logistic regression里independent variable重要性排序?这样还能算Randomized sample吗
急问:请教一个muliticollinearity的面试问题,谢谢!我用neural net做的model效果还不如logitstic regression
请教一个相关性分析(correlation)的问题sampling weight variable怎么用到linear regression里啊?
model和variables都sig.但每个category都不sigrandom sampling in R
请教一个问题请教如何用SAS处理这个RANDOM SAMPLING的问题
求高手指教,这个随机过程是Winner Process吗?logistic regression结果释疑,解读
regression problem - go confused如何在一个regression model里面同时处理continuous和categorical变量
相关话题的讨论汇总
话题: regressors话题: number话题: sample话题: xn