由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - regression prediction问题
相关主题
如何在一个regression model里面同时处理continuous和categorical变量问一个和regression analysis有关的问题
接着问统计问题(有包子答谢)问个multicollinearity 的问题,inSAS - fresh水平,谢谢帮助!
Correlated independent variable菜鸟请教一个问题
any regression model with high prediction accuracy?求教 如何出来很大的数据并且有很多variable
请教logistic regression的independent variable是categoricalClustering analysis with categorical variables
missing data questions也问个模型
a quesiton about random effect关于multiple imputation和variable selection的问题
一个covariance的问题How to code categorical time-varying covariates in Cox mod
相关话题的讨论汇总
话题: variable话题: prediction话题: regression话题: model
进入Statistics版参与讨论
1 (共1页)
l******n
发帖数: 9344
1
regression training data set里面,有个categorical variable只有3个level
需要prediction的data里面有一个data,这个categorical variable的值不再这3个
level里面,怎么做prediction?
谢谢
h***x
发帖数: 586
2
Two ways,
1) one way which is the safest method is not using this categorical variable
. :-)
2) the other way is building model using training dataset as it is, if the
variable(indicator) is significant, include it. When you apply the model to
the new data you mentioned, the indicator is 0 and will not affect
predictive results.
just my 2 cents,

【在 l******n 的大作中提到】
: regression training data set里面,有个categorical variable只有3个level
: 需要prediction的data里面有一个data,这个categorical variable的值不再这3个
: level里面,怎么做prediction?
: 谢谢

l******n
发帖数: 9344
3
Thanks, huxxx
both methods may sense.

variable
to

【在 h***x 的大作中提到】
: Two ways,
: 1) one way which is the safest method is not using this categorical variable
: . :-)
: 2) the other way is building model using training dataset as it is, if the
: variable(indicator) is significant, include it. When you apply the model to
: the new data you mentioned, the indicator is 0 and will not affect
: predictive results.
: just my 2 cents,

A*******s
发帖数: 3942
4
is there parameterization problem in the second one? i think it treats the
unknown category as the reference category. Not sure if it is valid if no
intercept in the model.
not sure if this would work--1st step, fit the model with that categorical
variable and other covariates; 2nd step, fit the model without the
categorical one and fix the coeffs of other covariates in order to find the
intercept estimate.
or, treat the categorical variable as a random effect. two methods should
have very close result if sample size in each category is large.

variable
to

【在 h***x 的大作中提到】
: Two ways,
: 1) one way which is the safest method is not using this categorical variable
: . :-)
: 2) the other way is building model using training dataset as it is, if the
: variable(indicator) is significant, include it. When you apply the model to
: the new data you mentioned, the indicator is 0 and will not affect
: predictive results.
: just my 2 cents,

h***x
发帖数: 586
5
其实我觉得最好的方法就是先检查两个dataset这个变量的分布,看是不是由于编码错
误导致编码不一样。如果确实不一样,那么这个变量就不应该用。也就没有必要检测包
括这个变量和不包括这个变量的区别了。
至于第二种方法,是基于training dataset的最优解,在具体model deployment的时候
,我们不知道data的分布会是怎样,但我们的假设就是就是要预测的data和training的
data有相似的分布。就这个具体例子看,model scores会变一点,但score ranking不
怎么会变,so final scoring results should be the same.
I think you are right, we can treat the categorical variable as a random
effect ...

the

【在 A*******s 的大作中提到】
: is there parameterization problem in the second one? i think it treats the
: unknown category as the reference category. Not sure if it is valid if no
: intercept in the model.
: not sure if this would work--1st step, fit the model with that categorical
: variable and other covariates; 2nd step, fit the model without the
: categorical one and fix the coeffs of other covariates in order to find the
: intercept estimate.
: or, treat the categorical variable as a random effect. two methods should
: have very close result if sample size in each category is large.
:

1 (共1页)
进入Statistics版参与讨论
相关主题
How to code categorical time-varying covariates in Cox mod请教logistic regression的independent variable是categorical
请大家帮我看看应该用哪个model分析?GEE 还是cox regression with time varing covariate?missing data questions
求教一个sas读data的问题a quesiton about random effect
SAS data merge求助一个covariance的问题
如何在一个regression model里面同时处理continuous和categorical变量问一个和regression analysis有关的问题
接着问统计问题(有包子答谢)问个multicollinearity 的问题,inSAS - fresh水平,谢谢帮助!
Correlated independent variable菜鸟请教一个问题
any regression model with high prediction accuracy?求教 如何出来很大的数据并且有很多variable
相关话题的讨论汇总
话题: variable话题: prediction话题: regression话题: model