如果dep variable严重skewed，如何做ordinal regression？ - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 如果dep variable严重skewed，如何做ordinal regression？

相关主题
● 如何做ordinal logistic regression的validation？	● 很惭愧的问一个简单的regression algebra.
● Regression model 不用 test normality？	● 有人用SAS么或者统计大拿帮忙看一个低级问题 (转载)
● 问一个关于linear regression的error假设问题	● 什么是统计 - 兼谈找工作
● Linear regression model 问题请教	● 该用什么model?
● 请教LINEAR REGRESSION基本问题	● 一个统计拟合问题
● regression的时候提高自由度对模式有什么好处？	● Maximum Likelihood estimation
● Regression中噪音项是一个AR(1)，如何做MLE或者其它Fit？	● 弱问个categorical variable有关的问题
● 请教一个面试问题。	● 还是MLE分布拟合问题

相关话题的讨论汇总
话题: regression话题: ordinal话题: variable话题: model话题: skewed

进入Statistics版参与讨论

(共1页)

p******r
发帖数: 1279

请教大家，我碰到一个问题，dependent variable是一个0 1 2 3...9共10个的
categorical variable，现在我要对其做 ordinal regression。
但问题是这个dependent variable 严重skewed，1200多个obs大多集中在0，1，2，在
9那里只有可怜的1个obs，请问这种情况还能做ordinal regression不？如果不能做，
那要怎么办呢？

A*******s
发帖数: 3942

i met similar count data before and i used zero-inflated models.

【在 p******r 的大作中提到】

: 请教大家，我碰到一个问题，dependent variable是一个0 1 2 3...9共10个的
: categorical variable，现在我要对其做 ordinal regression。
: 但问题是这个dependent variable 严重skewed，1200多个obs大多集中在0，1，2，在
: 9那里只有可怜的1个obs，请问这种情况还能做ordinal regression不？如果不能做，
: 那要怎么办呢？

e****t
发帖数: 766

en, i will use poission regression or negativie binomial regression with
zero inflation.

【在 A*******s 的大作中提到】

: i met similar count data before and i used zero-inflated models.

p******r
发帖数: 1279

Is zero-inflated model only valid for poisson family?
what i'm doing is Ordinal regression. it the zero-inflated model still ok to
do this? thanks!

【在 A*******s 的大作中提到】

: i met similar count data before and i used zero-inflated models.

A*******s
发帖数: 3942

i think poisson's nature is about counts or time length. not sure if other
types of ordinal data could fit in this approach. but worth a try as long as
the model offers a good fit.

to

【在 p******r 的大作中提到】

: Is zero-inflated model only valid for poisson family?
: what i'm doing is Ordinal regression. it the zero-inflated model still ok to
: do this? thanks!

p******r
发帖数: 1279

well my problem is try to predict the "depression score" based on "some
answers to some designed questions". the depression score is divided into 0
to 9, 0 is "not depressed at all", 9 is "very very depressed"..
One problem is I can't seem to find a good indep variable to be associated
with "depression score" in zero-inflated model. What do you think,
actuaries?

l***a
发帖数: 12410

for very skewed data, you can probably try over sampling when building the
model

0

【在 p******r 的大作中提到】

: well my problem is try to predict the "depression score" based on "some
: answers to some designed questions". the depression score is divided into 0
: to 9, 0 is "not depressed at all", 9 is "very very depressed"..
: One problem is I can't seem to find a good indep variable to be associated
: with "depression score" in zero-inflated model. What do you think,
: actuaries?

A*******s
发帖数: 3942

u mean no significant linear correlation between Y and X? Then changing the
link function of Y wouldn't help much. try to transform X, say introducing
curvature. or even find new X variables.
or dichotomizing Y into depressed and non-depressed using different
thresholds. sometimes it helps.

0

【在 p******r 的大作中提到】

p******r
发帖数: 1279

no, I mean when you try to run zero-inflated negative binomial model, the
SAS coding is like:
proc countreg data = depression method = qn;
model depscore = x1 x2 x3 / dist= zinegbin;
zeromodel depscore ~ x?;
run;
In addition to specify the "model", you have to specify the "zeromodel" as
well.
But which indep varible shall I shoose to put into "x?" ?
Shall I just try x1, x2 or x3 arbitragely one by one to see which one has
good fit?

the

【在 A*******s 的大作中提到】

: u mean no significant linear correlation between Y and X? Then changing the
: link function of Y wouldn't help much. try to transform X, say introducing
: curvature. or even find new X variables.
: or dichotomizing Y into depressed and non-depressed using different
: thresholds. sometimes it helps.
:
: 0

A*******s
发帖数: 3942

en...no idea... if no prior knowledge, just try all subsets i think....

【在 p******r 的大作中提到】

: no, I mean when you try to run zero-inflated negative binomial model, the
: SAS coding is like:
: proc countreg data = depression method = qn;
: model depscore = x1 x2 x3 / dist= zinegbin;
: zeromodel depscore ~ x?;
: run;
: In addition to specify the "model", you have to specify the "zeromodel" as
: well.
: But which indep varible shall I shoose to put into "x?" ?
: Shall I just try x1, x2 or x3 arbitragely one by one to see which one has

相关主题
● regression的时候提高自由度对模式有什么好处？	● 很惭愧的问一个简单的regression algebra.
● Regression中噪音项是一个AR(1)，如何做MLE或者其它Fit？	● 有人用SAS么或者统计大拿帮忙看一个低级问题 (转载)
● 请教一个面试问题。	● 什么是统计 - 兼谈找工作
进入Statistics版参与讨论

p******r
发帖数: 1279

ok, thanks for replying!

p******r
发帖数: 1279

well 我按照zero-inflated negative binomial 做了，
zero model那里都significant，但是那个AIC和-2loglikelihood有2600多。。。。是
不是太太太太大了啊？？？？
接下来该怎么办呢？

A*******s
发帖数: 3942

likelihood will increase with # of obs. it may not be bad if u have large
dataset. and AIC/BIC/-2logl are only for comparison. try other models to see
whose AIC or validation SSE is smaller.

【在 p******r 的大作中提到】

: well 我按照zero-inflated negative binomial 做了，
: zero model那里都significant，但是那个AIC和-2loglikelihood有2600多。。。。是
: 不是太太太太大了啊？？？？
: 接下来该怎么办呢？

s*********e
发帖数: 1051

i can't believe people suggest using ZIP model for depression score.
if you don't know the answer, then you don't have to provide misleading
answer either to show you are a smart ass.
- ' well my problem is try to predict the "depression score" based on
"some
answers to some designed questions". the depression score is divided
into 0
to 9, 0 is "not depressed at all", 9 is "very very depressed".'

A*******s
发帖数: 3942

like what i said b4,
"i think poisson's nature is about counts or time length. not sure if other
types of ordinal data could fit in this approach. "
would like to know Niu Ren's opinion. show what kind of ass u r.
btw, google "poisson regression depression score", some results come up.

【在 s*********e 的大作中提到】

: i can't believe people suggest using ZIP model for depression score.
: if you don't know the answer, then you don't have to provide misleading
: answer either to show you are a smart ass.
: - ' well my problem is try to predict the "depression score" based on
: "some
: answers to some designed questions". the depression score is divided
: into 0
: to 9, 0 is "not depressed at all", 9 is "very very depressed".'

p******r
发帖数: 1279

so what is your suggestion?
if you know something please help!
I appreciate every kinds of suggestion, and
holding something back is just another kind of "look what a smart ass I have
".

【在 s*********e 的大作中提到】

D*********2
发帖数: 535

I guess one could try Continuation Ratio Model, which is a discrete version
of Cox model.

D*********2
发帖数: 535

No offense. But I do not like ur post this time.
I really appreciate Stat Bull like u being active on this board. I mean, I
personally learnt a lot here, since my thesis is almost nothing to do with
modeling. However, I was thinking BBS is more like group discussion, we
might not know the right answer, but we try to figure it out. Sometime we
may luckily get helps from experts like you. We do appreciate that. But
please do not be contempt. That is discouraging. Thanks.

【在 s*********e 的大作中提到】

G*****m
发帖数: 222

In my opinion,
1.OLS does not restrict the distribution of y
2.your measurement may have systematic errors.People are not likely to
accept that they are highly depressed.

p******r
发帖数: 1279

有点不明白哦， odinal regression不是基于MLE的吗？
还是说你觉得索性把response variable看出continuous的，然后用OLS regression来
做？
如果ppl不愿意接收他们 highly depressed，then用什么model来做比较好呢？谢谢啊！

【在 G*****m 的大作中提到】

: In my opinion,
: 1.OLS does not restrict the distribution of y
: 2.your measurement may have systematic errors.People are not likely to
: accept that they are highly depressed.

相关主题
● 该用什么model?	● 弱问个categorical variable有关的问题
● 一个统计拟合问题	● 还是MLE分布拟合问题
● Maximum Likelihood estimation	● Standard Errors Calculation
进入Statistics版参与讨论

p******r
发帖数: 1279

那请问一般如果x variable严重righ skewed，除了用log来搞，还可以怎样搞呢？来个
x+x^2 ？

the

【在 A*******s 的大作中提到】

A*******s
发帖数: 3942

my understanding is transforming X for introducing nonlinearity. does it
matter that X variables are skewed?

【在 p******r 的大作中提到】

: 那请问一般如果x variable严重righ skewed，除了用log来搞，还可以怎样搞呢？来个
: x+x^2 ？
:
: the

p******r
发帖数: 1279

well 我觉得x variable没有必要normal dist的，其实只要residue满足normal dist就
好。
不过我的y和其中一个x都很skewed，我把它们用log变来变去好像最后residue还是不满
足normal dist，所以people may question我的significance of coefficient
estimates，because t-test is not valid。。。

【在 A*******s 的大作中提到】

: my understanding is transforming X for introducing nonlinearity. does it
: matter that X variables are skewed?

G*****m
发帖数: 222

1。ols=mle?
Normality (?). It is sometimes additionally assumed that the errors have
normal distribution conditional on the regressors:[4]
see：
http://en.wikipedia.org/wiki/Ordinary_least_squares
2。如果ppl不愿意接收他们 highly depressed的solution:
OLS, 得到error term。plot error against score.regress error on score, 检测是
否相关。解决法子，我也不清楚...要看你的X， literature。不过好像bootstrapping
通吃？

啊！

【在 p******r 的大作中提到】

: 有点不明白哦， odinal regression不是基于MLE的吗？
: 还是说你觉得索性把response variable看出continuous的，然后用OLS regression来
: 做？
: 如果ppl不愿意接收他们 highly depressed，then用什么model来做比较好呢？谢谢啊！

(共1页)

进入Statistics版参与讨论

相关主题
● 还是MLE分布拟合问题	● 请教LINEAR REGRESSION基本问题
● Standard Errors Calculation	● regression的时候提高自由度对模式有什么好处？
● 请问如何分析这两个变量之间的关系？	● Regression中噪音项是一个AR(1)，如何做MLE或者其它Fit？
● any regression model with high prediction accuracy?	● 请教一个面试问题。
● 如何做ordinal logistic regression的validation？	● 很惭愧的问一个简单的regression algebra.
● Regression model 不用 test normality？	● 有人用SAS么或者统计大拿帮忙看一个低级问题 (转载)
● 问一个关于linear regression的error假设问题	● 什么是统计 - 兼谈找工作
● Linear regression model 问题请教	● 该用什么model?

相关话题的讨论汇总
话题: regression话题: ordinal话题: variable话题: model话题: skewed

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天