关于stepwise programming - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 关于stepwise programming

相关主题
● [合集] └ Re: 关于stepwise programming	● 怎样用R除掉DUPLICATED RECORD
● 包子问题请教( sas)	● Ask 2 simple SAS questions,thanks
● 请问sas中一个变量的内容被两个左斜杠（/）分成了三部分	● 请问SAS中如何通过组内一个变量的值控制整组变量
● 请教一个简单SAS问题	● 再请教一个PROC SQL JOIN的问题。。。。
● 重新安排变量的问题	● 一个SAS问题(transpose?)
● 关于lasso的variable selection问题	● A SAS question
● [急求助] survival analysis （SAS）	● 怎样储存我想要的proc means 的结果？
● 怎样用R定位变量的位置	● How to change sas dataset column order

相关话题的讨论汇总
话题: stepwise话题: 变量话题: variables话题: sas

进入Statistics版参与讨论

(共1页)

f****7
发帖数: 398

请教大家一个关于stepwise programming的问题,多谢大家宝贵的时间：
有一个dataset含有101个变量，现在希望依次找出前100个变量中哪些和最后一个变量
的有相关关系。解决的步骤是
1.找出correlation最高的var1，将var1与前100个变量中剩下99个variable分别相加，
得到一组新的variables
2.从新的99个变量中找出correlation最高的那个变量var2, 此var2实际是原dataset中
的var1+var2，并将var1,var2与原100个变量中剩下98个变量相加，得到一组新的
variables
3.从新的98个变量中找出correlation最高的变量var3=（var1+var2+var3）
4.重复以上步骤，直到每次运算中最高的correlation与前一次相比不再增加。
我对此类编程没有经验，现在还徘徊在到底是用sas还是r来处理的阶段，希望大家多多
指教，不甚感激。

l***a
发帖数: 12410

this algorithm sounds interesting... what's it called?

【在 f****7 的大作中提到】

: 请教大家一个关于stepwise programming的问题,多谢大家宝贵的时间：
: 有一个dataset含有101个变量，现在希望依次找出前100个变量中哪些和最后一个变量
: 的有相关关系。解决的步骤是
: 1.找出correlation最高的var1，将var1与前100个变量中剩下99个variable分别相加，
: 得到一组新的variables
: 2.从新的99个变量中找出correlation最高的那个变量var2, 此var2实际是原dataset中
: 的var1+var2，并将var1,var2与原100个变量中剩下98个变量相加，得到一组新的
: variables
: 3.从新的98个变量中找出correlation最高的变量var3=（var1+var2+var3）
: 4.重复以上步骤，直到每次运算中最高的correlation与前一次相比不再增加。

f****7
发帖数: 398

stepwise selection
do you have any ideas about this? thanks!

【在 l***a 的大作中提到】

: this algorithm sounds interesting... what's it called?

m****r
发帖数: 237

似乎SAS里面proc reg里的stepwise selection of r.v. 的想法和这个差不多吧。。你
可以去找找看关于这方面的东西。。。实在不行去问SAS group。。

【在 f****7 的大作中提到】

f****7
发帖数: 398

谢谢 minner! 刚才看了Proc reg的东西，感觉很有帮助。有一个问题是sas proc reg
对predictor的选择是基于R-square的基础上，我可能比较偏向基于correlation的基础
上，不知道有没有什么方法可以将两者联系起来？

【在 m****r 的大作中提到】

: 似乎SAS里面proc reg里的stepwise selection of r.v. 的想法和这个差不多吧。。你
: 可以去找找看关于这方面的东西。。。实在不行去问SAS group。。

s*r
发帖数: 2757

proc reg won't do the addition that you have described.
you did not realize that libra was a little sarcastic ?

reg

【在 f****7 的大作中提到】

: 谢谢 minner! 刚才看了Proc reg的东西，感觉很有帮助。有一个问题是sas proc reg
: 对predictor的选择是基于R-square的基础上，我可能比较偏向基于correlation的基础
: 上，不知道有没有什么方法可以将两者联系起来？

f****7
发帖数: 398

Not really, but I don't care that much either. If someone can help, I really
appreciate it. If not, that's ok.

l***a
发帖数: 12410

actually I was not being sarcastic...

【在 s*r 的大作中提到】

: proc reg won't do the addition that you have described.
: you did not realize that libra was a little sarcastic ?
:
: reg

b******1
发帖数: 367

root of R is correlation

f****7
发帖数: 398

Thanks,berry321, but it seems like not being able to use R itself as the
statistic for stepwise selection.

【在 b******1 的大作中提到】

: root of R is correlation

相关主题
● 关于lasso的variable selection问题	● 怎样用R除掉DUPLICATED RECORD
● [急求助] survival analysis （SAS）	● Ask 2 simple SAS questions,thanks
● 怎样用R定位变量的位置	● 请问SAS中如何通过组内一个变量的值控制整组变量
进入Statistics版参与讨论

f****7
发帖数: 398

That is what I thought. And thank you to re my post.

【在 l***a 的大作中提到】

: actually I was not being sarcastic...

D******n
发帖数: 2836

Totally dont know what u wanna do with this algorithm, what is the purpose?
Whats the rationale behind this?
But anyway, u can use R to do this easily, definitely not SAS.

【在 f****7 的大作中提到】

f****7
发帖数: 398

Sorry if I confused you. In the end, I want to see a group of variables (
ideally <20), which can represent the correlations between first 100
variables and the last variable best.

?

【在 D******n 的大作中提到】

: Totally dont know what u wanna do with this algorithm, what is the purpose?
: Whats the rationale behind this?
: But anyway, u can use R to do this easily, definitely not SAS.

c**d
发帖数: 104

如果你不是很熟悉R的话， SAS proc glmselect 是一个很好用的。

【在 f****7 的大作中提到】

g********r
发帖数: 8017

有这么哥算法么？远不如stepwise regression make sense 啊。现在stepwise也早过时了。不做个lasso都不好意思见
人。你怎么还搞这个？

【在 f****7 的大作中提到】

: Sorry if I confused you. In the end, I want to see a group of variables (
: ideally <20), which can represent the correlations between first 100
: variables and the last variable best.
:
: ?

D******n
发帖数: 2836

why summation?

Sorry if I confused you. In the end, I want to see a group of variables (
ideally <20), which can represent the correlations between first 100
variables and the last variable best.
?

【在 f****7 的大作中提到】

: Sorry if I confused you. In the end, I want to see a group of variables (
: ideally <20), which can represent the correlations between first 100
: variables and the last variable best.
:
: ?

f****7
发帖数: 398

学习了！非常感谢！

【在 c**d 的大作中提到】

: 如果你不是很熟悉R的话， SAS proc glmselect 是一个很好用的。

f****7
发帖数: 398

这个算法其实是因为数据的缘故被要求的，不希望利用regression model 去选
predictors. 比较头疼，自己经验太少，碰
到这样的东西总是解决不好。

过时了。不做个lasso都不好意思见

【在 g********r 的大作中提到】

: 有这么哥算法么？远不如stepwise regression make sense 啊。现在stepwise也早过时了。不做个lasso都不好意思见
: 人。你怎么还搞这个？

o****o
发帖数: 8077

看起来你是想筛选主因素，为啥不用已经广泛接受的方法呢？比如lasso？

【在 f****7 的大作中提到】

: 这个算法其实是因为数据的缘故被要求的，不希望利用regression model 去选
: predictors. 比较头疼，自己经验太少，碰
: 到这样的东西总是解决不好。
:
: 过时了。不做个lasso都不好意思见

f****7
发帖数: 398

Summation was used to calculate "conditional" correlations based on the
selected variables. Without it, we
cannot tell when the correlations stop increasing.

【在 D******n 的大作中提到】

: why summation?
:
: Sorry if I confused you. In the end, I want to see a group of variables (
: ideally <20), which can represent the correlations between first 100
: variables and the last variable best.
: ?

相关主题
● 再请教一个PROC SQL JOIN的问题。。。。	● 怎样储存我想要的proc means 的结果？
● 一个SAS问题(transpose?)	● How to change sas dataset column order
● A SAS question	● mixed models
进入Statistics版参与讨论

g********r
发帖数: 8017

这个summation比用residual有什么优势？
劣势很明显。比如y=x1-x2，

【在 f****7 的大作中提到】

: Summation was used to calculate "conditional" correlations based on the
: selected variables. Without it, we
: cannot tell when the correlations stop increasing.

l***a
发帖数: 12410

how does sas do lasso?

【在 o****o 的大作中提到】

: 看起来你是想筛选主因素，为啥不用已经广泛接受的方法呢？比如lasso？

s*r
发帖数: 2757

i tried to say something obvious
when you see a '+' in model statement, it does not always mean mathematical
addition

【在 f****7 的大作中提到】

: Summation was used to calculate "conditional" correlations based on the
: selected variables. Without it, we
: cannot tell when the correlations stop increasing.

s*r
发帖数: 2757

lasso has been the main method
i am so outdated
do you have some introductory document

【在 o****o 的大作中提到】

: 看起来你是想筛选主因素，为啥不用已经广泛接受的方法呢？比如lasso？

l***a
发帖数: 12410

co-

【在 s*r 的大作中提到】

: lasso has been the main method
: i am so outdated
: do you have some introductory document

o****o
发帖数: 8077

tweak PROC GLMSELECT
you can also build your own SAS implementation.

【在 s*r 的大作中提到】

: lasso has been the main method
: i am so outdated
: do you have some introductory document

f****7
发帖数: 398

That's right, thanks for pointing out.

mathematical

【在 s*r 的大作中提到】

: i tried to say something obvious
: when you see a '+' in model statement, it does not always mean mathematical
: addition

(共1页)

进入Statistics版参与讨论

相关主题
● How to change sas dataset column order	● 重新安排变量的问题
● mixed models	● 关于lasso的variable selection问题
● SAS question	● [急求助] survival analysis （SAS）
● 【包子】弱问个dummy variable问题	● 怎样用R定位变量的位置
● [合集] └ Re: 关于stepwise programming	● 怎样用R除掉DUPLICATED RECORD
● 包子问题请教( sas)	● Ask 2 simple SAS questions,thanks
● 请问sas中一个变量的内容被两个左斜杠（/）分成了三部分	● 请问SAS中如何通过组内一个变量的值控制整组变量
● 请教一个简单SAS问题	● 再请教一个PROC SQL JOIN的问题。。。。

相关话题的讨论汇总
话题: stepwise话题: 变量话题: variables话题: sas

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天