第6页 - 关于regression的讨论汇总 - 话题女王

全部话题 - 话题: regression

l******i
发帖数: 1404

来自主题: Quant版 - 除了Rsquare外还有什么可以判断regression模型的么?

得出的值方向不对，说明model本身有问题。
选择predictor：如果predictor数目多的话（注意multicollinearity），建议用
stepwise model selection；predictor数目少的话，用all subset search.
在所有感兴趣的model里用R_adjusted_square结合C(p)和PRESS(p)判断。也可以结合
AIC, BIC, SIC。判据越多越好。
不过唯一的最好办法：get much more data, need sample size to be large enough.
这样asymtotic理论下的所有test就有意义了。也就不存在你说的Rsquare一会儿大一会儿小的问题了。
结果实在差的很远，不整regression，用multifactor ANOVA Model
再不行，就用nonparametric approach呗，例如bootstrapping

s********a
发帖数: 1100

来自主题: Quant版 - 除了Rsquare外还有什么可以判断regression模型的么?

mark
回lz
并不是是个数据就可以用regression的
不如试试nonparametric approach
ls说的非常好

enough.
会儿小的问题了。

i*****r
发帖数: 1302

来自主题: Quant版 - 除了Rsquare外还有什么可以判断regression模型的么?

The qestion is: Is there a approach to early detect if a model is working or
breaking down?
regression只是个例子

l******i
发帖数: 1404

来自主题: Quant版 - 除了Rsquare外还有什么可以判断regression模型的么?

or
get much more data, need sample size to be large enough.
不考虑时间的话，也就是你所有数据都是independent的话，不管是不是regression，
你所有的model和test都是基于asymptotic theory。如果你的sample size足够大，你
的model checking又足够多了的话,你的model应该一直是对的，不会有突然间model
break down的问题。
如果考虑时间correlation的话，你要不停变换model的predictors和volatility w.r.t
. time,最好用用ARMA(1,1), GARCH(1,1) 等等financial econometric的model试试，
还有很多其他更复杂的time series model可以让你的model里很多东西都跟时间有关,
接下来怎么做要看具体情况了。

z****g
发帖数: 1978

来自主题: Quant版 - 哪里有现成的Sparse regression的package可以下载（C++, MATLAB的）

sparse regression? 如果是说X是sparse matrix的话，找一个sparse matrix的库就可
以了啊

t*******y
发帖数: 637

来自主题: Quant版 - 问一个regression问题

least square linear regression, 假设 y=Xb+e
minimize e'e=(y-Xb)'(y-Xb)=y'y-y'Xb-b'X'y+b'X'Xb
这个后几项对b求导怎么求? 不是很清楚对向量的求导
多谢

p********1
发帖数: 1011

来自主题: Quant版 - logistic regression的 Model Accuracy用什么方法？包子谢

应该不是看pseudo R-square 或者Nagelkerke R-square吧？
应该用什么方法看这个logistic regression model到底好不好？

m**********4
发帖数: 774

来自主题: Quant版 - 问个面试题关于 ridge regression 顺便发个面经

大牛能否讲讲为啥old school的stepwise regression is wrong？我一直以为这种
greedy methods在现实中满实用的呢。

,

k***g
发帖数: 7244

来自主题: Quant版 - data prediction by regression or better ways (转载)

哦，我把你的k 和 b 看串了，一样的，你可以把 b 作同样的形变，这是一个
constrained linear regression 里常用的trick

b*****d
发帖数: 7166

来自主题: Quant版 - Regression analysis在 financial modelling里的应用有哪些？

有个面试，说要用excel做financial modeling，要用regression，
不知道具体指什么，请大家指点。
另外excel能做非线性回归吗，或者在金融模型里有非线性回归吗？

b*****d
发帖数: 7166

来自主题: Quant版 - regression的问题：怎么处理bad data (转载)

【以下文字转载自 Statistics 讨论区】
发信人: biokold (kold), 信区: Statistics
标题: regression的问题：怎么处理bad data
发信站: BBS 未名空间站 (Fri May 16 02:40:37 2014, 美东)
现在要做一个线性回归分析。数据是每5分钟记录一次的股票价格，共10年时间。
问题有
1.怎样判断数据是否是错的（比如太离谱的，负的等等）？有什么一般的方法判断吗？
2.怎么处理错的数据，直接扔掉？因为要做回归，比如regressor选为过去1天的数字，
那么就不能扔掉。这时要把错的数据改成一个猜测的数字吗？
3.有什么通用的办法引入一个权重，使得近期的数据权重更大？比如指数函数还是多项
式函数，哪个更合理？
谢谢！

P*S
发帖数: 381

来自主题: Quant版 - 请问multi variate linear regression 选择risk factor 问题 (转载)

t-statistic更重要，coefficient绝对值大小取决于variable的单位，并不是决定因素
.如果t-stat是significant,理论上应该都用
coefficient的 R square？还是单独regression的 r-square?

w**********y
发帖数: 1691

来自主题: Quant版 - 请问multi variate linear regression 选择risk factor 问题 (转载)

你问的问题太弱跟你讲起来太麻烦估计要给你补好多101 102的课
简言之，给你东西的人听起来很懂他们的描述非常make sense。不懂的直接请他们教你
吧。
forward selection, stepwise
都是最老的做regression 的方法而且都是被证明了是错误的至少只是approximation.
展开讲太多知识了。你如果常用这些去统计系修两门相关课程吧

r*******g
发帖数: 453

来自主题: Quant版 - 请问multi variate linear regression 选择risk factor 问题 (转载)

请教一下regression的话不用forward, stepwise 该用什么呢？

.

w********h
发帖数: 17

来自主题: Science版 - Binary Regression中的Identity, Probit和Complementary log-log

其实在binary data regression中还有其他的links:最常用的要数
identity: Y=\beta_0+\beta_1 X
Probit: Probit(Y)=\beta_0+\beta_1 X
和
Complementary log-log:log(-log(1-p))=\beta_0+\beta_1 X.
不同的links中对\beta_1的解释不同,用处也不同,要小心区别.Identity显然
不是一个好的link,为什么呢?(Ronna,北大数学系的学长给你布置的第一道家
庭作业,免得你在北大班猛灌 :))Probit link常用在bioassay的数据分析,而
C-log-log则常用于传染病的数据分析.

q****k
发帖数: 1023

来自主题: Sociology版 - Similar "freq count" statement in SPSS logistic regression?

I have no problem to use SAS Proc Logistic for an input data with aggregate
"count" variable.
But in SPSS, for the same input data with "count" variable, how to get the
similar "freq count" statement for SPSS Logistic Regression?
Thanks!
Please refer to
http://support.sas.com/rnd/app/da/cat/samples/chapter8.html
data coronary;
input sex ecg ca count @@;
datalines;
0 0 0 11 0 0 1 4
0 1 0 10 0 1 1 8
1 0 0 9 1 0 1 9
1 1 0 6 1 1 1 21
;
run;
proc logistic des... 阅读全帖

q****k
发帖数: 1023

来自主题: Sociology版 - Similar "freq count" statement in SPSS logistic regression?

r********t
发帖数: 41

来自主题: Statistics版 - convergence of linear regression with diverging number of predictors

Is there any result talking about the convergence of linear regression (or
likelihood based) with diverning number of predictors?
Say n observations yi, and xi.
the dimension p of xi diverges as n, say p=p(n).
however the response may depend only on the first 10 predictors.
Many thanks!

p********a
发帖数: 5352

来自主题: Statistics版 - [合集] linear regression的问题

☆─────────────────────────────────────☆
himalaya (Tea) 于 (Fri May 16 22:20:31 2008) 提到:
现在有一组simulated的数据. 数据一共有1000行,每一行中有10列分别是ID,X1,X2,X3
,I1,I2,I3,I4,I5,Y. X1,X2,X3,Y都是continuous的, I1-I5是indicator variable. 要
求对Y做linear regression. 我作出来的model效果很差 R-square<0.1.
大家能不能给点建议啊多谢啦!
☆─────────────────────────────────────☆
himalaya (Tea) 于 (Fri May 16 22:24:20 2008) 提到:
对Y 做了box-cox transformation也没用
☆─────────────────────────────────────☆
gysonny (Mushroom) 于 (Fri May 16 23:26:14 200

s******e
发帖数: 841

来自主题: Statistics版 - any regression model with high prediction accuracy?

Thank you for replying.
I am not a stastics major. It is an engineering problem. I think first I
want to reduce the prediction error as much as possible. That's why I wanted
to try regression tree method. But it failed. My question is can I find a
method that can give me small prediction error and it does not matter if it
is hard to interprete the result.

坏.

g*********n
发帖数: 119

来自主题: Statistics版 - 关于 Logit Regression和Deviance的问题。

the deviance is a G^2, compared with saturated model, so a logistic
regression with only intercept will have the largest deviance(>0), instead
of 0.
the deviance of reduced model shall >= the deviance of full model, so I
believe what you think is not true.

s***i
发帖数: 49

来自主题: Statistics版 - 关于 Logit Regression和Deviance的问题。

Can I use the "null deviance" in R as the deviance with 0 explanatory
variable? But how do I model a logit regression with 0 explanatory variable
in the first place, using GLM?

t******u
发帖数: 1

来自主题: Statistics版 - 什么SAS命令可以执行multivariate regression???

据说orthoreg可以算数据之间有关联的线性回归，但因变量可能不是二值变量吧...用
一般的回归如reg中的stepwise好像也可以，但不太确定是不是您提到的multivariate
regression...

c**********e
发帖数: 2007

来自主题: Statistics版 - How to the macro regression with if?

Suppose that I have a data set data1, with numerical variables
x, y, z. I would like to do regression y=x if a macro
variable is "a" and do z=x y if the macro variable is not
"a". The following does not work well. How to do it?
%macro regre(var);
%if "&var."="a" %then %do;
proc reg data=data1;
model y=x;
run;
%end;
%else %do;
proc reg data=data1;
model z=x y;
run;
%end;
%mend;
%regre(a);
%regre(b);

m*****8
发帖数: 27

来自主题: Statistics版 - 问一个关于linear regression的error假设问题

对于simple linear regression,y(i)=E(Y|X=x(i)).
One of the assumptions concerning the errors is
E(e(i)|x(i))=0, so if we draw a scatteplot of the e(i) versus x(i),we would
have null scatterplot, with no patterns.
问题是为什么要做这样一个假设，E(e(i)|x(i))=0说明e(i)和x(i)没有correlation吗
？如果是的话，怎么推出来的呢？谢谢!

w********e
发帖数: 944

来自主题: Statistics版 - 问一个关于linear regression的error假设问题

In the simple linear regression, the predictor X is considered as a constant
. For any level of X, X(i), the response variable Y(i) is a random variable
with mean dependent on X(i).

h***i
发帖数: 3844

来自主题: Statistics版 - 问一个关于linear regression的error假设问题

OLS estimator 不一定是 MLE,就是用method of moment.不是likelihood based
method.
没有用到任何distribution assumption，
当然有normal assumption的话，肯定是MLE.
如果你还有疑问,翻一下Applied linear regression 3rd edition by Sanford
Weisberg
Chapter 2 section 2.4, page 27, 第3段.

Y|
L2.

s********s
发帖数: 8

来自主题: Statistics版 - 问一个关于linear regression的error假设问题

This is the weakest assumption for linear regression, which
basically says the estimator should be an unbiased estimator
for E(Y|X=x(i)) or conditional mean .
For your second the question, no, this assumption doesn't say
there's no between e(i) and x(i). We can have a variance matrix
for e(i)'s which is related to x(i)'s.
Hope the above helps. Not necessarily this is the only answer and correct.

would

s*****n
发帖数: 2174

来自主题: Statistics版 - 请问如何验证已知的logistic regression models是不是能很好predict 自己的dataset

你问的问题, 本质上是logitic regression model 的
model diagnostics. 这个问题本身就是一个困难的问题.
你要考虑的, 不是给Y分组, 而是给X分组.
分两种情况
(1) Logistic model with replication
同样的X下, 有多个Y(0/1)的观测. 这种情况比较容易.
你可以比较
Y hat = Model(X)
Y empirical = #1/(#1+#0) under X
然后算一下correlation什么的.
(2) Logistic model without replication
同样的X下, 只有一次Y的观察, 或者是0或者是1. 这种
情况下, 必须借助额外的assumption. 常用的就是model
的连续性, 即相似的X意味着相似的P(Y=1)
这时, 需要把所有的observation根据X进行clustering.
然后在每个cluster内, 看成是replication. 进行第一
种情况那样的算empirical rate和predicted rate.
可是如果X的维度较高, 高

s*******9
发帖数: 35

来自主题: Statistics版 - 请问如何验证已知的logistic regression models是不是能很好predict 自己的dataset

I think that stata has very simple commands to help you get the ROC and the
cutoff point.
after running your logistic regression model, simply run 'lroc' and you will
get a nice ROC in stata.
and after that you can run 'roctab youroutcome p, detail' to get a series of
cutoff points.

u******3
发帖数: 11

来自主题: Statistics版 - 问一个和regression analysis有关的问题

one depedent variable, several independent variables. and so I develop a
regression model for them. the problem is that among several independent
variables, they might be correlated. Hence, I think it necessay to isolate
the effect of correlation or I will overstate the impact of these
independent variables on the dependent variable. What is the statistical
tool to isolate or test the correlation among indepedent variables? Sorry
but statistics is not really my area.

y****2
发帖数: 34

来自主题: Statistics版 - 问一个和regression analysis有关的问题

You may try to use principle component regression, but remember that the
interpretation could be fairly complicated. Good luck.

c*********t
发帖数: 340

来自主题: Statistics版 - 请问R中lrm和glm做logistic regression的区别

谢谢ls帮忙：）
我知道summary里有
可是我是想把每次的P值都输入到一个文件里
我要做六千多个logistic regression
不想一个一个地用SUMMARY看。。。。

S******y
发帖数: 1123

来自主题: Statistics版 - Stochastic Gradient Ascent for logistic regression in R -- Convergence problem !

Hi. guys,
I am trying to write my own Stochastic Gradient Ascent for logistic
regression in R. But it seems that I am having convergence problem.
Am I doing anything wrong, or just the data is off?
Here is my code in R -
lbw <-
read.table("http://www.biostat.jhsph.edu/~ririzarr/Teaching/754/lbw.dat"
, header=TRUE)
attach(lbw)
lbw[1:2,]
low age lwt race smoke ptl ht ui ftv bwt
1 0 19 182 2 0 0 0 1 0 2523
2 0 33 155 3 0 0 0 0 3 2551
#-----R implementation of l

l*******l
发帖数: 204

来自主题: Statistics版 - help on spline regression

anyone familiar with spline regression. I know nothing about it. I try
google and did not find introductory material. Many thanks if you can
provide some introductory material. The question I try to answer is to find
the non-linear relationship between x and y and how this relationship
differs between two groups.

h******e
发帖数: 1791

来自主题: Statistics版 - 一个弱弱的regression问题，请指点一二。

我正在分析一个简单的dataset，用linear regression分析欧洲的食品消费问题，以肉
类作response，其他各类食品作为predictor。从数据上看东欧的肉类消费明显小于西
欧，如果以东欧作为baseline，西欧作为categorical variable的dummy variable带入
model后，西欧的parameter是负的；如果以西欧作为baseline，东欧以同样的方式带入
model后，东欧的parameter是正的。该如何解释这个model呢？谢谢。

z*********o
发帖数: 541

来自主题: Statistics版 - 一个弱弱的regression问题，请指点一二。

logistic regression ?

a*****8
发帖数: 110

来自主题: Statistics版 - Applied Logistic Regression

Any one has Hosmer & Lemeshow's Applied Logistic Regression?
Many thanks!

y*****u
发帖数: 224

来自主题: Statistics版 - 请教一个regression问题

从一堆样本(x,y,z)中作regression 分析平面z = ax+by+c;
我们知道估计值~a（a） and ~b （b）是normal 分布
请问(~a/~b)是什么分布？参数如何？什么文章或书对此问题有详尽分析？
先谢了！

c*********t
发帖数: 592

来自主题: Statistics版 - [新手求救]怎样输出logistic regression的结果？

在SAS中
普通的logistic regression analysis
想要把output里面的point estimate 和95% confidence interval输出成一个文件或者
表格
请问应该怎样做
可以直接用output out=。。。么？

o******6
发帖数: 538

来自主题: Statistics版 - [合集] passing-bablok and deming regression

☆─────────────────────────────────────☆
jjoobbb (天上掉馅饼) 于 (Tue Feb 5 18:09:32 2008) 提到:
有人了解passing-bablok and deming regressions么, 哪里可以找到相关paper.找了
两篇clinical chemistry and biomedical上的文章都不能access.哪位好心人可以发两
篇paper到我的信箱.谢谢 w*******[email protected]
☆─────────────────────────────────────☆
cinsug (Lucky Lu) 于 (Tue Feb 5 22:28:07 2008) 提到:
I did PBR in SAS for equivalence test of chemical assays. There're 3 papers
written by them. If you really need them I can sent you a copy by mail.

y******0
发帖数: 401

来自主题: Statistics版 - 请教几个logistic regression model的问题

1. Overfitting.
2. For the missing values A or B. Check the the missing ratio. If the ratio
is more than 50%, maybe you should drop this variables. Create indicators
for the missing values and use the indicator in the model as a input
variable either. Impute the missing values using mean, median, regression,
or multiple imputation methods based on the data structures.
It is hard to find the 'best' imputation method, but you have to try.

s****l
发帖数: 129

来自主题: Statistics版 - 求推荐Logistic Regression的书

关于logistic regression，哪本书比较经典？
谢谢了！

d**n
发帖数: 23

来自主题: Statistics版 - 求推荐Logistic Regression的书

http://www.amazon.com/Applied-logistic-regression-probability-statistics/dp/0471356328

h**h
发帖数: 488

来自主题: Statistics版 - 请推荐一本regression带SAS例子的书。

以前学的regression，教材挺理论花的。现在想看带SAS例子的，但还是有理论说明的
。请大家推荐。

h**h
发帖数: 488

来自主题: Statistics版 - 请推荐一本regression带SAS例子的书。

Thanks. Open for comments.
Some text books are classic, really enjoy reading them. However my
regression book was not.

a*****n
发帖数: 5158

来自主题: Statistics版 - 问一个线性regression的probability of fit怎么算

有一组数据
X(i),error_X(i), Y(i),error_Y(i), coefficient of error_X(i) and error_Y(i)
要做least square linear regression,assume error is norm distribution
怎么计算probability of fit?

f******r
发帖数: 124

来自主题: Statistics版 - 求一个用SPSS算multilevel logistic regression。谢谢了。

我需要算一个相关性的问题。Dependent variable是0和1。Independent variables
are continuous and categorical variables. Intraclass correlation is present.
文献上说要用multilevel hiarchical logistic regression. 请问用SPSS的话应该
是哪个选项？我折腾一下午也没有结果。请版上大家统计高手帮帮吧。谢谢了。

g******7
发帖数: 19

来自主题: Statistics版 - 求一个用SPSS算multilevel logistic regression。谢谢了。

还真不知道SPSS现在能不能作 multilevel 的 logistic regression, at least from
the "True" multilevel perspective. If no random effects need to be specified
, you may try the procedures under "generalized linear models" in which you
should pay attention to how to specify the "nested" nature of the data. The
"mixed" procedure in SPSS can only deal with continuous DV.
please share if you figure out a way to do it in SPSS.
good luck.

present.

x*******u
发帖数: 500

来自主题: Statistics版 - Linear regression model 问题请教

linear regression model 对y 有normal的要求, 请问它对x有没有要求? 如果x r
ight skew, 是不是一定要tranform 或者改成categorical variable? 请大牛指教
. 谢谢.

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天