good classification methods for high dimension data - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - good classification methods for high dimension data

相关主题
● 找工作总结 [下]	● 统计专业找银行工作，需要有哪些金融的知识
● 有80个候选Predictors,怎么从中选<10个	● Re: 请推荐nonparametric regression 的入门经典书
● model selection一般都用什么方法	● Ebook-The elements of statistical learning:data mining,inference,and prediction.2nd edition(2009)
● data science 面试求教	● 请推荐一本学习Data Mining 的书，谢谢。
● 问大牛们一个logistic model的问题哈	● 搞统计的人的怨念...
● 报面筋求实习合租 (转载)	● model的predictors之间有multi-colinearity怎么办？
● machine learning救助模型在1数据集上表现好其他烂	● 哪位牛人有L1 SVM的matlab code
● classification 问题求教!!	● 【大包子】Factor data analysis

相关话题的讨论汇总
话题: dimension话题: methods话题: high话题: logistic

进入Statistics版参与讨论

(共1页)

s******e
发帖数: 841

Can anybody recommend some good methods to do classification for high
dimension data, which might have relatively smaller prediction errors
compared with logistic regression
Thanks

A*****s
发帖数: 13748

try PCA first to reduce dimension

【在 s******e 的大作中提到】

: Can anybody recommend some good methods to do classification for high
: dimension data, which might have relatively smaller prediction errors
: compared with logistic regression
: Thanks

h***i
发帖数: 3844

推荐一点有创意的,
这个太classic了.

【在 A*****s 的大作中提到】

: try PCA first to reduce dimension

s******e
发帖数: 841

actually, I tried, it does not help at all

【在 A*****s 的大作中提到】

: try PCA first to reduce dimension

A*****s
发帖数: 13748

then Nerual Network...

【在 s******e 的大作中提到】

: actually, I tried, it does not help at all

A*****s
发帖数: 13748

do you want 创意 or a solution?

【在 h***i 的大作中提到】

: 推荐一点有创意的,
: 这个太classic了.

c**********2
发帖数: 144

decision tree

c*******n
发帖数: 718

Distance Weighted Discrimination
http://scholar.google.com/scholar?rlz=1C1GGLS_enUS291US304&sourceid=chrome&q=distance%20weighted%20discrimination&um=1&ie=UTF-8&sa=N&tab=ws
Software: http://www.unc.edu/~marron/marron_software.html
btw, the old fashion way is SVM.

【在 s******e 的大作中提到】

: Can anybody recommend some good methods to do classification for high
: dimension data, which might have relatively smaller prediction errors
: compared with logistic regression
: Thanks

c*******n
发帖数: 718

PC analysis had nothing to do with classification.
Actually I can give an example where it screws up after PCA removes the more
importance dimensions (even though with smaller variation)

【在 s******e 的大作中提到】

: actually, I tried, it does not help at all

o****o
发帖数: 8077

you are right
try some method based on canonical correlation measurements, such as PLS
for very high dimension classification, try some algorithm from Text Mining
literature
but I am not aware of any method that can beat logistic regression in erro
rate universally, it is really case dependent

more

【在 c*******n 的大作中提到】

: PC analysis had nothing to do with classification.
: Actually I can give an example where it screws up after PCA removes the more
: importance dimensions (even though with smaller variation)

相关主题
● 报面筋求实习合租 (转载)	● 统计专业找银行工作，需要有哪些金融的知识
● machine learning救助模型在1数据集上表现好其他烂	● Re: 请推荐nonparametric regression 的入门经典书
● classification 问题求教!!	● Ebook-The elements of statistical learning:data mining,inference,and prediction.2nd edition(2009)
进入Statistics版参与讨论

s******e
发帖数: 841

I also tried this one, it could give a better fit to the data, but the
prediction property is much worse.

【在 A*****s 的大作中提到】

: then Nerual Network...

o****o
发帖数: 8077

why logistic regression is not allowed?
have u tried k-NN?

【在 s******e 的大作中提到】

: I also tried this one, it could give a better fit to the data, but the
: prediction property is much worse.

s******e
发帖数: 841

the misclassification rate of logistic regression is a little bit high, I
want to see if any other method can give a good result.
I have not tried k-NN, because of the high dimension

【在 o****o 的大作中提到】

: why logistic regression is not allowed?
: have u tried k-NN?

o****o
发帖数: 8077

a well-built logistic regression can have error rate as low as any other
popular methods, if not lower. I once compared Neural Netowrk vs. logistic
regression on different scenarios, as well built logistic regression standed
out on error rate both in training sample and validation sample
try some boosting methods, too. If your data is not very noisy, boosting
helps to overcome nonlinearity not handled by your primary model.
for very high dimensional data, methods such as Boltzmann Machines and B

【在 s******e 的大作中提到】

: the misclassification rate of logistic regression is a little bit high, I
: want to see if any other method can give a good result.
: I have not tried k-NN, because of the high dimension

l********s
发帖数: 430

you develop some new method relating to lasso ^_^

c*******n
发帖数: 718

is it data dependent
I am thinking that you should first draw a scatter plot along the first and
second PC directions and see how the two classes are distributed

【在 s******e 的大作中提到】

: the misclassification rate of logistic regression is a little bit high, I
: want to see if any other method can give a good result.
: I have not tried k-NN, because of the high dimension

A*****s
发帖数: 13748

if the first to PCs really count, the reduction will make the classification
effective. Maybe the first PCs count very little...

and

【在 c*******n 的大作中提到】

: is it data dependent
: I am thinking that you should first draw a scatter plot along the first and
: second PC directions and see how the two classes are distributed

g********r
发帖数: 8017

How high is the dimension? If really high, like >1000, some reduction/
feature selection must be done. Could be PCA, could be feature selection by
first/second order testing.
If moderately high, 50~1000, might be able to use LASSO-based methods to
simultaneously reduce dimension and fit the model.
If not really high, ~50, could try SVM/boosting/random forest/......directly
. Look at Hastie/Tibshirani book.
以上属道听途说。

【在 s******e 的大作中提到】

: Can anybody recommend some good methods to do classification for high
: dimension data, which might have relatively smaller prediction errors
: compared with logistic regression
: Thanks

(共1页)

进入Statistics版参与讨论

相关主题
● 【大包子】Factor data analysis	● 问大牛们一个logistic model的问题哈
● 新手请教一个分类问题	● 报面筋求实习合租 (转载)
● 问个关于lasso的问题	● machine learning救助模型在1数据集上表现好其他烂
● An interview question	● classification 问题求教!!
● 找工作总结 [下]	● 统计专业找银行工作，需要有哪些金融的知识
● 有80个候选Predictors,怎么从中选<10个	● Re: 请推荐nonparametric regression 的入门经典书
● model selection一般都用什么方法	● Ebook-The elements of statistical learning:data mining,inference,and prediction.2nd edition(2009)
● data science 面试求教	● 请推荐一本学习Data Mining 的书，谢谢。

相关话题的讨论汇总
话题: dimension话题: methods话题: high话题: logistic

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天