由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - good classification methods for high dimension data
相关主题
找工作总结 [下]统计专业找银行工作,需要有哪些金融的知识
有80个候选Predictors,怎么从中选<10个Re: 请推荐nonparametric regression 的入门经典书
model selection一般都用什么方法Ebook-The elements of statistical learning:data mining,inference,and prediction.2nd edition(2009)
data science 面试求教请推荐一本学习Data Mining 的书, 谢谢。
问大牛们一个logistic model的问题哈搞统计的人的怨念...
报面筋求实习合租 (转载)model的predictors之间有multi-colinearity怎么办?
machine learning救助 模型在1数据集上表现好 其他烂哪位牛人有L1 SVM的matlab code
classification 问题 求教!!【大包子】Factor data analysis
相关话题的讨论汇总
话题: dimension话题: methods话题: high话题: logistic
进入Statistics版参与讨论
1 (共1页)
s******e
发帖数: 841
1
Can anybody recommend some good methods to do classification for high
dimension data, which might have relatively smaller prediction errors
compared with logistic regression
Thanks
A*****s
发帖数: 13748
2
try PCA first to reduce dimension

【在 s******e 的大作中提到】
: Can anybody recommend some good methods to do classification for high
: dimension data, which might have relatively smaller prediction errors
: compared with logistic regression
: Thanks

h***i
发帖数: 3844
3
推荐一点有创意的,
这个太classic了.

【在 A*****s 的大作中提到】
: try PCA first to reduce dimension
s******e
发帖数: 841
4
actually, I tried, it does not help at all

【在 A*****s 的大作中提到】
: try PCA first to reduce dimension
A*****s
发帖数: 13748
5
then Nerual Network...

【在 s******e 的大作中提到】
: actually, I tried, it does not help at all
A*****s
发帖数: 13748
6
do you want 创意 or a solution?

【在 h***i 的大作中提到】
: 推荐一点有创意的,
: 这个太classic了.

c**********2
发帖数: 144
7
decision tree
c*******n
发帖数: 718
8
Distance Weighted Discrimination
http://scholar.google.com/scholar?rlz=1C1GGLS_enUS291US304&sourceid=chrome&q=distance%20weighted%20discrimination&um=1&ie=UTF-8&sa=N&tab=ws
Software: http://www.unc.edu/~marron/marron_software.html
btw, the old fashion way is SVM.

【在 s******e 的大作中提到】
: Can anybody recommend some good methods to do classification for high
: dimension data, which might have relatively smaller prediction errors
: compared with logistic regression
: Thanks

c*******n
发帖数: 718
9
PC analysis had nothing to do with classification.
Actually I can give an example where it screws up after PCA removes the more
importance dimensions (even though with smaller variation)

【在 s******e 的大作中提到】
: actually, I tried, it does not help at all
o****o
发帖数: 8077
10
you are right
try some method based on canonical correlation measurements, such as PLS
for very high dimension classification, try some algorithm from Text Mining
literature
but I am not aware of any method that can beat logistic regression in erro
rate universally, it is really case dependent

more

【在 c*******n 的大作中提到】
: PC analysis had nothing to do with classification.
: Actually I can give an example where it screws up after PCA removes the more
: importance dimensions (even though with smaller variation)

相关主题
报面筋求实习合租 (转载)统计专业找银行工作,需要有哪些金融的知识
machine learning救助 模型在1数据集上表现好 其他烂Re: 请推荐nonparametric regression 的入门经典书
classification 问题 求教!!Ebook-The elements of statistical learning:data mining,inference,and prediction.2nd edition(2009)
进入Statistics版参与讨论
s******e
发帖数: 841
11
I also tried this one, it could give a better fit to the data, but the
prediction property is much worse.

【在 A*****s 的大作中提到】
: then Nerual Network...
o****o
发帖数: 8077
12
why logistic regression is not allowed?
have u tried k-NN?

【在 s******e 的大作中提到】
: I also tried this one, it could give a better fit to the data, but the
: prediction property is much worse.

s******e
发帖数: 841
13
the misclassification rate of logistic regression is a little bit high, I
want to see if any other method can give a good result.
I have not tried k-NN, because of the high dimension

【在 o****o 的大作中提到】
: why logistic regression is not allowed?
: have u tried k-NN?

o****o
发帖数: 8077
14
a well-built logistic regression can have error rate as low as any other
popular methods, if not lower. I once compared Neural Netowrk vs. logistic
regression on different scenarios, as well built logistic regression standed
out on error rate both in training sample and validation sample
try some boosting methods, too. If your data is not very noisy, boosting
helps to overcome nonlinearity not handled by your primary model.
for very high dimensional data, methods such as Boltzmann Machines and B

【在 s******e 的大作中提到】
: the misclassification rate of logistic regression is a little bit high, I
: want to see if any other method can give a good result.
: I have not tried k-NN, because of the high dimension

l********s
发帖数: 430
15
you develop some new method relating to lasso ^_^
c*******n
发帖数: 718
16
is it data dependent
I am thinking that you should first draw a scatter plot along the first and
second PC directions and see how the two classes are distributed

【在 s******e 的大作中提到】
: the misclassification rate of logistic regression is a little bit high, I
: want to see if any other method can give a good result.
: I have not tried k-NN, because of the high dimension

A*****s
发帖数: 13748
17
if the first to PCs really count, the reduction will make the classification
effective. Maybe the first PCs count very little...

and

【在 c*******n 的大作中提到】
: is it data dependent
: I am thinking that you should first draw a scatter plot along the first and
: second PC directions and see how the two classes are distributed

g********r
发帖数: 8017
18
How high is the dimension? If really high, like >1000, some reduction/
feature selection must be done. Could be PCA, could be feature selection by
first/second order testing.
If moderately high, 50~1000, might be able to use LASSO-based methods to
simultaneously reduce dimension and fit the model.
If not really high, ~50, could try SVM/boosting/random forest/......directly
. Look at Hastie/Tibshirani book.
以上属道听途说。

【在 s******e 的大作中提到】
: Can anybody recommend some good methods to do classification for high
: dimension data, which might have relatively smaller prediction errors
: compared with logistic regression
: Thanks

1 (共1页)
进入Statistics版参与讨论
相关主题
【大包子】Factor data analysis问大牛们一个logistic model的问题哈
新手请教一个分类问题报面筋求实习合租 (转载)
问个关于lasso的问题machine learning救助 模型在1数据集上表现好 其他烂
An interview questionclassification 问题 求教!!
找工作总结 [下]统计专业找银行工作,需要有哪些金融的知识
有80个候选Predictors,怎么从中选<10个Re: 请推荐nonparametric regression 的入门经典书
model selection一般都用什么方法Ebook-The elements of statistical learning:data mining,inference,and prediction.2nd edition(2009)
data science 面试求教请推荐一本学习Data Mining 的书, 谢谢。
相关话题的讨论汇总
话题: dimension话题: methods话题: high话题: logistic