s******e 发帖数: 841 | 1 Can anybody recommend some good methods to do classification for high
dimension data, which might have relatively smaller prediction errors
compared with logistic regression
Thanks |
A*****s 发帖数: 13748 | 2 try PCA first to reduce dimension
【在 s******e 的大作中提到】 : Can anybody recommend some good methods to do classification for high : dimension data, which might have relatively smaller prediction errors : compared with logistic regression : Thanks
|
h***i 发帖数: 3844 | 3 推荐一点有创意的,
这个太classic了.
【在 A*****s 的大作中提到】 : try PCA first to reduce dimension
|
s******e 发帖数: 841 | 4 actually, I tried, it does not help at all
【在 A*****s 的大作中提到】 : try PCA first to reduce dimension
|
A*****s 发帖数: 13748 | 5 then Nerual Network...
【在 s******e 的大作中提到】 : actually, I tried, it does not help at all
|
A*****s 发帖数: 13748 | 6 do you want 创意 or a solution?
【在 h***i 的大作中提到】 : 推荐一点有创意的, : 这个太classic了.
|
c**********2 发帖数: 144 | |
c*******n 发帖数: 718 | 8 Distance Weighted Discrimination
http://scholar.google.com/scholar?rlz=1C1GGLS_enUS291US304&sourceid=chrome&q=distance%20weighted%20discrimination&um=1&ie=UTF-8&sa=N&tab=ws
Software: http://www.unc.edu/~marron/marron_software.html
btw, the old fashion way is SVM.
【在 s******e 的大作中提到】 : Can anybody recommend some good methods to do classification for high : dimension data, which might have relatively smaller prediction errors : compared with logistic regression : Thanks
|
c*******n 发帖数: 718 | 9 PC analysis had nothing to do with classification.
Actually I can give an example where it screws up after PCA removes the more
importance dimensions (even though with smaller variation)
【在 s******e 的大作中提到】 : actually, I tried, it does not help at all
|
o****o 发帖数: 8077 | 10 you are right
try some method based on canonical correlation measurements, such as PLS
for very high dimension classification, try some algorithm from Text Mining
literature
but I am not aware of any method that can beat logistic regression in erro
rate universally, it is really case dependent
more
【在 c*******n 的大作中提到】 : PC analysis had nothing to do with classification. : Actually I can give an example where it screws up after PCA removes the more : importance dimensions (even though with smaller variation)
|
|
|
s******e 发帖数: 841 | 11 I also tried this one, it could give a better fit to the data, but the
prediction property is much worse.
【在 A*****s 的大作中提到】 : then Nerual Network...
|
o****o 发帖数: 8077 | 12 why logistic regression is not allowed?
have u tried k-NN?
【在 s******e 的大作中提到】 : I also tried this one, it could give a better fit to the data, but the : prediction property is much worse.
|
s******e 发帖数: 841 | 13 the misclassification rate of logistic regression is a little bit high, I
want to see if any other method can give a good result.
I have not tried k-NN, because of the high dimension
【在 o****o 的大作中提到】 : why logistic regression is not allowed? : have u tried k-NN?
|
o****o 发帖数: 8077 | 14 a well-built logistic regression can have error rate as low as any other
popular methods, if not lower. I once compared Neural Netowrk vs. logistic
regression on different scenarios, as well built logistic regression standed
out on error rate both in training sample and validation sample
try some boosting methods, too. If your data is not very noisy, boosting
helps to overcome nonlinearity not handled by your primary model.
for very high dimensional data, methods such as Boltzmann Machines and B
【在 s******e 的大作中提到】 : the misclassification rate of logistic regression is a little bit high, I : want to see if any other method can give a good result. : I have not tried k-NN, because of the high dimension
|
l********s 发帖数: 430 | 15 you develop some new method relating to lasso ^_^ |
c*******n 发帖数: 718 | 16 is it data dependent
I am thinking that you should first draw a scatter plot along the first and
second PC directions and see how the two classes are distributed
【在 s******e 的大作中提到】 : the misclassification rate of logistic regression is a little bit high, I : want to see if any other method can give a good result. : I have not tried k-NN, because of the high dimension
|
A*****s 发帖数: 13748 | 17 if the first to PCs really count, the reduction will make the classification
effective. Maybe the first PCs count very little...
and
【在 c*******n 的大作中提到】 : is it data dependent : I am thinking that you should first draw a scatter plot along the first and : second PC directions and see how the two classes are distributed
|
g********r 发帖数: 8017 | 18 How high is the dimension? If really high, like >1000, some reduction/
feature selection must be done. Could be PCA, could be feature selection by
first/second order testing.
If moderately high, 50~1000, might be able to use LASSO-based methods to
simultaneously reduce dimension and fit the model.
If not really high, ~50, could try SVM/boosting/random forest/......directly
. Look at Hastie/Tibshirani book.
以上属道听途说。
【在 s******e 的大作中提到】 : Can anybody recommend some good methods to do classification for high : dimension data, which might have relatively smaller prediction errors : compared with logistic regression : Thanks
|