由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
DataSciences版 - model selection problem
相关主题
报面筋求实习合租 (转载)请问哪些算法是可以用python写的,然后输入PMML
借版面问个machine learning的问题请教预测算法
问题:用VIF做feature selectionkaggle上这个restaurant-revenue-prediction的题目有人考虑过么?
feature selection的方法求教please explain the difference between Logistic Regression and Linear Kernel SVMs
f.t."我不会编程"machine learning 课程项目的presentation: 大疑问-???
刚入行新人的两个问题Re: 攒人品,发Google Statistician/Data Scientist电面面经
问问面试如何出题评价一个变量可预测性问题 (转载)
为什么要知道DETAILS OF A MACHINE LEARNING ALGORITHM如何用python读取大数据
相关话题的讨论汇总
话题: logistic话题: response话题: regression话题: features话题: tests
进入DataSciences版参与讨论
1 (共1页)
c********1
发帖数: 60
1
刚拿到的一个Project:11 features,200 observations.The response variable (
ordinal and categorical) takes on only three possible values. The goal is to
learn if there are some common characteristics which help predict the
classification of the response variable.
I apply filter-based feature selection first: I run pairwise statistic tests
for all combinations of response and predictors. I select those significant
features, run VIF tests to get rid of multicollinearity, and fit the data
to an ordered logistic regression model (with significant features).
Unfortunately it turns out almost all features are insignificant (p_value>0.
05). Hence the ordered logistic regression might not be a good choice.
因为最终目的是要找出对response有显著影响的feature并且最好能够给出magnitude
of impact,所以random forest和SVM都不适合。我正在考虑用decision tree。不知道
版上各位大牛有没有更好的建议?
E**********e
发帖数: 1736
2
3个response level,可以用multinomial logistic regresion, following standard
model variable selections。

to
tests
significant
0.

【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.

E*********g
发帖数: 185
3
randomforest为啥不行?
randomforest -> important features
输出每个feature的影响概率,而不是category

to
tests
significant
0.

【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.

c********1
发帖数: 60
4
Feature importance我也考虑过。因为project的client是基本不懂统计和ML的,
feature importance的output很难向他们解释清楚:只能笼统地说哪几个feature重要
,有多重要还真的很难解释。不像Linear regression可以用one unit change in
independent variables lead to how much change in the dependent variables, 直
观易懂。
而且我用的是R,愣是没看懂help document里对importance的output的解释,但能肯定
不是影响概率。

【在 E*********g 的大作中提到】
: randomforest为啥不行?
: randomforest -> important features
: 输出每个feature的影响概率,而不是category
:
: to
: tests
: significant
: 0.

m*******u
发帖数: 13
5
lasso ?
c********1
发帖数: 60
6
Good idea! Thanks!

【在 m*******u 的大作中提到】
: lasso ?
E**********e
发帖数: 1736
7
lasso 是对应于linear regression把。 你的response variables 是categorical
ones。 恐怕不行啊吧

【在 c********1 的大作中提到】
: Good idea! Thanks!
e********9
发帖数: 444
8
Lasso可以用到Logistic regression上。。。
E**********e
发帖数: 1736
9
oh,yes. just learned regularized logistic regression.

:Lasso可以用到Logistic regression上。。。
h*********d
发帖数: 109
10

to
tests
significant
0.

【在 c********1 的大作中提到】
: 刚拿到的一个Project:11 features,200 observations.The response variable (
: ordinal and categorical) takes on only three possible values. The goal is to
: learn if there are some common characteristics which help predict the
: classification of the response variable.
: I apply filter-based feature selection first: I run pairwise statistic tests
: for all combinations of response and predictors. I select those significant
: features, run VIF tests to get rid of multicollinearity, and fit the data
: to an ordered logistic regression model (with significant features).
: Unfortunately it turns out almost all features are insignificant (p_value>0.
: 05). Hence the ordered logistic regression might not be a good choice.

1 (共1页)
进入DataSciences版参与讨论
相关主题
如何用python读取大数据f.t."我不会编程"
请问想找data scientist 工作应该怎么开始准备刚入行新人的两个问题
lending club的notes 数据问问面试如何出题
retail bussiness预测客户的流失概率为什么要知道DETAILS OF A MACHINE LEARNING ALGORITHM
报面筋求实习合租 (转载)请问哪些算法是可以用python写的,然后输入PMML
借版面问个machine learning的问题请教预测算法
问题:用VIF做feature selectionkaggle上这个restaurant-revenue-prediction的题目有人考虑过么?
feature selection的方法求教please explain the difference between Logistic Regression and Linear Kernel SVMs
相关话题的讨论汇总
话题: logistic话题: response话题: regression话题: features话题: tests