由买买提看人间百态

topics

全部话题 - 话题: categorical
1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)
c********1
发帖数: 60
1
Hi Wangsl,
你要是先做IM pyg-1,再转radiology pgy-2,你得先看你做的IM pyg-1 的轮转设置是
不是符合radiology的要求。各个IM program 轮转设置不一样,各个radiology
program要求也不一样。估计你得先查好吧。然后再考虑IM 的PD是不是同意给你写推荐
信。
我查了别的论坛关于同时申请同一个学校categorical和prelim的讨论,有的说不好,
给人感觉不focus,有的说没事,如果真的同时收到categorical和prelim的面试,就说
你非常喜欢这个学校,无论给什么offer都接受。个人感觉,对只申请内科的同学来说
问题不大,但对同时报两个专业的人来说,如果他们知道你也报了别的advanced
program,他们会立刻知道你做完pgy-1可能会走,估计即使给你面试,也不会rank你很
高。有的program还着重强调prelim只给同时报两个专业的人准备。感觉如果同时申请
prelim和categorical,被prelim录取的可能性会减小。但是据说prelim比categorical
竞争更激烈,能... 阅读全帖
f*********8
发帖数: 165
2
请教logistic regression的 independent variable是categorical ariable 时,必须
是ordinal categorical 吗?
如果只是一般的categorical ariable,coefficent 的意义是什莫啊(如果有意义的话)
多谢。
在网上看到下面的这个例子。
http://nlp.stanford.edu/~manning/courses/ling289/logistic.pdf
里面的independent ariable 好像就是普通的categorical ariable,
比如说,ariable “cat” 的值是d,m,n,v; 另一个categorical ariable "follows"的值是P,V。 coefficent 的意义该怎末解释啊?
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.31827 0.12221 -10.787 < 2e-16
catd -0.16931 0.10032 -1.688 0.0
m***c
发帖数: 118
3
对不起LS.
我的问题很简单,概括的说,在有了系数(coefficients)后,如何score新data,如果是
numeric,很简单,直接带进行数据里,是categorical var就有些复杂,(1)
一个categorical var里所有level全部significant,那么也很简单,直接带进行数据里
,(2) 但如果categorical variable有部分level(s)不significant,要score一个新
data的时候,这些不significant level该取何值?
e.g.x1是categorical variable with a/b/c/d 4个levels,下面是fit model后的结果:
var estimates p-vlaue
x1 a 0.1 0.010
x1 b 0.7 0.034
x1 c 0.3 0.870
x1 d 0
x2 1.3 0.001
x4 0.08 0.002
....
现在我要SCORE一个新的data:
obs x1 x... 阅读全帖

发帖数: 1
4
“ 至于CATEGORICAL 变量,可以算距离的,不需要处理成DUMMY 变量。然后用最近铃
算法
categorical变量应该不允许单独领出来按照这种最近令算法来吧?整个dataset是有
continouous 和 categorical。
打算用logistic regression, SVM来train model. 想问一下:你在处理mix dataset
的时候,都不需要把categorical变量变成dummy variable吗?
体的
class
E*******9
发帖数: 152
5
来自主题: MedicalCareer版 - 关于preliminary and categorical IM
So, I am thinking of prelim and categorical IM. I know it is not wise to
apply both to the same program. However I am not sure how I should decide
which programs for prelim and which ones for categorical?
Here are my 2 questions:
1. My first instinct is:
community hospitals for prelim and university or university-affiliated
hospitals for categorical?
Any comments on that?
2. Do I need to apply for prelim now or should I wait to apply till I have
info from the advanced programs I'm applying?
Man
h*******n
发帖数: 95
6
来自主题: MedicalCareer版 - 选择prelim还是categorical
在usmleforum上看到这个,更confused。
http://www.utsouthwestern.edu/utsw/cda/dept26481/files/147349.html
"We offer both categorical (3 year) and preliminary (1 year) Training
Programs. I urge graduates of international medical schools who are
interested in our categorical program to also apply to the preliminary
program and to rank both. In the event that an applicant interested in the
categorical program matches with the preliminary program and performs well
during internship, he or she is given special
F**********1
发帖数: 96
7
I am going to focus on the applications to Internal Medicine Categorical,
but will also apply to some PMR programs. But PMR programs are advanced and
require Internal Medicine Preliminary. So can I apply to both categorical
and preliminary in the same program? Will the PD think I am not commitment
to Internal Medicine?
Another question is whether ERAS will charge me as TWO applications if I
check both Preliminary and Categorical?
Thanks a lot in advance!
p******r
发帖数: 1279
8
做regression的时候,如果indep var里面有categorical类型的var,比如
salary=experience+edu+error 里edu是categorical变量,值为1 2 3 ,1代表高中
,2代表大学,3代表graduate school。
那我把它当成数字1 2 3然后直接做regression,得出一个beta值
和我把它变成几个dummy var来做one way anova得出几个fix effect的coefficient
这两种做法,在本质上有啥区别呢? 感觉除了手法上有区别,其他比如predict或者衡
量edu对salary的effect来看,没啥大区别啊?
还有在SAS里coding的话,如果edu的变量类型一开始就定义为categorical的话,那用
proc glm是不是就不需要事先create dummy varible?
请赐教!!
s**********l
发帖数: 395
9
Sorry, I should mention Genmod rather than GLM in SAS.
I think in Generalized liner model, if the dependent variable follows a
gamma distribution, the predictors can not be categorical variables, can
they?
There are totally 120 categorical variables and each of them have 20 levels.
Therefore, I tried to convert these categorical variables into continuous
variables so that I can build the model; however, I did not know how to do
so.
Who knows? Thanks.
r********n
发帖数: 6979
10
用ANCOVA可能是可以
不过如果我有好几个categorical variable
每个有很多的level
有的level可能有数据不足的问题。。。
第二个问题只是举个例子
同样的
如果我有大量的categroical variables
每个有很多level
每个categorical variables都变成了K-1个dummy variables
有没有overfitting的问题?
对于ordinal categorical variables又应该怎么处理呢?变成0,1,2,3,4.。。?

1
I*****a
发帖数: 5425
11
hi guys, i have a question about clustering analysis with both numerical
variables and categorical(nominal) variables. I am not very familiar with
clustering analysis. Any feedback will be appreciated.Can only type chinese
using phone, which is too much pain... sorry.
1) What are the standard ways to deal with categorical variables ? Do we
simply transform them to a lot of dummy variables ? In my particular problem
, I have a pretty large dataset, where some variables may have hundred
thousands ... 阅读全帖
E**********e
发帖数: 1736
12
说到categorical variable, 一般已经是numerical categorical了(1,2,3,。。。
)。 textbook 好像很少讲到mixture的continuous 和 categorical variables。 事
实是很多时候modeling的时候, 都是mixture。 这种情况用pca来降维或者找出
significant的variabels是不是就有问题。
当然也许可以试correspendse analysis, 就是把continous variable group, 然后
用contigency table来找出关联。但是也很少说用了选significant variabels。
问这个问题是因为面试是碰到这个问题。 所以pca来选variables的话,好像不是那么
可信。 lasso也许是个更好方法。

发帖数: 1
13
是machine learning的课程项目。
我们的数据是unbalanced,大部分是continuous, 只有少数几列是categorical的,因为
我们要用到logistic regression, svm, 所以把categorical都转化为dummy variable,
结果在用SMOTE的时候这些dummy cariable都变成0到1之间的数据。
现在两个问题:
1)原本的class组成是10700 class-1, 1450 class-0;用完SMOTE, 数据变成 5500
class-1, 4100 calss-0; 这些categorical的数据不能单独拿出来呀? 本身SMOTE用的
就是点到点之间的距离来cluster这些点的;
2)现在想死马当活马来用,想把 《0.5 的归于0;》=0.5的归为1。

了很
t*******m
发帖数: 1893
14
https://www.bostonglobe.com/metro/2018/02/06/proposed-bill-categorizing-
asian-americans-stirs-debate/PVBK59Sw9crFFPbkVlwgfO/story.html
A State House effort to categorize Asian-Americans into specific ethnic
groups is clashing with a vocal and well-organized opposition that has
likened the effort to racial profiling.
A bill by state Representative Tackey Chan urges “all state agencies, quasi
-state agencies, entities created by state statute, and sub-divisions of
state agencies” to identify Asia... 阅读全帖
J****i
发帖数: 470
15
双方根本没有信任。各说各话,说再多对方也是认为放屁。
[在 taprogram (我不烦,我不怕麻烦) 的大作中提到:]
:https://www.bostonglobe.com/metro/2018/02/06/proposed-bill-categorizing-
:asian-americans-stirs-debate/PVBK59Sw9crFFPbkVlwgfO/story.html
:A State House effort to categorize Asian-Americans into specific ethnic
:groups is clashing with a vocal and well-organized opposition that has
:likened the effort to racial profiling.
:A bill by state Representative Tackey Chan urges “all state agencies,
quasi-state agencies, entities created by s... 阅读全帖
g*******u
发帖数: 3948
16
来自主题: Programming版 - encode high cardinality categorical features
打算 lightgbm or xgboost
有几个 categorical features 有5000个不同的值。 这种怎么encode ?
谢谢
另外一般说high cardinality categorical features 多少个算是high?
thx
M******g
发帖数: 152
17
关键字: protein categorization (gene ontology (GO) analysis
发信站: BBS 未名空间站 (Wed Apr 6 13:49:28 2016, 美东)
Can anybody recommend a good software or server to do protein categorization
(gene ontology (GO) analysis)?
Thank you.
h****g
发帖数: 125
18
顶一顶,同问。
还有的地方就写categorical。没有AP/CP这项。
有的地方写的是pathology (categorical)。
这些都是什么区别呢?
h****a
发帖数: 234
19
来自主题: MedicalCareer版 - anesthesia categorical vs advanced, 问题.
Thank you, Eric,
Just to make sure, a categorical program means I don't need to worry about
finding a preliminary PGY-1 myself, right?
Does an interview for categorical anesthesia require meeting with faculties
from preliminary specialty (like IM/Surg)?

have to rotate through other departments for the broad training.
e*****a
发帖数: 1334
20
来自主题: MedicalCareer版 - anesthesia categorical vs advanced, 问题.

finding a preliminary PGY-1 myself, right?
- Yes
faculties from preliminary specialty (like IM/Surg)?
- It depends on the categorical program's setup. If the anesthesiology
department handles the PGY-1 year, it most likely no need to meet the prelim
faculties from IM/Surgery. If another department handles the prelim
training for the categorical program, that department could initiate an
interview.
x*******9
发帖数: 200
21
来自主题: MedicalCareer版 - categorical and preliminary?
Recently, when I look for prorgram. I found that some programs say
categorical but no preliminary in my erase, but I take a look at AMA, their
general information say offer preliminary.I am confused about it. I want to
chose categorical and preliminary in my erase. What should I do? Thank you
guys.
c******e
发帖数: 76
22
Since I need to apply for preliminary IM for advanced program, but that is
competitive so I will apply categorical IM for backup. Can I check both box
of preliminary and categorical IM in a program, but I can only use one ps in
that case? or apply separately by choosing different ps and pay money twice
to the same program?
Anyone in same situation or has experience or idea? Thanks a lot
a********a
发帖数: 57
23
来自主题: MedicalCareer版 - 选择prelim还是categorical
IMHO, the safest way for you to go is to apply only to all the Categorical
Neurology programs and all the categorical IM programs that you meet the
minimum requirements. It will be too late to apply for Preliminary IM
programs when you get Neuro interviews.
e*****a
发帖数: 1334
24
来自主题: MedicalCareer版 - 选择prelim还是categorical
This makes sense.
Since residency selection is processed by human, there are always general
rules and exceptions. Generalization based on a few exceptions can be
misleading.
It's a good idea to check each program's requirements and suggestions. If a
PD believes that it's OK to apply for both categorical and preliminary IM by
the same candidate, go for it. But this is not a general case. Several PDs
told us many programs don't like that. Programs may think the candidate uses
categorical IM as a b
E*******9
发帖数: 152
25
来自主题: MedicalCareer版 - 选择prelim还是categorical
如果向同一个program申请both prelim and categorical IM,如果PS用同一个(IM)
,可以吗?
I just wonder whether by doing so, the IM program will NOT think we use
categorical IM as backup?
但是不知道这样合理吗?
欢迎砸砖!
a********a
发帖数: 57
26
来自主题: MedicalCareer版 - 选择prelim还是categorical
Can we apply for both prelim and categorical, then
indicate our PGY-2 Specialty interest(s) as IM?
-This does not make any sense, it is much much tougher to get preliminary
year IM interviews compared to categorical IM at the same institution. Your
chances of getting an interview is not going to be increased by applying to
preliminary year.
c*********r
发帖数: 541
27
来自主题: MedicalCareer版 - 选择prelim还是categorical
你误解我的意思了。
我是说 community IM categorical 通常来讲,是相对容易申请的。如果你都拿来申请
prelim,那你用什么保底?
别忘了,即使是community IM prelim, 大部分也都是amg. 可是community IM
categorical,基本上是没有amg
l*****t
发帖数: 190
28
觉得申请的categorical IM 太少了,想要把几个从prelim IM 改为 Categorical IM。

知道行不行。 看到好多program都已经下载了,要是改来改去的话,这些人是不是能知
道我该过
呢,这样的话岂不是糟糕?还是他们只看最后的更新的申请,要是这样的话就好了?谢
谢了!
e*****a
发帖数: 1334
29

but will also apply to some PMR programs. But PMR programs are advanced and
require Internal Medicine Preliminary. So can I apply to both categorical
and preliminary in the same program? Will the PD think I am not commitment
to Internal Medicine?
- It's somehow hard to convince the PDs.
check both Preliminary and Categorical?
- It charges once if both are under the same program.
w**s
发帖数: 367
30
前面Eric帖子说你要能说服PD你的Preliminary interest和Categorical一致就行,那
是不是在PGY2 specialty interest里还添和Categorical一致(比如IM),这样就不会
有不好影响,还可以增加机会?
B*****5
发帖数: 39
31
FIND the answer:
A "categorical" position is one which offers full residency training
required for board certification in that specialty. A "preliminary" position
, in contrast, is a position offering only 1-2 years of training generally
prior to entry into advanced specialty programs. Many internal medicine and
surgery training programs offer preliminary positions in addition to
categorical positions. Transitional year programs are also considered
preliminary year training programs.

preliminer
i**********a
发帖数: 98
32
prelim is prepare the residents for other advanced program such as Neurology
, radiology. So once you apply both categorical and prelim, that means you
are not determined to have 3 yr categorical training.Make sense?
h**********1
发帖数: 123
33
我面试的医院有advanced和categorical两种。 top choice 是一个advanced program,
但我prelim的面试很少。 想请教一下,应该怎么样rank才能保证advanced 和 prelim
要么同时match, 要么move到排在后面的categorical program, 而不出现matched
with advanced only的局面?多谢了。
c********1
发帖数: 60
34
今年第一次申请,准备同时申请神经内和内科,以神经内为主。有个问题想请教大家。
在申请prelim的时候,是只申请prelim,同时指明第二年想去神经内好呢,还是同时申
请prelim和categorical的内科以增大match几率?只申请prelim和同时申请prelim和
categorical 内科花的钱应该是一样的。不知大家有什么意见?提前谢谢了!
o******e
发帖数: 1001
35
来自主题: Statistics版 - linear regression 中的categorical data
在linear regression 中用categorical data, 发现有些categorical data的levels是
significant, 而有些是不significant.在这种情况下,是不是只能把不significant归
到一类,然后重新作linear model?谢谢!
b*****o
发帖数: 482
36
如果你的categorical variable是1,2,3...那就是assume ordinal的. 如果不是
ordinal的话你应该把categorical variable换成dummy variables: e.g, x1=0 vs 1,
x2=0 vs 1....
s**********l
发帖数: 395
37
Now for each categorical variable, it has 20 levels 1-20.
In order to use GLM, I need to convert more than 100 categorical variables
into continuous variables.
Who knows how to do this in SAS? Thanks.
a**j
发帖数: 60
38
求Solution manual for "Categorical Data Analysis" (Alan Agresti版) ,或是做
过的习题
求Solution manual for "Nonparametric statistical inference" (Gibbons版)
包子paypal, textbook pdf 交换都可以
我有SAS prep guide (base & adv)
Linear model (rencher版)
Applied Linear statistical model (Hill 版)+solution manual
Categorical Data Analysis (Alan Agresti) 2nd and 3rd editions
Statistical Inference (Cassella Berger 版)+solution manual
等等
h**t
发帖数: 1678
39
来自主题: Statistics版 - Clustering algorithm for categorical data
Does anyone know what algorithm for clustering categorical variables? R
packages? Which is the best?
If a data has both numeric and categorical data, what is the best algorithm
to use and R package?
Thank you!
k*******a
发帖数: 772
40
你是说 pair-wise categorical vs. categorical 的association? 那就 用chi-
square test就可以啊
m**********4
发帖数: 774
41
很喜欢他那本Intro to categorical data analysis,算是我读过的最好的统计书之一
。可惜的是这书大多都是intuition,没有什么证明,听说他那个categorical data
analysis里讲的比较深,请问是吗?两本书区别大吗?挺喜欢他的书和他那篇著名的
paper的,不知道还有他的什么书推荐没有?非常感谢!
E**********e
发帖数: 1736
42
【 以下文字转载自 Statistics 讨论区 】
发信人: ExpressoLove (MoneyForNothing), 信区: Statistics
标 题: PCA 可以用在mixture of continuous 和categorical variables
发信站: BBS 未名空间站 (Sun May 17 18:03:19 2015, 美东)
据我了解, pca用在continous variables 比较合理。categorical one怎计算
covariance matrix啊?
s*********a
发帖数: 2623
43
categorical 是covariance 咯?是不是应该用anova?如果你又有continuous 也有
categorical的话在SAS下用GLM咯?
你PCA是用在MATLAB里的?
我也不是很懂。不对的话大家多多打击
E**********e
发帖数: 1736
44
这个我做过。你需要知道怎么计算category 变量的距离。好像有篇文章,就是原作者
的一篇介绍SMOTE文章谈到这个怎么处理categorical变量。
但是这个SMOTE对于实际问题也许效果不好。学校里做作也许能灌两篇文章。

:SMOTE只能处理continuous data, 对于categorical data 要用SMOTE-NC,google了很
:多就是没有找到相关可行的code。希望好心人能share 一下。
E**********e
发帖数: 1736
45
SMOTE 不复杂啊。你读原作者的文章,你可以CODE 出来的啊。当然原作者的终极版有
优化。
我同时用R 和 PYTHON code 了。但是最终实际效果在risk model 上表现不好。实际上
我也不看好这个方法。
至于CATEGORICAL 变量,可以算距离的,不需要处理成DUMMY 变量。然后用最近铃算法
,取majority vote.

:在没有把categorical data 变成 dummy variable的时候,用WEKA 能够生成出具体的
:哪一个州。但是WEKA,R studio 产生的结果不同。weka只是double minority class
, 保持majority class. 但是R studio 能够产生基本45%,55%的两个classes. 不知道
哪个是对的。
E**********e
发帖数: 1736
46
你要了解smote,你必须去看原作者的文章。 原作者用 value distance metric 来算
noncontinuous variable的距离, 然后跟continuouvariable 以起算距离。 这个距离
是用来选出对应某个样品的最近的几个邻居, 然后算出fake的那个样品,么就是每个
变量都有一个新的值, 然后在用majority vote 来制定这个faked的样品是1还是0.
不需要非得把categorical variable 处理成 dummy varaible。 比如50个州, 你用49
个dummy variable? 不麻烦。 你可以group 一些。 然后用log of odds order them
, 如果可以的话。 要是不能order ,但是还是要放进去, 那这能dummy了。 但是如
果不能order 新的group, 那么这个variable 也就不重要, 或者没有预测力。
里边谈到怎么算categorical 变量的距离, 酵素

dataset
l*i
发帖数: 50
47
A State House effort to categorize Asian-Americans into specific ethnic
groups is clashing with a vocal and well-organized opposition that has
likened the effort to racial profiling.
A bill by state Representative Tackey Chan urges “all state agencies, quasi
-state agencies, entities created by state statute, and sub-divisions of
state agencies” to identify Asian-Americans and Pacific Islanders, as
defined in the US census, in all data they collect and report.
But critics say Chan’s efforts woul... 阅读全帖

发帖数: 1
48
Say I have a relative big dataset which has a categorical variable with many
possible values/levels, for example, country.
If I do one-hot encoding as suggested by scikit learn, I get the error of "
out of memory". But when I load the data into R and treat the variable as a
normal factor and call some R machine learning library or H2o, everything
works fine, at least no error message and the results are acceptable. So I'm
wondering how does R or H2o treat it differently and what's the correct w... 阅读全帖
m******r
发帖数: 1033
49
其实还是老问题,如何处理categorical variable, 以前以为我学明白了,最近又有些
迷惑(尤其最近学R),所以上来问问。
比如美国54个州,某产品在个州均有销售。那么建模的时候,应该处理state这个变量
? 最简单的办法当然是根据经验(或者用WOE(weight_of_evidence))把一些州合在一起
。 比如纽约新泽西,弗吉尼亚DC, 或者中部几个州,密苏里,iowa, arkansas, 不过
这种方法完全依靠经验,不科学。
one_hot_encoding(就是dummy variable)我看也不科学。 54个州,不管你用one_hot
_encoding生成54个变量, 还是用哑变量生成53个变量, 软件计算的时候, 选变量还
是从54个州里面选一个州。 一个粗糙的办法,我看应该是试验所有可能分组. 比如:
54个州选一个州
54个州选两个州
...
54个州选27个州
这样一来,共有51+1275+20825...+2.9592E+14 = 1.60345E15种组合。
当然, 这是个天文数字。
一个折中方法,就是凭经验,把54个州合并成10个大州,... 阅读全帖

发帖数: 1
50
来自主题: Programming版 - encode high cardinality categorical features
binary encoding是一个值得一试的办法。类似的还有hashing trick。
除此之外,google "supervised ratio" 和 "weight of evidence",把categorical变
成numerical。
还有就是看level distribution,如果是几个major level和一大堆minor level,
minor level数量小过某个阈值,比如总feature数乘十这种,也可以考虑合并minor
level。
1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)