由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - [Data Science Project Case] Fuzzy matching on names (转载)
相关主题
关于MATCH HOSPITAL NAME[合集] k-mean clustering
面试问题紧急求助!请教一个R:K-means的问题
need help on bias correction请教一个频率优化问题(相关性?)
请教关于SAS fuzzy match merge的问题Reject Inference question in Credit Scoring
诚心请教:这样的背景适合什么样的工作?这种情况应该用什么hypothesis test。
请帮忙看看这份简历怎么样,怎么进一步修改。多谢!用什么可以画这个clustering 图? R?
help:eRROR MESSAGE INVALID sas NAMEAR(1) and clustering by firms
[合集] 公司的第一轮面试一般都问什么?Clustered Data能用GEE或Mixed Model吗?
相关话题的讨论汇总
话题: names话题: data话题: am话题: fuzzy话题: science
进入Statistics版参与讨论
1 (共1页)
c***z
发帖数: 6348
1
【 以下文字转载自 DataSciences 讨论区 】
发信人: chaoz (面朝大海,吃碗凉皮), 信区: DataSciences
标 题: [Data Science Project Case] Fuzzy matching on names
发信站: BBS 未名空间站 (Fri Apr 4 13:04:18 2014, 美东)
We have two data sets, one for product views and one for actual
purchases. We don't have all the shopping cart information and need to
infer the missing ones.
To make a training case we need to join the two sets, and the cart id
and item names are the only available keys. The problem is the items
can have many names in both sets, e.g. Dell 17" XPS and Dell XPS
Laptop 17 inch mean the same item.
I am thinking of two ways: tf-idf to identify the first three words of
item names; or clustering using edit distance.
This would be the first time I am doing a text analysis project, so I
am wondering if I need a lot of data, instead of just a smaller
sample, as well as what would be the best approach and tools. I am
familiar with R, Matlab, Pig and some Scala, and am willing to learn
other languages as well.
Thanks a lot!
1 (共1页)
进入Statistics版参与讨论
相关主题
Clustered Data能用GEE或Mixed Model吗?诚心请教:这样的背景适合什么样的工作?
请问哪里有PCA的SAS code 啊请帮忙看看这份简历怎么样,怎么进一步修改。多谢!
在线等,请教一个SAS关于cluster命令的输出结果问题help:eRROR MESSAGE INVALID sas NAME
very simple question about Cluster data[合集] 公司的第一轮面试一般都问什么?
关于MATCH HOSPITAL NAME[合集] k-mean clustering
面试问题紧急求助!请教一个R:K-means的问题
need help on bias correction请教一个频率优化问题(相关性?)
请教关于SAS fuzzy match merge的问题Reject Inference question in Credit Scoring
相关话题的讨论汇总
话题: names话题: data话题: am话题: fuzzy话题: science