big data analysis in Revolution R - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - big data analysis in Revolution R

相关主题
● R语言能否对大数据库运行中去重复？	● R glmnet 大数据
● 转发一个RECRUITER给的工作机会	● R 有点令人失望
● 求教：没有CS背景，可以学Data Mining吗？	● 有人用SAS connector 在Hadoop下做分析？
● Size of R object for models is so big	● merge单个文件800GB的文件
● 分享：从SAS 到 Python 与 R	● 现在SAS就业市场很火吗？
● 是不是好多人都不喜欢编程？	● 保险的modeler好不好？
● When will I finally know SAS	● 关于SAS,SPSS,R,Python
● 大数据该怎么处理？	● 银行十年内是不会用python和R的。。。

相关话题的讨论汇总
话题: revolution话题: data话题: mahout话题: hadoop话题: sas

进入Statistics版参与讨论

1

(共1页)

x**g 发帖数: 807	1 Did anybody used Revolution R? If so, do you think it is a solution for overcoming the limited memory problem in R? Thank you.
d*l 发帖数: 400	2 Well, I listened to their talk about RHadoop at a conference last year, and then tried to use RHadoop https://github.com/RevolutionAnalytics/RHadoop/ wiki) a couple of months ago. My impression is that this thing is in a very early stage of development, not quite useful at the moment. It is not transparent to programmer, and I guess it would take many people many years to port a good sized pool of R packages to Map/Reduce, it is not there yet. Basically, it is just an abstract layer on top of Hadoop streaming, but then I find that using the plain Hadoop streaming with R is actually easier and more straightforward and more flexible. I am interested in knowing others' opinions. 【在 x**g 的大作中提到】 : Did anybody used Revolution R? If so, do you think it is a solution for : overcoming the limited memory problem in R? : Thank you.
s*********e 发帖数: 1051	3 同意，他们迟早会有版权官司。 and 【在 d*l 的大作中提到】 : Well, I listened to their talk about RHadoop at a conference last year, and : then tried to use RHadoop https://github.com/RevolutionAnalytics/RHadoop/ : wiki) a couple of months ago. : My impression is that this thing is in a very early stage of development, : not quite useful at the moment. It is not transparent to programmer, and I : guess it would take many people many years to port a good sized pool of R : packages to Map/Reduce, it is not there yet. Basically, it is just an : abstract layer on top of Hadoop streaming, but then I find that using the : plain Hadoop streaming with R is actually easier and more straightforward : and more flexible.
S******y 发帖数: 1123	4 Interesting topic :-) Many people think that there would be such a thing coming that user could simply plug in R or SAS and make all existing functions/packages/procedures to run on Hadoop-scaled data and "solve" the ultimate data size problem. Unfortunately, there is no such thing. To achieve that, somebody has to virtually rewrite every R package or every SAS/STAT procedure since most of their underlying code/algorithms are simply not map-reduce compatible. That is industry-scaled development work. What Revolution R has achieved is a small piece of aforementioned endeavor - they rewrote a few R packages in drastic different implementation from free R. The goal is to avoid loading everything into memory upfront, but process data in chunks, while resembling current R user interface as much as possible. I have used SAS and R for many years in financial/pharmaceutical/insurance industries. But recent years witnessed explosive growth of data. In one of my recent projects for an e-commerce company, we are having 10GB data coming in every day. Hadoop has become the de facto platform for us. I have been using RevoR since 2009 and Mahout since 2011. I like the ease and simplicity of RevoR. But I also believe that Mahout is very promising (maybe the best shot so far) for solving analytical side of the big data problem. Just my 2 cents. Happy Holiday to everyone!
n*****3 发帖数: 1584	5 nice, thanks for sharing with us. May I ask what if you want some other algorithms which are NOt part of mahout? write the algorithm from scratch? will that be easy in the mahout environment? procedures of - free 【在 S******y 的大作中提到】 : Interesting topic :-) : Many people think that there would be such a thing coming that user could : simply plug in R or SAS and make all existing functions/packages/procedures : to run on Hadoop-scaled data and "solve" the ultimate data size problem. : Unfortunately, there is no such thing. To achieve that, somebody has to : virtually rewrite every R package or every SAS/STAT procedure since most of : their underlying code/algorithms are simply not map-reduce compatible. : That is industry-scaled development work. : What Revolution R has achieved is a small piece of aforementioned endeavor - : they rewrote a few R packages in drastic different implementation from free
S******y 发帖数: 1123	6 It probably requires quite a bit of work. If you can come up with something like that, you probably can contribute to Mahout, and also publish your work in academic/industry journal(s) :-) easy 【在 n*****3 的大作中提到】 : nice, thanks for sharing with us. : May I ask what if you want some other algorithms which : are NOt part of mahout? write the algorithm from scratch? will that be easy : in the mahout environment? : : procedures : of : - : free

1

(共1页)

进入Statistics版参与讨论

相关主题
● 银行十年内是不会用python和R的。。。	● 分享：从SAS 到 Python 与 R
● 有人参加过SIAM的会议吗	● 是不是好多人都不喜欢编程？
● 诚心求教，这样条件合适找什么样的工作？	● When will I finally know SAS
● 怎么由 SAS programmer 转成 data scientist (转载)	● 大数据该怎么处理？
● R语言能否对大数据库运行中去重复？	● R glmnet 大数据
● 转发一个RECRUITER给的工作机会	● R 有点令人失望
● 求教：没有CS背景，可以学Data Mining吗？	● 有人用SAS connector 在Hadoop下做分析？
● Size of R object for models is so big	● merge单个文件800GB的文件

相关话题的讨论汇总
话题: revolution话题: data话题: mahout话题: hadoop话题: sas

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)