由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - big data analysis in Revolution R
相关主题
R语言能否对大数据库运行中去重复?R glmnet 大数据
转发一个RECRUITER给的工作机会R 有点令人失望
求教:没有CS背景,可以学Data Mining吗?有人用SAS connector 在Hadoop下做分析?
Size of R object for models is so bigmerge单个文件800GB的文件
分享: 从SAS 到 Python 与 R现在SAS就业市场很火吗?
是不是好多人都不喜欢编程?保险的modeler好不好?
When will I finally know SAS关于SAS,SPSS,R,Python
大数据该怎么处理?银行十年内是不会用python和R的。。。
相关话题的讨论汇总
话题: revolution话题: data话题: mahout话题: hadoop话题: sas
进入Statistics版参与讨论
1 (共1页)
x**g
发帖数: 807
1
Did anybody used Revolution R? If so, do you think it is a solution for
overcoming the limited memory problem in R?
Thank you.
d*l
发帖数: 400
2
Well, I listened to their talk about RHadoop at a conference last year, and
then tried to use RHadoop https://github.com/RevolutionAnalytics/RHadoop/
wiki) a couple of months ago.
My impression is that this thing is in a very early stage of development,
not quite useful at the moment. It is not transparent to programmer, and I
guess it would take many people many years to port a good sized pool of R
packages to Map/Reduce, it is not there yet. Basically, it is just an
abstract layer on top of Hadoop streaming, but then I find that using the
plain Hadoop streaming with R is actually easier and more straightforward
and more flexible.
I am interested in knowing others' opinions.

【在 x**g 的大作中提到】
: Did anybody used Revolution R? If so, do you think it is a solution for
: overcoming the limited memory problem in R?
: Thank you.

s*********e
发帖数: 1051
3
同意,他们迟早会有版权官司。

and

【在 d*l 的大作中提到】
: Well, I listened to their talk about RHadoop at a conference last year, and
: then tried to use RHadoop https://github.com/RevolutionAnalytics/RHadoop/
: wiki) a couple of months ago.
: My impression is that this thing is in a very early stage of development,
: not quite useful at the moment. It is not transparent to programmer, and I
: guess it would take many people many years to port a good sized pool of R
: packages to Map/Reduce, it is not there yet. Basically, it is just an
: abstract layer on top of Hadoop streaming, but then I find that using the
: plain Hadoop streaming with R is actually easier and more straightforward
: and more flexible.

S******y
发帖数: 1123
4
Interesting topic :-)
Many people think that there would be such a thing coming that user could
simply plug in R or SAS and make all existing functions/packages/procedures
to run on Hadoop-scaled data and "solve" the ultimate data size problem.
Unfortunately, there is no such thing. To achieve that, somebody has to
virtually rewrite every R package or every SAS/STAT procedure since most of
their underlying code/algorithms are simply not map-reduce compatible.
That is industry-scaled development work.
What Revolution R has achieved is a small piece of aforementioned endeavor -
they rewrote a few R packages in drastic different implementation from free
R. The goal is to avoid loading everything into memory upfront, but process
data in chunks, while resembling current R user interface as much as
possible.
I have used SAS and R for many years in financial/pharmaceutical/insurance
industries. But recent years witnessed explosive growth of data. In one of
my recent projects for an e-commerce company, we are having 10GB data coming
in every day. Hadoop has become the de facto platform for us. I have been
using RevoR since 2009 and Mahout since 2011. I like the ease and simplicity
of RevoR. But I also believe that Mahout is very promising (maybe the best
shot so far) for solving analytical side of the big data problem.
Just my 2 cents.
Happy Holiday to everyone!
n*****3
发帖数: 1584
5
nice, thanks for sharing with us.
May I ask what if you want some other algorithms which
are NOt part of mahout? write the algorithm from scratch? will that be easy
in the mahout environment?

procedures
of
-
free

【在 S******y 的大作中提到】
: Interesting topic :-)
: Many people think that there would be such a thing coming that user could
: simply plug in R or SAS and make all existing functions/packages/procedures
: to run on Hadoop-scaled data and "solve" the ultimate data size problem.
: Unfortunately, there is no such thing. To achieve that, somebody has to
: virtually rewrite every R package or every SAS/STAT procedure since most of
: their underlying code/algorithms are simply not map-reduce compatible.
: That is industry-scaled development work.
: What Revolution R has achieved is a small piece of aforementioned endeavor -
: they rewrote a few R packages in drastic different implementation from free

S******y
发帖数: 1123
6
It probably requires quite a bit of work.
If you can come up with something like that, you probably can contribute to
Mahout, and also publish your work in academic/industry journal(s) :-)

easy

【在 n*****3 的大作中提到】
: nice, thanks for sharing with us.
: May I ask what if you want some other algorithms which
: are NOt part of mahout? write the algorithm from scratch? will that be easy
: in the mahout environment?
:
: procedures
: of
: -
: free

1 (共1页)
进入Statistics版参与讨论
相关主题
银行十年内是不会用python和R的。。。分享: 从SAS 到 Python 与 R
有人参加过SIAM的会议吗是不是好多人都不喜欢编程?
诚心求教,这样条件合适找什么样的工作?When will I finally know SAS
怎么由 SAS programmer 转成 data scientist (转载)大数据该怎么处理?
R语言能否对大数据库运行中去重复?R glmnet 大数据
转发一个RECRUITER给的工作机会R 有点令人失望
求教:没有CS背景,可以学Data Mining吗?有人用SAS connector 在Hadoop下做分析?
Size of R object for models is so bigmerge单个文件800GB的文件
相关话题的讨论汇总
话题: revolution话题: data话题: mahout话题: hadoop话题: sas