l******9 发帖数: 579 | 1 我有 两个 CSV 文件 (800 million rows, 3 columns), 每个有 30GB。
我的电脑内存 8GB。
请问 , 如何用R ,load and process/analyze 这样的文件
谢谢 |
g*****o 发帖数: 812 | 2 何不先装到数据库里再用r或者python加载相关数据...
【在 l******9 的大作中提到】 : 我有 两个 CSV 文件 (800 million rows, 3 columns), 每个有 30GB。 : 我的电脑内存 8GB。 : 请问 , 如何用R ,load and process/analyze 这样的文件 : 谢谢
|
l******9 发帖数: 579 | 3 You mean I load files to a database and then analyze data by SQL from R or
Python ?
【在 g*****o 的大作中提到】 : 何不先装到数据库里再用r或者python加载相关数据...
|
g*****o 发帖数: 812 | 4 对啊, 数据库不就是干这事的嘛.
【在 l******9 的大作中提到】 : You mean I load files to a database and then analyze data by SQL from R or : Python ?
|
h***i 发帖数: 3844 | 5 agree, 用R分析这么大data,想的太多了吧
【在 g*****o 的大作中提到】 : 对啊, 数据库不就是干这事的嘛.
|
n*****3 发帖数: 1584 | 6 if with r
add more memory is the cheapest option
【在 l******9 的大作中提到】 : 我有 两个 CSV 文件 (800 million rows, 3 columns), 每个有 30GB。 : 我的电脑内存 8GB。 : 请问 , 如何用R ,load and process/analyze 这样的文件 : 谢谢
|
w****r 发帖数: 28 | 7 load 进来以后可能 会小得多,因为csv存储数字也是按照txt格式
比如 3.33333333 再csv需要好几个byetes但是R里面就是一个double
我以前csv数据100m,存到R只有2m |
G**Y 发帖数: 33224 | 8 还是数据库靠谱,
R的随机读取很差劲。
【在 w****r 的大作中提到】 : load 进来以后可能 会小得多,因为csv存储数字也是按照txt格式 : 比如 3.33333333 再csv需要好几个byetes但是R里面就是一个double : 我以前csv数据100m,存到R只有2m
|
S******y 发帖数: 1123 | 9 If you prefer using R for modeling, you may need to perform data reduction
first (you can do it in Python if you like)
-----------------------------------------------------------------
欢迎浏览Python/R/Hadoop实战速成课(+工业界coding实例)
http://plus.google.com/+statsGuyMITBBS/about
----------------------------------------------------------------- |
b*********n 发帖数: 2284 | 10 try bigmemory and ff package in R first. if they don't work, i suggest you
use perl or other script language to chop files into small pieces, then use
r to analyze.
【在 l******9 的大作中提到】 : 我有 两个 CSV 文件 (800 million rows, 3 columns), 每个有 30GB。 : 我的电脑内存 8GB。 : 请问 , 如何用R ,load and process/analyze 这样的文件 : 谢谢
|
l******9 发帖数: 579 | 11 Thanks !
Although I can load data to database, I still need to load them into a data.
frame in R so that I can do analysis.
So, I have to install R on the database server ?
What if I have to use R on my laptop ?
Also, R has some limits on the size of data.frame and vectors.
For example, vector size cannot be more than 2 GB (or bits, not sure).
Any help would be appreciated.
use
【在 b*********n 的大作中提到】 : try bigmemory and ff package in R first. if they don't work, i suggest you : use perl or other script language to chop files into small pieces, then use : r to analyze.
|
e**********y 发帖数: 49 | 12 The only reason you need to load 8G data into R is that the analysis is
based on all data and cannot be divided.
data.
Not necessary. You can select the data you need from database into R, which
means only a subset of your data loaded.
【在 l******9 的大作中提到】 : Thanks ! : Although I can load data to database, I still need to load them into a data. : frame in R so that I can do analysis. : So, I have to install R on the database server ? : What if I have to use R on my laptop ? : Also, R has some limits on the size of data.frame and vectors. : For example, vector size cannot be more than 2 GB (or bits, not sure). : Any help would be appreciated. : : use
|
h**t 发帖数: 1678 | 13 before loading data you may have to increase the memory to use by :
memory.limit(size=...) size in Mb.
data.
【在 l******9 的大作中提到】 : Thanks ! : Although I can load data to database, I still need to load them into a data. : frame in R so that I can do analysis. : So, I have to install R on the database server ? : What if I have to use R on my laptop ? : Also, R has some limits on the size of data.frame and vectors. : For example, vector size cannot be more than 2 GB (or bits, not sure). : Any help would be appreciated. : : use
|