学习Pig Latin - DataSciences版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

DataSciences版 - 学习Pig Latin

相关主题
● How to load csv file converted from excel file into Cloudera Hive or Impala?	● 有没有人想报Cloudera的Data Scientist Certificate的
● What's the best way to convert text/csv file into PARQUET	● Impala v Hive
● [Pig Progamming] Pig Latin join problem	● 想转行Data Science，求建议
● [Road map] From ClickStream to ConsumerInsight	● 请问今年有Big Data的短期training培训吗（美国）？
● Tumblr HQ NYC refer (转载)	● Re: MapR Technologies continue hiring a lot of positions (转载)
● 克劳迪娅包怎么用啊	● Cloudera training VM: sudo apt-get command not found
● hadoop的经验怎么攒？	● 海量数据，”大数据”，高频数据及其它---从“Big Data"说开去
● 请教一下如何快速复习/学习DS的核心知识	● big data－大纽约地区聚会

相关话题的讨论汇总
话题: pig话题: latin话题: edge话题: node话题: file

进入DataSciences版参与讨论

1

(共1页)

r*****d 发帖数: 346	1 请问大家：有什么好方法好资料帮助学习Pig Latin? Pig Latin我完全是新手。我在 amazon上找了下，好像没有书评特别好的。谢谢！
c***z 发帖数: 6348	2 You need the following things: 1. An editor, I use sublime2, the cloudera package uses Gedit 2. A cluster with Pig installed at the edge nodes, you can use the VM in the cloudera package 3. A file transfer to move Pig code from local drive (if you edit locally) to edge node, I use Winscp, the cloudera package uses Hue 4. A way to run the code at edge node, I use putty, the cloudera package uses Hue My work flow: write Pig code locally using sublime2, upload code to edge node using winscp, run code at edge node using putty.
c***z 发帖数: 6348	3 Also, my work flow using Scala Scalding (i.e. Scala on Hadoop): 1. edit and compile in Intelij, or other IDE, or edit in any text editor and compile in a terminal (CMD for windows) setting up Intelij is complicated and out of my scope 2. upload the jar file to edge node, I use winscp optionally, upload the code to github for version control 3. run the jar file at edge node using putty, with specific input path, output path and other argument, I save them to an .sh file for reuse (you can save to .txt file and then copy and paste) the command to run the jar is something like hadoop jar myjar.jar packagename.functionname --input "myinputpath/part" -- output "myoutputpath" --hdfs 4. to get the output to a text file, use something like hadoop fs -cat "myoutputpath/part" > myresult.tsv (I prefer .tsv over .csv because comma can appear in numbers like 133,010 and mess up things)
g*******t 发帖数: 7704	4 pig好慢好慢
s*********e 发帖数: 1051	5 对大数据处理，pig不慢。你自己测测时间，与r或python比一下【在 g*******t 的大作中提到】 : pig好慢好慢
l*******m 发帖数: 1096	6 online tutorials are good enough for start if you know SQL and a bit mapreduce 【在 r*****d 的大作中提到】 : 请问大家：有什么好方法好资料帮助学习Pig Latin? Pig Latin我完全是新手。我在 : amazon上找了下，好像没有书评特别好的。 : 谢谢！
r*****d 发帖数: 346	7 谢谢大家回复！ update: i found the book 'Programming Pig' especially the first 6 chapters very helpful - it is not demanding to pick up Pig as yet another tool :)

1

(共1页)

进入DataSciences版参与讨论

相关主题
● big data－大纽约地区聚会	● Tumblr HQ NYC refer (转载)
● 请问我的背景能申请Data Scientist吗？谢谢！	● 克劳迪娅包怎么用啊
● 非phd的data infra码农求解职业规划迷惑...	● hadoop的经验怎么攒？
● Another perspective on Cloudera (maybe also big data)	● 请教一下如何快速复习/学习DS的核心知识
● How to load csv file converted from excel file into Cloudera Hive or Impala?	● 有没有人想报Cloudera的Data Scientist Certificate的
● What's the best way to convert text/csv file into PARQUET	● Impala v Hive
● [Pig Progamming] Pig Latin join problem	● 想转行Data Science，求建议
● [Road map] From ClickStream to ConsumerInsight	● 请问今年有Big Data的短期training培训吗（美国）？

相关话题的讨论汇总
话题: pig话题: latin话题: edge话题: node话题: file

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)