Next meetup (11th Aug)

10 views
Skip to first unread message

Makoto

unread,
Jul 28, 2010, 2:56:22 PM7/28/10
to nosql-summer-london
Hello.

Thank you very much for attending today's meetup.

For next meetup, we will follow the Dan's suggestion of "Query" theme.

Here are the related papers Dan suggested.

* MapReduce,
http://nosqlsummer.org/paper/google-mapreduce
* The 1995 SQL Reunion: People, Projects, and Politics,
http://nosqlsummer.org/paper/1995-sql-reunion
* Yahoo's pig
http://www.cs.cmu.edu/~olston/publications/sigmod08.pdf

During the meetup, people suggested to look into various query
languages like below.

http://labs.google.com/papers/sawzall.html
http://code.google.com/apis/bigquery
http://wiki.apache.org/hadoop/Hive

If you know any other interesting query languages, please suggest.

Thanks.

Makoto

Dan

unread,
Jul 28, 2010, 6:14:15 PM7/28/10
to nosql-sum...@googlegroups.com
Hey,

Thanks for picking this everyone!

Look forward to discussing these in a few weeks.

Dan

Makoto

unread,
Aug 7, 2010, 4:57:49 AM8/7/10
to nosql-summer-london
Hello.
Has anyone tried to install pig?
I just installed hadoop and pig on my macbook.
Hadoop dfs seems working fine, but pig does not recognize the dfs (so
it connects to file:/// instead of dfs:///)

$ hadoop dfs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2010-08-07 08:51 /user/
hadoop/mv_dir

$ pig
10/08/07 09:55:24 INFO pig.Main: Logging error messages to: /Users/
hadoop/src/pig-0.7.0/conf/pig_1281171324871.log
2010-08-07 09:55:25,124 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///


here is my env variable. Let me know if anyone encountered the same
problem.

$ env
HADOOP_HOME=/Users/hadoop/src/hadoop-0.20.2
PIGDIR=/Users/hadoop/src/pig-0.7.0
PIG_HADOOP_VERSION=20
USER=hadoop
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/git/bin:/
usr/X11/bin:/home/hadoop/hadoop-0.20.2/bin:/Users/hadoop/src/
hadoop-0.20.2/bin:/Users/hadoop/src/hadoop-0.20.2/bin:/Users/hadoop/
src/pig-0.7.0/bin:/Users/hadoop/src/hadoop-0.20.2/bin:/Users/hadoop/
src/pig-0.7.0/bin
JAVA_HOME=/Library/Java/Home
PIG_CLASSPATH=/Users/hadoop/src/hadoop-0.20.2

Thanks.

Makoto

Dan

unread,
Aug 7, 2010, 6:27:36 AM8/7/10
to nosql-sum...@googlegroups.com
I guess pig can't find the setting for hadoop, I think you have to add the hadoop conf path to the PIG_CLASSPATH. Here's my PIG_CLASSPATH :

PIG_CLASSPATH=/etc/hadoop/conf:/usr/share/java/zookeeper.jar:/usr/share/java/hbase.jar:/usr/share/java/hbase-test.jar

Also double check that Pig is set to use hdfs in it's config file, I think it is be default.

Hope that helps,

Makoto Inoue

unread,
Aug 7, 2010, 4:07:21 PM8/7/10
to nosql-sum...@googlegroups.com
Hi, Dan.
Thank you for your advice. When I include the conf dir, then pig does not even startup ;-(.

However, I can at least start as local mode where I can run various pig command, so that's good enough.

One question about pig though.
Looks like pig does not have any time related types. Is time based log analysis good fit for pig?

As a example, I wanted to analyse some of our log related info like this.

+-----+--------+----------+---------------------+
| id  | app_id | messages | created_at          |
+-----+--------+----------+---------------------+
|   1 |      4 |        0 | 2010-05-10 10:55:30 |
|   2 |      1 |       81 | 2010-05-10 10:55:30 |
|   3 |      4 |        11| 2010-05-10 10:55:38 |
|   4 |      1 |        25 | 2010-05-10 10:55:38 |
|   5 |      4 |        0 | 2010-05-10 10:55:43 |
|   6 |      1 |        2 | 2010-05-10 10:55:43 |
|   7 |      4 |        0 | 2010-05-10 10:55:48 |
|   8 |      1 |        7 | 2010-05-10 10:55:48 |
|   9 |      4 |        0 | 2010-05-10 10:55:53 |

I wanted to group these by app_id and various date range (1 min, 5 min, 1 hr, 1 day, 1 week, 1 month) and see avg, min, max, sd, etc for each time range. 

There are several ways to handle using sql (http://stackoverflow.com/questions/1607143/mysql-group-by-intervals-in-a-date-range), but none of them seem straight forward.
Do you know if this kind of use is good fit for map/reduce, pig, or not much difference from sql ?

BTW, I also had a quick look at hive. They do really look like sql. It's even hard to find the difference.


Cheers.

Makoto

Dan

unread,
Aug 11, 2010, 1:41:57 PM8/11/10
to nosql-sum...@googlegroups.com
Hey,

Not sure if you'll get this in time.. unfortunately due to things at work I can't come again, starts up my be good fun but they do suck up time :-(

So sorry I suggested this and can't input! Hope the discussion goes well.

Thanks,

Makoto Inoue

unread,
Aug 11, 2010, 3:08:49 PM8/11/10
to nosql-sum...@googlegroups.com
No worries Dan.

It was less people, but I certainly enjoyed the discussion.

I just posted the next meeting schedule.

Cheers.

Makoto
Reply all
Reply to author
Forward
0 new messages