Hi, Dan.
Thank you for your advice. When I include the conf dir, then pig does not even startup ;-(.
However, I can at least start as local mode where I can run various pig command, so that's good enough.
One question about pig though.
Looks like pig does not have any time related types. Is time based log analysis good fit for pig?
As a example, I wanted to analyse some of our log related info like this.
+-----+--------+----------+---------------------+
| id | app_id | messages | created_at |
+-----+--------+----------+---------------------+
| 1 | 4 | 0 | 2010-05-10 10:55:30 |
| 2 | 1 | 81 | 2010-05-10 10:55:30 |
| 3 | 4 | 11| 2010-05-10 10:55:38 |
| 4 | 1 | 25 | 2010-05-10 10:55:38 |
| 5 | 4 | 0 | 2010-05-10 10:55:43 |
| 6 | 1 | 2 | 2010-05-10 10:55:43 |
| 7 | 4 | 0 | 2010-05-10 10:55:48 |
| 8 | 1 | 7 | 2010-05-10 10:55:48 |
| 9 | 4 | 0 | 2010-05-10 10:55:53 |
I wanted to group these by app_id and various date range (1 min, 5 min, 1 hr, 1 day, 1 week, 1 month) and see avg, min, max, sd, etc for each time range.
Do you know if this kind of use is good fit for map/reduce, pig, or not much difference from sql ?
BTW, I also had a quick look at hive. They do really look like sql. It's even hard to find the difference.
Cheers.
Makoto