Pail issues

vmar...@gmail.com

unread,

May 5, 2013, 5:02:02 PM5/5/13

to cascal...@googlegroups.com

Hi,

I am following examples from Marz' book, and playing with Pail and graph thrift schema.

Anyway, I noticed following issues, so you correct me if I'm wrong:

1. There is no way to specify Hadoop Configuration for Pail utility MR jobs (for example - snapshot()). I peek into the code, and I can see that JobConf is created without taking any Configuration parameter from anywhere, so the only way to set Jobtracker address and other Hadoop properties was to place core-site.xml and mapred-site.xml in root of my classpath, because that's the default location that is checked automatically.

2. Source PailTap should read stored records from specified Pail using registered structure (in pail.meta), so only path to existing Pail should be sufficient. Even EAP version of Big data book says so, by giving example of source pailtap, something like:

Tap sourceTap = new PailTap("/path/to/source/pail");

Unfortunately, this constructor that takes only String path doesn't work since it does not use registered PailStructure while reading, thus throwing errors when I try to sink that sourced data into target PailTap, such as:

Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to com.marcinko.bigdatalab.domain.Data

(jus a note, both sing and source PailTap use my data object)

After using other constructor for my sink PailTap, the one that takes PailTap.Options as second argument, then stuff worked:

public static PailTap constructSourcePailTap(Pail pail) {

PailTap.PailTapOptions opts = new PailTap.PailTapOptions();

opts.spec = pail.getSpec();

return new PailTap(pail.getRoot(), opts);

// return new PailTap(pail.getRoot()); // this doesn't work!!!

}

Regards,

Vjeran

Sam Ritchie

unread,

May 5, 2013, 11:02:33 PM5/5/13

to cascal...@googlegroups.com

Nathan, it would be great if you could release a bit more information about Pail -- it seems that many, many people have questions, so I'm not sure that the documentation in "Big Data" is really up to date anymore. (The package root has changed from "backtype" to "com.backtype", for example.)

vmar...@gmail.com

May 5, 2013 4:02 PM

--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie

David Kincaid

unread,

May 7, 2013, 9:33:54 AM5/7/13

to cascal...@googlegroups.com

I've been spending a fair amount of time in the source code and working with Pail for a project we just started that will launch later this year. I'd be willing to help out on some documentation.

Dave

Sam Ritchie

unread,

May 7, 2013, 10:40:26 AM5/7/13

to cascal...@googlegroups.com

David, that would be fantastic. Earlier this year I merged the dfs-datastores and dfs-datastores-cascading projects into the main github repo:

https://github.com/nathanmarz/dfs-datastores/wiki

This is the right place for documentation, I think.