Hi,
I am following examples from Marz' book, and playing with Pail and graph thrift schema.
Anyway, I noticed following issues, so you correct me if I'm wrong:
1. There is no way to specify Hadoop Configuration for Pail utility MR jobs (for example - snapshot()). I peek into the code, and I can see that JobConf is created without taking any Configuration parameter from anywhere, so the only way to set Jobtracker address and other Hadoop properties was to place core-site.xml and mapred-site.xml in root of my classpath, because that's the default location that is checked automatically.
2. Source PailTap should read stored records from specified Pail using registered structure (in pail.meta), so only path to existing Pail should be sufficient. Even EAP version of Big data book says so, by giving example of source pailtap, something like:
Tap sourceTap = new PailTap("/path/to/source/pail");
Unfortunately, this constructor that takes only String path doesn't work since it does not use registered PailStructure while reading, thus throwing errors when I try to sink that sourced data into target PailTap, such as:
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to com.marcinko.bigdatalab.domain.Data
(jus a note, both sing and source PailTap use my data object)
After using other constructor for my sink PailTap, the one that takes PailTap.Options as second argument, then stuff worked:
public static PailTap constructSourcePailTap(Pail pail) {
PailTap.PailTapOptions opts = new PailTap.PailTapOptions();
opts.spec = pail.getSpec();
return new PailTap(pail.getRoot(), opts);
// return new PailTap(pail.getRoot()); // this doesn't work!!!
}
Regards,
Vjeran