I am trying to perform integration testing of the entire job during the build process. Currently all the intermediate flows of the Cascade are reading and writing parquet files. I have setup the framework to toggle local flow connector and hadoop flow connector based on environment. But Parquet files don't work in LocalFlowConnector due to the hard coded reference in Parquet Cascading to "Hfs".
private MessageType readSchema(FlowProcess<JobConf> flowProcess, Tap tap) {
try {
Hfs e;
if(tap instanceof CompositeTap) {
e = (Hfs)((CompositeTap)tap).getChildTaps().next();
} else {
e = (Hfs)tap;
}
List footers = this.getFooters(flowProcess, e);
if(footers.isEmpty()) {
throw new TapException("Could not read Parquet metadata at " + e.getPath());
} else {
return ((Footer)footers.get(0)).getParquetMetadata().getFileMetaData().getSchema();
}
} catch (IOException var5) {
throw new TapException(var5);
}
}
Even i pass a FileTap with Parquet Scheme it type casts it to Hfs tap and if i am running in local mode that fails. Is there any workaround without swapping between Parquet and Delimited as well based on environment?