Using Cascade Parquet Scheme with LocalFlowConnector to run integration tests

60 views
Skip to first unread message

Kunal Lahiri

unread,
Jun 19, 2016, 7:53:00 PM6/19/16
to cascading-user
I am trying to perform integration testing of the entire job during the build process. Currently all the intermediate flows of the Cascade are reading and writing parquet files. I have setup the framework to toggle local flow connector and hadoop flow connector based on environment. But Parquet files don't work in LocalFlowConnector due to the hard coded reference in Parquet Cascading to "Hfs". 

private MessageType readSchema(FlowProcess<JobConf> flowProcess, Tap tap) {
try {
Hfs e;
if(tap instanceof CompositeTap) {
e = (Hfs)((CompositeTap)tap).getChildTaps().next();
} else {
e = (Hfs)tap;
}

List footers = this.getFooters(flowProcess, e);
if(footers.isEmpty()) {
throw new TapException("Could not read Parquet metadata at " + e.getPath());
} else {
return ((Footer)footers.get(0)).getParquetMetadata().getFileMetaData().getSchema();
}
} catch (IOException var5) {
throw new TapException(var5);
}
}

Even i pass a FileTap with Parquet Scheme it type casts it to Hfs tap and if i am running in local mode that fails. Is there any workaround without swapping between Parquet and Delimited as well based on environment?

Ken Krugler

unread,
Jun 19, 2016, 8:19:58 PM6/19/16
to cascadi...@googlegroups.com
I don’t believe that currently there’s a way to use a Parquet scheme in local mode.

— Ken


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/a19050f5-0fae-4514-8d86-9020aaab9633%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Andre Kelpe

unread,
Jun 20, 2016, 12:11:57 PM6/20/16
to cascading-user
You will have to use the hadoop stand-alone mode for your tests as
parquet is only supported on hadoop.

- Andre
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/a19050f5-0fae-4514-8d86-9020aaab9633%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com
Reply all
Reply to author
Forward
0 new messages