parquet-cascading- can not be used as sink for parquet

104 views
Skip to first unread message

pavan kumar hegde

unread,
Jul 28, 2015, 11:39:13 AM7/28/15
to cascading-user

We are trying to convert a text file into a parquet file in hfs location. But facing an issue in creating the sink and getting below exception. Please assist.

CODE SNIPPET:

public static final Fields INPUT_FIELDS = new Fields("sample_int", "sample_str", "sample_date","sample_deci", "par_key");
Scheme sinkScheme = new ParquetTupleScheme(INPUT_FIELDS); Tap sink = new Hfs(sinkScheme, parqOutputPath);

Pipe assembly = new Pipe("namecp"); assembly = new Each(assembly, new UnpackTupleFunction());
Flow flow = new Hadoop2MR1FlowConnector().connect("namecp", inData, sink, assembly);

EXCEPTION:

Exception in thread "main" cascading.flow.planner.PlannerException: tap named: 'namecp', cannot be used as a sink: Hfs["ParquetTupleScheme[['sample_int', 'sample_str', 'sample_date', 'sample_deci', 'par_key']->[ALL]]"]["/user/cloudera/parquet_hive_cascade"] at cascading.flow.planner.FlowPlanner.verifyTaps(FlowPlanner.java:379) at cascading.flow.planner.FlowPlanner.verifyAllTaps(FlowPlanner.java:266) at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:169) at cascading.flow.FlowConnector.connect(FlowConnector.java:456) at cascading.flow.FlowConnector.connect(FlowConnector.java:445) at cascading.flow.FlowConnector.connect(FlowConnector.java:421) at cascading.flow.FlowConnector.connect(FlowConnector.java:270) at cascading.flow.FlowConnector.connect(FlowConnector.java:215)

============================================================================

SAMPLE DATA:

1|abc-xy|14-12-25|12.34|20150101

2|fbcxy|14-12-05|2.4|20150201

3|fbscxy|14-11-05|0.422|20150301

Andre Kelpe

unread,
Jul 28, 2015, 11:45:19 AM7/28/15
to cascadi...@googlegroups.com
From what I can see, you have to use ParquetTBaseScheme, if you want to use parquet as a sink: https://github.com/apache/parquet-mr/blob/master/parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetTBaseScheme.java

The parquet community should have more information for you: http://parquet.apache.org/community/

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/960cf4b3-1f8d-4ce5-8796-77bbf469f566%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages