BigDecimal & Date Datatype - How to use with parquet-cascading?

166 views
Skip to first unread message

Bhavesh Shah

unread,
Jun 26, 2015, 3:35:53 AM6/26/15
to cascadi...@googlegroups.com
Hi,
I am trying to use BigDecimal & Date datatype with parquet-cascading. I have created some sample job using it but it throws exception if I try to use BigDecimal/Date type in "message type". When I write the "message type" in ParquetTupleScheme, in that I am not able to use BigDecimal & Date. Below is the sample code: 

public class ReadWriteParquet {

 

                static String textInputPath = "inputOutput/input/in1.txt";

                static String parquetOutputPath = "inputOutput/output/parquet-out";

                static String textOutputPath = "inputOutput/output/text-out";

 

                public static void main(String[] args) throws Exception { 

                                ReadWriteParquet.write();

                                ReadWriteParquet.read(); 

                }

 

                private static void read() { 

                                Scheme parquetinput = new ParquetTupleScheme(new Fields("Name",

                                                                "College", "Branch", "Age", "Doj", "BigDeci"));

                                Scheme textoutput = new TextDelimited(true, ",");

 

                                Tap source = new Hfs(parquetinput, parquetOutputPath);

                                Tap sink = new Hfs(textoutput, textOutputPath, SinkMode.REPLACE);

 

                                Pipe pipe = new Pipe("Read Parquet");

                                pipe = new GroupBy(pipe, new Fields("Branch"));

 

                                Properties hadoopProps = new Properties();

                                AppProps.setApplicationJarClass(hadoopProps, ReadWriteParquet.class);

                                TupleSerializationProps.addSerialization(hadoopProps,

                                                                BigDecimalSerialization.class.getName());

 

                                FlowDef flowdef = FlowDef.flowDef().addSource(pipe, source)

                                                                .addTailSink(pipe, sink);

                                HadoopFlowConnector hd = new HadoopFlowConnector(hadoopProps);

                                hd.connect(flowdef).complete();

                }

 

                private static void write() {

                                DateType dateType = new DateType("dd/MM/yyyy");

                                Fields fields = new Fields("Name", "College", "Branch", "Age", "Doj",

                                                                "BigDeci").applyTypes(String.class, String.class, String.class,

                                                                Integer.class, dateType, BigDecimal.class);

 

                                Scheme input = new TextDelimited(fields, true, ",");

 

                                Scheme parquetout = new ParquetTupleScheme(

                                                                fields, fields,  "message ReadWriteParquet {required Binary Name; required Binary College; required Binary Branch; optional int64 Age; required int64 Doj; required Double BigDeci; }");

 

                                Tap source = new Hfs(input, textInputPath);

                                Tap sink = new Hfs(parquetout, parquetOutputPath, SinkMode.REPLACE);

 

                                Pipe pipe = new Pipe("Write Parquet");                               

 

                                FlowDef flowdef = FlowDef.flowDef().addSource(pipe, source)

                                                                .addTailSink(pipe, sink);

                                new HadoopFlowConnector().connect(flowdef).complete();

                } 

}


In above code, you can see that in "message type"  I have used Int64 for date as there is no provision for date datatype and Double for BigDecimal. Saying Int64 to date it writes the data as long values but I want date to be written in some particular format. And same with BigDecimal, To run the job I have mapped it to Double but I want to map it to BigDecimal.

Is there any way to deal with date and bigdecimal datatypes in parquet-cascading while writing the data? Please let me know if any of the workaround is there.


Thanks,
Bhavesh

Andre Kelpe

unread,
Jun 29, 2015, 1:55:36 AM6/29/15
to cascadi...@googlegroups.com
I think you might be better off asking on the parquet-user list, since parquet-cascading is part of Apache Parquet: https://parquet.apache.org/community/

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/b9c28380-b34d-4b38-959a-977238ae71b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

shree

unread,
Aug 5, 2015, 3:35:41 PM8/5/15
to cascading-user
Hi Bhavesh,

Sorry to pull you into my issue. I am also working on ParquetReadWrite functionality. It works fine in local IDE but when i try to run in Hadoop cluster as a  Hadoop jar am getting java.lang.ClassNotFoundException: cascading.scheme.Scheme
Can you please refer to my post for more details and share your thoughts on it.




Thank You.
Reply all
Reply to author
Forward
0 new messages