Code :
package com.parquet.TimestampTest;
import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.Pipe;
import cascading.scheme.Scheme;
import cascading.scheme.hadoop.TextDelimited;
import cascading.tap.SinkMode;
import cascading.tap.Tap;
import cascading.tap.hadoop.Hfs;
import cascading.tuple.Fields;
import parquet.cascading.ParquetTupleScheme;
public class GenrateTimeStampParquetFile {
static String inputPath = "target/input/timestampInputFile1";
static String outputPath = "target/parquetOutput/TimestampOutput";
public static void main(String[] args) {
write();
}
private static void write() {
Fields field = new Fields("timestampField").applyTypes(String.class);
Scheme sourceSch = new TextDelimited(field, false, "\n");
Fields
outputField = new Fields("timestampField");
Scheme sinkSch
= new ParquetTupleScheme(field, outputField,
"message TimeStampTest{optional binary timestampField ;}");
Tap source = new Hfs(sourceSch, inputPath);
Tap sink = new Hfs(sinkSch, outputPath, SinkMode.REPLACE);
Pipe pipe = new Pipe("Hive timestamp");
FlowDef fd = FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink);
new HadoopFlowConnector().connect(fd).complete();
}
}
Input file:
timestampInputFile1
timestampField
1988-05-25 15:15:15.254
1987-05-06 14:14:25.362
After running the code following files are generated.
Output :
1. part-00000-m-00000.parquet
2. _SUCCESS
3. _metadata
4. _common_metadata
I have created the table in hive to load the part-00000-m-00000.parquet file.
I have written following query in the hive.
Query :
hive> create table test3(timestampField timestamp) stored as parquet;
hive> load data local inpath '/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table test3;
hive> select * from test3;
After running above command I got following as output.
Output :
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
But I have got above exception.
So is there any way to store the timestamp data in the parquet file format . so that after loading data in hive, it can be successfully read from hive.
Currently I am using
Hive
1.1.0-cdh5.4.2.
Cascading 2.5.1
parquet-format-2.2.0
Thanks
Santlal J. Gupta
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/0f360685-f2f9-4d50-9fb8-ceb8db41872c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.