Unable to read decimal and timestamp column values from a parquet file

23 views
Skip to first unread message

khedkarn...@gmail.com

unread,
Oct 24, 2018, 5:42:28 AM10/24/18
to cascading-user
Hello, 

I am facing an issue while reading decimal and timestamp type values from a parquet file. I am using a parquet file having following schema type.

 message hive_schema {
  optional int64 prodid;
  optional int32 regionid;
  optional fixed_len_byte_array(5) stdprice (DECIMAL(10,0));
  optional fixed_len_byte_array(5) minprice (DECIMAL(10,0));
  optional int96 startdate;
  optional int96 enddate;
}


From the above schema when i am reading from file i am getting binary output for fixed_len_byte_array(5) stdprice (DECIMAL(10,0)) and  int96  datatypes.    

       Fields srcprices_parquetFields = new Fields("prodid","regionid","stdprice","minprice","startdate","enddate");
       Scheme src_scheme0 = new ParquetTupleScheme(srcprices_parquetFields);
       Tap srcprices_parquetTap = new Hfs(src_scheme0, "hdfs://nameservicetor/user/hive/warehouse/prices_parquet");


The above code giving the following output:

requestedFields ::: 'prodid', 'regionid', 'stdprice', 'minprice', 'startdate', 'enddate'
		 data :: 100860	101	�	|	 ��^�4k%	 ��^��l%
		 data :: 100860	102	�	~	 ��^�4k%	����
7m%
		 data :: 100860	103	�	�	����
8m%	null
		 data :: 100861	105	�	�	 ��^�4k%	 ��^��l%
		 data :: 100861	107	�	�	 ��^��l%	����
7m%
		 data :: 100861	106	�	�	����
8m%	null
		 data :: 100870	104	{	z	 ��^��l%	null
		 data :: 100871	101	�	�	 ��^�4k%	 ��^��l%
		 data :: 100890	108	��	��	����
8m%	����
tm%
		 data :: 100890	105	��	��	 ��^�4k%	null
		 data :: 101863	102	b	c	 ��^��k%	 ��^��l%
		 data :: 102130	103	�	�	����
�k%	����
�m%

How do I get the data for 'stdprice', 'minprice', 'startdate', 'enddate' columns having above mentioned schema.

Thank You.

Regards,
Nikhil

Reply all
Reply to author
Forward
0 new messages