Apache beam dataflow job to write GenericRecords to Parquet

213 views
Skip to first unread message

kanhaiya yadav

unread,
Jan 21, 2020, 1:19:43 PM1/21/20
to Google Cloud Developers
Hi,


In apache beam step I have a collection of KV<String, Iterable<KV<Long, GenericRecord>>>> .
I want to write all the records in the iterable to the same parquet file. My code snippet is given below


p.apply(ParDo.of(new MapWithAvroSchemaAndConvertToGenericRecord())) // PCollection<GenericRecord>
.apply(ParDo.of(new MapKafkaGenericRecordValue(formatter, options.getFileNameDelimiter()))) //PCollection<KV<String, KV<Long, GenericRecord>>>
.apply(GroupByKey.create()) //PCollection<KV<String, Iterable<KV<Long, GenericRecord>>>>>

now I want to write all the Records in the Iterable in the same parquet file(derive the file name by the key of KV).


Abdel (Cloud Platform Support)

unread,
Feb 7, 2020, 1:07:27 PM2/7/20
to Google Cloud Developers
It looks like you already found a solution and posted it on stackoverflow. Great!
Reply all
Reply to author
Forward
0 new messages