Google Cloud Dataflow streaming inserts from Cloud Storage

15 views
Skip to first unread message

Christopher Russell

unread,
Jun 18, 2019, 8:03:50 AM6/18/19
to Google Cloud Developers
Hello,

We are looking for some guidance around some best practices for streaming data into BigQuery. We want to be able to stream xml files from a Google Storage bucket using Dataflow and streaming insert them into BigQuery. The file is an xml representation of an object and the first stage of the pipeline is to unmarshal the xml into a java object, and so the solution has to read the entire file. We had dataflow reading the data from a Google PubSub but wanted to add Google Storage on as the first point of delivery when arriving in the cloud platform, and have dataflow read from the bucket. We have tried example template implementations and TextIO and FileIO to ingest the files, but get consistent failures under load. We were wondering if there were some commonly used design patterns that may suit this use case.

Thanks,
Chris

Nicolas (Google Cloud Platform Support)

unread,
Jun 20, 2019, 4:36:22 PM6/20/19
to Google Cloud Developers

Hi Christopher,

 

Thanks for reporting this,

 

This discussion group is oriented more towards general opinions, trends, and issues of general nature touching Developer tools.

 

For coding and Cloud Dataflow architecture, you may be better served in dedicated forums such as Stack Overflow, where experienced programmers are within reach and ready to help.
Reply all
Reply to author
Forward
0 new messages