Dataproc for Lift and Shift?

Pooja Choudhary

unread,

Dec 18, 2018, 12:12:11 AM12/18/18

to Google Cloud Dataproc Discussions

Is dataproc good for Lift and shift of existing Map Reduce application from on-prem or do we need to first deploy existing application on containers and then run those containers on cloud? Our requirement is minimum changes to our existing code and fewer overhead of migrating to cloud

Thanks

Dustin Smith

unread,

Dec 18, 2018, 1:00:35 AM12/18/18

to Google Cloud Dataproc Discussions

Hi Pooja,

We have lots of customers who choose to move their existing MapReduce workloads from on-prem into Cloud Dataproc. The extent of how much code needs to be changed is really dependent on the specifics of your use case. While many customers choose to adopt containers as part of their move to Google Cloud, it's definitely not a requirement depending on the situation. Do you have more details you could share about what you already have deployed?

If you'd be more comfortable discussing in a non-public setting, I'd be happy to connect you with a resource here at Google who can discuss the specifics of your project with you and some recommendations on how to move forward. I'm the product marketing manager for Cloud Dataproc, please feel free to email me directly if I can help (dustin...@google.com).

Best,

Dustin

mich.ta...@gmail.com

unread,

Dec 28, 2018, 5:23:18 PM12/28/18

to Google Cloud Dataproc Discussions

Hi,

Can you be a bit more specific about MapReduce applications? Are these primary applications using Hive tables on prem. As I understand you get to build clusters with the chosen number of nodes for master + n workers. Then you have to migrate your HDFS storage (Hive tables or others) to buckets in Cloud (which is not plain sailing from my experience) and then use Spark on the storage or BigQuery tables. That will provide you with Ephemeral clusters of your choice to perform you work again predominantly Spark as the execution engine.

So it boils down on the usage of Spark as the execution engine on Prem already and you migration plan to Cloud. Also from my experience you will have to maintain a hybrid model for sometime, in other words your on-prem should co-exist with Cloud for sometime.

So it boils down what issues are you facing that you want to consider Dataproc as an option.

HTH,

Mich

Karthik Palaniappan

unread,

Dec 28, 2018, 5:48:49 PM12/28/18

to Google Cloud Dataproc Discussions

Aside: just as a more permanent reference, you can take a look at this doc and the docs it links to: https://cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-overview. There are plenty of videos from conferences as well, such as https://www.youtube.com/watch?v=FNNCXBmSYfU.

Thanks @Mich for jumping in to help!

Reply all

Reply to author

Forward