Options to enable MapReduce jobs on Datastore data

341 views
Skip to first unread message

Hadoop on Google Cloud Platform Team

unread,
Feb 17, 2015, 10:55:14 PM2/17/15
to gcp-had...@google.com, gcp-hadoo...@googlegroups.com

Greetings,


Back in April 2014, we released the Beta version of the Datastore connector for Hadoop, a Java library which enables Hadoop to read and write data to/from Cloud Datastore programmatically. Based on user feedback, usage characteristics, and continued analysis of other available options to meet the needs of various use cases, we have decided to stop further development of the Cloud Datastore connector for Hadoop.

Google Cloud Platform supports multiple options for data processing on Datastore data:

  1. AppEngine MapReduce: Open Source Map Reduce library that runs within App Engine and leverages Datastore data and TaskQueues.

  2. Datastore Backups to GCS: Backup Datastore data to Google Cloud Storage(GCS), leveraging the standard backup mechanisms. Once data is in GCS, we can leverage Hadoop GCS Connector to connect to Hadoop clusters running on Google Compute Engine (GCE). We have also integrated deployment of Hadoop GCS connectors with bdutil, Google Cloud Platform Hadoop Deployment toolset.

  3. Analyze Data store data via BigQuery: Create data processing pipeline to export data from Datastore and load data into BigQuery


We are working hard on enhancing tools to simplify the deployment of Hadoop clusters on Google Cloud Platform, including providing enhanced connectivity to Google Cloud Storage and Google BigQuery. We have recently released bdutil 1.1.0 to streamline deployment, including support for Hortonworks HDP 2.2, support for BigQuery connector with Hadoop 2.x and additional enhancements


As always, please send us any feedback to  gcp-hadoop-contact@google.com.


Best Regards,


Ram Ramanathan on behalf of the Google Cloud Platform

Reply all
Reply to author
Forward
0 new messages