Greetings,
Back in April 2014, we released the Beta version of the Datastore connector for Hadoop, a Java library which enables Hadoop to read and write data to/from Cloud Datastore programmatically. Based on user feedback, usage characteristics, and continued analysis of other available options to meet the needs of various use cases, we have decided to stop further development of the Cloud Datastore connector for Hadoop.
Google Cloud Platform supports multiple options for data processing on Datastore data:
AppEngine MapReduce: Open Source Map Reduce library that runs within App Engine and leverages Datastore data and TaskQueues.
Datastore Backups to GCS: Backup Datastore data to Google Cloud Storage(GCS), leveraging the standard backup mechanisms. Once data is in GCS, we can leverage Hadoop GCS Connector to connect to Hadoop clusters running on Google Compute Engine (GCE). We have also integrated deployment of Hadoop GCS connectors with bdutil, Google Cloud Platform Hadoop Deployment toolset.
Analyze Data store data via BigQuery: Create data processing pipeline to export data from Datastore and load data into BigQuery
We are working hard on enhancing tools to simplify the deployment of Hadoop clusters on Google Cloud Platform, including providing enhanced connectivity to Google Cloud Storage and Google BigQuery. We have recently released bdutil 1.1.0 to streamline deployment, including support for Hortonworks HDP 2.2, support for BigQuery connector with Hadoop 2.x and additional enhancements
As always, please send us any feedback to gcp-hadoop-contact@google.com.
Best Regards,
Ram Ramanathan on behalf of the Google Cloud Platform