backup of datastore to local machine

167 views
Skip to first unread message

Rohith D Vallam

unread,
Aug 21, 2014, 1:54:14 PM8/21/14
to google-a...@googlegroups.com
Hello,

I have some data stored  in the Google Datastore. I have two entities in the Google Datastore for this purpose. I was wondering if there is a way to backup these Datastore entries periodically to my  local machines ? I searched the internet and could some answers related to using task queues, cron jobs, etc but none of them offered a complete, working solution to this problem. It would be great if someone could share their ideas about how to  periodically backup the datastore to local machines. 

Thanks and Regards,
Rohith 

Shawn Lee

unread,
Aug 22, 2014, 10:34:37 AM8/22/14
to google-a...@googlegroups.com
Hi Rohith,

We have an experimental idea of doing periodic backup in our app. It involves a combination of GAE APIs to achieve it. The following are the steps we used:

1) Identify the entities that need to be backup. This is because downloading all entities will incur unnecessary costs. 
To do this, we created identifying log messages for creation/modification/deletion of the entities in our app. This allows us to look through the logs and retrieve entities that are recently modified.

2) Create a cron job on the local machine that looks through the app engine logs for let's say the past 5 minutes. The logs can be retrieved remotely using the GAE log api.

3) Once the entities are identified, you can download them via the app engine's remote api, which allows you to remotely access the datastore to download the individual entities.

4) Save the data of these downloaded entities in a .csv format or any other format that you require.

5) The csv file can also be uploaded to a local host so that you can view them in a local application. This can be done using the bulkloader in the python SDK, which can also be applied to Java applications.

Hope these might be of some help. We are still experimenting with this idea, but we are able to periodically retrieve the entities over 5 minute intervals and upload them to a local machine. We can then view the data on the local server as though we are on the live server.

Best regards,
Shawn

Rohith D Vallam

unread,
Aug 27, 2014, 9:13:09 AM8/27/14
to google-a...@googlegroups.com
Hello Shawn,

Thanks a lot for your valuable suggestions and sorry for the delay in replying. Could you tell me some more details about the GAE Log  and Remote API  ?  is there some link pointing more details about using these APIs? 

I came across another link for this purpose : http://gbayer.com/big-data/app-engine-datastore-how-to-efficiently-export-your-data/ . I guess you are mentioning about the first approach mentioned in this link right ? 
I tried out the New approach mentioned in this link. We have to basically schedule a backup job  which copies the datastore to cloud storage  and from there we can use tools like gsutil to download. I was wondering if there was an automatic way to backup datastore to cloud storage ? do you know of any such method ? I know we can use task queues, cron jobs ,etc. I tried setting up a cron job but it simply didnt work. Do let me know if you can figure out a way to automatically schedule a cron job or a task queue to backup datastore to google cloud storage. 

Thanks and Regards,
Rohith  

Shawn Lee

unread,
Aug 27, 2014, 9:49:46 AM8/27/14
to google-a...@googlegroups.com
Hi Rohith,

For the GAE log API, the official API can be found here: https://developers.google.com/appengine/docs/java/logs/
An example of the use of the log API can be found here: http://learntogoogleit.com/post/59602939088/logging-api-example
I have personally used the function provided by the above link and it works wonders.

As for the remote API, the official API can be found here: https://developers.google.com/appengine/docs/java/tools/remoteapi

With reference to the link: http://gbayer.com/big-data/app-engine-datastore-how-to-efficiently-export-your-data/ , the first approach slightly differs from the idea I mentioned previously. Instead, it plays a small part in the idea. In the link, the first approach simply uses the bulkloader to download all the data from the datastore. As highlighted by the article, it takes a long time to download the data and might fail. In our approach, the data is downloaded by the remote api instead, and the bulkloader is simply used to upload the data when required.

Lastly, for the automated way to backup the datastore via Google Cloud Storage, you can refer to the following link: https://developers.google.com/appengine/articles/scheduled_backups . This method creates a cron job the automatically backup the datastore at the specified intervals. Hope you found these useful!

Best regards,
Shawn

Rohith D Vallam

unread,
Aug 27, 2014, 10:05:18 AM8/27/14
to google-a...@googlegroups.com
Hello Shawn,

Thanks for the links. Most of the links you have sent works for Python or Java. Is  there any support for PHP as I am currently using PHP for development ? 

Thanks and Regards,
Rohith 


--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/w117ZrSnHy0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


Shawn Lee

unread,
Aug 27, 2014, 10:24:51 AM8/27/14
to google-a...@googlegroups.com
Hi Rohith,

Unfortunately, I have only used the Python and Java APIs for the GAE, so I can't really comment on the support for PHP, but I'm sure that there are definitely some similar APIs for PHP.

Best regards,
Shawn
Reply all
Reply to author
Forward
0 new messages