How to best handle data stored in different locations in Google BigQuery?

1,019 views
Skip to first unread message

Philipp Singer

unread,
Sep 25, 2015, 12:51:39 PM9/25/15
to Google App Engine

My current workflow in BigQuery is as follows:


(1) query data in a public repository (stored in the US), (2) write it to a table in my repository, (3) export a csv to a cloud bucket and (4) download the csv on the ubuntu server I work on and (5) work with that on the server.


The problem I have now, is that the server I work on is located in EU. Thus, I have to pay quite some fees for transfering data between my US bucket and my EU server. I could now go ahead and locate my bucket in EU, but then I still have the problem that I would transfer data from the US (BigQuery) to EU (bucket). So I could also set my dataset in bq to be located in the EU, but then I cant do any queries anylonger, because the data in the public repository is located in the US, and queries between different locations are not allowed.


I am querying the public reddit comment data. My final ubuntu server that I work on is a machine that is located in the EU, so I can't change that.


Does anyone have an idea of how to approach this?

Patrice (Cloud Platform Support)

unread,
Sep 29, 2015, 10:12:59 AM9/29/15
to Google App Engine
Hi Phillip,

I've been waiting to see if someone from the community here has a brillant idea for your conundrum. Seems like not.

Honestly, with the use case you have, I don't see a good way to cut on costs. The best I could think of would be that you can always zip or encode the data in your bucket before sending it to EU. It's not much, but it'll minimize the size of the data you have to send over to the EU. I don't see much else that can help you unfortunately.

Cheers!

Philipp Singer

unread,
Sep 29, 2015, 1:16:19 PM9/29/15
to Google App Engine
Patrice - Thanks for your response.

If anyone comes up with an idea of how to approach this, please let me know :)

Cheers, Philipp

Philipp Singer

unread,
Sep 29, 2015, 6:11:24 PM9/29/15
to Google App Engine
One more thing - I think there should be some easy way to copy tables between datasets from different locations; this could somehow ease the whole process. Currently, I always need to export the table to a bucket in the same location, copy the export to a bucket in another location, and then import it again in BigQuery.

Patrice (Cloud Platform Support)

unread,
Sep 30, 2015, 9:26:06 AM9/30/15
to Google App Engine
Hi Philipp,

For that, I would say you can post your suggestion to the BigQuery Issue Tracker as a Feature Request. It don't think it'll save you costs necessarily (in the end the same amount of data will move from one location to the next), but I can definitely see a benefit in having this done through a console with an automated process.

Cheers!

Philipp Singer

unread,
Oct 3, 2015, 1:51:16 PM10/3/15
to Google App Engine
I have now moved all my stuff to an EU dataset on BigQuery and only export to and then download from an EU bucket. Still I pay 0.12$ per GB downloaded from the bucket. I don't quite understand that. Is there any way of getting my data out of BigQuery with lower rates?

Patrice (Cloud Platform Support)

unread,
Oct 5, 2015, 10:43:44 AM10/5/15
to Google App Engine
Hi Phillip,

That cost is simply the cost for extracting ANYTHING from Google Cloud Storage. There's not much more to do here if you want to minimize the cost. You'll need to minimize what you put in your bucket if you want to pay less, but you will always pay the same rate by GB.

Cheers
Reply all
Reply to author
Forward
0 new messages