GAE & BigQuery

147 views
Skip to first unread message

Richard Watson

unread,
Oct 15, 2012, 6:31:46 AM10/15/12
to google-a...@googlegroups.com
One of the issues of the datastore is a lack of SQL-like aggregates, or easy ad-hoc queries in general.  This adds a lot of cost to reporting, dashboards, etc.  Is BigQuery the default choice for this type of work, now?  For those who have used it, what kinds of things should we watch out for?  What types of result do you cache back in the datastore, or do you just run all that from BigQuery?

alex

unread,
Oct 15, 2012, 8:58:12 AM10/15/12
to google-a...@googlegroups.com
I don't personally believe there exist a "default" in this kind of
work without giving any specifics but you should definitely try it. If
you haven't seen it, here's a recent post related to your questions.
You might find some answers there:
http://googleappengine.blogspot.it/2012/10/streak-brings-crm-to-inbox-with-google.html

The only feature that's not there and I'd love to see it being
implemented, is existing rows/data updates. Currently you can only
append to existing table. Other than that, BigQuery is really awesome
for all sorts of adhoc data analysis, dashboards, charts, etc.

-- alex
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/tb9HEW70DM4J.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

Richard Watson

unread,
Oct 15, 2012, 9:48:59 AM10/15/12
to google-a...@googlegroups.com
Thanks, Alex, I saw that article and I've taken a look at their source.  Good to see that people are using BQ, I'm just hoping to get some first-hand pros and cons from GAE devs.

Richard Watson

unread,
Oct 18, 2012, 7:57:17 AM10/18/12
to google-a...@googlegroups.com
Just a note in case anyone is thinking about similar questions. I'm trying out CloudSQL for this, in part because of Alex's issue of no updates, but also because of all the nice known-entity toys that SQL gives us. I'll likely keep all my main data in the datastore, but the no-joins and no-aggregates and no ad-hoc queries stories have soaked up enough of my time.

I imagine a good approach is:
Datastore for base data
Copy oft-updated entities to SQL for intra-day queries
When data becomes fixed (e.g. closed-off months), move it to BigQuery for multi-month/year querying. This I'll likely not do until I feel there's a need for it.

If you have alternate approaches that work well, I'd appreciate a reply.

Aleem Mawani

unread,
Oct 18, 2012, 1:27:30 PM10/18/12
to google-a...@googlegroups.com
I wrote the article mentioned above. Our approach has been to export our entire datastore to BQ multiple times a day. It actually isn't that expensive. We then run our dashboards off of that. 

We wrote Mache to do the export automatically -  https://github.com/StreakYC/mache

Jon Stevens

unread,
Oct 18, 2012, 9:05:01 PM10/18/12
to google-a...@googlegroups.com
It depends on what you are doing, but to throw another service out there, I've actually found mixpanel.com to be quite useful for sending data out to. I then use their website to get an idea for what sort of data I want to query for (and how to query it). Then, I wire up some Google Charts to render the data.

jon
Reply all
Reply to author
Forward
0 new messages