Stephen your work flow seems pretty good, I need to try it out. We have a good workflow for webservices / website / services but honestly for database updates we do everything manually today.
How do people handle "default" data for the databases? I understand the concept of adding the schema information into source control, so it could be re-created. What about the basic "getting standard" data , are you checking that in as well? I am not talking about the runtime data (generated per user/group/etc) but the "default" data in the DB even before user #1 (whatever that means for your service) signs up.
For example at buddy we have about 50 gigs of "default" data we need in any new DB instance without any users. We don't have that scripted because from our experience, the tools always timeout trying to import that much data from a script or file based resource.
Jeff MacDuff
CTO & Co-Founder, Buddy
Email: je...@buddy.com
Azure scale mobile services -- http://msdn.microsoft.com/en-us/magazine/hh781021.aspx
One of the scenarios Buddy supports is geolocation, and to support geolocation scenarios we have a white label location of places. We have approx. 50 million + locations/places which are "default" in our database and are adding more all the time.
The best solution we have today for "large" default data is to import directly from a live DB into a new instance. Trying to do with from a file set hasn’t been very successful.
The location data needs to be relational because of the complex queries we need to run. For example a mobile app can query buddy for "give me the top 10 coffee shops , within 100 yard of a friend anywhere in the world, where I have rated it 4+ stars, and has metadata = foo". That’s hitting users + friends + locations + metadata string compare all in the same query… and everything is based on location / distance away.
Doing these type of queries without relations (from our experience) can be very inefficient. We looked at different database solutions and most of them worked awesome at ~10 million locations but at ~50 million locations (and the complex queries we do) they fell apart.
Of course once you have 50 million locations + 10 million users + 5 million friend connections that query gets expensive J
-Jeff
Hey Scott, hopefully we can get a copy of your slides soon.
Yes we have thought about exposing the location data through a service, however the queries we then need to run become almost not possible from a performance perspective.
For example consider a simple query where you have 50 million places, and you want to find all the places where your (user account in a app) friends are (other user accounts with a link). If you access the location database from a service you would encounter a “world search” rather than a localized search. The idea of serving the location data from a service quickly breaks down if you try and write a performant query in this model. I would like to hear any ideas about how to solve that problem.. from a performance perspective I don’t have a solution expect for a connection.
Very happy to have a separate smaller meeting to talk about these type of location queries, we have tons of resources optimizing our system including looking at alternatives.
Jeff MacDuff
CTO & Co-Founder, Buddy
Email: je...@buddy.com