Database migration

758 views
Skip to first unread message

John Lim

unread,
Feb 24, 2016, 2:58:35 AM2/24/16
to Firebase Google Group
Hi,

May I know if Firebase provides any tools to aid in database migration? Using this answer from http://programmers.stackexchange.com/questions/109312/are-database-schema-migrations-a-problem-in-production-environments

Just because your NoSql database doesn't have a schema in a traditional sense doesn't mean there is a logical schema you need to deal with as it changes. In the case of a typical app using MongoDb, most likely your code expects certain fields of the json object to behave in certain ways. If you change the behavior, it follows you might want to update the already existing data in the database. Now, with traditional RDBMS this was a largely solved problem -- you just had to ALTER the underlying tables. But with these newfangled NoSQL databases, you have a decision -- do you write a script to munge and update all your objects? Or do you add code to convert between versions on the fly? If so, how long do you support v1 objects? Forever? Until v3?



I
'll add that the example used in the MongoDb blog post is a bit simplistic and a very easy case to handle if you've got a decent update process no matter what the RDBMS is; adding a field rarely hurts. It is when you decide to split your Name field into FirstName and LastName that things get exciting.


How can I update my firebase `schema` when for example, I've decided to split Name field into FirstName and LastName? Thanks.

Kato Richardson

unread,
Feb 24, 2016, 9:43:34 AM2/24/16
to Firebase Google Group
John,

I'm not really sure what this means. Since your Firebase data is JSON and easily manipulated, it's generally not a problem to import or export. Firebase also offers private backups, which are an elegant and performant way to pull down big data sets for export. Imports of small data can be done with any resource, such as REST. Big data sets (tens of MBs or more) are generally handled with something like firebase-streaming-import.

If you're talking about synchronizing two databases on a regular interval--daily, hourly, etc--this is always extremely challenging but certainly possible to some extent, with careful planning.

☼, Kato

--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.
To post to this group, send email to fireba...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/f6fbef7e-42a2-46ec-956b-9800cd7007b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tom Larkworthy

unread,
Feb 24, 2016, 11:59:54 AM2/24/16
to Firebase Google Group

I think John was hoping for in-place schema migration tools, but we don't have anything like that. We don't have an explicit representation of schema at the storage level.

John Lim

unread,
Feb 25, 2016, 2:44:07 AM2/25/16
to Firebase Google Group
Hi Kato,

Tom is right. I'm approaching from the perspective of schema migration. Even though NoSql databases are supposed to be 'schema-less' they do still have some kind implicit schema. This talk by Martin Fowler 

https://www.youtube.com/watch?v=TgdFA72crHM&feature=youtu.be&t=21m45s also talks about database migration of NoSql databases.


In a basic scenario, suppose for example I wanted to rename a node in firebase. How would I do it in sync with the firebase app, assuming I have also updated it to use the new node name? In other words, when i run a `firebase deploy`, ideally, the node would be programmatically renamed and then the firebase app gets deployed. 


Or simply put, how can I refactor the firebase db? Is there a way I can programmatically pull down all data, run my own scripts to update the json data and then programmatically overwrite the database with the updated json? Is this even a recommended approach?


Thanks. :)

Tom Larkworthy

unread,
Feb 25, 2016, 3:30:51 AM2/25/16
to Firebase Google Group
If you have existing clients running logic that assumes a V1 schema, you can't really migrate to a V2 schema by deleting old fields. You could create an adapter that logically presents the V2 schema irrespective of whether backed by the V1 or V2 storage model. It strongly depends on whether older clients are interacting with your newer schemas. Generally, its much easier to over replicate fields, i.e. have name AND firstname AND last name, and utilize the V2 advantages only where possible.
Mobile applications tend to have less control over the deployment process than traditional 3 tier architectures, making many migration strategies moot. You don't want to break apps that are in the wild, so you can't make fundamental changes to the schemas generally (beyond adding fields). Try really hard to get it right first time :) Version your users if you think you are going to change schemas a lot, so your older clients can just ignore data coming from newer schemas.

We have daily backups to Google Cloud Storage, so you could do a map reduce to bulk migrate everything in one go. That would totally break existing apps though so thats only a suitable strategy if the newly connecting clients are refreshed (e.g. a web app only application). Its a big topic, I don't think there is a single golden solution .... it depends. 

I am sure that is not massively helpful but if your migrations are additive then its ok (e.g. splitting name into two fields). If you want a huge topological change to your database and keep older clients running then it's impossible in general. With some creative thinking some changes are probably possible. 

John Lim

unread,
Feb 26, 2016, 1:47:58 AM2/26/16
to Firebase Google Group
Hi Tom,

Thank you for taking the time to answer my questions. Really appreciate it.

`If you have existing clients running logic that assumes a V1 schema, you can't really migrate to a V2 schema by deleting old fields.`
- I could if I had control over the clients (which I do as it's a web app)  and updated the client code to use V2 schema while ensuring that I deployed the clients and the V2 schema at the same time. 

`You could create an adapter that logically presents the V2 schema irrespective of whether backed by the V1 or V2 storage model.`
- I could but the code gets hard to maintain after a certain point, especially given that the schema is constantly evolving as its a greenfield project. 

`We have daily backups to Google Cloud Storage, so you could do a map reduce to bulk migrate everything in one go`
- Could you please point me to resources where I could learn how to do this?

Thanks once again for your time. :)

Tom Larkworthy

unread,
Feb 26, 2016, 2:26:39 AM2/26/16
to Firebase Google Group
If you had a huge dataset and were intolerant of downtime we can backup data to GCS or AWS: https://www.firebase.com/blog/2015-03-05-private-backups-for-firebase-data.html for Bonfire plans. So this gives you a big JSON of all your data. However, its done daily so it might be a bit stale, but it at least gives you the bulk of the paths that need migrating. If you deploy a client that can deal with both the V1 and V2 schema, and writes only V2 schema. Then the backup following that deploy, all new writes should be V2. You can use the backup to guide a crawl over your remaining V1 legacy schema, and rewrite everything with V2. Double check that the legacy data has not already been migrated by a live client, otherwise you risk corrupting your data. This would be quite a slow process because you would have to check all the paths against the backup, but it could scale to quite a big deployment.

If you had a really big dataset but could handle shutting off the app for a day, then it would be much simpler if you could pause your app for maintenance while you do the migration though, (or tolerate that a days worth of user data might be lost), because then you could just do the migration on the backup, upload it to AWS or GCS, and then restore it, blowing away any changes since the initial backup.  

Manipulating backups though might be a bit heavy weight though if your dataset is not that big yet. You could just try and download all your data from REST api and re-upload it (using the export flag, see https://www.firebase.com/docs/rest/api/#section-param-format). Big reads and writes do pause the namespace though for the length of the operation (which backups don't suffer from). However, we can export data through the REST api very fast if you have the bandwidth, so to minimize pausing you could try downloading to a Google Cloud VM, which is near the servers and also has huge bandwidth. I have not tested this but I would expect transfers in the region of 100Mb per second this way. So if your dataset was 1Gb it would interrupt operations for 10 seconds to get the legacy data and another 10 seconds to upload it. So if you really tuned your migration code you might manage a 1 minute turnaround during which you had degraded service. You could totally turn off your app by setting your security rules to "write":false, and doing the upload using an admin account (bypassing security rules), which would totally eliminates the possibility of a live client doing something between the download and upload step. Note security rules can be uploaded via REST api too (https://www.firebase.com/docs/rest/api/#section-security-rules)


Tom



John Lim

unread,
Feb 28, 2016, 8:56:53 PM2/28/16
to Firebase Google Group
Hi Tom,

Thank you for the detailed reply. The last solution you proposed seems to be a viable approach for me. As I'm not familiar with Google Cloud VM and all of firebase REST capabilities, I'll follow up on this post once I have had the chance to experiment on the recommended approach :)

Thanks once again for your kind help on this.

Regards,
John
Reply all
Reply to author
Forward
0 new messages