App migrations (level up from Database Migrations): opinions, tools, strategies?

24 views
Skip to first unread message

Bert Vanpeteghem

unread,
Oct 3, 2014, 8:44:53 AM10/3/14
to continuou...@googlegroups.com
Hi,

I realize this is a very vague question, but even some pointers for further reading would make me very happy.


Database migrations are well known, but I'm wondering if there exists such a concept as Application migrations?
Where each migration performs part of an application upgrade. This can be a database migration, but also file system changes, search index changes, ...

We are considering writing such a concept ourselves, but we are aware of the many pitfalls that might exist in such a framework: transactional, rollback, ...
So, we are wondering: is it discouraged, what are the alternatives, or do certain strategies exist, ...?

Bert Vanpeteghem

unread,
Oct 3, 2014, 8:59:04 AM10/3/14
to continuou...@googlegroups.com
While pondering over it: a lot of this could for instance happen in the deploy scripts, and perhaps execute a custom deploy script for this particular deploy.

However, this is a multitenant web application, and things might need to happen for each tenant. So while running from the deploy script, preferebly this should run in a context where the tenants are known. 
 

Aviran Mordo

unread,
Oct 4, 2014, 9:42:12 AM10/4/14
to continuou...@googlegroups.com
I'm not aware of the term "Application migration" this seems to me like refactoring. Basically what you need to do is to have you application forward and backward compatible so old methods could handle data from new methods and new methods could handle data from old ones. You may need to have an intermediate state where you handle the compatibility issues.
You should control the flows using feature toggles.
You can find a good example of how to do it here http://www.aviransplace.com/2013/03/27/continuous-delivery-part-3-feature-toggles/ where it shows how to gradually switch from one database to another

Kief Morris

unread,
Oct 6, 2014, 10:00:19 AM10/6/14
to continuou...@googlegroups.com
On Fri, Oct 3, 2014 at 1:59 PM, Bert Vanpeteghem <vanpeteg...@gmail.com> wrote:
While pondering over it: a lot of this could for instance happen in the deploy scripts, and perhaps execute a custom deploy script for this particular deploy.

However, this is a multitenant web application, and things might need to happen for each tenant. So while running from the deploy script, preferebly this should run in a context where the tenants are known. 

Without knowing more about your situation, this smells like something that should be addressed with the application design. Are you saying the database changes will vary by deployed instanced of the application? 

Generally speaking, you should decouple your data from releases. Database migration scripts can alter the structure of the database, and perhaps reference data or default values, but should preserve instance-specific data.

If you actually need to change the database structures per-instance, look to redesign your database and software so that you don't.

If I'm misreading your situation, please share more details.

Kief



Bert Vanpeteghem

unread,
Oct 8, 2014, 2:16:17 AM10/8/14
to continuou...@googlegroups.com, kiefm...@kief.com
Hi,

Yes, maybe my original question was a little too high level.

Let's have a simple use case.

We have a Database Migration that upgrades the database to add a column Email Address, and a column Notes to the Client table. Both will be populated with data for all records during the migration.
The Email Address also needs to be indexed in ElasticSearch/Lucene/Solr/..., but the Notes column does not need to be indexed.
Since there is existing data, the indexing step needs to happen in conjunction with the migration.

We cannot simply reindex completely after each migration or group of migrations. But, we do need to instruct the system to reindex, because of this migration.
So, there is another "migration" step, which does not affect the database, but another part of the application.


Would there be a usual approach for such a specific scenario?

Kief Morris

unread,
Oct 8, 2014, 4:03:07 AM10/8/14
to continuou...@googlegroups.com
 
We have a Database Migration that upgrades the database to add a column Email Address, and a column Notes to the Client table. Both will be populated with data for all records during the migration.

What data? Where does it come from? How is it different from the existing data you mention below?
 
The Email Address also needs to be indexed in ElasticSearch/Lucene/Solr/..., but the Notes column does not need to be indexed.
Since there is existing data, the indexing step needs to happen in conjunction with the migration.


We cannot simply reindex completely after each migration or group of migrations. But, we do need to instruct the system to reindex, because of this migration.
So, there is another "migration" step, which does not affect the database, but another part of the application.

Is this just triggering the application to rebuild its indices? That sounds like a normal thing to have in your deployment cycle. For example, for a fairly basic deployment process (which assumes the app needs to be stopped during upgrade):

1) Distribute new app binaries to servers
3) Put up holding page / disable monitoring
3) Stop app instances
4) Swap new app binaries into place
5) Run database restructuring script
6) Start app instances
7) Smoke test app instances
8) Trigger reindexing
9) Remove holding page / enable monitoring

So where does your process need to differ from this?

Kief


Bert Vanpeteghem

unread,
Oct 8, 2014, 4:22:01 AM10/8/14
to continuou...@googlegroups.com, kiefm...@kief.com


On Wednesday, October 8, 2014 10:03:07 AM UTC+2, Kief Morris wrote:
 
We have a Database Migration that upgrades the database to add a column Email Address, and a column Notes to the Client table. Both will be populated with data for all records during the migration.

What data? Where does it come from? How is it different from the existing data you mention below?

Maybe transformed data from existing data store, or CSV file? Does it matter? It is different because it is not already in the same datastore. So, the migration is: We add the field and want it populated.
 
 
The Email Address also needs to be indexed in ElasticSearch/Lucene/Solr/..., but the Notes column does not need to be indexed.
Since there is existing data, the indexing step needs to happen in conjunction with the migration.


We cannot simply reindex completely after each migration or group of migrations. But, we do need to instruct the system to reindex, because of this migration.
So, there is another "migration" step, which does not affect the database, but another part of the application.

Is this just triggering the application to rebuild its indices? That sounds like a normal thing to have in your deployment cycle. For example, for a fairly basic deployment process (which assumes the app needs to be stopped during upgrade):

OK, I was not aware that this is a usual approach to do a full reindex upon deploy. What if a full reindex is discouraged? How to approach this problem in that case?
 

1) Distribute new app binaries to servers
3) Put up holding page / disable monitoring
3) Stop app instances
4) Swap new app binaries into place
5) Run database restructuring script
6) Start app instances
7) Smoke test app instances
8) Trigger reindexing
9) Remove holding page / enable monitoring

So where does your process need to differ from this?

Indeed, very similar, but, as stated above, assuming we cannot trigger full reindexing, where does it fit in? It feels akward doing a full reindex just because we added a single field.


Kief


Kief Morris

unread,
Oct 8, 2014, 5:03:50 AM10/8/14
to continuou...@googlegroups.com
On Wednesday, October 8, 2014 10:03:07 AM UTC+2, Kief Morris wrote:
 
We have a Database Migration that upgrades the database to add a column Email Address, and a column Notes to the Client table. Both will be populated with data for all records during the migration.

What data? Where does it come from? How is it different from the existing data you mention below?

Maybe transformed data from existing data store, or CSV file? Does it matter? It is different because it is not already in the same datastore. So, the migration is: We add the field and want it populated.

Ah, I see why I'm confused - you're *actually* migrating data from one db instance to another, rather than using "migration" in the corrupted RoR sense. I read "database migration" and wrongly think "db schema transformation". http://en.wikipedia.org/wiki/Schema_migration

Maybe have a parameter passed into the deployment process that is an ID for the client instance. Then use that in the db migration phase to fetch the data to import. This way the deployment scripts can be generic, you won't need to change anything if you add a new client, other than making sure the data is made available somewhere that it can be retrieved using that ID.

For example, you might have data dumped to a standard location in a file server, e.g. /var/datadumps/$CLIENTID/dbdump-$CLIENTID-YYYYMMDDHHMMSS.tgz. The deployment would look for the latest matching version.

Does that make sense? Is it relevant to your situation?


We cannot simply reindex completely after each migration or group of migrations. But, we do need to instruct the system to reindex, because of this migration.
So, there is another "migration" step, which does not affect the database, but another part of the application.

Is this just triggering the application to rebuild its indices? That sounds like a normal thing to have in your deployment cycle. For example, for a fairly basic deployment process (which assumes the app needs to be stopped during upgrade):

OK, I was not aware that this is a usual approach to do a full reindex upon deploy. What if a full reindex is discouraged? How to approach this problem in that case?

Well, I was using this as an example of a post-install step, which could do different things. But you're right, it's not useful on its own if the index is conditional.

Assuming a simple conditional would work at that post-install stage "if $full_index == true then reindex" - then you could set it during the deployment process, assuming there's a way to programmatically work out whether it's needed. Otherwise, it could be set somewhere in the deployment package. 

So when you commit a code change that requires a full index, you also commit something that gets bundled into the deployment package to flag that the reindex is needed. You'd like to have a way to ensure it's only run once for that particular cause. 

There's probably a simpler way, but one approach using the dbdeploy pattern comes to mind. Have a table in the db, "reindex_needed". When you commit a code change that needs full reindexing, also add a schema transformation script that adds a row to this table, with a unique ID, if it doesn't exist already, and a "true" value. This should be applied along with any db schema changes. When the post-install phase happens, check this table for any rows with a "true" value. If one is found, run the reindexing and flip the value to false.

This could probably be done with files or other approaches, but the key is to track each commit that requires a reindex, and keep a history of which of these events has been acted on. This way, you can apply an upgrade to any instance, whether development or production, and be sure that reindexing only happens when needed.

I hope this makes some kind of sense and is helpful. Someone else might have a simpler/cleaner suggestion.

Kief


Reply all
Reply to author
Forward
0 new messages