Source control versioning and migrations

98 views
Skip to first unread message

Antonio Salazar

unread,
Nov 27, 2015, 12:52:55 PM11/27/15
to web2py-users
What are the best practices (or personal experiences) when dealing with database migrations and source control versioning?

I think database migration is one of the strong points of web2py, but what about migrations caused by checking out different branches?

Example:
  • Version A is in production
  • Version B creates a new field in an existing table, in development
User reports problem in version A. When switching from version C to version A, the new field is dropped from the database and its content is lost.

Richard Vézina

unread,
Nov 27, 2015, 1:53:57 PM11/27/15
to web2py-users
Hello Antonio, 

This is a vast subject... And all suggestions couldn't apply or you will not agree with all... But here a couple of them :

1) Web2py app versioning tool only allow tracking of a single branch... So, you could try to not branch your app development and only commit change that should hit trunck (stable) improvement. It really not a best practice, but it could avoid one of the problem you mention, having different db schema related to different branch. It not a best pratice since with git branching is encorage a lot... And a feature "should" requires a branch since you don't really know when you will deliver it, so you don't want to be stock with changes to be commit that are not ready for trunk... 
2) One of the great force of web2py as you mention is the DAL-model and migration that can allow db schema versionning which is rather a difficult things to achieve for a single developper or a little team... So if you keep you data out of the scope here... You can version you db schema if every change is apply to web2py dal models in the first place and you use web2py internal migration feature. But you have to keep migrating you data from one schema to another... So you hand up to coding migration script or and you should do this instead you develop ETL migration routine for your data. So your data can be see as a "flux" that you manage with ETL. That way you can put your data everywhere anytime in addtion to the versionning of your db schema.
3) About db schema inconsistencies that may arrive depending of the nature of the change apply to the model, you have to know that web2py migration will never delete anything and only force type change when there is no risk to do it. What it means is that there will be time where you db schema will not be in sync with you DAL model which also means that you will have to get down from the dal abstraction level and go to the backend and do what requires to be done (delete, rename, table or field, etc.). 

If you have a good ETL manipulating routine defined, you can create new DB instead of modifiying the same DB... So you can name your DB with the version of you app so you know that these DB schema is the same of the app dal model. Then you adapt you ETL migration routine and you insert your data into the refactored (or new) DB schema... If you only have testing data set to manage, you can also, have these data created by your app with a fixutre of this type :

if db(db.district_state_province).count() == 0:
     db.district_state_province.insert(district_state_province_name_en='Alabama',
         district_state_province_name_fr='Alabama',
         state_province_iso_code='AL',
         country_id=db(db.country.iso_3166_1_alpha_2_code=='US').select(db.country.id).first().id)
    db.commit()

Below your define table declaration... So, if table is empty, data get recreate on first request... This can be slow if there a lot of data to be creating though...

Hope these advices can help you...

I am not doing all of them, but if I were restarting a new project (or I take time to improve my actual practices), I will adopt this practice. Particuarly the ETL part...

Finally, about branching, thinking more to it, you could maybe name you DB to also include branch name, so you create a new DB each time your change hit the model definition...

The main idea is really to split schema and data in order to have the greatest flexibility in schema change and not have to write big data migration script when there is schema change. Having already a base ETL routine would make the refactoring of the routine simpler... It is an assomption... But it raises another problem, which is what the structure of the container of the data that the ETL routine... And I guess that it is the structure of the previous db schema...

NOTE : ETL = Extract - Transform = Load

Thanks

Richard

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Antonio Salazar

unread,
Nov 27, 2015, 2:35:44 PM11/27/15
to web2py-users
My projects are small enough that I can consider switching from migrated to legacy databases. I feared I was overlooking something, but it really seems like migrations, in some cases, will just shift the workload elsewhere.

Thanks for your insightful reply.

Dave S

unread,
Nov 30, 2015, 2:18:56 PM11/30/15
to web2py-users
My notes below:


On Friday, November 27, 2015 at 11:35:44 AM UTC-8, Antonio Salazar wrote:
My projects are small enough that I can consider switching from migrated to legacy databases. I feared I was overlooking something, but it really seems like migrations, in some cases, will just shift the workload elsewhere.

 [...]


On Friday, November 27, 2015 at 12:53:57 PM UTC-6, Richard wrote:
Hello Antonio, 

This is a vast subject... And all suggestions couldn't apply or you will not agree with all... But here a couple of them :
[...]
[If you have a good ETL manipulating routine defined, you can create new DB instead of modifiying the same DB... So you can name your DB with the version of you app so you know that these DB schema is the same of the app dal model. Then you adapt you ETL migration routine and you insert your data into the refactored (or new) DB schema... If you only have testing data set to manage, you can also, have these data created by your app with a fixutre of this type :

[...]
NOTE : ETL = Extract - Transform = Load


<URL:https://en.wikipedia.org/wiki/Extract,_transform,_load>

For many of us, Extract and Load can be handled either in the DB console or even (within web2py) by CSV export/import.  Especially for a dev database.  Since none of my dev DBs need to track anything like live data, just deleting the old DB and importing the new one gives me sufficient  ETL to play around.  This is also how I back up my databases.

Another thing to consider when working on old code after working on new code is to have a dev environment on a virtual machine (Box, or the home user version of VMware would cover my needs).  With Mercurial or Git (or even SVN) there is no particular problem with your working dirs being on different machines, so the VM shouldn't interfere with your code version control.

If you have a lot of tables, though, or a lot of data, an ETL vendor might be helpful.  I don't know how big or how complicated you need to be to reach that level.  I'm nowhere close, but perhaps Niphlod or Anthony have some insights.


/dps

 
Reply all
Reply to author
Forward
0 new messages