[Mifos-developer] Proposal for "seamless database upgrades" when deploying new release in production. VERSION 3

33 views
Skip to first unread message

Vivek Singh

unread,
Nov 26, 2010, 5:28:24 AM11/26/10
to Mifos software development
[This is to solve delays caused from database perspective]
Please rate it between 1-10 and also add your views what would take it to be 10.

terms
(I know we might be using some of the terms differently, but for reading this email please use the following. Thanks)
schema change: adding/changing/deleting any of the database elements (table, column, index, constraint, view)
base data: data required by Mifos in its tables for it to function correctly
business data: data created by the use and running of the Mifos in production
database code: stored procedures, functions and triggers
data migration: transforming business data from one form to another
database upgrade: schema upgrade + base data change + database code upgrade + data migration

goal
near zero downtime for Mifos release deployment
simpler rollback of release when required

where we are today
  1. We need to stop Mifos web server and batch jobs. Database is upgraded. New version of code is deployed. Web servers and batch jobs are started. Database upgrade can be slow depending on the volume of data and upgrade.
  2. Database upgrade is triggered by starting the web server. This is rather complex approach and couples database upgrade to web server start. If there are cluster of servers accidentally started together then it can corrupt the database as well.
  3. Some of the DataMigration/BaseData is carried out in the context of tomcat runtime using the production domain entities and services. This can go wrong when the release upgrade is not sequential. Using an example. If in release 1.7 we carried out some DataMigration/BaseData change. This would work when release is upgraded from 1.6 to 1.7. But when upgrading from 1.6 to 1.8 directly it might fail. The reason being the code used to perform the upgrade would be for 1.8 release, where these services entities might have changed. Such things can easily fall in the blind spot of testers as well. This can also be significantly slower than plain sql based approach.
  4. Mifos is maintaining its own tool for managing database changes.
  5. Web servers at the startup check the revision number of database against the code's revision.
  6. We defragment the change scripts into categorized scripts for our understanding, new deployments and development environments.
proposal
  1. Use liquibase instead of maintaining our own tool. liquibase provides features like changeset, include changeset, to organize database code and control the execution. liquibase also allows to view the sql scripts which would be executed when run. (It would have been simpler if it just used sql instead of adding the language layer). This does reverse some of the things in this story http://mifosforge.jira.com/browse/MIFOS-2896. The reasons have been provided at different places in the proposal.
  2. Changeset from different releases can be organized using include change set mechanism. This also allows to control the order of execution across change logs by ordering them. The change logs can be arranged based on the order in which the go to production. Change set identification needs to be unique within a change log not across all change logs so conflicts would not happen. This means there would be two change logs for one release (expansion and contraction). When there are multiple branches working on it them it would be merged like the code merge. If there are error because of dependency between change sets in a change log they would get caught when database changes are applied using the build. For example by release 2.0 it might look like: 1.8-expansion.xml, 1.8-contraction.xml, 1.9-expansion.xml, 1.9-contraction.xml, trunk-expansion.xml and trunk-contraction.xml. There would also be expansion.xml and contraction.xml which would include expansion and contraction change logs for all releases. [I am not sure whether liquibase would allow two databaseChangeLog for one database. In which case we can explore getting the sql and running it ourselves]
  3. Liquibase supports setting up test data. Since this can cause significant bloat because of being in xml, we should stick use sql for it as we do now.
  4. It is a good practice to have undo scripts for every do. The reason being if the production build has to be rolled back in production, cooking up undo script then might take time. The reason a simple restore of database from backup take before release is because there might be a time lag. e.g. The release done on 19/10/10 if rolled back on 20/10/10 should not erase the transactional changes made in one day. So we should start writing undo scripts explained here for every upgrade except data migration and database code. As far as writing them is concerned in a some or lot, depending on change types, liquibase can provide automatic rollback scripts.
  5. Continuously test undo scripts. After running acceptance tests the build should run undo scripts to see whether they are syntactically correct. The build should fail if it is not.
  6. Manually test undo scripts before the release to check whether the are not only syntactically but logically correct as well.
  7. Do not use Java code to do data upgrade for performance reasons and point 3 in above section. Even at the cost of duplicating we should try to avoid it. Infact, it is generally more of a reuse question than duplication. If it cannot be avoided then version control the old runnable version of migration. One way to do this is to not depend on tomcat context and create a plain Java executable. Spring support in liquibase can be of use here. Once the release is over the concerned jar files should be commited to a well known location in source control. This can be painful and that is the idea so that sql option is explored first. In some cases it might mean we have to resort to stored procedures to efficiently do this.
  8. Continue doing 5&6 above.
  9. Categorize our change sets as expansion and contraction explained here (http://exortech.com/blog/2009/02/01/weekly-release-blog-11-zero-downtime-database-deployment/). It can be automatically ensured by doing a keyword search in expansion/contraction changelogs.
  10. Change the high level release steps to a) review database upgrade (both do and undo) scripts b) run expansion scripts and data migration scripts when system is in use, preferably in offhours c) deploy the code and database code (only) d) wait for the release to become stable e) run contraction scripts (if stable) or undo of expansion (if unstable) [Please note that database code (stored procedures are treated in the same way as source code and not driven using liquibase]
  11. When we get into situations where database operations are less per-formant and have impact with running system then we can take a case by case call. There are some good tips here and here.
  12. It is possible that there might be some issues in application when run against expansion scripts. For example, if the sql fired to database depends on the order of column (unlikely because of hibernate mostly). I cannot think of any other scenario. If there are some then we can run the unit/integration/functional tests against expansion scripts as well as complete upgrade scripts.
  13. There are following events two which we have to respond on a regular basis. (some of it can be simplified if we know in advance the branch name)
    • Just before release branch creation: On trunk, rename trunk-expansion.xml to release.number-expansion.xml (same for contraction). Also change the names in expansion.xml and contraction.xml.
    • Immediately after branch creation: On trunk, create new trunk-expansion.xml (and contraction) file. Also add the names in expansion.xml and contraction.xml.
After we finish the perfection game, I would create the stories. I would try to keep them fine grained so that we can prioritize them. I would send the email to the group so that we can may be vote on which stories which should implement which we shouldn't depending on estimate and cost of implementation concerns.

Jakub Sławiński

unread,
Nov 29, 2010, 6:46:46 AM11/29/10
to mifos-d...@lists.sourceforge.net

Hi Vivek,

I rate the proposal a 7 out of 10.

What I liked about it:
 - It describes where we are today in a great detail
 - It stress the fact that we should avoid Java upgrades
 - It introduces different types of upgrades

What I think will make it 10:
 - I do not like the idea of maintaining upgrades between non sequential releases (for example from 1.6 to 1.8) without a serious business need. In my opinion it might be very costly without any visible gain. Can you add some arguments why we need to support such upgrades?
 - I do not like the idea of maintaining undo scripts without a serious business need. It will really slow down development and introduce a lot of new work to QA team.
 - Can you describe in a more detail what is wrong with having our own tool for managing database upgrades? As far as I know the current system works out during current release development (several independent teams working on many branches) and we didn't have to change anything in it.
 - Can you extend point 11th? I think the current system works quite well when all upgrades are simple and fast. There are only problems with bigger upgrades and you wrote, that such upgrades should be considered in a case by case model. This means, that introducing new tools will not solve our real problem - what to do with upgrades that take a very long time (several hours for example) and cannot be performed when the system is running (because the upgrade modifies data in important tables for example).


Regards,
  Jakub.
------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev

Keith Woodlock

unread,
Nov 29, 2010, 11:56:25 AM11/29/10
to Mifos software development
Vivek,

I rate the proposal a 7 out of 10.

What I liked about it:

- It provides a good summary of where we are today.
- It stresses the fact that we should avoid Java upgrades
- It stresses the fact we should decouple the application from
database upgrade process.
- The seperation of 'contraction' and 'expansion' maybe useful

What I think will make it 10:

Indicate more clearly how we can get from where we are today to the
proposed goal of 'seamless database upgrades'. What I mean is from the
13 points listed, which do you think we can start doing today? Any
quick wins? What points would we need to get to before reaching our
end goal?

I would like to understand how Liquibase is now the tool of choice? I
understand there has being talk of it for a while now but I don't know
if anyone on the team has expierence of the tool or performed any
spike and indicated any learnings to the team.

I would like to understand how important it is to have near zero down
time (Is this mostly driven by cloud customers?)

Undo scripts: Are they needed for 'rolling back' to previous state.
Are there other feasible approaches?

I also feel this proposal is getting quiet big. Is there any value/way
in splitting it up a bit?

Keith.

Vivek Singh

unread,
Nov 30, 2010, 1:52:02 AM11/30/10
to Mifos software development
Thanks for your response.
>> I do not like the idea of maintaining upgrades between non sequential releases (for example from 1.6 to 1.8) without a serious business need. In my opinion it might be very costly without any visible gain. Can you add some arguments why we need to support such upgrades?
We have some deployments (GK e.g.) who are one more than one release behind.

>> Can you describe in a more detail what is wrong with having our own tool for managing database upgrades? As far as I know the current system works out during current release development (several independent teams working on many branches) and we didn't have to change anything in it.
 - Can you extend point 11th? I think the current system works quite well when all upgrades are simple and fast. There are only problems with bigger upgrades and you wrote, that such upgrades should be considered in a case by case model. This means, that introducing new tools will not solve our real problem - what to do with upgrades that take a very long time (several hours for example) and cannot be performed when the system is running (because the upgrade modifies data in important tables for example).

I do not have anything new to add apart from the ones I have already put.

2010/11/29 Jakub Sławiński <jslaw...@soldevelo.com>

Vivek Singh

unread,
Nov 30, 2010, 1:56:42 AM11/30/10
to Mifos software development
Thanks for your comments.

>> Indicate more clearly how we can get from where we are today to the
proposed goal of 'seamless database upgrades'. What I mean is from the
13 points listed, which do you think we can start doing today? Any
quick wins? What points would we need to get to before reaching our
end goal?
We discussed in the dev call, yesterday and have mentioned in the proposal. I would create stories, with rough estimate and the we would be able to decide which ones to do and when.


>> I would like to understand how Liquibase is now the tool of choice? I
understand there has being talk of it for a while now but I don't know
if anyone on the team has expierence of the tool or performed any
spike and indicated any learnings to the team.
I have never used Liquibase but I have been told in TW that it is a good tool. Do you have any specific concerns?

>> I would like to understand how important it is to have near zero down
time (Is this mostly driven by cloud customers?)
Adam Feuer might be able to answer it better.

Jakub Sławiński

unread,
Nov 30, 2010, 1:07:00 PM11/30/10
to mifos-d...@lists.sourceforge.net

Hi Vivek,

can you explain why GK cannot update their instance in several steps (including all intermediate releases)? In your example it would be only two upgrades (from 1.6 to 1.7 and from 1.7 to 1.8).

Moreover, can you elaborate a little bit more about why we need to create undo scripts? This looks like a completely new requirement and I wonder if we really need this.


I am not against the idea of using liquibase instead the current solution, but I would like to be sure that this additional effort will give us some noticeable benefits.

I can point out several issues related to database development that we have today:
 - we have a lot of dbunit datasets that need to be upgraded whenever we add new database upgrade
 - Java upgrades become broken, if we change anything in the database schema in the future sql upgrades (look at the issues/commits related to the following upgrades: 1286195484, 1288013750 and 1290720085)
 - some database upgrades on large databases simply take a lot of time (look at MIFOS-4239)


Does the liquibase solve any of the above issues?


Regards,
 Jakub.

Vivek Singh

unread,
Dec 1, 2010, 6:14:55 AM12/1/10
to Mifos software development
>> we have a lot of dbunit datasets that need to be upgraded whenever we add new database upgrade
Yes I think it is a big issue and I am working on a spike to find a solution to this.


>> Moreover, can you elaborate a little bit more about why we need to create undo scripts? This looks like a completely new requirement and I wonder if we really need this.
<from-proposal>It is a good practice to have undo scripts for every do. The reason being if the production build has to be rolled back in production, cooking up undo script then might take time. The reason a simple restore of database from backup take before release is because there might be a time lag. e.g. The release done on 19/10/10 if rolled back on 20/10/10 should not erase the transactional changes made in one day. So we should start writing undo scripts explained here for every upgrade except data migration and database code. As far as writing them is concerned in a some or lot, depending on change types, liquibase can provide automatic rollback scripts.<from-proposal>

>> - Java upgrades become broken, if we change anything in the database schema in the future sql upgrades (look at the issues/commits related to the following upgrades: 1286195484, 1288013750 and 1290720085)
 - some database upgrades on large databases simply take a lot of time (look at MIFOS-4239)
Does the liquibase solve any of the above issues?
Liquibase is not meant to solve it, its not a silver bullet.

2010/11/30 Jakub Sławiński <jslaw...@soldevelo.com>

Adam Feuer

unread,
Dec 1, 2010, 10:46:23 AM12/1/10
to Mifos software development
Jakub said:
>>> Moreover, can you elaborate a little bit more about why we need to create
>>> undo scripts? This looks like a completely new requirement and I wonder if
>>> we really need this.

Note Vivek's previous mail answering my question about this - they
only undo the results of the previous expansion, so they will be
relatively easy - delete columns or tables. Once a contract has been
made, the undo won't work.

This is done so we can move to a new version, run for some time (a few
weeks, say) and then roll back to the previous version without much
problem. But you can't roll back through multiple versions.

-adam
--
Adam Feuer <adamf at pobox dot com>

kgam...@gmail.com

unread,
Dec 5, 2010, 1:18:17 AM12/5/10
to Mifos software development
Hello Vivek,
I give this interesting proposal a 8.

What I like about it
1. It aims at decoupling database upgrades from application upgrades.
2. It will support of goal of frequent releases
3. Liquibase supports branch based development

What it will take to make it a 10
1. show that we can do undo operations without incurring data loss. If this is not feasible, then we should not add the extra overhead of writing and testing undo scripts. Writing undo scripts however seems cheaper than I had imagined.

Other thoughts
1. Parallel run - Have you considered ways in which we could run multiple versions of mifos in parallel?
2. Which is cheaper? Given that we are increasing our QA efforts we are definitely reducing the likelihood of severe bugs in releases. Wouldn't it be cheaper fixing any issues that come up in production versus rolling back?

Regards,
Kojo

Adam Feuer

unread,
Dec 5, 2010, 3:03:06 PM12/5/10
to Mifos software development
On Sat, Dec 4, 2010 at 10:18 PM, <kgam...@gmail.com> wrote:
> 1. show that we can do undo operations without incurring data loss. If this
> is not feasible, then we should not add the extra overhead of writing and
> testing undo scripts.

Kojo,

Have you checked out the earlier link Vivek posted?

http://exortech.com/blog/2009/02/01/weekly-release-blog-11-zero-downtime-database-deployment/

The expand scripts will only add data - columns, tables, etc. To undo
this, one can simply delete whatever was added. Furthermore, as Vivek
said, Liquibase automatically generates rollbacks:

http://www.liquibase.org/manual/rollback

These can be used for the "expand-undo" scripts.

However, once a contract operation has been performed, rollback cannot
be done without custom work to generate data. The idea here is to run
for some time in in the expanded state to verify everything is well -
only then will the contract be performed.

> 2. Which is cheaper? Given that we are increasing our QA efforts we are
> definitely reducing the likelihood of severe bugs in releases. Wouldn't it
> be cheaper fixing any issues that come up in production versus rolling back?

It is very very difficult to fix some issues that are discovered in
production. Furthermore, many issues are discovered by MFIs only after
one or more weekly cycles have happened. This means that to roll back
using the current software, we would have to do one of two costly
things:

1. restore an old backup from a week or more in the past; then pay for
the MFI to re-enter all their data. Painful and slow for the MFI.

or

2. write data migration scripts that will undo the upgrade, while
preserving the week or more of data the MFI has entered. This would be
stressful and painful for us, and slow for the MFI.

We have experienced first hand the pain of both solutions, and do not
want to undergo that pain again.

While it would be great to have no bugs - and we are striving for that
- it is better to have backup systems that make bad bugs non-fatal to
us, and non-fatal to customers. This is especially important the more
customers we have.

Does that make sense?

cheers


adam
--
Adam Feuer <adamf at pobox dot com>

------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d

kgam...@gmail.com

unread,
Dec 6, 2010, 12:58:09 PM12/6/10
to Mifos software development
Hi Adam,
I think I have a better understanding of the expand - contract mechanism
after re-reading the link in the proposal. Correct me if I am wrong; The
database after expansion is compatible with two versions of the
application and after contraction, is only compatible with the latest
application version.

If my understanding is correct then this approach should not create data
loss issues so long as the scripts are well tested.

It gets a ten(10) from me in that case.

Kojo

Adam Monsen

unread,
Dec 9, 2010, 7:52:18 PM12/9/10
to mifos-d...@lists.sf.net
Vivek wrote:
> Please rate it between 1-10 and also add your views what would take
> it to be 10.

I give it a 9 out of 10.

I like
- that it will help Mifos sysadmins be much more confident when
upgrading Mifos.
- that it shows you understand what it will take to actually implement
seamless database upgrades.
- that it stresses the use of off-the-shelf tools (like liquibase)
rather than more custom code.

To make it a 10,
- add: "A Mifos sysadmin will clearly be able initiate and monitor the
upgrade (contract/expand) process", and "a Mifos developer will be able
to maintain the seamless upgrade system more easily and reliably than
the current upgrade system". This will stress simplicity and usability
from both developer and user perspective.
- add "the new (seamless) upgrade mechanism (minus the novel
expand/contract feature or UI) will require less custom code than the
current ("non-sequential database upgrades") mechanism".

Other comments:
- Look carefully at the current Java-based upgrades when estimating the
"seamless database upgrades" stories. Changing to SQL-only will require
a good deal of refactoring. There are a bunch of "upgrades" that do
stuff like conditionally fix problems in data. Also, this may require
refactoring of broken i18n code and custom labels (which needs to happen
anyway). IIRC Mifos permission changes are also done in Java-based
upgrades, but this might give us a chance to use more out-of-the-box
Spring Security. I just wanted to bring these up since they'll take
time. Saying "duplicate code" or "use stored procedures" is fine, but
there's a devil in those details.
- Great work!

signature.asc

Adam Feuer

unread,
Dec 9, 2010, 11:23:01 PM12/9/10
to Mifos software development
On Mon, Dec 6, 2010 at 9:58 AM, <kgam...@gmail.com> wrote:
> I think I have a better understanding of the expand - contract mechanism
> after re-reading the link in the proposal. Correct me if I am wrong; The
> database after expansion is compatible with two versions of the
> application and after contraction, is only compatible with the latest
> application version.

Yes, this is my understanding too.

> If my understanding is correct then this approach should not create data
> loss issues so long as the scripts are well tested.

Cool! That is our idea!

-adam

Reply all
Reply to author
Forward
0 new messages