Re: [Mifos-developer] Proposal for near zero downtime when deploying new release in production. VERSION 2

Vivek Singh

unread,

Nov 19, 2010, 1:29:15 AM11/19/10

to Mifos software development

Thanks Udai and Artur for the feedback. I have improved the proposal based on your concerns/recommendations. It is marked in blue. Please let me know if I need to explain it a bit more or you disagree with something.

[This is to solve delays caused from database perspective]
Please rate it between 1-10 and also add your views what would take it to be 10.

terms

(I know we might be using some of the terms differently, but for reading this email please use the following. Thanks)

schema change: adding/changing/deleting any of the database elements (table, column, index, constraint, view)
base data: data required by Mifos in its tables for it to function correctly
business data: data created by the use and running of the Mifos in production
database code: stored procedures, functions and triggers
data migration: transforming business data from one form to another

database upgrade: schema upgrade + base data change + database code upgrade + data migration

goal
near zero downtime for Mifos release deployment

simpler rollback of release when required

where we are today
We need to stop Mifos web server and batch jobs. Database is upgraded. New version of code is deployed. Web servers and batch jobs are started. Database upgrade can be slow depending on the volume of data and upgrade.

Database upgrade is triggered by starting the web server. This is rather complex approach and couples database upgrade to web server start. If there are cluster of servers accidentally started together then it can corrupt the database as well.

Some of the DataMigration/BaseData is carried out in the context of tomcat runtime using the production domain entities and services. This can go wrong when the release upgrade is not sequential. Using an example. If in release 1.7 we carried out some DataMigration/BaseData change. This would work when release is upgraded from 1.6 to 1.7. But when upgrading from 1.6 to 1.8 directly it might fail. The reason being the code used to perform the upgrade would be for 1.8 release, where these services entities might have changed. Such things can easily fall in the blind spot of testers as well. This can also be significantly slower than plain sql based approach.

Mifos is maintaining its own tool for managing database changes.
Web servers at the startup check the revision number of database against the code's revision.
We defragment the change scripts into categorized scripts for our understanding, new deployments and development environments.

proposal
Use dbdeploy instead of maintaining our own tool. (dbdeploy works based on numbers which means that one needs to bit more to manage script numbers across branches. As an implementation detail I have avoided it. If you feel concerned about it please ask for the detail. It is a mature tool used by majority of Java projects in ThoughtWorks for last 3-4 years. It was winner for Jolt productivity award in 2007. It is not extremely active though. I think they have moved it to googlecode because of which older releases cannot be seen. Other tool recommendations are welcome). liquibase as a tool can also be option but I have one big reservation about it. Use of this tool would introduce another language (xml based). It would take us one step away from know exactly what sql would run against the database. This seems to be unnecessary without any value add over sql. This also does reverse some of the things in this story http://mifosforge.jira.com/browse/MIFOS-2896. The reasons have been provided at different places in the proposal. Regarding branch based schema changes we would use allocation of numbers for different releases so that upgrade can be run in specific order. e.g. 100 to 110 for 1.7, 110 to 120 for 1.8 etc, ofcourse we would need a bigger range. This is a standard pattern used. 2896 mentions that liquibase solves it, I couldn't find it where it documented.

Start writing undo scripts explained here for every upgrade except data migration and database code.
Do not use Java code to do data upgrade for performance reasons. Even at the cost of duplicating we should try to avoid it. Infact, it more of reuse question than duplication. If it cannot be avoided then version control the old runnable version of migration. One way to do this is to not depend on tomcat context and create a plain Java executable. Once the release is over the concerned jar files should be commited to a well known location in source control. This can be painful and that is the idea so that sql option is explored first.

Continue doing 5&6 above.
Categorize our database scripts as expansion and contraction scripts explained here (http://exortech.com/blog/2009/02/01/weekly-release-blog-11-zero-downtime-database-deployment/).

Change the high level release steps to a) review database upgrade (both do and undo) scripts b) run expansion scripts and data migration scripts when system is in use, preferably in offhours c) deploy the code and database code (only) d) wait for the release to become stable e) run contraction scripts (if stable) or undo of expansion (if unstable) [Please note that database code is treat in the same way as source code.]

When we get into situations where database operations are less per-formant and have impact with running system then we can take a case by case call. There are some good tips here and here.

It is possible that there might be some issues in application when run against expansion script. For example, if the sql fired to database depends on the order of column (unlikely because of hibernate mostly). I cannot think of any other scenario. If there are some then we can run the unit/integration/functional tests against expansion scripts as well as complete upgrade scripts.

Vivek Singh | +91 98452 32929 | http://sites.google.com/site/petmongrels

Udai Gupta

unread,

Nov 19, 2010, 3:15:29 AM11/19/10

to Mifos software development

Hi Vivek,

I really like all the points that you have mentioned in the proposal.
I just have little resistance in my mind. I will put some short notes
so that this mail doesn't go very long.

NSDU (non sequencial database upgrades) might be a misleading term for
current upgrade mechanism in Mifos. It uses Unix timestamp which an
incremental number (like 100 -110 for but with larger range). The
upgrades are applied in sequence of increasing Unix timestamp number
assigned to these upgrades. It is not easy to determine the position
of an upgrade based on timestamp but if you know all the numbers for
the upgrades it's easily visible. It gives flexibility of merging two
branches and eliminates the worry about upgrade number collision
during merge. It might matter what the order of those upgrades are we
merge , So a formal review process might be able help in that.

Before NSDU, we were using numbers like 100-110 for 1.7 etc, the
benefit it gives probably is that you know that between 100 - 110
there are 10 upgrades.

I don't see benefits in moving from NSDU to dbdeploy or Liquibase.
Would you list some of the benefits that you think might be useful to
reduce downtime.
except both the framework
- provides external execution capability.
- will reduce some code from Mifos to be maintained.

In summary, I would still stick with NSDU in Mifos if there not much
to gain in the context of reducing downtime,

Improvements that you suggested should be done anyway.
- Before release upgrade review
- Don't use Java based upgrades as much as possible
- Categorisation of upgrades and pre/post deploy scheme
- Create undo scripts

I wonder if there is any software which has already solved the problem
of zero downtime during upgrade for all of it's users (with large
database like 100 GB). I think that we can reduce the downtime by best
practices but it won't be near zero for large databases unless we go
for more tedious approaches like online shema changes or versioning
database or archiving data to reduce the size of database.

Thanks,
Udai

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev

Vivek Singh

unread,

Nov 23, 2010, 1:21:55 AM11/23/10

to Mifos software development

>> It gives flexibility of merging two branches and eliminates the worry about upgrade number collision during merge. It might matter what the order of those upgrades are we merge , So a formal review process might be able help in that.

There is a simpler option than using review to figure what goes first. The same technique used for avoiding number collision can be used to here. The idea is to use slots. So with NSDU it would be like 100+timestamp, 200+timestamp for different releases in chronological order. Always apply from lower to higher (as dbdeploy does). We can achieve this via a mvn task for creating an empty sql file, as done by rails migrations rake task.

>> I don't see benefits in moving from NSDU to dbdeploy or Liquibase. Would you list some of the benefits that you think might be useful to reduce downtime.
except both the framework
- provides external execution capability.
- will reduce some code from Mifos to be maintained.

Apart from not executing upgrade from within tomcat context and us not maintaining codebase for NSDU, dbdeploy also provides mechanism to provide rollbacks, sequential upgrade (simple to understand) and its widely used.

Since the tool is simple it would not take very long for any one to write something like this. So if we want to continue with NSDU with we should implement these features and also delink it from mifos codebase. More like another open source project. Would you do that.

>> I think that we can reduce the downtime by best practices but it won't be near zero for large databases unless we go for more tedious approaches like online shema changes or versioning database or archiving data to reduce the size of database.

For first few releases lets work closely (deployment and development team) to keep focus not only on upgrading but also keeping the downtime low. Once we have real life examples we would learn new techniques.

--

Vivek Singh

unread,

Nov 23, 2010, 2:30:47 AM11/23/10

to Mifos software development

>> I think that we can reduce the downtime by best practices but it won't be near zero for large databases unless we go for more tedious approaches like online shema changes or versioning database or archiving data to reduce the size of database.

I should mention that since expansion script would be used on an online system (may be at off hours if possible) there would be no downtime whatsoever is this phase. There might be slowdown depending on how mysql handles it and kind of expansion being run. So the only downtime is incurred by deploying the non-database stuff. In a clustered environment near zero downtime can be achieved.

Adam Feuer

unread,

Nov 24, 2010, 10:19:33 PM11/24/10

to Mifos software development

On Mon, Nov 22, 2010 at 10:21 PM, Vivek Singh <vsi...@thoughtworks.com> wrote:
> There is a simpler option than using review to figure what goes first. The
> same technique used for avoiding number collision can be used to here. The
> idea is to use slots. So with NSDU it would be like 100+timestamp,
> 200+timestamp for different releases in chronological order. Always apply
> from lower to higher (as dbdeploy does).

Vivek,

This numbering scheme seems cumbersome. Liquibase is an open source
project that does something similar to dbdeploy, but has the concept
of database "changesets" rather than "versions." This is similar to
NDSU's list of upgrades.

http://www.liquibase.org/

This matches better with our branching strategy - since any branch can
have any collection of changesets, rather than having to remember and
maintain a "chart of account" within the version numbers.

The other advantages of using an open source project you mention are
great - not maintaining the code and having a specialized software
made for the purpose of database refactoring.

Liquibase interoperates with Spring:

http://www.liquibase.org/manual/spring

So it could be used as part of the proposal you made, to make Mifos
software run against multiple database versions, do expand and
contract database updates or refactorings.

-adam
--
Adam Feuer <adamf at pobox dot com>

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev

Adam Feuer

unread,

Nov 24, 2010, 10:26:21 PM11/24/10

to Mifos software development

On Thu, Nov 18, 2010 at 10:29 PM, Vivek Singh <vsi...@thoughtworks.com> wrote:
>> 2896 mentions that liquibase solves it, I couldn't
>> find it where it documented.

Vivek,

Liquibase solves this by using changesets instead of version numbers, I think:

http://www.liquibase.org/manual/changeset

Each changeset specifies a list of transformations to run on the database.

Adam Feuer

unread,

Nov 24, 2010, 10:38:13 PM11/24/10

to Mifos software development

On Thu, Nov 18, 2010 at 10:29 PM, Vivek Singh <vsi...@thoughtworks.com> wrote:

>> Please rate it between 1-10 and also add your views what would take it to
>> be 10.

Vivek,

I rate your "seamless database upgrades" proposal a 7. I liked:

* That it proposes an actionable plan to get seamless upgrades.
* The idea of online upgrades - that upgrades can be done while Mifos
is running.
* The idea of expansion and contraction transformations - that Mifos
can run against multiple db versions, and destructive modifications
(contraction) are done only when the additive (expansion)
transformations are done and tested.
* That the use of external open source tool (dbdeploy or Liquibase)
will be used to manage the upgrades.

To be a 10:

* Explain in detail how the version number namespace will be used to
manage multiple branches - make it match the Mifos branching strategy.
We have multiple branches being maintained at any one time, 1.6, 1.7,
etc. - many of them. We also encourage experimental branches that may
be short or long lived, now that we use git. Ideally we'd have a
database upgrade system that made branching really easy too.
* Explain why "undo" scripts need to be generated and tested - I think
this is because rollbacks only occur on running systems that have had
data entered into them, so they need to be migrated back to their
starting state if there is a failure. But this wasn't clear.
* Say more about why Liquibase's XML configuration language is bad -
Spring uses one, and many tools have their own DSLs. The overhead
doesn't seem high to me.

Van Mittal-Henkle

unread,

Nov 25, 2010, 1:59:38 AM11/25/10

to Mifos software development

Vivek,

My feedback is based on the version of this proposal below.

I rate this proposal a 6 out of 10.

What I liked about it:

* it eliminates database schema & data evolution code from Mifos. A major goal for the project as we move forward is to have Mifos code focus on microfinance as much as possible and leverage existing open source tools for infrastructure work. This would move us in that direction.

* database upgrades can be run externally. Whether we move to another toolkit or not, we should support this.

* it points out a current flaw in Java based upgrades (the need for snapshot in time of classes for a given upgrade)

* it introduces the idea of expansion and contraction scripts which seems useful.

To make it a 10:

* make clear that eliminating java upgrades when using dbdeploy will mean introducing stored procedures for cases with complex business logic and that these stored procedures would potentially need to duplicate application business logic.

* outline how dbdeploy could be extended to include non-sequential upgrades or lay out how sequential upgrades could be effectively managed. In the past we found that the requirement of sequential upgrades was a limiting factor. For example, 2 teams would be working on disparate areas of Mifos, but each requiring an upgrade. It would be unclear which team would finish first. Both teams would use the same upgrade number. The second team to finish would then need to change their upgrade number before committing. Another scenario is that an upgrade needs to be introduced on a release branch and then merged into the head (which already has additional upgrades)—how would this be done effectively with sequential upgrades?

* elaborate on the downgrade feature. Initially we supported this in the current Mifos database upgrade mechanism, but it was abandoned because the cost of writing downgrades was high. For simple upgrades, the downgrade is also simple. However, we found that for complex upgrades, the downgrade can be significantly more complex than the upgrade. Are the use cases for downgrades compelling enough to justify reintroducing them now? Is a use case for downgrades to allow downgrades after new data has been entered into mifos post upgrade? If so, comment on how we would we build and maintain automated tests that exercise this.

* expansion and contraction scripts sound good. My concern is how to enforce the semantics of them in an automated way. If we rely only on convention, sooner or later it will be broken as contributors come and go to the project it is not certain that breaking the convention would be detected. Would there be an effective automated way to enforce these conventions?

* the ideas mentioned in the proposal sound good and like they will make the database upgrade mechanism more industrial strength and bulletproof. These are good things. I like them. However, in total the proposal gives me the feeling of something that will slow down development that involves database upgrades. Can you comment on the cost of implementing these ideas on development speed? If you think it won’t slow us down, why is that? If it will slow us down, is it worth it and is now the right time to incur that cost?

--Van

----

Van Mittal-Henkle

Mifos Software Developer

Grameen Foundation

va...@grameenfoundation.org

John Woodlock

unread,

Nov 25, 2010, 6:41:41 PM11/25/10

to Mifos software development

Vivek,

I rate the proposal a 9 out of 10.

What I liked about it:

Echoing many of the previous comments, I like the idea of decoupling the upgrade code from the mifos code base and from having to run the lot prior to upgrading to a new app release.

I've never liked the fact that the upgrade logic runs on start-up but I note that it does have the benefit that it does it all for you and you don't have to think about running separate upgrade steps :) Many apps seem to do it like this as well. Its only a pain if it is slow (subjective though that is).

To make it a 10:

As said before I liked the ideas and the possibility of using something like dbdeploy but then I got scared of how much the implementation will cost. This fear is based on ignorance :) but implementations can always go awry and the ideas involve quite a bit of change even outside of dbdeploy/liquibase. I acknowledge that if people have worked a certain way in the past they wouldn't have this fear and I don't particularly want to get in their way.

I wonder can we get 80% of the benefit with 20% of the implementation i.e. leave mifos pretty much as it is but give MFIs (hosted or not) the option of doing any big time wasters (like indexes) prior to the application upgrade? So, we'd provide something which allowed them to do the big 'expansion' pieces of work if they wished and a mifos app upgrade would enhanced to be able to identify that it has been done or not.

So, if we end up doing upgrades that 'only take a minute' even for GK then I reckon no-one would bother running it prior but if we want to add indexes or columns or other big things they might jump at it. I remember a recent release had a step to make a large number of decimal fields bigger and ran for day(s) at GK. I think they did do a fancy workaround involving replication but it would have been nice to allow them to do these one or two at a time way before upgrade.

Possibly similar for contraction work e.g. a script which back fixes data... but only relevant once the db is upgraded.... there might be an option that mifos doesn't make this mandatory but allows the user to run this at their leisure if they wish. So the user takes on the extra maintenance responsibility in exchange for much reduced downtime.

I have to admit I wanted to add/remove an index or two from table account_trxn (which can be huge) but I bottled out of putting it in 1.7 because 1.7 seemed 'big enough' already.

John

Reply all

Reply to author

Forward