How to handle zero downtime deployments with vert.x?

614 views
Skip to first unread message

Daniel Lindberg

unread,
Sep 10, 2015, 10:43:05 AM9/10/15
to vert.x
Perhaps this question is not really related specifically to vert.x, but here goes. 

Let's say that I want an architecture that gives me the flexibility to do zero downtime deployments.
One possibibility (I think) would be to deploy my updated verticle next to my existing verticle, and then undeploy the old version of the verticle. 
If both of these are consuming messages (point to point) from the event bus, it would mean that I have two verticles receiving messages in a round robin fashion for the short duration that both work in parallell. The function would be slightly different depending on which of the verticles that receive the message, but the service would at least not be interrupted. 

Now let's say that I have a REST service verticle that serves requests on port 8080. I wouldn't be able to just deploy another version of this verticle next to the existing one, as this would cause a bind exception on port 8080. Since I don't want to interrupt the service, I cannot simply undeploy the existing verticle before deploying the new verticle, as this would mean this would cause an interruption, although a short one. 

What would be my options here? Can I replace my existing verticle without interrupting the service? Any best practices?

bytor99999

unread,
Sep 10, 2015, 12:54:15 PM9/10/15
to vert.x
You might want to read near the bottom of this section of the docs..


Talks about how Vert.x does magic for sharing port numbers.

Mark

Daniel Lindberg

unread,
Sep 10, 2015, 3:10:12 PM9/10/15
to vert.x
I'm talking about a scenario where I already have one vert.x instance running with one verticle, and then deploy a new vert.x instance with an updated verticle in it. My understanding of the "magic sharing" is that it only works if I deploy more than one verticle instance inside the same vert.x instance, i.e. the same JVM. If I would try to deploy another vert.x. instance, that would run in a separate JVM and thus it would give a BindExceltion anyway. 

At least that is my takeaway from the discussion in this thread, https://groups.google.com/d/msg/vertx/x3U2H24T518/tVUDAytMdoIJ (link to an answer from Tim Fox)

bytor99999

unread,
Sep 11, 2015, 1:48:58 PM9/11/15
to vert.x


On Thursday, September 10, 2015 at 12:10:12 PM UTC-7, Daniel Lindberg wrote:
I'm talking about a scenario where I already have one vert.x instance running with one verticle, and then deploy a new vert.x instance with an updated verticle in it. My understanding of the "magic sharing" is that it only works if I deploy more than one verticle instance inside the same vert.x instance, i.e. the same JVM. If I would try to deploy another vert.x. instance, that would run in a separate JVM and thus it would give a BindExceltion anyway. 

That is true. It sounded like you were asking about Verticle deployed instance count, not Vert.x instances.

Mark

Krzysztof Kowalczyk

unread,
Sep 16, 2015, 9:08:38 AM9/16/15
to vert.x
Hi Daniel,

Following is just a (random) list of problems that I can think of. Most likely you know most of them already. I don't think it is complete. I hope it will be of any use.
The general idea that you go through all instances one, by one, deploy new one, register in load balancer, unregister old one from load balancer, stop internal processing, wait till it stop and kill the old instance, then go do the same with next one.

Load balancer
You need a cluster and a front end load balancer of some sort. It can be hardware, it can haproxy or another Vertx verticle, whatever works best for you. It can be many machines as well. This way you can bring new instances up on any port and register them in load balancer. Then you unregister the old version, wait for it to finish all requests (it is easy with HTTP, as there would be no new request, but in terms of internal communication and internal processing you need a stop switch) and kill it. If you have nice homogenic cluster you can stop machine first and then deploy new version so you don't have to double the ports or keep spare machine for rolling update. Be aware that some services need some warm up time and should be registered in load balancer after they are warm so they will not slow down the cluster.


Versioning of api
Are you enriching existing API or creating a new one? New one is simple, just deploy it on different path - use version in the path.
But if you are adding new features to existing API, you have 2 verticles serving different things on exactly the same path and you have limited control over who will get what. So old verticle can start request, but new verticle will serve part of it to finally end up on old verticle again. Thus if you are not versioning the external api (and you are not able to route based on version) the changes need to be compatible backward and forward.
In practice you most likely will get all 3 cases eventually:
1) bugfixes update - no change of API, no new fields, just different internal logic
2) small additions - new data that should be ignored but not lost by old version
3) big changes - new versions

The hardest point IMHO is nr 2.
You need to test API compatibility, always, before deploying. Can this addition will kill our users? What will happen if half of the flow will be run on new and half on old? 
Fortunatelly when using schemaless stuff, life is easier here.


Versioning of internal messaging
Although in most cases you cannot force people to use different version of your API for each change. You can version internal API much finer. Then the new version (let sey 2.1.1234) do all it's calls to new version endpoints (internal rest calls, websockets, eventbus, :8080/v2.1/service) and old run stuff on old paths, so none is conflicting with another. 
In some cases (especially when we have persistence ;) - jms, database, etc.) you might need to have the new version also handle the old data / communication. Thus the new version is also listening to v2.0 and modernize stuff once it finds it.
So we might need to have support new and old logic in single version of app and then you need to have a way to handle that in your code.

Format of data / shared cluster data / hazelcast etc.
As said, option 2 - small additions can be most complex. If we talking about small change that should not "change" the major version of API, we need to be sure that new / different data that we send over REST, JMS, Eventbus, memory grid, database will not kill the old client. Or if both version use same data, that no data will be lost (on read new data by old version and save it back). In such case you need schemaless / not validated files, or schemas that allow for all tags/fields that they don't know. 
There are two main issues here:
1) you try to parse/validate the data and your client/old version of app throw exception
2) you try to read and right new entity with old version and you loose data.

1) is simple to solve, just allow for unknown fields in your input data. If you have completely different formats you need to version them.
2) is harder, and is worst if you have in memory data grid like Hazelcast which will spread all the data across all the machines in you cluster, and you don't want to loose data when it is being replicated or read/write on old machine. There are ways to support that, some more sophisticated - http://download.oracle.com/otn_hosted_doc/coherence/342/com/tangosol/io/Evolvable.html
some are simpler. The idea is that you use the object in it's native form if you can (json, xml) or if you need to parse / reformat it (i.e. map to Java class) - keep the stuff that you do not understand in a byte array (Protobuf, Kryo, Hazelcast, Coherence). This might be supported by your mapping technology, i.e. https://blogs.oracle.com/felcey/entry/coherence_rolling_upgrades



When new version is deployed?
To finish the deployment you need to have all old nodes killed and any migration of data finished. In most cases you want to run the migration of data after the last old version have been killed, not sooner.
So here some questions: 
- do you serve new logic right after new version has one(2, 3) instances or do you think old version will brake something ? 
- do you need to wait for both - no old version running and data migration finished to switch from old logic to new logic?
- if you have to modernize data, can you do that in lazy way instead of batch, both? (then you may need to keep migration logic "forever")

You may want to do more than one deployment to deploy new version. I.e start with version that support old and new data, migrate, clean up stuff (remove columns in database), deploy simpler version that does not have old code.

Regards,
Krzysztof Kowalczyk

javadevmtl

unread,
Sep 16, 2015, 3:11:07 PM9/16/15
to vert.x
I had done something similar which worked quite good. 

I used one instance off vertx to create an http server, then I had a routing mechanism inside the http verticle instance that had a bunch of event bus address registered for each of my verticle that represented a "service/business logic" module. So the routing mechanism would receive requests from http server check to see if the address was in it's list as "active" and it would send to that verticle. This allowed me to mark addresses in the router as active or inactive. So once the addres marked inactive the router would no longer send to that verticle. Then I could check if that verticle had finished all operations and I could undeploy it and deploy a new one and go back to the router and mark the address as active again and voila.

Julien Viet

unread,
Sep 17, 2015, 2:35:38 AM9/17/15
to ve...@googlegroups.com, javadevmtl
it would be good to capture all these “best practices” in the wiki, perhaps we can create a page in this https://github.com/vert-x3/wiki/wiki for this purpose ?

-- 
Julien Viet
www.julienviet.com

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at http://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/b5bc5618-9338-44c3-a108-0449de38bfb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages