Deployments & migrations

Skip to first unread message

Mark Côté

May 29, 2014, 11:31:01 PM5/29/14
In the coming weeks, we plan on rolling out a couple breaking changes to
improve Pulse overall. I want to start a discussion now as to the needs
of consumers to determine the best way forward.

The first big change is to the exchange and queue names. Currently
exchanges are generally of the form<service>, and
there is no naming convention for queues at all. As discussed in the
thread "Proposed Pulse security model"[1] and recorded in the "Security
Model" section of the wiki page[2], we'll be moving to the form
exchange/<username>[/<service>] for exchanges and
queue/<username>/<queuename> for queues.

The second big change will be the launch of PulseGuardian[3], and, more
importantly, the deprecation and eventual removal of the "public" user
in favour of PulseGuardian-created users.

Since the first change affects existing producers but both changes
affect existing consumers, it probably makes sense to roll them out
together to minimize disruptions. Producers will have to be restarted
to start publishing to the new exchanges. Consumers will have to be
restarted to create new queues, following the new queue-name convention
and bound to the new exchanges, and to switch to the new
PulseGuardian-created users.

(For convenience, the mozillapulse package will be updated ahead of
time, so all that will be required of applications is to upgrade the
package and restart.)

Restarting the producers with the new configs may cause a few messages
to not be published. Restarting the consumers with the new configs may
cause a few messages to not be received. If necessary, there are a
variety of more complicated solutions we can pursue to mitigate this
situation, such as publishing to both the old and new exchanges for a
time, and leaving existing queues up for a time after creating new
queues (and somehow handling duplicate messages when transitioning from
the old queue to the new).

I would prefer to avoid all this extra work if at all possible. My
first question is whether it is acceptable for existing consumers to
miss a few messages (we are largely talking about BuildBot here, but
also Bugzfeed[4]). If so, my second question is whether we can agree
upon a time to restart all producers and consumers in concert, since,
after restarting, producers will break existing consumers until the
latter are restarted with the new configs.

PulseGuardian is currently going through security review, so we have a
bit of time to answer these questions before we're ready to flip the switch.



Mark Côté

Sep 5, 2014, 5:31:29 PM9/5/14
We are finally getting close to deployment of PulseGuardian and the new
security model. PulseGuardian has been updated to create users with the
proper permissions, and the mozillapulse library has been updated with
the new exchange- and queue-name formats (though the new code has not
been released to pypi yet, since it is a breaking change). The
remaining steps are as follows:

1. Deploy PulseGuardian (bug 1015037). Note that PulseGuardian will
replace the static content currently on; some
info will be missing until bug 1017957 is fixed. This isn't worth
blocking deployment though.

2. Existing consumers need to create users via PulseGuardian. As noted,
at some point after the switchover, the public user will be deleted.
Note that these users won't be useful until the publishers are updated
to use the new mozillapulse library.

3. Release new mozillapulse version.

4. Existing publishers need to upgrade their mozillapulse packages and
be restarted. This will cause them to start writing to the new
exchanges (e.g. exchange/build instead of

5. Existing consumers will also need to upgrade and restart in order to
start consuming from the new exchanges. This needs to be done after the
publishers are restarted since publishers create the exchanges and
consumers will error out if the target exchange is not present.
Ideally, however, this step should be done as closely following the
previous step as possible.

As mentioned in the original message in this thread, steps 4 and 5 will
potentially cause some messages to be lost. Depending on how the
publishers are written, they may miss whatever events they are listening
for while being restarted. When consumers are restarted, they may miss
messages since they will have to recreate any durable queues to bind to
the new exchanges. This could be worked around by modifying consumer
apps to listen to both the original queue (with the public user) and the
new queue (with the new PulseGuardian-created user), since no messages
will be duplicated between queues, but I imagine this will be more work
than it's worth for most, if not all, apps.

As I also mentioned, my preference is just to pick a single time to
restart everything, if possible and if missing a couple messages is
acceptable to all consumer apps. If you run an app with a Pulse
consumer, please reply and let me know if this is acceptable. I'd like
to pick a day and time to flip the switch very soon (like in a week or two).

Reply all
Reply to author
0 new messages