Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Changes to replication semantics on hg.mozilla.org

9 views
Skip to first unread message

Gregory Szorc

unread,
Jul 13, 2018, 4:37:13 PM7/13/18
to firefox-ci, dev-version-control
tl;dr the behavior of how https://hg.mozilla.org/ exposes data after a push
has changed and this may impact service consumers. This should be a
positive change for most. Read on for details.

5+ years ago, requests hitting https://hg.mozilla.org/ were served off an
NFS volume. As soon as a push was made to hg.mozilla.org, it was available
via the HTTP endpoint.

For various reasons, we changed this several years ago so that pushes to
hg.mozilla.org would asynchronously replicate independently to local
storage on N servers behind a load balancer fronting https://hg.mozilla.org.
Because HTTP requests are round-robined to N servers and the local storage
was independent, there were race conditions after a push where the state of
a repository could vary between servers. e.g. server A would have the new
push and server B would not. This could result in consumers seeing
inconsistent repository state at any given instance in time.

In reality, this wasn't a major issue because the servers were all
homogeneous and operating in a well-defined environment. So the
"inconsistency window" after a push was small - typically no more than a
few hundred milliseconds. Short enough that you probably wouldn't ever
notice.

But the migration of hg.mozilla.org to a different data center required us
to make servers non-homogeneous for a period of time. This made the
"inconsistency window" up to several seconds and caused disruptive
intermittent failures in CI (bug 1462323). And upcoming plans to host
Mercurial endpoints in AWS would suffer the same fate (since performance in
EC2 is highly varied).

With the help of glob, sheehan, and fubar, we've rolled out a change (bug
1470606) that should practically eliminate the inconsistency window after
pushes and make https://hg.mozilla.org/ expose atomic state at any given
instance in time.

Essentially, instead of an individual server exposing data once it has been
replicated, we wait until all servers behind the load balancer have
replicated the data. At that point, the new data is exposed by all servers.
i.e. new pushes are exposed once the slowest server has replicated them.

I know many consumers of hg.mozilla.org have been affected by the
inconsistency window in the past. People have had to add things like sleeps
and excessive retries to work around the issue. And developers and sheriffs
have been annoyed by failures that fall through the cracks (especially
those in the past few weeks). Hopefully now that the inconsistency window
is practically non-existent, these workarounds can be removed and we can
all enjoy a more reliable service.

We're still tracking down some minor fallout from the change. But for the
most part it appears things "just work." If you see anything wonky or want
to track work as a result of this change, please chain things up to bug
1470606.

I'd like to thank glob, sheehan, and fubar for helping with the design,
implementation, and rollout of this significant change.

If you are interested in learning more about how the replication works, it
is described at
https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/replication.html#architecture
.

Gregory

Dustin Mitchell

unread,
Jul 13, 2018, 4:41:30 PM7/13/18
to Gregory Szorc, dev-version-control, firefox-ci
Thanks to gps, glob, sheehan, and fubar for the hard work on this!

Users of Taskcluster were hit pretty hard by this, especially in try
pushes. While we have a solution in mind, we have not had the
engineering bandwidth to implement it, so I'm glad to know that the
issue will no longer cause users pain.

Dustin
> --
> You received this message because you are subscribed to the Google Groups
> "firefox-ci" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to firefox-ci+...@mozilla.com.
> To post to this group, send email to firef...@mozilla.com.
> To view this discussion on the web visit
> https://groups.google.com/a/mozilla.com/d/msgid/firefox-ci/CAJTgH0kM6EAFGoha6a8cTL06nwfMMcz6xP_MjZM4ExWS6tQo6A%40mail.gmail.com.
0 new messages