Flag day mitigation

6 views
Skip to first unread message

Richard Newman

unread,
Mar 31, 2012, 7:47:14 PM3/31/12
to servic...@mozilla.com
(Further to discussion with Ally...)

We're currently working on Firefox 14. At some point in a subsequent release (15? 16?), we'll be shipping a release that's incompatible with Sync in earlier releases, and will partition a user's devices between two separate services that happen to share a name.

We thus have a narrow window of time in which we can ship forward-facing mitigations in a current version, to behave more respectfully when the later release goes all honey badger on their devices.

For example:

* landing strings to report a protocol upgrade
* changing wording to be aware of two different setup processes for "Firefox Sync" and "Foxfire BIDSink", which will unfortunately have exactly the same name. (We can't assume that a user will have Firefox 16 on all of their devices.)
* implementing detection code by...
** extending or using the meta/global storage version logic to allow the new client to leave a 'marker' in the 1.1/v5 account;
** setting some other record in 'meta';
** annotating some client record field with a protocol version;
** allowing an account to be marked as obsolete post-upgrade, with a server response that Firefox 14 clients can use to detect their obsolescence.

Oh, and testing the hell out of the scenarios, so we know what to expect and can get SUMO ready.

And testing the behavior of Firefox 10 ESR, for those poor suckers who'll be trying to sync their Firefox 16 home machine with their Firefox 10 ESR machine at work.

If we're feeling super persuasive, we might be able to get this into 2 or 3 releases by the time Sync 2.0 ships, which would be a usability and support win IMO. If we don't, there's a real chance of users having totally partitioned sets of devices without noticing for weeks or months.

Thoughts? Suggestions? Am I overly concerned about our poor users? Does someone have the time to step up and drive this in a more hands-on manner than I can right now? Are we going to avoid this situation somehow?

-R

P.S., we also need to start planning out client behavior when a SunkBID 2.9 client tries to connect to an old third-party Sync 1.1 server. We certainly don't want the client to fall back to using Mozilla's servers, so we're going to need a branch in the setup flow, and new strings...

Toby Elliott

unread,
Mar 31, 2012, 8:19:08 PM3/31/12
to Richard Newman, servic...@mozilla.com

On Mar 31, 2012, at 4:47 PM, Richard Newman wrote:
> We thus have a narrow window of time in which we can ship forward-facing mitigations in a current version, to behave more respectfully when the later release goes all honey badger on their devices.
>
> For example:
>
> * landing strings to report a protocol upgrade
> * changing wording to be aware of two different setup processes for "Firefox Sync" and "Foxfire BIDSink", which will unfortunately have exactly the same name. (We can't assume that a user will have Firefox 16 on all of their devices.)

Is a name change an option? Firefox Cloud? Firefox Unity? I dunno :P


> * implementing detection code by...
> ** extending or using the meta/global storage version logic to allow the new client to leave a 'marker' in the 1.1/v5 account;
> ** setting some other record in 'meta';
> ** annotating some client record field with a protocol version;
> ** allowing an account to be marked as obsolete post-upgrade, with a server response that Firefox 14 clients can use to detect their obsolescence.

Part of the problem here is that a lot of people may not even have the same account-name across the two systems, and no part of the new system touches the old one any more - we got rid of the central ldap dependency when we had to go to unique ids. We could theoretically feed across names from the old system to the new, but that would be a hack.

On the client side, it would require maintaining support for the old protocol in order to reupload the client record, and would need a fair bit of logic to go with it.


> Oh, and testing the hell out of the scenarios, so we know what to expect and can get SUMO ready.
>
> And testing the behavior of Firefox 10 ESR, for those poor suckers who'll be trying to sync their Firefox 16 home machine with their Firefox 10 ESR machine at work.
>

10 ESR is the scariest thing in my mind. I'm comforted by the fact that the number of people who want to sync an ESR'd FF (usually a fairly tightly controlled workplace) with home is likely to be pretty small.

> If we're feeling super persuasive, we might be able to get this into 2 or 3 releases by the time Sync 2.0 ships, which would be a usability and support win IMO. If we don't, there's a real chance of users having totally partitioned sets of devices without noticing for weeks or months.
>

I'd argue that if they don't notice for weeks or months, then they aren't really using sync :P

> Thoughts? Suggestions? Am I overly concerned about our poor users? Does someone have the time to step up and drive this in a more hands-on manner than I can right now? Are we going to avoid this situation somehow?

I can try to make things happen from the server end, but I suspect this will mostly turn out to be an interface problem. I think this won't be all that bad, though we should probably do some new sampling. Are 70% of our users still on one client? They won't notice at all. People who upgrade both probably won't notice. A few people will be prompted to upgrade when sync breaks. That just leaves people who are a) using sync, b) on multiple devices, c) one of which can't be upgraded. Is that a couple hundred irked users, or am I thinking way too low?

> P.S., we also need to start planning out client behavior when a SunkBID 2.9 client tries to connect to an old third-party Sync 1.1 server. We certainly don't want the client to fall back to using Mozilla's servers, so we're going to need a branch in the setup flow, and new strings…

We also need to make sure that setting up a replacement server for the new system is as easy as setting up the old one.

Toby

_______________________________________________
Services-dev mailing list
Servic...@mozilla.org
https://mail.mozilla.org/listinfo/services-dev

Richard Newman

unread,
Mar 31, 2012, 8:52:03 PM3/31/12
to Toby Elliott, servic...@mozilla.com
>> * changing wording to be aware of two different setup processes for "Firefox Sync" and "Foxfire BIDSink", which will unfortunately have exactly the same name. (We can't assume that a user will have Firefox 16 on all of their devices.)
>
> Is a name change an option? Firefox Cloud? Firefox Unity? I dunno :P

Heh.

My guess is that any name which implies a continuity of trust will also imply continuity of service, because it's rare for companies to 'fire' all of their users during a routine upgrade.

> Part of the problem here is that a lot of people may not even have the same account-name across the two systems, and no part of the new system touches the old one any more - we got rid of the central ldap dependency when we had to go to unique ids. We could theoretically feed across names from the old system to the new, but that would be a hack.

I'm not sure that's needed; this would be some mechanism in the old server that could be triggered by the new client. The "new system to old system" bridge would be the new client. At least, that's what makes the most sense to me.

(We *could* tell the 2.0 server that it has a correspondence to a 1.1 account, but that seems like overkill.)

> On the client side, it would require maintaining support for the old protocol in order to reupload the client record, and would need a fair bit of logic to go with it.

Not necessarily: "I'm Firefox 16; I need to migrate Sync credentials; do a PUT with this username and password to this URI with this payload". That's "upload a new meta/global" or "put a sentinel in meta", right?

>> Oh, and testing the hell out of the scenarios, so we know what to expect and can get SUMO ready.
>>
>> And testing the behavior of Firefox 10 ESR, for those poor suckers who'll be trying to sync their Firefox 16 home machine with their Firefox 10 ESR machine at work.
>>
>
> 10 ESR is the scariest thing in my mind. I'm comforted by the fact that the number of people who want to sync an ESR'd FF (usually a fairly tightly controlled workplace) with home is likely to be pretty small.

That's true, but documenting the behavior -- and ideally coming up with a mitigation that will tell them that their old account is stale, and allow them to accept partitioning instead -- seems worthwhile.

>> If we're feeling super persuasive, we might be able to get this into 2 or 3 releases by the time Sync 2.0 ships, which would be a usability and support win IMO. If we don't, there's a real chance of users having totally partitioned sets of devices without noticing for weeks or months.
>
> I'd argue that if they don't notice for weeks or months, then they aren't really using sync :P

Heh.

On a serious note: do you have all of your bookmarks on every device? How do you know?

I suspect most users who already have large established synced sets of data will only notice if their toolbar diverges, or they no longer see a device in Tabs From Other Computers (which is a one-week TTL, I think).

>> Thoughts? Suggestions? Am I overly concerned about our poor users? Does someone have the time to step up and drive this in a more hands-on manner than I can right now? Are we going to avoid this situation somehow?
>
> I can try to make things happen from the server end, but I suspect this will mostly turn out to be an interface problem.

I think so. My point can better be phrased as: "what can we add to the existing Sync 1.1 client (and/or the 2.0 client) to make it behave better when you upgrade one device to Sync + BID?".

That'll probably entail UX changes and writing stuff to the server, rather than server changes.

> I think this won't be all that bad, though we should probably do some new sampling. Are 70% of our users still on one client? They won't notice at all. People who upgrade both probably won't notice. A few people will be prompted to upgrade when sync breaks. That just leaves people who are a) using sync, b) on multiple devices, c) one of which can't be upgraded. Is that a couple hundred irked users, or am I thinking way too low?

I suspect there are several large groups of users who will be affected.

There are those we can't help unless we pick a solution that already works (such as bumping the 1.1 storageVersion):

* People who upgrade Firefox and use Firefox Home (wadda we get there, 100K downloads per month?) who are now screwed, and won't know it.
* People who are tied to an earlier version (testing on 3.6, enterprise deployments, stuck on an old add-on).

Then those who will be helped by fixes in 14/15:

* People who don't manage their own machines.
* People who once refused an update, and thus won't be updated automatically.
* People who use Firefox on multiple platforms; we have a history recently of not rolling out new major versions on some platforms for *weeks* because of bugs.

We could probably think of more. I think expecting 95% of our users to upgrade all of their machines within a few days without any kind of prompting is overly optimistic.

> We also need to make sure that setting up a replacement server for the new system is as easy as setting up the old one.

Aye. Ideally "server-full" will include everything and self-migrate...

Mike Hoye

unread,
Mar 31, 2012, 9:13:03 PM3/31/12
to servic...@mozilla.org
On 12-03-31 8:19 PM, Toby Elliott wrote:
>
> On Mar 31, 2012, at 4:47 PM, Richard Newman wrote:
>>
>> And testing the behavior of Firefox 10 ESR, for those poor suckers who'll be trying to sync their Firefox 16 home machine with their Firefox 10 ESR machine at work.
>>
>
> 10 ESR is the scariest thing in my mind. I'm comforted by the fact that the number of people who want to sync an ESR'd FF (usually a fairly tightly controlled workplace) with home is likely to be pretty small.

While I don't have any data on this, I have a strong suspicion that
those people don't actually exist.

However: you might be able to mitigate this risk further by providing
the Enteprise mailing list with as much advance notice of this change
(and the concomitant risks) as possible, along with some instructions
about how to deliver Firefox-ESR to users with Sync redirected to a
private server or disabled.

--
Michael Hoye
Bespoke I/O
http://bespokeio.com

Toby Elliott

unread,
Mar 31, 2012, 9:51:23 PM3/31/12
to Richard Newman, servic...@mozilla.com

On Mar 31, 2012, at 5:52 PM, Richard Newman wrote:

> There are those we can't help unless we pick a solution that already works (such as bumping the 1.1 storageVersion):
>
> * People who upgrade Firefox and use Firefox Home (wadda we get there, 100K downloads per month?) who are now screwed, and won't know it.

Ugh.

Actually, this is possibly our worst vector, and there's literally nothing we can do about it short of figuring out a Home plan. The other users may be dwarfed by this class.

> * People who are tied to an earlier version (testing on 3.6, enterprise deployments, stuck on an old add-on).

These people I'm not nearly as worried about. I'm just guessing, but I would not expect these groups to have high Sync usage.

>
> We could probably think of more. I think expecting 95% of our users to upgrade all of their machines within a few days without any kind of prompting is overly optimistic.

Oh, sure. But the nice thing about sync is that even after, say, a week of pain, it all settles back to normal, and a lot of people may not even notice that much.

Toby

Richard Newman

unread,
Mar 31, 2012, 10:13:52 PM3/31/12
to Toby Elliott, servic...@mozilla.com
> Actually, this is possibly our worst vector, and there's literally nothing we can do about it short of figuring out a Home plan. The other users may be dwarfed by this class.

Figuring out a Home plan would be a really good idea anyway, as I'm sure we all agree :D

Still, I imagine Home respects storage version, mm? So we might have a way to get an error on the screen.

>> We could probably think of more. I think expecting 95% of our users to upgrade all of their machines within a few days without any kind of prompting is overly optimistic.
>
> Oh, sure. But the nice thing about sync is that even after, say, a week of pain, it all settles back to normal, and a lot of people may not even notice that much.

The issue with all this, though, is that *there is no pain*.

If you install Firefox 16, and either we don't proactively show a Sync setup screen, or you mindlessly type in your email address and password and don't read our warnings, then your device simply isn't syncing with the others, and none of your other devices will warn you about it. No pain at all... just no new data.

(Unless we do something about it, hence this thread.)

The guy with Home on his iPad and Fx16 on his desktop isn't going to realize at first that his bookmarks aren't updating. It'll break one day when we node-reassign him. Worse still if he updates a minority of his devices, or doesn't sign up for this new Sync version but expects his stuff to still be backed up.

That's my point: the minute we partition a user's devices, we need to make the "left behind" show some kind of "upgrade me!" notification.

This whole upgrade path is going to take some really stern UI wording...

("Sync is now powered by Mozilla Persona! YOU NEED TO SIGN UP AGAIN. NONE OF YOUR DATA IS BEING SYNCED RIGHT NOW. We promise it's worth it to... um... someone else. If you don't want to do this, you need to downgrade to Firefox 10ESR, and figure out how to migrate to Xmarks")

... but reminders on other devices would be a Really Good Thing. Users are forgetful, they don't read warnings, and they'll make errors at every step of the process.

Richard Newman

unread,
Mar 31, 2012, 10:25:15 PM3/31/12
to Toby Elliott, services-dev
> If you install Firefox 16, and either we don't proactively show a Sync setup screen, or you mindlessly type in your email address and password and don't read our warnings, then your device simply isn't syncing with the others, and none of your other devices will warn you about it. No pain at all... just no new data.

We could do something on the new device, like grab the client records from your old Sync account, and try to parse them, and show you some kind of UI telling you which ones are update-able... but the shitty thing is that by this point you've already installed a Firefox version that won't work with anything old, so we can't make any good recommendations. ("I see you're using Home! Sorry!")

Other suggestions welcome.

JR Conlin

unread,
Mar 31, 2012, 11:54:05 PM3/31/12
to servic...@mozilla.org
Silly, hopefully clueless question:

Are we including metrics back on the user's ff version and platform in
newer instances of sync?
While there may be some possible issues with privacy, it would help at
times like this.

Richard Newman

unread,
Apr 1, 2012, 12:08:39 AM4/1/12
to JR Conlin, servic...@mozilla.org
> Silly, hopefully clueless question:
>
> Are we including metrics back on the user's ff version and platform in newer instances of sync?
> While there may be some possible issues with privacy, it would help at times like this.

We submit some of that on every HTTP request.

"Firefox/13.0a2 FxSync/1.15.0.20120331042008."

I believe the build ID is enough to determine platform.

JR Conlin

unread,
Apr 1, 2012, 12:22:59 AM4/1/12
to Richard Newman, servic...@mozilla.org
On 3/31/2012 9:08 PM, Richard Newman wrote:
>> Silly, hopefully clueless question:
>>
>> Are we including metrics back on the user's ff version and platform in newer instances of sync?
>> While there may be some possible issues with privacy, it would help at times like this.
> We submit some of that on every HTTP request.
>
> "Firefox/13.0a2 FxSync/1.15.0.20120331042008."
>
> I believe the build ID is enough to determine platform.
Spiff. So we could determine the percentage of users that would be hit
by this change. I hate to be a dork, but I'd say that if it's less than
10%, it's not worth worrying about.

Granted, there's no way for us to determine the number of folks who have
vastly differing versions on various platforms (e.g. someone who
regularly uses an ancient mobile version and nightly on the desktop),
but we can make a fairly educated guess from the usages.

Richard Newman

unread,
Apr 1, 2012, 12:47:20 AM4/1/12
to JR Conlin, servic...@mozilla.org
> Spiff. So we could determine the percentage of users that would be hit by this change. I hate to be a dork, but I'd say that if it's less than 10%, it's not worth worrying about.

Right now it's 100% -- there are no clients which will be able to determine that their account has been set adrift, because we don't take any action at all. (Well, there is no plan to; as far as I know the migration logic isn't finished.)

If we add some logic to improve matters, our metrics will be able to determine clients which behave well, and determine how many don't.

Mike Connor

unread,
Apr 2, 2012, 10:36:42 AM4/2/12
to Richard Newman, servic...@mozilla.com
This is an attempt to catch up to the thread:

* We need to figure out a plan for iOS. This is known, and we have options on the table.
* Based on conversations with Kev about ESR, this is not really an issue as most orgs using ESR also want to disable Sync. It's not optimal, but I don't think we should gate significantly on that solution. (At the worst, 10 ESR will EOL 18 weeks after Fx16, if we even can hit Fx16, but we would hope that ESR consumers will move sooner.)
* To set expectations Fx16 is the earliest possible release I can imagine shipping the revised setup in. It would not shock me to see this fall into Fx17 given the need to interop with B2G and Fennec (and getting the crypto bits all reviewed and stable).

On 2012-03-31, at 7:47 PM, Richard Newman wrote:

> (Further to discussion with Ally...)
>
> We're currently working on Firefox 14. At some point in a subsequent release (15? 16?), we'll be shipping a release that's incompatible with Sync in earlier releases, and will partition a user's devices between two separate services that happen to share a name.
>
> We thus have a narrow window of time in which we can ship forward-facing mitigations in a current version, to behave more respectfully when the later release goes all honey badger on their devices.

If by all honey badger you mean "asks a user to set up BID and set up their online profile" or something like that. This will not be a seamless migration, for various reasons, not least of which is "there's a lot of really weak passwords for Sync 1.1, and users still don't remember them."

> For example:
>
> * landing strings to report a protocol upgrade
> * changing wording to be aware of two different setup processes for "Firefox Sync" and "Foxfire BIDSink", which will unfortunately have exactly the same name. (We can't assume that a user will have Firefox 16 on all of their devices.)

That's a big assumption. I think we may want to take this opportunity to brand things differently.

> * implementing detection code by...
> ** extending or using the meta/global storage version logic to allow the new client to leave a 'marker' in the 1.1/v5 account;
> ** setting some other record in 'meta';
> ** annotating some client record field with a protocol version;
> ** allowing an account to be marked as obsolete post-upgrade, with a server response that Firefox 14 clients can use to detect their obsolescence.

I think all of this is fine, if we believe that the new UX can't sufficiently communicate the change to a "new" system.

> P.S., we also need to start planning out client behavior when a SunkBID 2.9 client tries to connect to an old third-party Sync 1.1 server. We certainly don't want the client to fall back to using Mozilla's servers, so we're going to need a branch in the setup flow, and new strings…

The client used to do a validity check on a custom server URL and show an error if an API call failed. I would assume that we would re-use this.

-- Mike

Richard Newman

unread,
Apr 2, 2012, 1:58:48 PM4/2/12
to Mike Connor, servic...@mozilla.com
> * We need to figure out a plan for iOS. This is known, and we have options on the table.

Good. I would love to discuss them! :D

> * Based on conversations with Kev about ESR, this is not really an issue as most orgs using ESR also want to disable Sync. It's not optimal, but I don't think we should gate significantly on that solution. (At the worst, 10 ESR will EOL 18 weeks after Fx16, if we even can hit Fx16, but we would hope that ESR consumers will move sooner.)

File a bug?

> * To set expectations Fx16 is the earliest possible release I can imagine shipping the revised setup in. It would not shock me to see this fall into Fx17 given the need to interop with B2G and Fennec (and getting the crypto bits all reviewed and stable).

I'm also concerned about when it hits Beta (and ideally Aurora), because this all certainly requires a very long bake period.

One thing I learned from Fennec is that we can't treat our Aurora users like Nightly testers. Or even our Nightly testers like Nightly testers. If we ship Sync 2.0 in Aurora without a stable-ish server infrastructure and interop with other devices, people will be peeved.

That is: our target date for replacing Home is, for several reasons, six weeks or more earlier than release.

>> We thus have a narrow window of time in which we can ship forward-facing mitigations in a current version, to behave more respectfully when the later release goes all honey badger on their devices.
>
> If by all honey badger you mean "asks a user to set up BID and set up their online profile" or something like that. This will not be a seamless migration, for various reasons, not least of which is "there's a lot of really weak passwords for Sync 1.1, and users still don't remember them."

The point I was addressing was the fallout from that: other devices not knowing that they've been left behind, and the new devices being all honey badger about it -- not caring. The "Flag day mitigation" in the title of this email is "how can we improve the user experience when users don't update all of their devices at the same time, or even cannot?".

> That's a big assumption. I think we may want to take this opportunity to brand things differently.

Then I'm glad to have publicly started this discussion :D

> I think all of this is fine, if we believe that the new UX can't sufficiently communicate the change to a "new" system.

I don't believe it can, at least not perfectly. (Think how many users ended up trying to install the Sync addon, despite messaging that "Sync is now built-in!".)

People are forgetful and don't pay attention. Belt and braces.

>> P.S., we also need to start planning out client behavior when a SunkBID 2.9 client tries to connect to an old third-party Sync 1.1 server. We certainly don't want the client to fall back to using Mozilla's servers, so we're going to need a branch in the setup flow, and new strings…
>
> The client used to do a validity check on a custom server URL and show an error if an API call failed. I would assume that we would re-use this.

I have no doubt that we can detect the situation. The UI is the more important part here -- it's very easy to forget these edge cases when putting together flows, and then run into them during a string freeze.

Reply all
Reply to author
Forward
0 new messages