reduce default OCSP timeouts.

139 views
Skip to first unread message

Camilo Viecco

unread,
Oct 11, 2013, 3:57:28 PM10/11/13
to mozilla's crypto code discussion list
Hello List

I am planning to land a patch to reduce the default (soft-fail) OCSP
network timeout values. Currently OCSP connections timeout after 10
seconds and my plan is to changed that to 3 seconds (hard fail will keep
the current 10 second timeout value).

With this change (according to telemetry) we will cover 95% of
successful checks in desktop and 90% of fennec. (2 seconds is 90% of
desktop 85% of fennec). Currently fennect cancelled connections are
about 6% of connections.

Any issues with this change?

Thanks

Camilo

Bob Clary

unread,
Oct 11, 2013, 4:39:55 PM10/11/13
to dev-tec...@lists.mozilla.org
How will this play with high latency connections such as found on
Satellite-based internet where ping times are 600-1000ms?

/bc


Wan-Teh Chang

unread,
Oct 11, 2013, 4:50:15 PM10/11/13
to mozilla's crypto code discussion list
On Fri, Oct 11, 2013 at 12:57 PM, Camilo Viecco <cvi...@mozilla.com> wrote:
> Hello List
>
> I am planning to land a patch to reduce the default (soft-fail) OCSP network
> timeout values. Currently OCSP connections timeout after 10 seconds and my
> plan is to changed that to 3 seconds (hard fail will keep the current 10
> second timeout value).

I would use a timeout of 5 seconds. 3 seconds seem a little short.

I agree 10 seconds are too long.

Wan-Teh

Eddy Nigg

unread,
Oct 11, 2013, 4:58:50 PM10/11/13
to mozilla-dev...@lists.mozilla.org
On 10/11/2013 11:50 PM, From Wan-Teh Chang:
> I would use a timeout of 5 seconds. 3 seconds seem a little short. I
> agree 10 seconds are too long.

+1

--
Regards

Signer: Eddy Nigg, StartCom Ltd.
XMPP: star...@startcom.org
Blog: http://blog.startcom.org/
Twitter: http://twitter.com/eddy_nigg

Camilo Viecco

unread,
Oct 11, 2013, 7:17:29 PM10/11/13
to mozilla's crypto code discussion list
On 10/11/13 1:39 PM, Bob Clary wrote:
> On 10/11/2013 12:57 PM, Camilo Viecco wrote:
>> Hello List
>>
>> I am planning to land a patch to reduce the default (soft-fail) OCSP
>> network timeout values. Currently OCSP connections timeout after 10
>> seconds and my plan is to changed that to 3 seconds (hard fail will keep
>> the current 10 second timeout value).
>>
>> With this change (according to telemetry) we will cover 95% of
>> successful checks in desktop and 90% of fennec. (2 seconds is 90% of
>> desktop 85% of fennec). Currently fennect cancelled connections are
>> about 6% of connections.
>>
>> Any issues with this change?
>>
>> Thanks
>>
>> Camilo
>
>
> How will this play with high latency connections such as found on
> Satellite-based internet where ping times are 600-1000ms?
Since fetching the OCSP response takes 2RTT (without closing the
connection) a 3 second timeout would be sufficient for 1000ms RTT.

But if you desire you can still enable strict ocsp responses and that
will give you back the 10 second timeouts.

Camilo
>
> /bc
>
>

Camilo Viecco

unread,
Oct 11, 2013, 7:20:49 PM10/11/13
to dev-tec...@lists.mozilla.org
On 10/11/13 1:58 PM, Eddy Nigg wrote:
> On 10/11/2013 11:50 PM, From Wan-Teh Chang:
>> I would use a timeout of 5 seconds. 3 seconds seem a little short. I
>> agree 10 seconds are too long.
>
> +1
>
Thanks Eddy/Wan Tech:

5 seconds seems too high for a fail open option, but let me ask you:
what percent of checks are you comfortable with (given soft fail).

Camilo


Gervase Markham

unread,
Oct 14, 2013, 9:57:39 AM10/14/13
to mozilla-dev...@lists.mozilla.org
On 11/10/13 21:50, Wan-Teh Chang wrote:
> I would use a timeout of 5 seconds. 3 seconds seem a little short.
>
> I agree 10 seconds are too long.

Can you expand on what criteria you are using to make these judgements?

Fetching the OCSP response takes 2RTT, as Camilo said. So if your RTT is
1000ms (very long!) and your OCSP server takes 999ms to respond (also
well outside any sane performance requirement), 3s is still long enough
to get a response.

(The fact that 3s only covers 95% of successful checks on Desktop
suggests that there are either some laggy networks or some sucky OCSP
servers out there...)

Gerv

Wan-Teh Chang

unread,
Oct 14, 2013, 2:01:12 PM10/14/13
to mozilla's crypto code discussion list, mozilla-dev-tech-crypto
On Mon, Oct 14, 2013 at 6:57 AM, Gervase Markham <ge...@mozilla.org> wrote:
> On 11/10/13 21:50, Wan-Teh Chang wrote:
>> I would use a timeout of 5 seconds. 3 seconds seem a little short.
>>
>> I agree 10 seconds are too long.
>
> Can you expand on what criteria you are using to make these judgements?

My criteria are not scientific. 5 seconds are what came to my mind
when I saw this subject. I cannot explain it well. (There is a book
named "Blink" that talks about this.) Perhaps I know 5 seconds are a
commonly used timeout in networking; perhaps it is simply a nice
factor of 10.

Wan-Teh

Wan-Teh Chang

unread,
Oct 14, 2013, 2:01:12 PM10/14/13
to mozilla's crypto code discussion list, mozilla-dev-tech-crypto
On Mon, Oct 14, 2013 at 6:57 AM, Gervase Markham <ge...@mozilla.org> wrote:
> On 11/10/13 21:50, Wan-Teh Chang wrote:
>> I would use a timeout of 5 seconds. 3 seconds seem a little short.
>>
>> I agree 10 seconds are too long.
>

Gervase Markham

unread,
Oct 15, 2013, 8:16:15 AM10/15/13
to mozilla-dev...@lists.mozilla.org
On 14/10/13 19:01, Wan-Teh Chang wrote:
> My criteria are not scientific. 5 seconds are what came to my mind
> when I saw this subject. I cannot explain it well. (There is a book
> named "Blink" that talks about this.) Perhaps I know 5 seconds are a
> commonly used timeout in networking; perhaps it is simply a nice
> factor of 10.

The issue is that the timeout we pick is the amount of time a user has
to wait to see _anything_ of their web page, if the site has been unwise
enough to purchase a cert from a CA with a flaky OCSP server. Therefore,
there is a strong incentive to make this timeout as short as possible.

Gerv


Rob Stradling

unread,
Oct 15, 2013, 8:27:20 AM10/15/13
to mozilla's crypto code discussion list
On 15/10/13 13:16, Gervase Markham wrote:
> On 14/10/13 19:01, Wan-Teh Chang wrote:
>> My criteria are not scientific. 5 seconds are what came to my mind
>> when I saw this subject. I cannot explain it well. (There is a book
>> named "Blink" that talks about this.) Perhaps I know 5 seconds are a
>> commonly used timeout in networking; perhaps it is simply a nice
>> factor of 10.
>
> The issue is that the timeout we pick is the amount of time a user has
> to wait to see _anything_ of their web page, if the site has been unwise
> enough to purchase a cert from a CA with a flaky OCSP server. Therefore,
> there is a strong incentive to make this timeout as short as possible.

This might help...

http://uptime.netcraft.com/perf/reports/OCSP?orderby=avg_total

--
Rob Stradling
Senior Research & Development Scientist
COMODO - Creating Trust Online

Kai Engert

unread,
Oct 15, 2013, 8:44:04 AM10/15/13
to mozilla's crypto code discussion list
This is the classic security vs. performance discussion.

As soon as you decide to timeout the connection (and give up on the
optional check), you can no longer learn that your route to the
destination server is MITMed and uses a revoked certificate.

What about users who connect through busy mobile phone networks?

What about users in rural areas, or in countries with slow dialup
Internet connections?

If the connection is fast, the timeout is irrelevant.

If the connection is slow, it's reasonable to wait longer.

I don't like the idea of giving up the OCSP check, simply because I'm on
a train, connected through a mobile network, and there's a glitch in
network coverage, that causes a delay.

Regards
Kai


Eddy Nigg

unread,
Oct 15, 2013, 9:59:22 AM10/15/13
to mozilla-dev...@lists.mozilla.org
On 10/15/2013 03:27 PM, From Rob Stradling:
Just for the record, Netcraft doesn't send a valid OCSP request and
doesn't verify the OCSP response. They check only the headers returned.
One has to be a bit careful with such data - I verified this information
with them since their checks on StartCom's OCPS returned a bad request
and got 100% failure (since removed).

A better test is https://revocation-report.x509labs.com/ which sends a
real OCSP request. But also here I don't think it checks the actual
response as long as the server returns "something" (assuming OK 200).

Patrick McManus

unread,
Oct 15, 2013, 10:57:12 AM10/15/13
to mozilla's crypto code discussion list
On Tue, Oct 15, 2013 at 8:44 AM, Kai Engert <ka...@kuix.de> wrote:

> This is the classic security vs. performance discussion.
>
> As soon as you decide to timeout the connection (and give up on the
> optional check),


for better or for worse firefox is already making that tradeoff - at a 10s
threshold. No matter the threshold some tail events are going to give up
the check because of it - the question here is what amount of latency and
what size of tail is acceptable. What has changed is that
1] we have measurements that show such a large timeout isn't necessary for
most of the population to garner whatever benefit soft-fail gives
2] we have measurements that show a large number of connections hitting
this timeout, so its value is relevant

A timeout of 5 seconds exposes about 1% of our currently successful queries
(across all platforms) to this. 4 seconds pushes that to almost 2%, and the
proposed 3 seconds about 5%. I think 3 seconds is acceptable, while 4 might
actually be a sweet spot that I could also get behind at this point. Any
more than that is, imo, too much pain for too little benefit.

In return we go from a completely unusable experience (10 second) to a slow
but perhaps usable one.

-P

[as an aside, I prefer our telemetry data here over the server-to-server
data from places like netcraft just because we know it reflects the quirks
of the firefox userbase, actual OCSP results, and our own current use of
POST vs GET, etc..]

Camilo Viecco

unread,
Oct 17, 2013, 2:02:26 PM10/17/13
to dev-tec...@lists.mozilla.org
Thanks everyone for the discussion.

We mostly agree that the current 10 seconds for softfail is not good for
users.
The data reported by the server labs seem to indicate that the timeouts
are correlated with being on the southern hemisphere and the CA in use.

5 seconds seems the value suggested by WTH and Eddy Nigg. 4 seconds was
the maximum penalty from the networking team (Patrick MManus).
While the issue of large RTT seems worse that I expected (specially for
places like AU, ZA and BR) there where still ok with 3 seconds.

However if there is any compelling data for making it 4 seconds, please
share

Thank you all

CAmilo


On 10/15/13 6:59 AM, Eddy Nigg wrote:
> On 10/15/2013 03:27 PM, From Rob Stradling:
>>
Reply all
Reply to author
Forward
0 new messages