Django and BREACH (remember that?)

Adam Brenecki

unread,

Aug 3, 2014, 9:15:15 PM8/3/14

to django-d...@googlegroups.com

Hi all,

So, a while ago, BREACH happened, and Django's CSRF implementation was vulnerable, as was Rails'. The paper that discussed it described a mitigation (and a Rails patch had already been made), so I implemented that same mitigation in a Django patch. Discussion on the Trac ticket has stalled, and I've been told this is the place to go.

Disclaimer: although I understand enough about it to write the patch, I'm not a security person - the reason I'm posting here is that I'm hoping to get an answer from someone that is a security person as to whether I'm on the right track.

To jog your memories (and also in the hope that if I'm misunderstanding the problem, someone will correct me), the short version of BREACH is: an attacker forcing a user to visit a HTTPS page that a) is gzipped, and b) contains in the response body a secret (in our case, the CSRF token in the form), as well as user input (e.g. reflected URL parameters), then observing the size of the HTTPS responses. When the user input partially or completely matches the secret, the compressed length is slightly shorter, and the attacker can use this to guess the secret byte-by-byte.

In section 3 of the paper, the authors describe a variety of mitigations. One of them is to, on each request where a secret S should appear, generate a new one-time pad P and instead write P + xor(P, S) in the response. (The paper also describes other mitigations, but this is the most feasible¹.)

My understanding is (and I'll reiterate here I'm not a security person) that this should make S impossible to deduce via BREACH, as it doesn't appear directly anywhere in the response. As P takes on a new value with every request, it simply can't be deduced by any method that involves making many requests; therefore the same thing will happen to xor(P, S). (I think the fact that P changes every request might have been a point of confusion in the Trac discussion; there's a lot of comments there talking about trying to deduce P).

The patch I've written implements this mitigation, with one difference: instead of using xor, it uses a Vigenère cipher (as suggested by FunkyBob), as xor was creating non-printable characters which caused problems. I think this should be OK as Vigenère is commonly used for one-time pads and does more or less the same thing to characters that xor does to bits.

This is pretty much what django-debreach does, except that it uses AES instead of xor (i.e. the output is K + AES(K, S).). However, this adds processing load to every request and a dependency on PyCrypto, and as far as I can tell this doesn't actually add any benefit over xor/Vigenère.

So, in summary, I think I'm doing nearly exactly what the paper says, and I think it effectively makes the attack practically impossible, but I'd love to hear from someone who has a better idea than I as to if I'm actually correct on all of this.

Thanks,

Adam

The Trac ticket: https://code.djangoproject.com/ticket/20869

My pull request: https://github.com/django/django/pull/1477

The original paper: http://breachattack.com/resources/BREACH%20-%20SSL,%20gone%20in%2030%20seconds.pdf (the relevant bit is section 3.4 at the bottom of page 10)

The related Rails pull request: https://github.com/rails/rails/pull/11729

¹ Section 3.4 describes the mitigation I implemented. Section 3.1 describes fuzzing the lengths of each response by appending random lengths of garbage data, then immediately dismisses this method as ineffective; Russell Keith-Magee tells me that a discussion within Django reached the same conclusion about this one. Apart from those two, they're all things like rate limiting and separating secrets from user input, which don't seem to me like things we can enforce on a Django level.

Justin Holmes

unread,

Aug 3, 2014, 11:51:54 PM8/3/14

to django-developers

Thanks for much for this contribution. I don't feel particularly qualified to review it, but there's are particular bits where general django knowledge (or even just a second pair of eyes) will help, let the know.

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/66fb5b2b-5f18-4244-92fb-2427b5b63aa8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Florian Apolloner

unread,

Aug 4, 2014, 3:47:02 AM8/4/14

to django-d...@googlegroups.com

Hi,

On Monday, August 4, 2014 3:15:15 AM UTC+2, Adam Brenecki wrote:

So, a while ago, BREACH happened, and Django's CSRF implementation was vulnerable, as was Rails'. The paper that discussed it described a mitigation (and a Rails patch had already been made), so I implemented that same mitigation in a Django patch. Discussion on the Trac ticket has stalled, and I've been told this is the place to go.

Technically the only mitigation is to disable GZip. That said, randomizing the CSRF token provides an extra layer of security, but doesn't necessarily help (eg credit card data could still get leaked, so you'd still have to disable gzip).

The patch I've written implements this mitigation, with one difference: instead of using xor, it uses a Vigenère cipher (as suggested by FunkyBob), as xor was creating non-printable characters which caused problems. I think this should be OK as Vigenère is commonly used for one-time pads and does more or less the same thing to characters that xor does to bits.

What is wrong with xor+base64? Not that Vigenère cipher is complex, but we have a pretty hard stance against implementing "crypto" on our own.

Cheers,
Florian

Adam Brenecki

unread,

Aug 4, 2014, 5:01:14 AM8/4/14

to django-d...@googlegroups.com

On 4 August 2014 17:47, Florian Apolloner <f.apo...@gmail.com> wrote:

(eg credit card data could still get leaked, so you'd still have to disable gzip).

This patch is entirely about preventing leakage of the CSRF token specifically; as I understand it (again, disclaimer) it should do that pretty effectively, but of course it will do nothing at all to stop leakage of any other data.

What is wrong with xor+base64? Not that Vigenère cipher is complex, but we have a pretty hard stance against implementing "crypto" on our own.

Nothing, really; that's probably what I would have used had FunkyBob not suggested the Vigenère cipher. That's a perfectly reasonable stance, and I can change the patch to do that if it's preferable.

Florian Apolloner

unread,

Aug 4, 2014, 5:46:59 AM8/4/14

to django-d...@googlegroups.com

On Monday, August 4, 2014 11:01:14 AM UTC+2, Adam Brenecki wrote:

What is wrong with xor+base64? Not that Vigenère cipher is complex, but we have a pretty hard stance against implementing "crypto" on our own.

Nothing, really; that's probably what I would have used had FunkyBob not suggested the Vigenère cipher. That's a perfectly reasonable stance, and I can change the patch to do that if it's preferable.

Yes, please, let's keep this patch as simple as possible. Also the changes around https://github.com/django/django/pull/1477/files#diff-a3be722ce2831a8d11438021d44cedf1L51 need some explaining -- Why can't we return None there anymore?

Michael Mior

unread,

Aug 4, 2014, 10:21:43 AM8/4/14

to django-d...@googlegroups.com

This looks good, although it seems like there should be a config setting as I could imagine some use cases where the application expects the token not to change in this way. I'm not sure how common this and whether or not such a setting should be enabled by default, but I think it should be considered.

--
Michael Mior

Florian Apolloner

unread,

Aug 4, 2014, 10:33:32 AM8/4/14

to django-d...@googlegroups.com

On Monday, August 4, 2014 4:21:43 PM UTC+2, Michael Mior wrote:

This looks good, although it seems like there should be a config setting as I could imagine some use cases where the application expects the token not to change in this way. I'm not sure how common this and whether or not such a setting should be enabled by default, but I think it should be considered.

Any example for such an App? Ajax single page apps might come to mind -- Docs would also need updates, as currently the CSRF token is saved in a JS variable…

Donald Stufft

unread,

Aug 4, 2014, 4:11:59 PM8/4/14

to Adam Brenecki, django-d...@googlegroups.com

On August 3, 2014 at 9:48:53 PM, Adam Brenecki (adambr...@gmail.com) wrote:
> The patch I've written implements this mitigation, with one difference:
> instead of using xor, it uses a Vigenère cipher (as suggested by FunkyBob),
> as xor was creating non-printable characters which caused problems. I think
> this should be OK as Vigenère is commonly used for one-time pads and does
> more or less the same thing to characters that xor does to bits.
>

> This is pretty much what django-debreach does, except that *it* uses AES

> instead of xor (i.e. the output is K + AES(K, S).). However, this adds
> processing load to every request and a dependency on PyCrypto, and as far
> as I can tell this doesn't actually add any benefit over xor/Vigenère.
>
> So, in summary, I think I'm doing nearly exactly what the paper says, and I
> think it effectively makes the attack practically impossible, but I'd love
> to hear from someone who has a better idea than I as to if I'm actually
> correct on all of this.

Thoughts:

-1 on implementing our own Vigenère cipher.

If I understand the patch correctly it doesn't use the pad as a nonce exactly,
it will happily accept the same pad + CSRF token multiple times over and over.
Anything that treats the CSRF token as an opaque sequence of characters (which
is essentially what it is prior to this patch) should be backwards compatible
if my understanding is correct. Further more it looks like the original
unpadded CSRF token is still in the cookie so anything that pulls it's tokens
out of the cookie directly should be completely unaffected.

Is my understanding correct? If so then the only thing I can imagine this
breaking is something that expected a certain length for the CSRF token and
afaik the length of the token is not within our backwards compatibility
promises.

Curtis Maloney

unread,

Aug 4, 2014, 5:14:29 PM8/4/14

to django-d...@googlegroups.com

As mentioned the cipher choice was mine. It was, also as mentioned

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/etPan.53dfe906.327b23c6.1280a%40Thor.local.

Curtis Maloney

unread,

Aug 4, 2014, 5:17:10 PM8/4/14

to django-d...@googlegroups.com

Stupid phone

The reason for that cipher over xor was to avoid non printable characters.

Any other solution is fine. Xoring the lower 4 bits reduced the entropy too much in my view.

On Aug 5, 2014 6:11 AM, "Donald Stufft" <don...@stufft.io> wrote:

Reply all

Reply to author

Forward