On Thu, Sep 24, 2009 at 1:18 PM, Simon Willison <si...@simonwillison.net> wrote:
> It's also something that's hard to do correctly. At the sprints Armin
> pointed out that I should be using hmac, not straight sha1, for
> generating signatures (something Django itself gets wrong in the few
> places that implement signing already). Having a cryptographer-
> approved implementation will save a lot of people from making the same
> mistakes.
I have no idea how hmac differs from straight sha1, but this was
raised in a django-signedcookies issue as well, and i've since
integrated it. I'm with you on this; if somebody who knows better
recommends something, I'm inclined to listen.
> I think signed cookies should either be a separate API from
> response.set_cookie or should be provided as an additional argument to
> that method. I'm not a fan of signing using middleware (as seen in
> http://code.google.com/p/django-signedcookies/ ) since that approach
> signs everything - some cookies, such as those used by Google
> Analytics, need to remain unsigned.
I admit, I hadn't considered third-party cookies when I put it
together as a decorator. Client-side cookie access is likely
problematic as well, but that'll always be questionable anyway. You
can't validate a cookie in the client without divulging your secret
key and you can't just ignore the signature, because that defeats the
whole purpose. My app also provides a decorator, which might help in
some rare situations, but most of the time things like analytics
cookies will be in a base template and will always be included in
whatever view happens to have the decorator. I'm very surprised that
in all this time, nobody submitted a bug about the analytics problem.
> So the API could either be:
>
> response.set_signed_cookie(key, value)
>
> Or...
>
> response.set_cookie(key, value, signed=True)
>
> (I prefer the latter option)
I prefer the latter as well, for an added reason. I'd personally like
to invest some time in seeing if there are any other interesting
pitfalls in set_cookie() based on it deferring to Python's
SimpleCookie implementation. When writing Pro Django, I realized that
SimpleCookie expects everything to be strings, so #6657 came up with
secure=False resulting in a secure cookie after all. I don't know if
there are other such issues, but it might be worth looking at in
detail if we already have to add in signed cookie support.
And before anyone asks, no I don't think tying the signing behavior
into secure=True would be a good idea. Secure is meant to tell the
browser it should only send the cookie back over a secure channel,
such as HTTPS. While people who need secure cookies may also want
signed cookies, they're two separate ideas that I don't think would do
well mixed together. Of course, that leaves us with a potential
response.set_cookie(key, value, secure=True, signed=True), but I think
it's worth it to be explicit.
But I do have one other concern with either of these APIs that is at
least worth discussing: the disparity between setting a signed cookie
and retrieving one.
response.set_cookie(key, value, signed=True)
value = signed.unsign(request.COOKIES[key])
Mainly, unsigning a cookie requires importing and directly using the
signing module anyway, so the benefit of having set_cookie() handle it
transparently is fairly weak (unless, of course, I'm missing something
obvious). I'd rather just see the signing code made available and
well-documented, so that at least the change in code when adding
signed cookies is equivalent on both sides.
Another option would be to have request.COOKIES be a custom
dictionary, with an extra .get_unsigned(key) method that would work
like .get(), but validates the signature along the way. That behavior
can't be added straight to __getitem__() though, because that has no
way of knowing whether the cookie was signed to begin with.
These are the types of issues that led me to just implement it as a
middleware, so the API could be as dead simple as possible. With these
new issues in mind, I don't think dead simplicity is really an option,
so I'd rather fall back to being explicit.
>>>> signed.unsign('hello.9asVJn9dfv6qLJ_BYObzF7mmH8c')
> 'hello'
>>>> signed.unsign('hello.badsignature')
> Traceback (most recent call last):
> ...
> BadSignature: Signature failed: badsignature
>
> BadSignature is a subclass of ValueError, meaning lazy developers
> (like myself) can do the following rather than importing the exception
> itself:
>
> try:
> value = signed.unsign(signed_value)
> except ValueError:
> return tamper_error_view(request)
I'm not a big fan of this, personally. Yes, it does make things
slightly easier, by not requiring a separate import, but we already
have django.core.exceptions.SuspiciousOperation exists for a reason,
and a missing or invalid signature would certainly qualify, in my
eyes. Yeah, I suppose we could perhaps make that a subclass of
ValueError or TypeError, but I would worry about people wrapping it up
into something else that might cause problems.
try:
value = signed.unsign(signed_value).decode('utf-8')
except ValueError:
# Whoops! UnicodeDecodeError winds up here as well!
I don't know how likely this is to happen, since the only examples I
could come up with offhand are:
* Unicode errors, which could possibly be handled by automatic
Unicode translating in the signing code
* Using int() for things like user IDs, in which case a non-integer
would be evidence of tampering anyway.
Of course, you could just as easily argue that people shouldn't be
doing that, or if they do, they should import BadSignature (or
whatever) and give it its own except block ahead of ValueError so the
two can be distinguished. But of course, that expects a non-lazy
programmer, and if we're trying to make it easier for lazy
programmers, I don't think the behavior of non-lazy ones matters much.
> Potential uses
> ==============
>
> Lots of stuff:
>
> - Signed cookies (obviously)
> - Generating CSRF tokens
> - Secure /logout/ and /change-language/ links
> - Securing /login/?next=/some/path/
> - Securing hidden fields in form wizards
> - Recover-your-account links in e-mails
I think this is a big win for including signing as a separate piece
from signed cookies, because you have a much bigger list of use cases,
which can make it easier for people to understand the value. One of
the most common questions I got on my app was, "why would I want to
sign my cookies?" I think expanding the available uses signing will
help answer that question.
> So... what do people think? Is this a feature suitable for Django
> (obviously I think so)? Is this as simple as getting a cryptographer's
> input and dropping signed.py in to django.utils or are there other
> design factors we should consider?
All in all, I understand and appreciate the benefits of signing
things, and I'd like that behavior to be made available reliably and
easily, and it looks like that would require them being in core. I'm a
little worried about ease of use, though, because it seems like most
of the ways to make things easy can also make things either more
confusing or less flexible. I'm a little scattered on the moment, but
I think it's definitely worth refining into something that can go into
core.
-Gul
The behavior you mention here is exactly what django-signedcookies
does, but it can only do so because it can blindly assume that all
cookies are signed, which as you pointed out, causes other problems.
> We could fix this with a naming convention of some sort:
>
> response.set_cookie('key', 'value', sign=True)
> - results in a Set-Cookie: key__Xsigned=value header
That seems pretty ugly on the surface, but it does confine the
ugliness to somewhere most people won't bother to look. One potential
problem is if someone wants to use __Xsigned in the name of an
unsigned cookie, but a namespace clash like that should be extremely
rare in practice.
Also, does the name of a cookie factor into the cookie length limits?
My reading of RFC 2109 says yes, but it'd be worth verifying, since it
would cut down on the usable value space. With your compressed base64
stuff, that's not as big of a problem, but still something to look
into.
> request.unsign_cookie('key') might be an option, as at least that
> reflects the set_cookie / sign / unsign API a bit. Not ideal by a long
> shot though.
>> try:
>> value = signed.unsign(signed_value).decode('utf-8')
>> except ValueError:
>> # Whoops! UnicodeDecodeError winds up here as well!
>
> That's a great argument against subclassing ValueError - I hadn't
> considered the unicode case. You're right, if anything it should
> subclass SuspiciousOperation instead.
I don't know if it's completely anti-ValueError, because a ValueError
subclass does still make some sense semantically, and since you can
catch more than one exception type in a try block, it's perfectly
functional. It's just that when lazy people blindly catch ValueError
without checking for something more specific as well, things can
break.
So it really just comes down to whether we expect people to be
thorough or lazy. Hrm. Yeah, I guess that answers it. :)
-Gul
I do think that this should find it's way into trunk, as signed
cookies are important in the use cases you mention and are really easy
to get wrong... and getting it wrong can be dangerous.
I'm not going to get into the dumps/loads bit right now because
there's enough to tackle on signed cookies.
On Thu, Sep 24, 2009 at 2:54 PM, Simon Willison <si...@simonwillison.net> wrote:
> Hmm... I hadn't considered that. I was thinking that the unsigning
> could be transparent, so by the time you access request.COOKIES['key']
> the value had already been unsigned (and if the signature failed the
> cookie key just wouldn't be set at all, as if the cookie never
> existed). But as you point out, this doesn't work because you can't
> tell if the cookie was signed or not in the first place.
>
> We could fix this with a naming convention of some sort:
>
> response.set_cookie('key', 'value', sign=True)
> - results in a Set-Cookie: key__Xsigned=value header
Unfortunately, this approach won't work.
A malicious client can just send "key" rather than "key__Xsigned" and
you'll never know that the cookie hasn't gone through validation.
This also means that you can't look for cookie values that match a
specific format (ex: body.signature) because a malicious client could
just drop the signature portion.
As always, we can't trust the client. :-(
This means that unsigning can *never* be fully transparent. We need a
symmetric specification of the fact that a given cookie should,
indeed, be signed.
> But that's pretty ugly. Not sure what to do about this one -
> request.unsign_cookie('key') might be an option, as at least that
> reflects the set_cookie / sign / unsign API a bit. Not ideal by a long
> shot though.
I'm not sure what the best solution is, but here are some of the
options I've considered:
1) request.unsign_cookie('foo') -- This breaks the parallelism with
existing cookies. Sometimes we'll be doing request.COOKIES['foo'] and
sometimes we'll be doing request.unsign_cookie('foo').
2) A decorator for views -- @unsign_cookies("foo", "bar") -- This
doesn't allow any sort of fall-back (you can't customize what to do if
a given cookie is improperly signed)
3) COOKIES as an intelligent object -- We can overload .get so we're
doing something like request.COOKIES.get('foo', signed=True) -- I
think this has the best chance at an interface that keeps a consistent
feel. It's completely backward compatible, though it breaks the
traditional expectation of what you can do via the `get` method on a
dictionary.
Best,
- Ben
Also, just to throw this out there for the sake of compleness: could
the signature be stored under a separate name, rather than being
bundled with the original cookie itself?
Set-Cookie: key=value
Set-Cookie: key__Xsignature=signature_string
It seems like this could address a couple issues at once.
* There's a clear distinction between signed and unsigned cookies, so
request.COOKIES can be populated with only valid cookies
* The key/value pair remains unchanged, so things like Google
Analytics can happily ignore the signature if it was applied
unnecessarily (middleware is back on the table!)
Since there may be an upper limit on the number of allowed cookies,
though, doubling the number of cookies could present some very real
problems. RFC 2109 recommends allowing at least 20 cookies per domain
name, and it looks like at least Microsoft takes that to be a
maximum,[1] so it could present very real problems (middleware is back
off the table!).
I'm not sure how many cookies people use on a regular basis, and this
would only be for explicitly signed cookies, so maybe it'd be okay,
but it's flirting dangerously close to being completely unworkable.
Worse yet, it doesn't look like there's any predictable way to know
which cookies would get lost in the event of having too many, so this
may end up causing some very weird application errors if things go
wrong.
At least now it's been recorded for future reference. (Hello, future
me, looking up information on why we did things the way we did! Do we
have flying cars yet?)
-Gul
And you've just added another reason my followup email is invalid,
though I sent that before reading your response. (Take that, future
me!)
> 3) COOKIES as an intelligent object -- We can overload .get so we're
> doing something like request.COOKIES.get('foo', signed=True) -- I
> think this has the best chance at an interface that keeps a consistent
> feel. It's completely backward compatible, though it breaks the
> traditional expectation of what you can do via the `get` method on a
> dictionary.
I was wondering about this option as well, after I mentioned adding a
request.COOKIES.get_unsigned() method. I actually like this idea a
lot, personally. As you mention, it's backward compatible, and I'm not
sure it completely breaks the traditional expectation. After all,
aren't subclasses expected to customize the behavior of their parents?
It doesn't change any existing behavior, but rather just adds
something extra.
The one downside to using get() directly, as opposed to an altogether
new method, is that get() doesn't raise a KeyError when a value
doesn't exist. That means if anyone's wrapping request.COOKIES[key] in
a try block and catching KeyError, changing to the new code is more
than just a one-liner. I'm personally okay with this, but it's
definitely worth noting.
-Gul
signer = Signer(key=...)
assert signer.unsign(signer.sign(value)) == value
This way you wouldn't have to pass around key, extra_key, and
potential further arguments but a single Signer instance. Plus, you
could easyly overwrite hashing, concatenation, and serialization as
well as add functionality transparently, eg.:
sig = TimestampedSignatureFactory(ttl=3600) # sig.unsign() will raise
SigntureExpired after 3600 seconds
> 1) request.unsign_cookie('foo') -- This breaks the parallelism with
> existing cookies. Sometimes we'll be doing request.COOKIES['foo'] and
> sometimes we'll be doing request.unsign_cookie('foo').
>
> 2) A decorator for views -- @unsign_cookies("foo", "bar") -- This
> doesn't allow any sort of fall-back (you can't customize what to do if
> a given cookie is improperly signed)
>
> 3) COOKIES as an intelligent object -- We can overload .get so we're
> doing something like request.COOKIES.get('foo', signed=True) -- I
> think this has the best chance at an interface that keeps a consistent
> feel. It's completely backward compatible, though it breaks the
> traditional expectation of what you can do via the `get` method on a
> dictionary.
4) A signed.Cookies object:
signed_cookies = signed.Cookies(request, key=...)
signed_cookies.set_cookie(key, value, **kwargs)
value = signed_cookies[key]
or
signed_cookies = signer.get_cookies(request)
...
__
Johannes
The problem is that you don't know which cookies are signed and which
aren't for the reasons posted earlier. So you don't know which cookies
to put in SIGNED_COOKIES and which to put in COOKIES unless accessing
COOKIES gives you raw values of ALL cookies and SIGNED_COOKIES
attempts to unsign ALL cookies. That seems really clunky.
You have to sign and unsign the cookies yourself in code which makes
the sign_cookie/unsign_cookie API make sense.
Ian
request.COOKIES.get_signed(key) makes the most sense to me since it's
clear you are dealing with cookies and as you said it looks something
like the request.POST object. Though it's a bit more verbose than
request.unsign_cookie(key), accessing cookies directly through the
request object looks wierd and unsign_cookie could be ambiguous.
Ian
Put me down as +1 in favor of adding support for signed cookies in some form.
As for the exact form that the API will take - I don't have any
particularly strong opinions at this point, and there are plenty of
big brains weighing in, so I will stay out of the discussion and let
the community evolve the idea.
By way of greasing the wheels towards trunk: if the outcome of this
mailing list thread was a wiki page that digested all the ideas,
concerns and issues into a single page, it will make the final
approval process much easier. Luke Plant's wiki page on the proposed
CSRF changes [1] is a good model to follow here - I wasn't involved in
the early stages of that discussion, but thanks to that wiki page, I
was able to come up to speed very quickly and see why certain ideas
were rejected.
[1] http://code.djangoproject.com/wiki/CsrfProtection
Yours,
Russ Magee %-)
> SECRET_KEY considerations
> =========================
Can I add some other things I've been worrying about while we're on
the topic?
In other web apps (I think Wordpress?), there have been problems
associated with use of secret keys when the same key is used for
different purposes throughout the application.
Suppose one part of an app signs an e-mail address for the purpose of
an account confirmation link sent in an e-mail. The user won't be
able to forge the link unless they know HMAC(SECRET_KEY, email).
However, suppose another part of the website allows a user to set
their e-mail address (merely for convenience), and stores it in a
signed cookie. That means an attacker can now easily get hold of
HMAC(SECRET_KEY, email), and forge the link.
There are many places in Django that use SECRET_KEY. I'm not
currently aware of any vulnerability, because in most cases the
attacker has only *limited* control over manipulating the message that
is being signed. But I may have missed some, and without some
systematic method, it would be easy for one place to open up
vulnerabilities for all the others.
So I propose:
- we review all the Django code involving md5/sha1
- we switch to HMAC where appropriate
- we add unique prefixes to the SECRET_KEY for every different
place it is used. So for the e-mail confirmation link, we use
HMAC("email-confirmation" + SECRET_KEY, message)
- also add the ability to do SECRET_KEY rotation, as Simon
suggested. This suggests we want a utility wrapper around hmac
that looks like hmac(unique_key_prefix, key, message) and handles
all the above details for us.
The main difficulty is the way this could break compatibility with
existing signed messages, especially persistent ones like those stored
in password files etc.
Luke
--
"Smoking cures weight problems...eventually..." (Steven Wright)
Luke Plant || http://lukeplant.me.uk/
I'm not a huge fan of request.SIGNED_COOKIES. I'd rather see them all in
request.COOKIES, but the same idea could still apply:
@expects_signed_cookies('username')
def user_home(request):
...
Alternatively, is it stupid to think that the number of signed cookies
used by an application is probably small enough that keeping a list of
them in settings.py is too much book keeping? Then, when
setting/retrieving a cookie, a lookup is done in settings, and if the
key is marked as SIGNED, it does it without human intervention. Less
things to go wrong that way I'd think, and any third-party app's cookies
can be secured without hardly any code changes.
SIGNED_COOKIES = ['username']
> Do you have any further information on the WordPress problems?
No, I can't find it. It might not have been WordPress. All I remember
is that it was along the lines of what I outlined in my previous e-
mail -- one part of the application was essentially allowing the user
to retrieve MD5(secret_key + user_supplied_data) (might have been HMAC
or SHA1), which allowed them to get past another bit of security.
This was something I was concerned with when I put together my app, so
I just added the name of the cookie to the signature as well, rather
than requiring some other explicit prefix. Since the two parts of the
application would need to use different names anyway to avoid other
problems, I figured the cookie name alone would be sufficient. If we
end up with something like response.set_signed_cookie() or
response.set_cookie(..., signed=True), the cookie name would be
available to the signing code internally, without any need to add in
some other key.
Of course, it'd still be worth documenting for the case of using the
signing stuff outside of cookies, because similar problems could creep
in elsewhere. I just think if we already have a name available, we
should be able to use it without any trouble at all. I wish there was
a way to sign the expiration as well, so people couldn't artificially
extend the life of the cookie, but since that doesn't come back in the
request, there'd be no way to validate it.
> So I propose:
>
> - we add unique prefixes to the SECRET_KEY for every different
> place it is used. So for the e-mail confirmation link, we use
> HMAC("email-confirmation" + SECRET_KEY, message)
I think this is good for everywhere the raw signing API is accessed,
perhaps to the point where that API can't even be used without a key
(a namespace, really - honking great idea!). Helpers on top of that
API could do without asking for that separately, if they can retrieve
a reasonable key from other forms of input, such as a cookie name or a
query-string argument name.
-Gul
Very true. I had considered baking the timestamp right into the value
portion as well, but I was concerned about the extra space it would
take. It looks like it would only be 8 characters max, if encoded as
base64, which could be shortened to 6 if we strip out the == at the
end. I think I remember reading somewhere that those can be reattached
programmatically on the other end if necessary. All in all, the space
usage isn't that bad, and since it would be an optional component
anyway, it wouldn't add any overhead to the common case.
Would expire_after on the unsign just automatically imply
timestamp=True? There's been a lot of concern raised about parity in
the API, and it reads a little weird with the two different arguments.
I'm not sure it's a problem, but it's just a little funny.
Even though Ben rightly pointed out that we can't autodetect whether a
value is signed or not, I wonder if we could at least autodetect the
presence of a timestamp within a signature, once we already know that
the value is supposed to be signed. Essentially the unsigning code
could look for two different types of signatures, one with a timestamp
and one without. The timestamp would then be the actual expiration
time, rather than the time it was signed, so the API can look like
this instead (with a key added per prior discussion).
>>> s = signed.sign('key', 'value')
>>> v = signed.unsign('key', s)
>>> s = signed.sign('key', 'value', expire_after=24 * 60 * 60)
>>> v = signed.unsign('key', s)
This does make some assumptions about the format of the signed value,
but once we explicitly establish that the value is signed (by way of
passing it into the unsigning code), it's safe to make certain
assumptions about the format of the value. Or am I missing something
obvious here?
In the cookie case, it might be appropriate to use the combination of
an explicit expiration and signed=True to imply that an expiration
timestamp should be added to the cookie as well. I think that would be
the desired behavior, but I'm not quite sure.
Aside from some of the specifics of the cookie implementation, I think
we're getting close to an API here. I just hope we can get some input
from a cryptographer to make sure we get a solid implementation before
we go too far with this.
-Gul
But that only works for signatures that do in fact use a timestamp. If
the API makes timestamps optional, then there's still the question of
what to do for signatures that aren't stamped anyawy. In those cases,
I assumed the protocol would be to simply try to most recent
SECRET_KEY and work backward if the signature failed. Once all current
and deprecated keys have been tried (which should only be a total of
two, I would think), it would raise BadSignature at that point.
If we already have that behavior for non-timestamped cookies (and
correct me if I'm wrong on that point), I don't know that it's much of
an argument in favor of which timestamp to use. As long as there's not
a huge list of valid SECRET_KEYs to choose from, the overhead of
trying them each individually should be negligible, so I'm not sure
how much of an issue it would be.
I do agree though that you do have a point about there being possible
reasons for expiring the key at a different point after the fact, but
I'd argue that in those cases, you'd want to set an explicit
expiration time. In your example, you provided expire_after=24 * 60 *
60, but that wouldn't let you expire a cookie because of an event that
happened 2 hours ago. I would think you'd want to pass in a specific
time that cookies should be considered expired instead.
Of course, that then goes back to stamping cookies with their creation
time anyway, because otherwise you couldn't really do it right. If I
say "expire as of time X" I want to expire all cookies issued prior to
that point in time, while leaving any issued after that point intact.
The only way to make that decision is to know the time it was issued,
rather than when it was originally expected to expire.
I think we're getting a bit ahead of ourselves, though. There's
nothing stopping an application timestamping its own signatures and
validating them however they like, so we're really just discussing a
reasonable default here. Heck, since the timestamp behavior is driven
entirely within an already-signed value, we don't need to provide any
behavior at all if it's not a common requirement. I think it's a good
idea, but I'd like to hear from other people before I really stand
behind including it.
-Gul
Regarding parity, let me advertise a Signer object again:
signer = signed.Signer(
key=settings.SECRET_KEY,
expire_after=3600
)
s = signer.sign(v)
v = signer.unsign(s)
signer.set_cookie(response, name, value)
signer.get_cookie(request, name)
# or:
signer.get_cookies(request)[name]
# transparently sign cookies set via response.set_cookie()
# and unsign cookies in request.COOKIES in place:
@signer.sign_cookies
def view0(request):
...
@signer.sign_cookies('cookie', 'names')
def view1(request):
...
This would make more options and customization feasible (by
subclassing):
- put signatures in separate cookies (or a single separate cookie)
- automatically include request.user.id (to prevent users from
sharing cookies)
- swap out the hash, serialization, or compression algorithm without
changing the token format
- customize when and how expiration is determined
__
Johannes
Personally, I don't see much point in specifically reporting on
incorrectly signed cookies - imo they should just be treated as if
they never existed. If someone really cared, they can look in
request.COOKIES to see if the cookie was in there but not in
SIGNED_COOKIES.
http://dpaste.com/hold/111386/
This is how a class-based API might look. It's based on django-signed.
All cookie related code is untested.
__
Johannes