Should SECRET_KEY be allowed to be bytes?

540 views
Skip to first unread message

Tim Graham

unread,
Dec 22, 2016, 2:41:20 PM12/22/16
to Django developers (Contributions to Django itself)
There's debate in #24994 about whether or not settings.SECRET_KEY should or may be a bytestring. Some select quotes to summarize the discussion:

1. Aymeric Augustin, "Once Django drops support for Python 2 you'll have to go out of your way to put bytes in the SECRET_KEY.

Currently, since the type isn't enforced, every app that wants to do something with SECRET_KEY should conditionally convert it to bytes, typically with force_bytes, since it _could_ be bytes. I don't think this is common knowledge; it's even a pitfall.

Enforcing the type would simplify that code and avoid errors for the minority who uses bytes in pluggable apps that assume text e.g. by doing settings.SECRET_KEY.encode()."

2. Andres Mejia, " Cryptographic keys are suppose to be random bytes that don't necessarily represent a Unicode string. See also the RFC I linked in my comment.

I think it's fair to assume devs using the SECRET_KEY know it must be used as bytes. Various crypto libraries will refuse to accept them otherwise. This is true of the hmac, cryptography, and pyOpenSSL libraries.

As for my use case, a common practice is to use an external script or program to pipe secrets into processes that need them. I use something like this to not only setup my Django sites but to also rotate the secrets in them whenever necessary. The output from a subprocess.check_output() call is in bytes. As of now, since Django accepts the SECRET_KEY as bytes, I use random bytes for my SECRET_KEY and have it loaded in my Django sites via an external program."

-----

A past issue, #19980, noted that Signer was broken when SECRET_KEY contains non-ASCII bytes and it was fixed.

1. Claude Paroz, "The random string that Django produces in startproject never contains non-ascii characters. Now of course anyone is free to set it to any random content. However, do we gain anything in allowing SECRET_KEY to be a bytestring? In the spirit of Python 3, I would be more in favour of documenting that SECRET_KEY is a string (not a bytestring), and if you include in it any non-ASCII chars on Python 2, you should prefix it by u''."

2. Anonymous, "I would argue that, given that this is used in cryptographic contexts, the key should be bytes, not a string. At the lowest level, all crypto happens on bytes anyway, and with strings you always rely on an encoding. Even though we can assume utf-8 fairly reasonably, this still is an assumption, and basically only used to convert the unicode string to a byte string. Why then, let us not pass in the bytes explicitly, independent of an encoding?
Even more so, this would make it similar to other web frameworks. For example, in Flask, the SECRET_KEY value is recommended to be generated completely random, using os.urandom(). Thus, a byte string.
If consensus is around using a unicode string nonetheless, this should be explicitly documented at the very least, and maybe even type-checked."

3. Luke Plant, "I agree it makes more sense for SECRET_KEY to be a bytestring, especially with the argument about os.random but we also need compatibility with unicode, especially as we cross the Python 2/3 barrier and as projects migrate.
So I think we should document that it can be either a string or bytestring, and it will be converted using UTF8 if it is a string."

4. Alex Gaynor, " in principle both the secret key and the salt_key should be bytes, not strings, so there should be no need to encode them."
-----

#24994 proposes to add a system check to disallow (or warn that it isn't proper) a SECRET_KEY that contains non-ASCII bytes (i.e. the case that was fixed in Signer in #19980).

I hope we can come to a decision and at least clarify the documentation. Perhaps deferring a code change (if any) until Django 2.0 so that only Python 3 is a consideration would help.

https://code.djangoproject.com/ticket/24994
https://code.djangoproject.com/ticket/19980

Ryan Hiebert

unread,
Dec 22, 2016, 3:18:52 PM12/22/16
to django-d...@googlegroups.com

> On Dec 22, 2016, at 1:41 PM, Tim Graham <timog...@gmail.com> wrote:
>
> There's debate in #24994 about whether or not settings.SECRET_KEY should or may be a bytestring. Some select quotes to summarize the discussion:
>
> [snip]
>
> I hope we can come to a decision and at least clarify the documentation. Perhaps deferring a code change (if any) until Django 2.0 so that only Python 3 is a consideration would help.
>
> https://code.djangoproject.com/ticket/24994
> https://code.djangoproject.com/ticket/19980

It seems to me that whatever option is chosen, it should support some mechanism that all provided bytes be random. In the case of using ascii-only bytes, this is not the case. With that goal in mind, I think there are, at least in principle, two choices, and the correct one depends on what is a more important priority.

If its more important to transparently have the right data type, then we should use bytes. This may mean that we prefer to generate bytes that are not fully random (top bits are always 0, etc) so that we can avoid escape sequences.

If we'd rather ensure that all of the bits we generate both avoid escape sequences and provide all of their own bytes randomly, then we'll need to have settings.SECRET_KEY be an encoded bytes, perhaps base64. In that case, it likely doesn't matter whether it's a string or a bytes, as base64.b64decode permits either.

I suspect making SECRET_KEY encoded with something like base64 is likely a non-starter due to backward compatibility, so that means to me that using bytes would be preferable.

Tim Graham

unread,
Dec 22, 2016, 3:32:54 PM12/22/16
to Django developers (Contributions to Django itself)
Perhaps times have changed but I forgot to mention that 8 years ago Malcolm rejected the idea that more randomness is required in the secret key. From the reporter of #9687:

"The generation of the SECRET_KEY setting for a new site uses an artificially low number of characters due to a design accident. As far as I can see, SECRET_KEY is not used in a way which would make it case-sensitive or require it to be read out, but instead is used in MD5 hashes where more randomness is preferred."

Malcolm:

"You don't explain what the "design accident" might be. If you are referring to the particular choice of characters, that is hardly an accident.

The question isn't whether the string is as random as possible (your alternative doesn't meet that requirement either), it's whether it's sufficiently random as to be effectively unguessable. The current code has 5050 different strings (around 1085) that are possible and the choices are statistically uniformly distributed over that space. Your are proposing to extend the range of characters, but that doesn't change much. Since the chance of guessing the secret key string for a particular site is already unlikely (a billion guesses per second will take 1068 years), the key weakness is the PRNG being used to gerneated the choices and that security isn't improved by increased the range of characters (if you can accurately determine the state at the start of the string, you can still determine the whole string, regardless of the number of characters used).

At some point here, we just have to make a choice. Why not use more characters than your range? Why not switch to 150 characters in the secret, etc?

If you want to use a longer secret key in your projects, that's certainly going to be possible, since it's just a string and you can create it however you like. The one that Django generates isn't inherently insecure, though."

https://code.djangoproject.com/ticket/9687

Ryan Hiebert

unread,
Dec 22, 2016, 3:43:57 PM12/22/16
to django-d...@googlegroups.com

On Dec 22, 2016, at 2:32 PM, Tim Graham <timog...@gmail.com> wrote:

Perhaps times have changed but I forgot to mention that 8 years ago Malcolm rejected the idea that more randomness is required in the secret key. From the reporter of #9687:

You're right, and I knew that, but didn't consider it in my response. I think it puts even more weight behind it being a bytes. The expressibility argument (it should support all bytes being truly random) of my options is still reasonable I think, but the readability negative for bytestrings is nullified, since you can just make the string longer when you generate it.

+1 for bytes

Aymeric Augustin

unread,
Dec 22, 2016, 5:15:53 PM12/22/16
to django-d...@googlegroups.com
Hello,

In my opinion, recommending or enforcing that SECRET_KEY contain random bytes would be a backwards incompatible change, bring no practical advantage, and make it more difficult to manage SECRET_KEY securely. I’m -1 on that.


startproject always generated an ASCII str on Python 2 and Python 3. While I don’t think startproject should be a consideration going forwards — if anything, it should be nuked from orbit, because if SECRET_KEY is stored in the code repository and copied on all developer’s laptops, it might as well be “hunter2” — startproject is still the best reference of what SECRET_KEY should look like.

Everyone I knows mimics the format when they implement a more decent way to set SECRET_KEY, for example:

export SECRET_KEY=… # generated with pwgen -s 50

SECRET_KEY = os.environ[‘SECRET_KEY’]

(Of course a configuration management system is a better option but that’s a luxury many small or medium projets can’t afford.)

Since very few people use bytes, especially on Python 3, recommending or enforcing bytes will be a de facto backwards incompatibility. Apps that use SECRET_KEY.encode() to obtain bytes and worked just find will crash when the type of SECRET_KEY changes to bytes.


Forcing every Django project to change `SECRET_KEY = os.environ[‘SECRET_KEY’]` to `SECRET_KEY = os.environ[‘SECRET_KEY’].encode()` doesn’t sound particularly useful to me.

Recommending random bytes — the point of the proposal as far as I understand — is likely cause security issues. For example, how many developers will accidentally end up with a null byte in a SECREY_KEY they initialize from an environment variable, making it much shorter as intended? If the docs started recommending generating SECRET_KEY with random bytes, that would certainly qualify as a security vulnerability.


I get that SECRET_KEY is often used in cryptographic contexts where things will eventually be converted to bytes and hashed. However, a careful audit of its use in the current version Django shows that a text key (unicode / str) will work everywhere while a bytes key will crash in some places. (This is a bug and it should be fixed.)

The reasons brought in support of the change look weak to me:

- “I think it's fair to assume devs using the SECRET_KEY know it must be used as bytes.” — well that doesn't include me or any Django dev I ever talked to about this topic
- “The output from a subprocess.check_output() call is in bytes” — this ignores the universal_newlines argument; really you have a choice, depending on your use case
- “Django accepts the SECRET_KEY as bytes” — more accurately, that works in most places, but not everywhere

Some comments also suggest a incomplete understanding of entropy. <secret> and base64.b64encode(<secret>) have the same entropy — and the latter can be used as is, it doesn’t need decoding. Talking of “bytes that aren’t fully random” doesn’t make sense in this context. If Django needs N bytes of entropy, then it should hash the SECRET_KEY (and perhaps a salt) with a hash function whose output has length N. In short, what matters is the total entropy of the secret key, as explained by Malcolm.


So — is the theoretical purity of optimizing the encoding of SECRET_KEY and the economy of 20 bytes worth throwing all these new problems at developers? I don’t think so.

Best regards,

--
Aymeric.

Adam Johnson

unread,
Dec 22, 2016, 6:22:28 PM12/22/16
to django-d...@googlegroups.com
+1 to what Aymeric wrote. I was just drafting an email with a similar argument about how it's hard to manage pure bytes in config management systems that write to env vars, that's why ascii strings are so useful. They're also easy to copy/paste and verify when adding them to your config management.


--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5F15C6CB-FA9B-466A-A909-2D2DBB43E24A%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.



--
Adam

Ryan Hiebert

unread,
Dec 22, 2016, 7:05:36 PM12/22/16
to django-d...@googlegroups.com

On Dec 22, 2016, at 5:22 PM, Adam Johnson <m...@adamj.eu> wrote:

+1 to what Aymeric wrote. I was just drafting an email with a similar argument about how it's hard to manage pure bytes in config management systems that write to env vars, that's why ascii strings are so useful. They're also easy to copy/paste and verify when adding them to your config management.

Thanks for this. As someone who writes 12-factor apps all day, this argument makes a lot of sense to me.

Tim Graham

unread,
Dec 23, 2016, 10:48:15 AM12/23/16
to Django developers (Contributions to Django itself)
I'm unsure of exactly how to proceed, probably due to my poor string encoding/unicode understanding.

Is a system check warning to suggest basestring (py2) / str (py3) secret keys desirable?

Aymeric, on the ticket you said, "Currently, since the type isn't enforced, every app that wants to do something with SECRET_KEY should conditionally convert it to bytes, typically with force_bytes, since it _could_ be bytes. I don't think this is common knowledge; it's even a pitfall. Enforcing the type would simplify that code and avoid errors for the minority who uses bytes in pluggable apps that assume text e.g. by doing settings.SECRET_KEY.encode()."


But then above, you said, "a bytes key will crash in some places [in Django]. (This is a bug and it should be fixed.)"

It sounds like you're saying Django should promote the use of basestring/str but also allow bytestrings (even non-ASCII bytestrings as reported in #19980?).

Aymeric Augustin

unread,
Dec 23, 2016, 12:40:59 PM12/23/16
to django-d...@googlegroups.com
On 23 Dec 2016, at 16:48, Tim Graham <timog...@gmail.com> wrote:

> also allow bytestrings (even non-ASCII bytestrings as reported in #19980?).


There are arguments both ways.

Allowing non-ASCII bytestring means every app that needs bytes must call force_bytes(settings.SECRET_KEY) instead of settings.SECRET_KEY.encode(), or else it will break on non-ASCII bytestrings. This will likely cause bugs in pluggable apps.

Disallowing non-ASCII bytestrings will require those who like them to change their ways, even if they don’t hit any incompatibility. Some people feel strongly that random bytestrings are the way to go. I don’t feel a great urge to frustrate them.

So count me in the “in doubt, do nothing” camp...

--
Aymeric.

Andres Mejia

unread,
Dec 23, 2016, 6:36:33 PM12/23/16
to django-d...@googlegroups.com
Hello,

I saw these emails today and am finally getting around to responding to
them now. I'm one of the people commenting to those issues about the
SECRET_KEY being bytes and how I think it should still be allowed. I
think there's a misunderstanding about what I said in those issues.
Although I do recommend the use of random bytes in generating the
SECRET_KEY, I'm certainly not suggesting it be enforced.

I'll comment further inline.


On 12/22/2016 05:15 PM, Aymeric Augustin wrote:
> Hello,
>
> In my opinion, recommending or enforcing that SECRET_KEY contain random bytes would be a backwards incompatible change, bring no practical advantage, and make it more difficult to manage SECRET_KEY securely. I’m -1 on that.

I share this sentiment the other way, as in, I disagree with
recommending/enforcing the SECRET_KEY be a valid Unicode string. I
already voiced them in the issues.

>
> startproject always generated an ASCII str on Python 2 and Python 3. While I don’t think startproject should be a consideration going forwards — if anything, it should be nuked from orbit, because if SECRET_KEY is stored in the code repository and copied on all developer’s laptops, it might as well be “hunter2” — startproject is still the best reference of what SECRET_KEY should look like.
>
> Everyone I knows mimics the format when they implement a more decent way to set SECRET_KEY, for example:
>
> export SECRET_KEY=… # generated with pwgen -s 50

What do you think is ultimately being used in the pwgen program? I'm
going to guess, at least on POSIX systems, it is /dev/urandom or
/dev/random, both of which return random bytes.

>
> SECRET_KEY = os.environ[‘SECRET_KEY’]
>
> (Of course a configuration management system is a better option but that’s a luxury many small or medium projets can’t afford.)
>
> Since very few people use bytes, especially on Python 3, recommending or enforcing bytes will be a de facto backwards incompatibility. Apps that use SECRET_KEY.encode() to obtain bytes and worked just find will crash when the type of SECRET_KEY changes to bytes.
>
>
> Forcing every Django project to change `SECRET_KEY = os.environ[‘SECRET_KEY’]` to `SECRET_KEY = os.environ[‘SECRET_KEY’].encode()` doesn’t sound particularly useful to me.
>
> Recommending random bytes — the point of the proposal as far as I understand — is likely cause security issues. For example, how many developers will accidentally end up with a null byte in a SECREY_KEY they initialize from an environment variable, making it much shorter as intended? If the docs started recommending generating SECRET_KEY with random bytes, that would certainly qualify as a security vulnerability.
>
>
> I get that SECRET_KEY is often used in cryptographic contexts where things will eventually be converted to bytes and hashed. However, a careful audit of its use in the current version Django shows that a text key (unicode / str) will work everywhere while a bytes key will crash in some places. (This is a bug and it should be fixed.)
>
> The reasons brought in support of the change look weak to me:
>
> - “I think it's fair to assume devs using the SECRET_KEY know it must be used as bytes.” — well that doesn't include me or any Django dev I ever talked to about this topic

Are you not one of the developer of the Django debug toolbar? Whether or
not you are, I presume the devs of the toolbar know that the SECRET_KEY
must be used as bytes, seeing that they have code in the toolbar to
ensure it is converted to bytes so it can be accepted by hashlib.

> - “The output from a subprocess.check_output() call is in bytes” — this ignores the universal_newlines argument; really you have a choice, depending on your use case
> - “Django accepts the SECRET_KEY as bytes” — more accurately, that works in most places, but not everywhere
>
> Some comments also suggest a incomplete understanding of entropy. <secret> and base64.b64encode(<secret>) have the same entropy — and the latter can be used as is, it doesn’t need decoding. Talking of “bytes that aren’t fully random” doesn’t make sense in this context. If Django needs N bytes of entropy, then it should hash the SECRET_KEY (and perhaps a salt) with a hash function whose output has length N. In short, what matters is the total entropy of the secret key, as explained by Malcolm.

What is likely to be the way that <secret> is generated? Seeing that
base64 is being mentioned here, that <secret> is likely to be generated
with os.urandom() or some other method that generates random bytes which
may or may not resemble a valid unicode string, otherwise why bother
base64 encoding the key in the first place. Also, base64.b64encode() in
fact returns bytes. You will need to decode it if you need to use it as
a string for some reason.

>
> So — is the theoretical purity of optimizing the encoding of SECRET_KEY and the economy of 20 bytes worth throwing all these new problems at developers? I don’t think so.
>
> Best regards,
>
What are these problems? The problem of trying to use encode() on a
bytes object? There's already the force_bytes() method from Django which
can handle that. Is it that some users, like myself, don't use
environment variables to store the secret key?

All Python crypto libraries uses bytes as input and output as far as I
know. Knowing this, I set the SECRET_KEY as bytes. I normally generate
them using `dd if=/dev/urandom ...` and store it somewhere to be later
retrieved via another Python method, as bytes. For me, management of
these keys, as bytes, is not hard at all. I don't bother with using
base64 or some other method to ensure I have only certain characters in
the SECRET_KEY, because I know it's unnecessary. I would like to
continue using bytes for my cryptographic keys and hope that Django will
still allow this.

--
Andres

Aymeric Augustin

unread,
Dec 24, 2016, 4:52:38 PM12/24/16
to django-d...@googlegroups.com
Hello Andres,

We both seem to agree with the status quo — supporting both text and bytes.


On 24 Dec 2016, at 00:36, 'Andres Mejia' via Django developers (Contributions to Django itself) <django-d...@googlegroups.com> wrote:
On 12/22/2016 05:15 PM, Aymeric Augustin wrote:
export SECRET_KEY=… # generated with pwgen -s 50
What do you think is ultimately being used in the pwgen program? I'm going to guess, at least on POSIX systems, it is /dev/urandom or /dev/random, both of which return random bytes.

I understand this, but it doesn’t change my argument. I’m saying that the format of SECRET_KEY doesn’t matter, as long as it contain enough entropy, since it will be injected in hashing algorithms designed to extract the entropy. I think we can agree on this.

We have different preferences for that format. You like keeping the original raw binary data SECRET_KEY. I find it more convenient to convert it to an ASCII-safe format, for example with pwgen. I really think this boils down to taste. I don’t think we can conclusively determine that one approach is superior to the other. I think my technique is more beginner friendly; while not applicable to you, it’s a concern for Django in general.

The only cost of supporting both options is that every use must go either through force_text or force_bytes to convert to a known type. 

- “I think it's fair to assume devs using the SECRET_KEY know it must be used as bytes.” — well that doesn't include me or any Django dev I ever talked to about this topic
(..)

Oops, I misunderstood “used as bytes” to mean “defined as bytes”. Sorry. I withdraw this.


And since I’ve been waving my hands about the types Django expects in a previous email, here’s the full audit. Below, text means unicode on Python 2 and str on Python 3. ASCII-safe bytes means bytes containing only ASCII-characters, so they can be used transparently as if they were text on Python 2, because it will call decode() implicitly.

- django/conf/global_settings.py

Sets the default to an empty text string (note the unicode_literals import at the top for Python 2).

- django/conf/settings.py-tpl

Sets the generated value to ASCII-safe bytes on Python 2 and text on Python 3 (no unicode_literals there).

- django/core/signing.py:

Calls force_bytes to support bytes and text in get_cookie_signer.

- django/utils/crypto.py:

Calls force_bytes to support bytes and text in salted_hmac.

Assumes SECRET_KEY contains text in the `if not using_sysrandom` branch of `get_random_string`. This is the bug I hinted to in a previous email. It must have appeared when adding the unicode_literals import to that file. No one complained since June 2012. It only affects people setting their SECRET_KEY to bytes on Python 3 or ASCII-unsafe bytes on Python 2 on Unix-like systems that don’t provide /dev/urandom. This sounds uncommon.

While we’re there, we should use https://docs.python.org/3/library/secrets.html#module-secrets on Python >= 3.6.


Best regards,

-- 
Aymeric.

Tim Graham

unread,
Dec 27, 2016, 1:49:15 PM12/27/16
to Django developers (Contributions to Django itself)
Thanks Aymeric. How about this documentation addition:

Uses of the key shouldn't assume that it's text or bytes. Every use should go
through :func:`~django.utils.encoding.force_text` or
:func:`~django.utils.encoding.force_bytes` to convert it to the desired type.

https://github.com/django/django/pull/7750

Adam created https://code.djangoproject.com/ticket/27635 about the "use secrets" idea.

Aymeric Augustin

unread,
Dec 28, 2016, 4:45:50 AM12/28/16
to django-d...@googlegroups.com
I’m happy with that.

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Cristiano Coelho

unread,
Aug 3, 2022, 5:16:53 PM8/3/22
to Django developers (Contributions to Django itself)
Years later, sorry. But this is still broken and SECRET_KEY management is a mess!

Even though you can now use bytes, this line here will blow up if you attempt to use bytes as a secret key: https://github.com/django/django/blob/3.2.14/django/core/checks/security/base.py#L202

Basically, we are unable to use bytes in the secret key (and we want bytes so we can use a short string for other HMAC/signing operations). This means that our "str" keys will be twice as big (if encoded as hex) and will also always end up hashed because a 64 bytes secret will be a 128 string hex, which is over HMAC-SHA256 block size.

Reply all
Reply to author
Forward
0 new messages