Hello,
I saw these emails today and am finally getting around to responding to
them now. I'm one of the people commenting to those issues about the
SECRET_KEY being bytes and how I think it should still be allowed. I
think there's a misunderstanding about what I said in those issues.
Although I do recommend the use of random bytes in generating the
SECRET_KEY, I'm certainly not suggesting it be enforced.
I'll comment further inline.
On 12/22/2016 05:15 PM, Aymeric Augustin wrote:
> Hello,
>
> In my opinion, recommending or enforcing that SECRET_KEY contain random bytes would be a backwards incompatible change, bring no practical advantage, and make it more difficult to manage SECRET_KEY securely. I’m -1 on that.
I share this sentiment the other way, as in, I disagree with
recommending/enforcing the SECRET_KEY be a valid Unicode string. I
already voiced them in the issues.
>
> startproject always generated an ASCII str on Python 2 and Python 3. While I don’t think startproject should be a consideration going forwards — if anything, it should be nuked from orbit, because if SECRET_KEY is stored in the code repository and copied on all developer’s laptops, it might as well be “hunter2” — startproject is still the best reference of what SECRET_KEY should look like.
>
> Everyone I knows mimics the format when they implement a more decent way to set SECRET_KEY, for example:
>
> export SECRET_KEY=… # generated with pwgen -s 50
What do you think is ultimately being used in the pwgen program? I'm
going to guess, at least on POSIX systems, it is /dev/urandom or
/dev/random, both of which return random bytes.
>
> SECRET_KEY = os.environ[‘SECRET_KEY’]
>
> (Of course a configuration management system is a better option but that’s a luxury many small or medium projets can’t afford.)
>
> Since very few people use bytes, especially on Python 3, recommending or enforcing bytes will be a de facto backwards incompatibility. Apps that use SECRET_KEY.encode() to obtain bytes and worked just find will crash when the type of SECRET_KEY changes to bytes.
>
>
> Forcing every Django project to change `SECRET_KEY = os.environ[‘SECRET_KEY’]` to `SECRET_KEY = os.environ[‘SECRET_KEY’].encode()` doesn’t sound particularly useful to me.
>
> Recommending random bytes — the point of the proposal as far as I understand — is likely cause security issues. For example, how many developers will accidentally end up with a null byte in a SECREY_KEY they initialize from an environment variable, making it much shorter as intended? If the docs started recommending generating SECRET_KEY with random bytes, that would certainly qualify as a security vulnerability.
>
>
> I get that SECRET_KEY is often used in cryptographic contexts where things will eventually be converted to bytes and hashed. However, a careful audit of its use in the current version Django shows that a text key (unicode / str) will work everywhere while a bytes key will crash in some places. (This is a bug and it should be fixed.)
>
> The reasons brought in support of the change look weak to me:
>
> - “I think it's fair to assume devs using the SECRET_KEY know it must be used as bytes.” — well that doesn't include me or any Django dev I ever talked to about this topic
Are you not one of the developer of the Django debug toolbar? Whether or
not you are, I presume the devs of the toolbar know that the SECRET_KEY
must be used as bytes, seeing that they have code in the toolbar to
ensure it is converted to bytes so it can be accepted by hashlib.
> - “The output from a subprocess.check_output() call is in bytes” — this ignores the universal_newlines argument; really you have a choice, depending on your use case
> - “Django accepts the SECRET_KEY as bytes” — more accurately, that works in most places, but not everywhere
>
> Some comments also suggest a incomplete understanding of entropy. <secret> and base64.b64encode(<secret>) have the same entropy — and the latter can be used as is, it doesn’t need decoding. Talking of “bytes that aren’t fully random” doesn’t make sense in this context. If Django needs N bytes of entropy, then it should hash the SECRET_KEY (and perhaps a salt) with a hash function whose output has length N. In short, what matters is the total entropy of the secret key, as explained by Malcolm.
What is likely to be the way that <secret> is generated? Seeing that
base64 is being mentioned here, that <secret> is likely to be generated
with os.urandom() or some other method that generates random bytes which
may or may not resemble a valid unicode string, otherwise why bother
base64 encoding the key in the first place. Also, base64.b64encode() in
fact returns bytes. You will need to decode it if you need to use it as
a string for some reason.
>
> So — is the theoretical purity of optimizing the encoding of SECRET_KEY and the economy of 20 bytes worth throwing all these new problems at developers? I don’t think so.
>
> Best regards,
>
What are these problems? The problem of trying to use encode() on a
bytes object? There's already the force_bytes() method from Django which
can handle that. Is it that some users, like myself, don't use
environment variables to store the secret key?
All Python crypto libraries uses bytes as input and output as far as I
know. Knowing this, I set the SECRET_KEY as bytes. I normally generate
them using `dd if=/dev/urandom ...` and store it somewhere to be later
retrieved via another Python method, as bytes. For me, management of
these keys, as bytes, is not hard at all. I don't bother with using
base64 or some other method to ensure I have only certain characters in
the SECRET_KEY, because I know it's unnecessary. I would like to
continue using bytes for my cryptographic keys and hope that Django will
still allow this.
--
Andres