A more useful list of common passwords?

303 views
Skip to first unread message

Brenton Cleeland

unread,
Mar 30, 2018, 1:24:19 AM3/30/18
to Django developers (Contributions to Django itself)
Three years ago Django introduced the CommonPasswordValidator and included a list of 1,000 passwords considered to be "common". That list was based on leaked passwords and came from xato.net[1].

I'd like to update the list to

a) be from a more reliable / recent source
b) be larger and more in line with the NIST recommendations

Security researcher Troy Hunt has published a massive list of leaked passwords, including frequencies on Have I Been Pwned[2]. The top 20,000 of which are available in a gist from Royce Williams[3], including the frequency, md5 hash and plain text password.

Interestingly there's 27 passwords in the Django list that aren't in the HIBP list. I'd post them here but they're mostly short and not safe for work.

I've created a ticket for the increase in size[4] but wanted to check in and make sure this is something django-developers thinks is valuable.

Cheers,
Brenton

[1]: https://web.archive.org/web/20150315154609/https://xato.net/passwords/more-top-worst-passwords/#.Wr3H1chxV25
[2]: https://haveibeenpwned.com/Passwords
[3]: https://gist.github.com/roycewilliams/281ce539915a947a23db17137d91aeb7
[4]: https://code.djangoproject.com/ticket/29274

Curtis Maloney

unread,
Mar 30, 2018, 1:26:49 AM3/30/18
to django-d...@googlegroups.com
What sort of performance impact is this having over the existing list?

What's the additional memory load, if any?

--
Curtis
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/0a215878-9d3f-4446-a018-602694f54904%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/0a215878-9d3f-4446-a018-602694f54904%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Curtis Maloney

unread,
Mar 30, 2018, 1:31:13 AM3/30/18
to django-d...@googlegroups.com
By which I mean... hi Brenton! Great to see you being active again :)

It's great you've taken the time to do this, and the benefits are very
clear [improved security], but what are the costs?

Whilst you're at it, what is the new file size?

--
Curtis

Brenton Cleeland

unread,
Mar 30, 2018, 2:50:29 AM3/30/18
to django-d...@googlegroups.com
Heya, Curtis!

The gzipped file size of the new file is 82K. That's with all 19,999 passwords from Royce's list.

I threw together a quick test that compares the default list to the new larger one by checking 10,000 random passwords. Speed difference is negligible, with both varying between 0.8–1.1 seconds on my machine.

Memory usage on the other hand is definitely higher. With the current Django list of 1,000 passwords memory usage increases by 0.1MiB. With the new list it's 0.9-1.0MiB. This would be expected, since the list if 20x the size. To put it into context, the project that I can that test on (a fresh project using the standard template) was already using 30MiB to run the management command.

You can see the full output of the memory test here:
https://gist.github.com/sesh/c431b8cc6b5063e31f08b2a4dc3b46f0

I think the trade-off of a little extra memory is worth it. If you really want to save memory you can (should?) disable the common password validator or provide your own shorter list anyway.

To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com <mailto:django-developers+unsubsc...@googlegroups.com>.
To post to this group, send email to django-developers@googlegroups.com <mailto:django-developers@googlegroups.com>.


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/oMWLVK5kTpI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Cheers,
Brenton

Adam Johnson

unread,
Mar 30, 2018, 4:06:13 AM3/30/18
to django-d...@googlegroups.com
This new file sounds good to me.

Whilst you're at it, what is the new file size?

I downloaded the gist, took only column 3 (the actual passwords) and gzipped it, it came to 81K over the existing 3.8K. Uncompressed that's 163K over 7.1K.

It would probably warrant a smarter checking algorithm over the current one, where the validator loads the whole file into memory on initialization (and doesn't share it between instances).

OOI have you seen https://github.com/ubernostrum/pwned-passwords-django/ , which uses Troy Hunt's massive API for all leaked passwords ?

To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com <mailto:django-developers+unsubsc...@googlegroups.com>.
To post to this group, send email to django-developers@googlegroups.com <mailto:django-developers@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

Curtis Maloney

unread,
Mar 30, 2018, 4:12:36 AM3/30/18
to django-d...@googlegroups.com
On 03/30/2018 07:05 PM, Adam Johnson wrote:
> This new file sounds good to me.
>
> Whilst you're at it, what is the new file size?
>
>
> I downloaded the gist, took only column 3 (the actual passwords) and
> gzipped it, it came to 81K over the existing 3.8K. Uncompressed that's
> 163K over 7.1K.

Still a tiny drop compared to a running system... but something worth
keeping an eye on.

A quick look at the code shows, of course, that you can specify your own
file, so IFF this new file is rejected, it can at least be easily
offered and used.

> It would probably warrant a smarter checking algorithm over the current
> one, where the validator loads the whole file into memory on
> initialization (and doesn't share it between instances).

The current solution is storing the strings in a set, so membership of
strings in a set _should_ be fairly efficient.

--
Curtis

> OOI have you seen https://github.com/ubernostrum/pwned-passwords-django/
> , which uses Troy Hunt's massive API for all leaked passwords ?

The joy of pluggable validators is... people can choose their level of
strictness :)

--
C

Jessica F

unread,
Apr 10, 2018, 5:54:55 PM4/10/18
to Django developers (Contributions to Django itself)

Hello! I'm Jessica, the assignee to this ticket. I am speaking on behalf of a group of newbies contributing to open source projects.
I was looking at the list of 20k passwords by Royce Williams, and there were 40 that were something like "$HEX[d0bfd197d5]". When I parsed them, nothing legible came out of it. I was wondering if this was an error on the list or was it intentional?


Brenton Cleeland

unread,
Apr 10, 2018, 5:59:20 PM4/10/18
to django-d...@googlegroups.com
Hi Jessica (& team!),

My immediate thought is that those rows are errors. They should be ignored and not included in any list added to Django :)

On 11 April 2018 at 02:13, Jessica F <jf2...@nyu.edu> wrote:

Hello! I'm Jessica, the assignee to this ticket. I am speaking on behalf of a group of newbies contributing to open source projects.
I was looking at the list of 20k passwords by Royce Williams, and there were 40 that were something like "$HEX[d0bfd197d5]". When I parsed them, nothing legible came out of it. I was wondering if this was an error on the list or was it intentional?


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/oMWLVK5kTpI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-developers+unsubscribe@googlegroups.com.

To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.



--

Jessica F

unread,
Apr 11, 2018, 2:41:44 PM4/11/18
to Django developers (Contributions to Django itself)
I see. Thank you very much!

Cheers,
Jessica


On Tuesday, April 10, 2018 at 5:59:20 PM UTC-4, Brenton Cleeland wrote:
Hi Jessica (& team!),

My immediate thought is that those rows are errors. They should be ignored and not included in any list added to Django :)
On 11 April 2018 at 02:13, Jessica F <jf2...@nyu.edu> wrote:

Hello! I'm Jessica, the assignee to this ticket. I am speaking on behalf of a group of newbies contributing to open source projects.
I was looking at the list of 20k passwords by Royce Williams, and there were 40 that were something like "$HEX[d0bfd197d5]". When I parsed them, nothing legible came out of it. I was wondering if this was an error on the list or was it intentional?


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/oMWLVK5kTpI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Kelly

unread,
Apr 13, 2018, 5:29:10 PM4/13/18
to Django developers (Contributions to Django itself)
Hello!

I am Kelly, a member of the group working on ticket #29274. We really appreciate your help thus far!

We have successfully replaced the list of passwords and ran the unit tests with ./runtests.py.

When looking at the CommonPasswordValidatorTest(TestCase) class found in https://github.com/django/django/blob/2cb6b7732dc7b172797cebb1e8f19be2de89e264/tests/auth_tests/test_validators.py, we noticed that only a few strings are being tested, namely 'godzilla' and 'a-safe-password'.

As we make our pull request, we were wondering if we should include more specific units tests to test the validator.

Cheers,

Kelly

On Tuesday, April 10, 2018 at 5:59:20 PM UTC-4, Brenton Cleeland wrote:
Hi Jessica (& team!),

My immediate thought is that those rows are errors. They should be ignored and not included in any list added to Django :)
On 11 April 2018 at 02:13, Jessica F <jf2...@nyu.edu> wrote:

Hello! I'm Jessica, the assignee to this ticket. I am speaking on behalf of a group of newbies contributing to open source projects.
I was looking at the list of 20k passwords by Royce Williams, and there were 40 that were something like "$HEX[d0bfd197d5]". When I parsed them, nothing legible came out of it. I was wondering if this was an error on the list or was it intentional?


--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/oMWLVK5kTpI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

James Bennett

unread,
Apr 13, 2018, 5:33:24 PM4/13/18
to django-d...@googlegroups.com
One approach you might try is on every test run, randomly select some lines from the list of common passwords and verify they fail the validator. That way we know it's not just testing a single, fixed, contrived case.

Kelly

unread,
Apr 16, 2018, 11:16:05 AM4/16/18
to Django developers (Contributions to Django itself)
Thank you for your quick reply! We will try that.

Cheers,

Kelly
Reply all
Reply to author
Forward
0 new messages