[Django] #36190: High memory usage of CommonPasswordValidator

21 views
Skip to first unread message

Django

unread,
Feb 15, 2025, 6:06:41 AMFeb 15
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
---------------------------------+-----------------------------------------
Reporter: Michel Le Bihan | Type: Uncategorized
Status: new | Component: Uncategorized
Version: 5.1 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+-----------------------------------------
Hello,

I noticed that `CommonPasswordValidator` has a very high memory usage.
Loading the `piotrcki-wordlist-top10m.txt` list that has `9872702` entries
causes a 755M usage even though the disk size of the list is only 87M.
Since `CommonPasswordValidator` basically only needs to check for
membership, I think that using a variant of a bloom filter would be much
better than using a hash table (`set()`).
--
Ticket URL: <https://code.djangoproject.com/ticket/36190>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 15, 2025, 6:14:42 AMFeb 15
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: (none)
Type: Uncategorized | Status: new
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Michel Le Bihan):

* component: Uncategorized => contrib.auth
* keywords: => CommonPasswordValidator

--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:1>

Django

unread,
Feb 17, 2025, 12:08:15 AMFeb 17
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: assigned
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
CommonPasswordValidator |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Priyanshu Singh Panda):

* owner: (none) => Priyanshu Singh Panda
* stage: Unreviewed => Accepted
* status: new => assigned
* type: Uncategorized => Cleanup/optimization

--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:2>

Django

unread,
Feb 17, 2025, 2:37:09 AMFeb 17
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* resolution: => needsinfo
* stage: Accepted => Unreviewed
* status: assigned => closed

Comment:

Michel can you please share steps to reproduce? This will help in
investigating and resolving any issues
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:3>

Django

unread,
Feb 18, 2025, 2:53:18 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Priyanshu Singh Panda):

Replying to [comment:3 Sarah Boyce]:
> Michel can you please share steps to reproduce? This will help in
investigating and resolving any issues
I was able to generate this using the "piotrcki-wordlist-top10m.txt" file,
which contains around 10 million commonly used passwords. The
CommonPasswordValidator loads the file and stores the passwords in a list.
By default, the original file contains only 2,000 passwords, which is much
smaller compared to the new file and requires more memory.
{{{
try:
with gzip.open(password_list_path, "rt", encoding="utf-8") as f:
self.passwords = {x.strip() for x in f}
except OSError:
with open(password_list_path) as f:
self.passwords = {x.strip() for x in f
}}}
I'm currently working on addressing the memory issue caused by loading
such a large file. Please assign this task to me so I can continue working
on optimizing it.
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:4>

Django

unread,
Feb 18, 2025, 3:02:11 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Michel Le Bihan):

Hello,

Thanks for looking at this. How are you planning to reduce memory usage?
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:5>

Django

unread,
Feb 18, 2025, 3:10:36 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Priyanshu Singh Panda):

Replying to [comment:5 Michel Le Bihan]:
> Hello,
>
> Thanks for looking at this. How are you planning to reduce memory usage?

I plan to optimize memory and processing time by implementing a Bloom
filter. This will reduce memory usage by storing only essential data and
speed up lookups using a probabilistic approach. I have also validated the
improvements using the @profile function, which tracks memory usage per
line of code. After implementing the Bloom filter, both time and memory
usage have been significantly reduced.
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:6>

Django

unread,
Feb 18, 2025, 4:05:07 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Michel Le Bihan):

How will you deal with false positives? Maybe using a sorted array of 64
bit hashes and doing a binary search on it would have a much better lower
FP ratio.
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:7>

Django

unread,
Feb 18, 2025, 5:24:54 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Priyanshu Singh Panda):

Replying to [comment:7 Michel Le Bihan]:
> How will you deal with false positives? Maybe using a sorted array of 64
bit hashes and doing a binary search on it would have a much better lower
FP ratio.
Both approaches seem optimal for handling false positives. I’ve already
set a custom false positive rate for the Bloom filter, which provides a
good balance between memory usage and performance. However, as you
mentioned, using a sorted array of 64-bit hashes with binary search could
further optimize the false positive rate.

Would you recommend switching to this method or keeping the Bloom filter
with the custom FP rate? I can also implement the hash map approach if
needed for further optimization.

Can you let me know how you'd like to proceed, and kindly assign this task
to me.
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:8>

Django

unread,
Feb 18, 2025, 5:33:24 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Michel Le Bihan):

That's a question that should be answered by the Django team...
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:9>

Django

unread,
Feb 18, 2025, 5:33:45 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: new
Component: contrib.auth | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage:
CommonPasswordValidator | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Michel Le Bihan):

* resolution: needsinfo =>
* status: closed => new

--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:10>

Django

unread,
Feb 18, 2025, 7:40:53 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: assigned
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
CommonPasswordValidator | checkin
Has patch: 0 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Priyanshu Singh Panda):

* cc: Priyanshu Singh Panda (added)
* needs_docs: 0 => 1
* stage: Unreviewed => Ready for checkin
* status: new => assigned
* version: 5.1 => dev

--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:11>

Django

unread,
Feb 18, 2025, 7:41:35 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: assigned
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
CommonPasswordValidator |
Has patch: 0 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Priyanshu Singh Panda):

* stage: Ready for checkin => Accepted

--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:12>

Django

unread,
Feb 18, 2025, 8:04:54 AMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: dev
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
CommonPasswordValidator |
Has patch: 0 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* resolution: => wontfix
* status: assigned => closed

Comment:

There is often a trade-off between memory usage and performance.

For `CommonPasswordValidator` to compromise on accuracy via a bloom filter
would be backwards incompatible, and therefore, not acceptable without a
strong consensus from the community.

In Django, you can write your own password validators. In this case, a
custom very large file is being used, and so it might make sense to use a
custom validator that uses a bloom filter (to deal with the very large
file). This custom validator could be provided by a third-party package.

Anyone is welcome to discuss this further on the
[https://forum.djangoproject.com/c/internals/5 Django forum].
Note that any PR to improve the current state should not compromise on
accuracy and needs to include performance and memory bench-marking.
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:13>

Django

unread,
Feb 18, 2025, 2:41:03 PMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: dev
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
CommonPasswordValidator |
Has patch: 0 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Michel Le Bihan):

* Attachment "test2.py" added.

Django

unread,
Feb 18, 2025, 2:41:04 PMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: dev
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
CommonPasswordValidator |
Has patch: 0 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Michel Le Bihan):

* Attachment "test.py" added.

Django

unread,
Feb 18, 2025, 2:44:51 PMFeb 18
to django-...@googlegroups.com
#36190: High memory usage of CommonPasswordValidator
-------------------------------------+-------------------------------------
Reporter: Michel Le Bihan | Owner: Priyanshu
Type: | Singh Panda
Cleanup/optimization | Status: closed
Component: contrib.auth | Version: dev
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
CommonPasswordValidator |
Has patch: 0 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Michel Le Bihan):

Here is a simple benchmark:
{{{
(venv) michel@debian:/dev/shm$ python3 test.py
Reading password list took: 3.177845547
Checking if password is in list took: 2.2280000000485245e-06
Password is in list: False
Size of passwords in MB: 755.3031387329102
(venv) michel@debian:/dev/shm$ python3 test2.py
Reading password list took: 17.459581553
Checking if password is in list took: 1.1341000000442136e-05
Password is in list: False
Size of passwords in MB: 75.32260131835938
}}}

As you can see, reading the password list is much longer and not really
optimized. However, checking if a password is in the list is basically
instant and the memory usage reduction is significant. As for false
positives, I think that they are extremely unlikely. 64 bits is really a
lot.
--
Ticket URL: <https://code.djangoproject.com/ticket/36190#comment:14>
Reply all
Reply to author
Forward
0 new messages