[Django] #35572: Improve performance replacing os.listdir() with os.scandir()

71 views
Skip to first unread message

Django

unread,
Jul 2, 2024, 5:02:08 AM7/2/24
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Type:
| Cleanup/optimization
Status: new | Component:
| Uncategorized
Version: dev | Severity: Normal
Keywords: scandir listdir | Triage Stage:
python os | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Use `os.scandir()` instead of `os.listdir()` in the remaining occurrences
in the code:
https://github.com/search?q=repo%3Adjango%2Fdjango+os.listdir&type=code

Based on the [https://docs.python.org/3/library/os.html#os.scandir Python
documentation]
> Using scandir() instead of listdir() can significantly increase the
performance of code that also needs file type or file attribute
information, because os.DirEntry objects expose this information if the
operating system provides it when scanning a directory.
--
Ticket URL: <https://code.djangoproject.com/ticket/35572>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 2, 2024, 5:39:29 AM7/2/24
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Uncategorized | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* stage: Unreviewed => Accepted

Comment:

Similar to #29689 accepting, thank you
Note that additional benchmarks in [https://github.com/django/django-asv
django-asv] are always welcome 👍
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:1>

Django

unread,
Jul 2, 2024, 8:46:38 AM7/2/24
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Amir
Type: | Karimi
Cleanup/optimization | Status: assigned
Component: Uncategorized | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Amir Karimi):

* owner: (none) => Amir Karimi
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:2>

Django

unread,
Jul 2, 2024, 8:52:11 AM7/2/24
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Amir
Type: | Karimi
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* component: Uncategorized => Core (Other)

Comment:

The description makes it sound like this is a simple find and replace all,
however, do all usages "also need file type or file attribute
information"?
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:3>

Django

unread,
Jul 2, 2024, 9:32:09 AM7/2/24
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Amir
Type: | Karimi
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Amir Karimi):

Replying to [comment:3 Tim Graham]:
> The description makes it sound like this is a simple find and replace
all, however, do all usages "also need file type or file attribute
information"?
Good point! Except this case:
https://github.com/django/django/blob/aa74c4083e047473ac385753e047e075e8f04890/scripts/manage_translations.py#L42
I didn't find any other cases where file attributes (is_dir, etc) are
needed, and only their names or the number of list_dir output are needed.
The only edge that "scandir" may still have is its less memory consumption
when it comes to large folders (which I suspect is the case in any of
these usages)
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:4>

Django

unread,
Feb 3, 2025, 10:55:27 PM2/3/25
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Marcus
Type: | Vinicius Araujo
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Marcus Vinicius Araujo):

* cc: Marcus Vinicius Araujo (added)
* owner: Amir Karimi => Marcus Vinicius Araujo

--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:5>

Django

unread,
Feb 3, 2025, 11:07:26 PM2/3/25
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Marcus
Type: | Vinicius Araujo
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Marcus Vinicius Araujo):

* has_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:6>

Django

unread,
Feb 3, 2025, 11:17:00 PM2/3/25
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Marcus
Type: | Vinicius Araujo
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Marcus Vinicius Araujo):

Replying to [comment:2 Amir Karimi]:

Amir, my bad. I'm new to contributing to this project, and I didn't notice
that this ticket had already been assigned to you.

Anyway, I submitted a patch.
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:7>

Django

unread,
Feb 4, 2025, 4:08:49 AM2/4/25
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Marcus
Type: | Vinicius Araujo
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Amir Karimi):

Replying to [comment:7 Marcus Vinicius Araujo]:
> Replying to [comment:2 Amir Karimi]:
>
> Amir, my bad. I'm new to contributing to this project, and I didn't
notice that this ticket had already been assigned to you.
>
> Anyway, I submitted a patch.
That's fine. I forgot I had such a ticket. Next time, you can first ask
what's going on with the task and if you don't hear back, you can assign
it to yourself.
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:8>

Django

unread,
Feb 6, 2025, 10:39:08 PM2/6/25
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Marcus
Type: | Vinicius Araujo
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Marcus Vinicius Araujo):

Replying to [comment:8 Amir Karimi]:
> Replying to [comment:7 Marcus Vinicius Araujo]:
> > Replying to [comment:2 Amir Karimi]:
> >
> > Amir, my bad. I'm new to contributing to this project, and I didn't
notice that this ticket had already been assigned to you.
> >
> > Anyway, I submitted a patch.
> That's fine. I forgot I had such a ticket. Next time, you can first ask
what's going on with the task and if you don't hear back, you can assign
it to yourself.

Will do! Thanks.
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:9>

Django

unread,
Feb 11, 2025, 11:49:26 AM2/11/25
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Marcus
Type: | Vinicius Araujo
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):

After reviewing this ticket, the relevant documentation, and the PR, I
remain unconvinced of the benefits of this change. Given Tim's question
and answer (specifically, that only `manage_translations.py` requires file
type or file attribute information) the proposed change seems to add an
extra level of indentation to the code without a clear benefit.

Basic performance tests show only minimal improvements. In accordance with
our [https://docs.djangoproject.com/en/dev/internals/contributing/bugs-
and-features/#requesting-performance-optimizations current docs for
requesting performance optimizations] I think we need to mark this as
`wontfix` until a clear and substantial performance gain is demonstrated.
The risk of breaking systems that rely on `os.listdir` (for example, via
monkeypatching) outweighs the potential unproven benefits.
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:10>

Django

unread,
Feb 26, 2025, 11:10:21 AM2/26/25
to django-...@googlegroups.com
#35572: Improve performance replacing os.listdir() with os.scandir()
-------------------------------------+-------------------------------------
Reporter: Paolo Melchiorre | Owner: Marcus
Type: | Vinicius Araujo
Cleanup/optimization | Status: closed
Component: Core (Other) | Version: dev
Severity: Normal | Resolution: wontfix
Keywords: scandir listdir | Triage Stage: Accepted
python os |
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* resolution: => wontfix
* status: assigned => closed

Comment:

Agreed
If someone has found a place where it is suitable (and I'd avoid making
these changes in tests), and has a benchmark which shows a clear benefit,
then this can be posted here and the ticket could be reopened
--
Ticket URL: <https://code.djangoproject.com/ticket/35572#comment:11>
Reply all
Reply to author
Forward
0 new messages