[Django] #31804: Parallelize database cloning process

41 views
Skip to first unread message

Django

unread,
Jul 20, 2020, 10:51:59 AM7/20/20
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. | Owner: nobody
Hussein |
Type: New | Status: assigned
feature |
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords: parallel, mysqlpump
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Parallelizing database cloning processes would yield a nice speed-up for
running Django's own test suite (and all django projects that use the
default test runner)

So far there are two main ways I see we can implement this:
- Use existing backend utilities e.g mysqlpump instead of mysqldump
- Use a normal multiprocessing pool on top of our existing cloning code

--
Ticket URL: <https://code.djangoproject.com/ticket/31804>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 20, 2020, 11:02:50 AM7/20/20
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmad A.
| Hussein
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:

Keywords: parallel, mysqlpump | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ahmad A. Hussein):

* owner: nobody => Ahmad A. Hussein


--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:1>

Django

unread,
Jul 20, 2020, 11:31:43 AM7/20/20
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmad A.
| Hussein
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Ahmad A. Hussein:

Old description:

> Parallelizing database cloning processes would yield a nice speed-up for
> running Django's own test suite (and all django projects that use the
> default test runner)
>
> So far there are two main ways I see we can implement this:
> - Use existing backend utilities e.g mysqlpump instead of mysqldump
> - Use a normal multiprocessing pool on top of our existing cloning code

New description:

Parallelizing database cloning processes would yield a nice speed-up for
running Django's own test suite (and all django projects that use the
default test runner)

So far there are three main ways I see we can implement this:
- Use a multiprocessing pool at the setup_databases level that'll create
workers which run ```clone_test_db``` for each method
- Use a pool at the ```clone_test_db``` level which parallelizes the
internal ```_clone_test_db``` call
- Scrap parallelizing the cloning in general, but parallelizing the
internals of specific backends (at least MySQL fits here)

In the first two options, we'd have to refactor MySQL's cloning process
since it has another call to ```_clone_db```. We have to because otherwise
we'd have a dump being created inside of each parallel process, slowing
the workers greatly.

In the last option, we could consider using mysqlpump instead of mysqldump
for both exporting the database and restoring it. The con of this approach
is that it isn't general enough to apply to the other backends.

Oracle's cloning process(although not merged in the current master) has
internal support for option 3 (users can specify a PARALLEL variable to
speed-up expdp/impdp utilities), and it can also use the first two
options.

The major con though with the first two options is forcing parallelization

--

--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:2>

Django

unread,
Jul 21, 2020, 2:18:16 AM7/21/20
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmad A.
| Hussein
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted


Comment:

Hi Ahmad. Yes: if you can get this going super.

One thing that's been bugging me about #31169 is how slow the DB cloning
appears on Windows. (I need to measure exact times...) So if we can speed
that up, it would be a big win.

Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:3>

Django

unread,
Jul 21, 2020, 12:49:52 PM7/21/20
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmad A.
| Hussein
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Ahmad A. Hussein):

[https://github.com/django/django/pull/13217 PR]

Still needs more work

--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:4>

Django

unread,
Aug 1, 2023, 8:01:58 AM8/1/23
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: (none)
Type: New feature | Status: new
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* owner: Ahmad A. Hussein => (none)
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:5>

Django

unread,
Oct 12, 2024, 4:11:54 AM10/12/24
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmed
| Ibrahim
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ahmed Ibrahim):

* owner: (none) => Ahmed Ibrahim
* status: new => assigned

Comment:

I'm taking this one!
--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:6>

Django

unread,
Oct 14, 2024, 10:49:56 AM10/14/24
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmed
| Ibrahim
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ahmed Ibrahim):

* has_patch: 0 => 1
* needs_tests: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:7>

Django

unread,
Mar 17, 2025, 6:49:33 PM3/17/25
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmed
| Ibrahim
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ahmed Ibrahim):

* needs_tests: 1 => 0

--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:8>

Django

unread,
Apr 27, 2025, 7:42:57 AM4/27/25
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmed
| Ibrahim
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by David Smith):

* needs_better_patch: 0 => 1

Comment:

The proposed patch currently crashes on Windows and PostgreSQL.
--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:9>

Django

unread,
Jul 17, 2025, 12:47:54 AM7/17/25
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmed
| Ibrahim
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Ahmed Ibrahim):

Replying to [comment:9 David Smith]:
> The proposed patch currently crashes on Windows and PostgreSQL.

Thanks for your response, they actually implemented this a while ago and
it was merged, not sure what to do with this ticket
--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:10>

Django

unread,
Jul 17, 2025, 11:24:00 AM7/17/25
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmed
| Ibrahim
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

There might be some confusion here Ahmed, the patch David is referring to
can be found
[https://github.com/django/django/pull/18668#issuecomment-2833414363 here]
and hasn't been merged yet.
--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:11>

Django

unread,
Jul 20, 2025, 10:20:25 AM7/20/25
to django-...@googlegroups.com
#31804: Parallelize database cloning process
-------------------------------------+-------------------------------------
Reporter: Ahmad A. Hussein | Owner: Ahmed
| Ibrahim
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: parallel, mysqlpump | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Ahmed Ibrahim):

Replying to [comment:11 Simon Charette]:
> There might be some confusion here Ahmed, the patch David is referring
to can be found
[https://github.com/django/django/pull/18668#issuecomment-2833414363 here]
and hasn't been merged yet.

Thank you for elaborating, I mixed things up, yes this patch is authored
by me but idk what made me believe that it's now redundant, I will try to
fix the issues reported and I will appreciate any review
--
Ticket URL: <https://code.djangoproject.com/ticket/31804#comment:12>
Reply all
Reply to author
Forward
0 new messages