[Django] #33636: BulkProcessMixin on models.Model

3 views
Skip to first unread message

Django

unread,
Apr 11, 2022, 9:53:25 AM4/11/22
to django-...@googlegroups.com
#33636: BulkProcessMixin on models.Model
-------------------------------------+-------------------------------------
Reporter: Myung Eui | Owner: nobody
Yoon |
Type: New | Status: new
feature |
Component: Database | Version: 4.0
layer (models, ORM) |
Severity: Normal | Keywords: bulk, model
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
{{{
"""
class BulkyModel(models.Model, ModelBulkProcessMixin):
id = models.BigAutoField(primary_key=True)
name = models.CharField(max_length=10, null=False)

With ModelBulkProcessMixin , We could minimize memory usage.
Without ModelBulkProcessMixin, We shold maintain bulk array size
up to 100_000
or manually maintain arraysize up to batch_size like 10_000

if len(chunked_list)>10_000:
Model.objects.bulk_create(chunked_list)

and check remain in list again at the end.

if len(chunked_list)>0:
Model.objects.bulk_create(chunked_list)
"""

names = [f"name-{num}" for num in range(100_000)]

with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
for name in names:
bulk.add(BulkyModel(name=name))

self.assertEqual(100_000, BulkyModel.objects.all().count())
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/33636>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 11, 2022, 9:55:47 AM4/11/22
to django-...@googlegroups.com
#33636: BulkProcessMixin on models.Model
-------------------------------------+-------------------------------------
Reporter: Myung Eui Yoon | Owner: Myung Eui
| Yoon
Type: New feature | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:

Keywords: bulk, model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Myung Eui Yoon):

* owner: nobody => Myung Eui Yoon
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:1>

Django

unread,
Apr 11, 2022, 10:07:00 AM4/11/22
to django-...@googlegroups.com
#33636: BulkProcessMixin on models.Model
-------------------------------------+-------------------------------------
Reporter: Myung Eui Yoon | Owner: Myung Eui
| Yoon
Type: New feature | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk, model | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Myung Eui Yoon):

* has_patch: 0 => 1


Old description:

> {{{
> """
> class BulkyModel(models.Model, ModelBulkProcessMixin):
> id = models.BigAutoField(primary_key=True)
> name = models.CharField(max_length=10, null=False)
>
> With ModelBulkProcessMixin , We could minimize memory usage.
> Without ModelBulkProcessMixin, We shold maintain bulk array size
> up to 100_000
> or manually maintain arraysize up to batch_size like 10_000
>
> if len(chunked_list)>10_000:
> Model.objects.bulk_create(chunked_list)
>
> and check remain in list again at the end.
>
> if len(chunked_list)>0:
> Model.objects.bulk_create(chunked_list)
> """
>
> names = [f"name-{num}" for num in range(100_000)]
>
> with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
> for name in names:
> bulk.add(BulkyModel(name=name))
>
> self.assertEqual(100_000, BulkyModel.objects.all().count())
> }}}

New description:

{{{
"""
class BulkyModel(models.Model, ModelBulkProcessMixin):
id = models.BigAutoField(primary_key=True)
name = models.CharField(max_length=10, null=False)

With ModelBulkProcessMixin , We could minimize memory usage.
Without ModelBulkProcessMixin, We shold maintain bulk array size
up to 100_000
or manually maintain arraysize up to batch_size like 10_000

if len(chunked_list)>10_000:
Model.objects.bulk_create(chunked_list)

and check remain in list again at the end.

if len(chunked_list)>0:
Model.objects.bulk_create(chunked_list)
"""

names = [f"name-{num}" for num in range(100_000)]

with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
for name in names:
bulk.add(BulkyModel(name=name))

self.assertEqual(100_000, BulkyModel.objects.all().count())
}}}

https://github.com/django/django/pull/15577

--

Comment:

Issued Pull Request
https://github.com/django/django/pull/15577

--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:2>

Django

unread,
Apr 11, 2022, 10:17:01 AM4/11/22
to django-...@googlegroups.com
#33636: BulkProcessMixin on models.Model
-------------------------------------+-------------------------------------
Reporter: Myung Eui Yoon | Owner: Myung Eui
| Yoon
Type: New feature | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk, model | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Myung Eui Yoon:

Old description:

> {{{
> """
> class BulkyModel(models.Model, ModelBulkProcessMixin):
> id = models.BigAutoField(primary_key=True)
> name = models.CharField(max_length=10, null=False)
>
> With ModelBulkProcessMixin , We could minimize memory usage.
> Without ModelBulkProcessMixin, We shold maintain bulk array size
> up to 100_000
> or manually maintain arraysize up to batch_size like 10_000
>
> if len(chunked_list)>10_000:
> Model.objects.bulk_create(chunked_list)
>
> and check remain in list again at the end.
>
> if len(chunked_list)>0:
> Model.objects.bulk_create(chunked_list)
> """
>
> names = [f"name-{num}" for num in range(100_000)]
>
> with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
> for name in names:
> bulk.add(BulkyModel(name=name))
>
> self.assertEqual(100_000, BulkyModel.objects.all().count())
> }}}
>

> https://github.com/django/django/pull/15577

New description:

{{{
class BulkyModel(models.Model, ModelBulkProcessMixin):
id = models.BigAutoField(primary_key=True)
name = models.CharField(max_length=10, null=False)

names = [f"name-{num}" for num in range(100_100_000)]

# Case : Raw bulk_create
objs = [BulkyModel(name=name) for name in names]
BulkyModel.objects.bulk_create(
objs
) # We should maintain big array size and this leades to OOM
error

# Case : Chunked bulk_create
objs = list()
for name in names:
obj = BulkyModel(name=name)
objs.append(obj)
if len(objs) > 10_1000:
BulkyModel.objects.bulk_create(objs)
objs.clear()
if len(objs) > 0:
BulkyModel.objects.bulk_create(objs)
objs.clear()

# Case : With ModelBulkProcessMixin


with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
for name in names:
bulk.add(BulkyModel(name=name))
}}}


https://github.com/django/django/pull/15577

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:3>

Django

unread,
Apr 11, 2022, 10:19:48 AM4/11/22
to django-...@googlegroups.com
#33636: BulkProcessMixin on models.Model
-------------------------------------+-------------------------------------
Reporter: Myung Eui Yoon | Owner: Myung Eui
| Yoon
Type: New feature | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk, model | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Myung Eui Yoon:

Old description:

> {{{
> class BulkyModel(models.Model, ModelBulkProcessMixin):
> id = models.BigAutoField(primary_key=True)
> name = models.CharField(max_length=10, null=False)
>

> names = [f"name-{num}" for num in range(100_100_000)]
>
> # Case : Raw bulk_create
> objs = [BulkyModel(name=name) for name in names]
> BulkyModel.objects.bulk_create(
> objs
> ) # We should maintain big array size and this leades to OOM
> error
>
> # Case : Chunked bulk_create
> objs = list()
> for name in names:
> obj = BulkyModel(name=name)
> objs.append(obj)
> if len(objs) > 10_1000:
> BulkyModel.objects.bulk_create(objs)
> objs.clear()
> if len(objs) > 0:
> BulkyModel.objects.bulk_create(objs)
> objs.clear()
>
> # Case : With ModelBulkProcessMixin
> with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
> for name in names:
> bulk.add(BulkyModel(name=name))
> }}}
>

> https://github.com/django/django/pull/15577

New description:

New Feature for convinient context manager for bulk create/update.
Just inherit MoModelBulkProcessMixin on Model class.


https://github.com/django/django/pull/15577

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:4>

Django

unread,
Apr 11, 2022, 10:52:48 AM4/11/22
to django-...@googlegroups.com
#33636: BulkProcessMixin on models.Model
-------------------------------------+-------------------------------------
Reporter: Myung Eui Yoon | Owner: Myung Eui
| Yoon
Type: New feature | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: bulk, model | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Myung Eui Yoon:

Old description:

> New Feature for convinient context manager for bulk create/update.

> https://github.com/django/django/pull/15577

New description:


https://github.com/django/django/pull/15578

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:5>

Django

unread,
Apr 11, 2022, 11:47:07 AM4/11/22
to django-...@googlegroups.com
#33636: BulkProcessMixin on models.Model
-------------------------------------+-------------------------------------
Reporter: Myung Eui Yoon | Owner: Myung Eui
| Yoon
Type: New feature | Status: closed

Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution: wontfix

Keywords: bulk, model | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: assigned => closed
* resolution: => wontfix


Comment:

Thanks for this patch, however I don't see how this (quite complicated)
implementation optimize creating objects in bulk in a significant way.
It's also not something that needs to be built into Django itself. It
sounds like a third-party package is the best way to proceed.

Please [https://docs.djangoproject.com/en/stable/internals/contributing
/triaging-tickets/#closing-tickets follow the triaging guidelines with
regards to wontfix tickets] and take the idea to DevelopersMailingList to
reach a wider audience and see what other think.

--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:6>

Reply all
Reply to author
Forward
0 new messages