With ModelBulkProcessMixin , We could minimize memory usage.
Without ModelBulkProcessMixin, We shold maintain bulk array size
up to 100_000
or manually maintain arraysize up to batch_size like 10_000
if len(chunked_list)>10_000:
Model.objects.bulk_create(chunked_list)
and check remain in list again at the end.
if len(chunked_list)>0:
Model.objects.bulk_create(chunked_list)
"""
names = [f"name-{num}" for num in range(100_000)]
with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
for name in names:
bulk.add(BulkyModel(name=name))
self.assertEqual(100_000, BulkyModel.objects.all().count())
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/33636>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* owner: nobody => Myung Eui Yoon
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:1>
* has_patch: 0 => 1
Old description:
> {{{
> """
> class BulkyModel(models.Model, ModelBulkProcessMixin):
> id = models.BigAutoField(primary_key=True)
> name = models.CharField(max_length=10, null=False)
>
> With ModelBulkProcessMixin , We could minimize memory usage.
> Without ModelBulkProcessMixin, We shold maintain bulk array size
> up to 100_000
> or manually maintain arraysize up to batch_size like 10_000
>
> if len(chunked_list)>10_000:
> Model.objects.bulk_create(chunked_list)
>
> and check remain in list again at the end.
>
> if len(chunked_list)>0:
> Model.objects.bulk_create(chunked_list)
> """
>
> names = [f"name-{num}" for num in range(100_000)]
>
> with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
> for name in names:
> bulk.add(BulkyModel(name=name))
>
> self.assertEqual(100_000, BulkyModel.objects.all().count())
> }}}
New description:
{{{
"""
class BulkyModel(models.Model, ModelBulkProcessMixin):
id = models.BigAutoField(primary_key=True)
name = models.CharField(max_length=10, null=False)
With ModelBulkProcessMixin , We could minimize memory usage.
Without ModelBulkProcessMixin, We shold maintain bulk array size
up to 100_000
or manually maintain arraysize up to batch_size like 10_000
if len(chunked_list)>10_000:
Model.objects.bulk_create(chunked_list)
and check remain in list again at the end.
if len(chunked_list)>0:
Model.objects.bulk_create(chunked_list)
"""
names = [f"name-{num}" for num in range(100_000)]
with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
for name in names:
bulk.add(BulkyModel(name=name))
self.assertEqual(100_000, BulkyModel.objects.all().count())
}}}
https://github.com/django/django/pull/15577
--
Comment:
Issued Pull Request
https://github.com/django/django/pull/15577
--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:2>
Old description:
> {{{
> """
> class BulkyModel(models.Model, ModelBulkProcessMixin):
> id = models.BigAutoField(primary_key=True)
> name = models.CharField(max_length=10, null=False)
>
> With ModelBulkProcessMixin , We could minimize memory usage.
> Without ModelBulkProcessMixin, We shold maintain bulk array size
> up to 100_000
> or manually maintain arraysize up to batch_size like 10_000
>
> if len(chunked_list)>10_000:
> Model.objects.bulk_create(chunked_list)
>
> and check remain in list again at the end.
>
> if len(chunked_list)>0:
> Model.objects.bulk_create(chunked_list)
> """
>
> names = [f"name-{num}" for num in range(100_000)]
>
> with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
> for name in names:
> bulk.add(BulkyModel(name=name))
>
> self.assertEqual(100_000, BulkyModel.objects.all().count())
> }}}
>
> https://github.com/django/django/pull/15577
New description:
{{{
class BulkyModel(models.Model, ModelBulkProcessMixin):
id = models.BigAutoField(primary_key=True)
name = models.CharField(max_length=10, null=False)
names = [f"name-{num}" for num in range(100_100_000)]
# Case : Raw bulk_create
objs = [BulkyModel(name=name) for name in names]
BulkyModel.objects.bulk_create(
objs
) # We should maintain big array size and this leades to OOM
error
# Case : Chunked bulk_create
objs = list()
for name in names:
obj = BulkyModel(name=name)
objs.append(obj)
if len(objs) > 10_1000:
BulkyModel.objects.bulk_create(objs)
objs.clear()
if len(objs) > 0:
BulkyModel.objects.bulk_create(objs)
objs.clear()
# Case : With ModelBulkProcessMixin
with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
for name in names:
bulk.add(BulkyModel(name=name))
}}}
https://github.com/django/django/pull/15577
--
--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:3>
Old description:
> {{{
> class BulkyModel(models.Model, ModelBulkProcessMixin):
> id = models.BigAutoField(primary_key=True)
> name = models.CharField(max_length=10, null=False)
>
> names = [f"name-{num}" for num in range(100_100_000)]
>
> # Case : Raw bulk_create
> objs = [BulkyModel(name=name) for name in names]
> BulkyModel.objects.bulk_create(
> objs
> ) # We should maintain big array size and this leades to OOM
> error
>
> # Case : Chunked bulk_create
> objs = list()
> for name in names:
> obj = BulkyModel(name=name)
> objs.append(obj)
> if len(objs) > 10_1000:
> BulkyModel.objects.bulk_create(objs)
> objs.clear()
> if len(objs) > 0:
> BulkyModel.objects.bulk_create(objs)
> objs.clear()
>
> # Case : With ModelBulkProcessMixin
> with BulkyModel.gen_bulk_create(batch_size=10_000) as bulk:
> for name in names:
> bulk.add(BulkyModel(name=name))
> }}}
>
> https://github.com/django/django/pull/15577
New description:
New Feature for convinient context manager for bulk create/update.
Just inherit MoModelBulkProcessMixin on Model class.
https://github.com/django/django/pull/15577
--
--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:4>
Old description:
> New Feature for convinient context manager for bulk create/update.
> https://github.com/django/django/pull/15577
New description:
https://github.com/django/django/pull/15578
--
--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:5>
* status: assigned => closed
* resolution: => wontfix
Comment:
Thanks for this patch, however I don't see how this (quite complicated)
implementation optimize creating objects in bulk in a significant way.
It's also not something that needs to be built into Django itself. It
sounds like a third-party package is the best way to proceed.
Please [https://docs.djangoproject.com/en/stable/internals/contributing
/triaging-tickets/#closing-tickets follow the triaging guidelines with
regards to wontfix tickets] and take the idea to DevelopersMailingList to
reach a wider audience and see what other think.
--
Ticket URL: <https://code.djangoproject.com/ticket/33636#comment:6>