[Django] #30190: json lines (jsonl) serializer

45 views
Skip to first unread message

Django

unread,
Feb 17, 2019, 8:37:40 AM2/17/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
------------------------------------------------+------------------------
Reporter: aliva | Owner: nobody
Type: New feature | Status: new
Component: Core (Serialization) | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 1
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------------+------------------------
I have create a patch to add jsonl serializer/deserializer to json,

Issue with json serializer is when using dumpdata/loaddata json
deserializer tries to load the whole file which causes MemoryError, but
with jsonl loaddata will read file line by line which so you won't see
memory overflow

note that I will add docs when I get the green light for this patch

--
Ticket URL: <https://code.djangoproject.com/ticket/30190>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 17, 2019, 8:38:47 AM2/17/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------

Reporter: aliva | Owner: nobody
Type: New feature | Status: new
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:

Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by aliva):

* Attachment "patch.diff" added.

patch 1

Django

unread,
Feb 18, 2019, 4:57:10 AM2/18/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody
Type: New feature | Status: closed

Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo

Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* status: new => closed
* resolution: => needsinfo


Comment:

Hi. Thanks for the report.

This seems like a reasonable idea for a serializer. It also though seems
perfectly suited to be wrapped up as a third-party application. In general
we tend to prefer those, rather than increasing the surface area of the
code in the core framework. As such, we'd need to see if there was a
consensus on mailing list to add the code. I wouldn't be surprised if a
third-party app didn't already exist, so we should look into that too.

You'd need to add test cases as well as docs. I'd suggest putting that
together as a third-party app on GitHub say, and then seeing if there was
a willingness to move that into core, if that's what you want to do at
that point.

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:1>

Django

unread,
Feb 18, 2019, 11:59:50 AM2/18/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

See #22259 where you'll find a link to an existing third-party library.

Personally I would be open to integrate this into Django, as several
tickets were open in the past wrt memory issues with big files. However,
this should indeed be discussed on the django-developers mailing list
first.

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:2>

Django

unread,
Mar 5, 2019, 3:07:03 AM3/5/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Ian Foote):

I've been using the library linked in the other ticket with good success.
I'd be in favour of adding something to core.

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:3>

Django

unread,
Mar 5, 2019, 11:51:50 AM3/5/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Adam (Chainz) Johnson):

+1 to add to core from me too.

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:4>

Django

unread,
Mar 6, 2019, 5:47:36 AM3/6/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aliva):

Hi

I have started writing tests for jsonl, (I simply copied `test_json` to
`test_jsonl`) and trying to fix issues. there is a problem I'm facing,
when using jsonl output lines can be very long, so what is the preferred
way to pass flake for test data?

* should I add `NOQA: E501` for these lines?
* use this style

{{{
a = """{"key": value",""" \
""""key2": value"}"""
}}}

* use this style

{{{
a = """{
"key": "value"
"ley2: "value2",
}.replace("\n", "")
}}}

also I have added patch 2 which is a work in progress and some tests fail


as a sample here is a sample test data from test_json.py

{{{
mapping_ordering_str = """[
{
"model": "serializers.article",
"pk": %(article_pk)s,
"fields": {
"author": %(author_pk)s,
"headline": "Poker has no place on ESPN",
"pub_date": "2006-06-16T11:00:00",
"categories": [
%(first_category_pk)s,
%(second_category_pk)s
],
"meta_data": []
}
}
]
}}}

It should be like this for jsonl

{{{
mapping_ordering_str = """{"model": "serializers.article", "pk":
%(article_pk)s, fields": {"author": %(author_pk)s, "headline": "Poker has
no place on ESPN", pub_date": "2006-06-16T11:00:00", categories":
[%(first_category_pk)s, %(second_category_pk)s], "meta_data": []}}\n"""
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:5>

Django

unread,
Mar 6, 2019, 5:48:21 AM3/6/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by aliva):

* Attachment "patch-2-wip.diff" added.

patch 2 WIP

Django

unread,
Nov 26, 2019, 3:00:42 AM11/26/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Francisco Javier):

The Django Core MUST solve his own things by himself, and not relaying to
a third party solution.

If dumpdata/loaddata is broken for very big datasets, the issue must be
addresed and solved as soon as posible.

In my particular case, I have a Sqlite3 DataBase of 800MB, and want to
migrate to PostgreSQL. The Dumpdata/Loaddata combo is the only straight
way to do it.

If the django-mljson (or the patch submited in the ticket) solves the
MemoryError of Loaddata in big datasets, Django-Core must integrate the
mljson (or the patch submited) as the default serializer/deserializer for
dumpdata/loaddata process.

Actually, my issue is solved by using django-mljson, but lost 2 days
figuring 'what the heck' was going wrong.

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:6>

Django

unread,
Nov 26, 2019, 3:02:17 AM11/26/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody
Type: Bug | Status: new

Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Francisco Javier):

* cc: Francisco Javier (added)
* type: New feature => Bug
* status: closed => new
* resolution: needsinfo =>
* needs_tests: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:7>

Django

unread,
Nov 26, 2019, 3:17:30 AM11/26/19
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody
Type: New feature | Status: closed

Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo

Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* status: new => closed
* resolution: => needsinfo

* type: Bug => New feature


Comment:

Please don't reopen closed ticket. This ticket should be discussed in the
DevelopersMailingList before reopening. We can reconsider it when we reach
a strong consensus on the mailing list.

Also please try to be more polite, your comment sounds like a demand. This
is not a good way to be heard.

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:8>

Django

unread,
Jan 16, 2020, 10:54:53 AM1/16/20
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by aliva):

Hi!

I have talked about it on [https://groups.google.com/forum/#!topic/django-
developers/MMm1AGS2Ibg mailing list] and here is my
[https://code.djangoproject.com/ticket/30190 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:9>

Django

unread,
May 26, 2020, 6:17:07 AM5/26/20
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
--------------------------------------+------------------------------------

Reporter: aliva | Owner: nobody
Type: New feature | Status: new
Component: Core (Serialization) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted


Comment:

OK, thanks for updating the ticket here.

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:11>

Django

unread,
Jun 16, 2020, 10:42:48 AM6/16/20
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------

Reporter: aliva | Owner: nobody
Type: New feature | Status: new
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:12>

Django

unread,
Jun 16, 2020, 10:52:18 AM6/16/20
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody
Type: New feature | Status: closed

Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by GitHub <noreply@…>):

* status: new => closed

* resolution: => fixed


Comment:

In [changeset:"e29637681be07606674cdccb47d1e53acb930f5b" e2963768]:
{{{
#!CommitTicketReference repository=""
revision="e29637681be07606674cdccb47d1e53acb930f5b"
Fixed #30190 -- Added JSONL serializer.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:13>

Django

unread,
Jun 17, 2020, 1:59:57 AM6/17/20
to django-...@googlegroups.com
#30190: json lines (jsonl) serializer
-------------------------------------+-------------------------------------
Reporter: aliva | Owner: nobody

Type: New feature | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by GitHub <noreply@…>):

In [changeset:"78c811334c9e5477b86fd113fa2c4a261e167d15" 78c81133]:
{{{
#!CommitTicketReference repository=""
revision="78c811334c9e5477b86fd113fa2c4a261e167d15"
Refs #30190 -- Minor edits to JSONL serializer.

Follow up to e29637681be07606674cdccb47d1e53acb930f5b.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:14>

Reply all
Reply to author
Forward
0 new messages