Issue with json serializer is when using dumpdata/loaddata json
deserializer tries to load the whole file which causes MemoryError, but
with jsonl loaddata will read file line by line which so you won't see
memory overflow
note that I will add docs when I get the green light for this patch
--
Ticket URL: <https://code.djangoproject.com/ticket/30190>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* Attachment "patch.diff" added.
patch 1
* status: new => closed
* resolution: => needsinfo
Comment:
Hi. Thanks for the report.
This seems like a reasonable idea for a serializer. It also though seems
perfectly suited to be wrapped up as a third-party application. In general
we tend to prefer those, rather than increasing the surface area of the
code in the core framework. As such, we'd need to see if there was a
consensus on mailing list to add the code. I wouldn't be surprised if a
third-party app didn't already exist, so we should look into that too.
You'd need to add test cases as well as docs. I'd suggest putting that
together as a third-party app on GitHub say, and then seeing if there was
a willingness to move that into core, if that's what you want to do at
that point.
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:1>
Comment (by Claude Paroz):
See #22259 where you'll find a link to an existing third-party library.
Personally I would be open to integrate this into Django, as several
tickets were open in the past wrt memory issues with big files. However,
this should indeed be discussed on the django-developers mailing list
first.
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:2>
Comment (by Ian Foote):
I've been using the library linked in the other ticket with good success.
I'd be in favour of adding something to core.
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:3>
Comment (by Adam (Chainz) Johnson):
+1 to add to core from me too.
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:4>
Comment (by aliva):
Hi
I have started writing tests for jsonl, (I simply copied `test_json` to
`test_jsonl`) and trying to fix issues. there is a problem I'm facing,
when using jsonl output lines can be very long, so what is the preferred
way to pass flake for test data?
* should I add `NOQA: E501` for these lines?
* use this style
{{{
a = """{"key": value",""" \
""""key2": value"}"""
}}}
* use this style
{{{
a = """{
"key": "value"
"ley2: "value2",
}.replace("\n", "")
}}}
also I have added patch 2 which is a work in progress and some tests fail
as a sample here is a sample test data from test_json.py
{{{
mapping_ordering_str = """[
{
"model": "serializers.article",
"pk": %(article_pk)s,
"fields": {
"author": %(author_pk)s,
"headline": "Poker has no place on ESPN",
"pub_date": "2006-06-16T11:00:00",
"categories": [
%(first_category_pk)s,
%(second_category_pk)s
],
"meta_data": []
}
}
]
}}}
It should be like this for jsonl
{{{
mapping_ordering_str = """{"model": "serializers.article", "pk":
%(article_pk)s, fields": {"author": %(author_pk)s, "headline": "Poker has
no place on ESPN", pub_date": "2006-06-16T11:00:00", categories":
[%(first_category_pk)s, %(second_category_pk)s], "meta_data": []}}\n"""
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:5>
* Attachment "patch-2-wip.diff" added.
patch 2 WIP
Comment (by Francisco Javier):
The Django Core MUST solve his own things by himself, and not relaying to
a third party solution.
If dumpdata/loaddata is broken for very big datasets, the issue must be
addresed and solved as soon as posible.
In my particular case, I have a Sqlite3 DataBase of 800MB, and want to
migrate to PostgreSQL. The Dumpdata/Loaddata combo is the only straight
way to do it.
If the django-mljson (or the patch submited in the ticket) solves the
MemoryError of Loaddata in big datasets, Django-Core must integrate the
mljson (or the patch submited) as the default serializer/deserializer for
dumpdata/loaddata process.
Actually, my issue is solved by using django-mljson, but lost 2 days
figuring 'what the heck' was going wrong.
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:6>
* cc: Francisco Javier (added)
* type: New feature => Bug
* status: closed => new
* resolution: needsinfo =>
* needs_tests: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:7>
* status: new => closed
* resolution: => needsinfo
* type: Bug => New feature
Comment:
Please don't reopen closed ticket. This ticket should be discussed in the
DevelopersMailingList before reopening. We can reconsider it when we reach
a strong consensus on the mailing list.
Also please try to be more polite, your comment sounds like a demand. This
is not a good way to be heard.
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:8>
Comment (by aliva):
Hi!
I have talked about it on [https://groups.google.com/forum/#!topic/django-
developers/MMm1AGS2Ibg mailing list] and here is my
[https://code.djangoproject.com/ticket/30190 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:9>
* stage: Unreviewed => Accepted
Comment:
OK, thanks for updating the ticket here.
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:11>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:12>
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"e29637681be07606674cdccb47d1e53acb930f5b" e2963768]:
{{{
#!CommitTicketReference repository=""
revision="e29637681be07606674cdccb47d1e53acb930f5b"
Fixed #30190 -- Added JSONL serializer.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:13>
Comment (by GitHub <noreply@…>):
In [changeset:"78c811334c9e5477b86fd113fa2c4a261e167d15" 78c81133]:
{{{
#!CommitTicketReference repository=""
revision="78c811334c9e5477b86fd113fa2c4a261e167d15"
Refs #30190 -- Minor edits to JSONL serializer.
Follow up to e29637681be07606674cdccb47d1e53acb930f5b.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/30190#comment:14>