[Django] #22088: XML deserializer strips leading whitespace on loaddata

13 views
Skip to first unread message

Django

unread,
Feb 18, 2014, 5:29:24 PM2/18/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
--------------------------------------+---------------------------------
Reporter: Joseph-django@… | Owner: nobody
Type: Bug | Status: new
Component: Core (Serialization) | Version: 1.6
Severity: Normal | Keywords: xml deserialization
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+---------------------------------
If an object instance has a character field and the value of that field
starts with the tab character, loaddata removes that tab character when
the loaded fixture is in XML format.

Note that the XML dump data does not strip this leading tab character.
Also note that both the JSON dump and load data preserve the tab
character.


I have not tested this with other whitespace characters. This can be
easily reproduced by creating a simple model:


{{{
class Foobar(models.model)
name = models.CharField(max_length=20)
}}}

And then creating an instance of that model (e.g, in Django Shell) with a
name value of, e.g, `"\tBaz"` and then using the manage.py dumpdata with
`--format=xml`.

Once the fixture has been generated, remove the existing instance (either
by deleting it, or flushing the app data, or your preferred method) and
then using manage.py loaddata to load the fixture. Note the instance's
name no longer contains the tab character.

--
Ticket URL: <https://code.djangoproject.com/ticket/22088>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 1, 2014, 5:21:50 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage:
Keywords: xml deserialization | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by numerodix):

* status: new => assigned
* cc: numerodix@… (added)
* needs_better_patch: => 0
* needs_tests: => 0
* owner: nobody => numerodix
* needs_docs: => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:1>

Django

unread,
Mar 1, 2014, 5:28:54 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage:
Keywords: xml deserialization | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by numerodix):

I can reproduce it.

--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:2>

Django

unread,
Mar 1, 2014, 5:42:20 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage:
Keywords: xml deserialization | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by numerodix):

Trying with a few special characters:

{{{#!python
Foobar.objects.create(name=' bar')
Foobar.objects.create(name='\abar')
Foobar.objects.create(name='\bbar')
Foobar.objects.create(name='\fbar')
Foobar.objects.create(name='\nbar')
Foobar.objects.create(name='\rbar')
Foobar.objects.create(name='\tbar')
Foobar.objects.create(name='\vbar')
}}}

XML pretty printer corrupts this completely:

{{{#!xml
<object pk="9" model="app.foobar">
<field type="CharField" name="name"> bar</field>
</object>
<object pk="10" model="app.foobar">
<field type="CharField" name="name">bar</field>
</object>
<object pk="11" model="app.foobar">
<field type="CharField" name="name"bar</field>
</object>
<object pk="12" model="app.foobar">
<field type="CharField" name="name">
bar</field>
</object>
<object pk="13" model="app.foobar">
<field type="CharField" name="name">
bar</field>
</object>
<object pk="14" model="app.foobar">
bar</field>eld type="CharField" name="name">
</object>
<object pk="15" model="app.foobar">
<field type="CharField" name="name"> bar</field>
</object>
<object pk="16" model="app.foobar">
<field type="CharField" name="name">
bar</field>
</object>
}}}

In terse mode it's more likely to be correct when loaded again, but
clearly this needs fixing:

{{{#!xml
<object pk="9" model="app.foobar"><field type="CharField" name="name">
bar</field></object><object pk="10" model="app.foobar"><field
type="CharField" name="name">bar</field></object><object pk="11"
model="app.foobar"><field type="CharField"
name="name"bar</field></object><object pk="12" model="app.foobar"><field
type="CharField" name="name">
bar</field></object><object pk="13"
model="app.foobar"><field type="CharField" name="name">
bar</field></object><object pk="15" model="app.foobar"><field
type="CharField" name="name"> bar</field></object><object pk="16"
model="app.foobar"><field type="CharField" name="name">
bar</field></object>
}}}

So that's dumpdata at fault, not loaddata.

--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:3>

Django

unread,
Mar 1, 2014, 6:04:12 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage:
Keywords: xml deserialization | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by numerodix):

Another experiment: hand edit the xml dump file using html escapes (1) to
see if loaddata will load it correctly. No, also doesn't work.

So:
Problem 1: The xml serializers (SimplerXMLGenerator/pulldom) do not round
trip these characters.
Problem 2: Even if they did a tab character would be stripped due to:

{{{#!python
value = field.to_python(getInnerText(field_node).strip())
}}}

core.serializers.xml_serliazer.py:214

(1) http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-
users/200406.mbox/%3C40C5E61...@hotmail.com%3E

--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:4>

Django

unread,
Mar 1, 2014, 6:22:57 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: xml deserialization | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by EvilDMP):

* stage: Unreviewed => Accepted


Comment:

Replying to [comment:3 numerodix]:

> {{{#!xml


> <object pk="11" model="app.foobar">
> <field type="CharField" name="name"bar</field>
> </object>>
}}}

Wow. If that's what you're getting in the dumped text file, that's
remarkable.

--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:5>

Django

unread,
Mar 1, 2014, 7:14:27 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: xml deserialization | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by numerodix):

What we could do is try to wrap any special characters in a CDATA section.
So the xml would look like this:

{{{#!xml
<object pk="2" model="app.foobar"><field type="CharField"
name="name"><![CDATA[\t]]>bar</field></object>
}}}

The deserializer then gives us the tab character escaped:

{{{#!xml
u'\\tbar'
}}}

So we'd have to strip the escape.

But this feels very ad-hoc and messy.

--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:6>

Django

unread,
Mar 1, 2014, 8:35:16 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: xml deserialization | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by smeatonj):

Shouldn't all straight up text (CharField/TextField) be wrapped in a CDATA
though? Then there's no need to look for special characters at all. Are
there any downsides to wrapping all text values in a CDATA? Unsure why the
deserialiser would escape the tab though (I haven't looked into it), but
that is sort of a separate - related - issue.

--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:7>

Django

unread,
Mar 1, 2014, 9:21:37 AM3/1/14
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
-------------------------------------+-------------------------------------
Reporter: Joseph-django@… | Owner: numerodix
Type: Bug | Status: assigned
Component: Core | Version: 1.6
(Serialization) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: xml deserialization | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by numerodix):

@smeaton, I think it probably should, yes. The downside would be that
you're adding 12 bytes to every value even though most strings would not
need it.

One could optimize that by wrapping only strings that need to be wrapped,
according to a character range or something like that.

This serializer is also used by the syndication feed framework btw. It
would be nice to fix the problem in both places at once.

--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:8>

Django

unread,
Feb 28, 2021, 10:30:36 AM2/28/21
to django-...@googlegroups.com
#22088: XML deserializer strips leading whitespace on loaddata
--------------------------------------+------------------------------------
Reporter: Joseph-django@… | Owner: (none)

Type: Bug | Status: new
Component: Core (Serialization) | Version: 1.6
Severity: Normal | Resolution:
Keywords: xml deserialization | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Jacob Walls):

* owner: Martin Matusiak => (none)
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/22088#comment:9>

Reply all
Reply to author
Forward
0 new messages