I'm not sure if this is something that should be fixed in this module,
perhaps in sax/saxutils or in somewhere like django.utils.encoding
force_text ?
At the moment I'm working around this issue with a regex which replaces
this range of chars.
[1] http://www.w3.org/TR/REC-xml/#charsets
--
Ticket URL: <https://code.djangoproject.com/ticket/24985>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Old description:
> I have some data which comes from log files that I'd like to put into a
> RSS feed, unfortunately due to the nature of this data it sometimes
> contains control characters e.g. \0001 \0003 , this causes it to fail
> RSS feed reader validation due to these characters (although valid utf-8)
> are not allowed [1].
>
> I'm not sure if this is something that should be fixed in this module,
> perhaps in sax/saxutils or in somewhere like django.utils.encoding
> force_text ?
>
> At the moment I'm working around this issue with a regex which replaces
> this range of chars.
>
> [1] http://www.w3.org/TR/REC-xml/#charsets
New description:
I have some data which comes from log files that I'd like to put into a
RSS feed, unfortunately due to the nature of this data it sometimes
contains control characters e.g. \0001 \0003 , this causes it to fail RSS
feed reader validation due to these characters (although valid utf-8) are
not allowed (1).
I'm not sure if this is something that should be fixed in this module,
perhaps in sax/saxutils or in somewhere like django.utils.encoding
force_text ?
At the moment I'm working around this issue with a regex which replaces
this range of chars.
(1) http://www.w3.org/TR/REC-xml/#charsets
--
--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:1>
* stage: Unreviewed => Accepted
* type: Bug => New feature
Comment:
We could look and see if other web frameworks perform sanitization or make
alternate recommendations. If we don't make a change in Django, we could
at least update the docs to note that requirement of sanitizing your own
input and make a recommendation of how to do so.
--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:2>
--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:3>
Comment (by claudep):
#20197 is similar but targets XML serialization with `dumpdata`. I just
added a patch in that ticket to loudly fail instead of silently producing
invalid XML. Automatic sanitation is tricky, because depending on the use
case, you might want to remove the offending chars, replace them with some
alternative coding, or simply fix the source.
The patch for #20197 also affects RSS production, as the same
`django.utils.xmlutils.SimplerXMLGenerator` is used. If it gets committed,
we might want to add a similar admonition in syndication docs.
--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:4>
Comment (by claudep):
Proposal for a documentation addition:
{{{
#!diff
diff --git a/docs/ref/contrib/syndication.txt
b/docs/ref/contrib/syndication.txt
index 6c86be0..940123c 100644
--- a/docs/ref/contrib/syndication.txt
+++ b/docs/ref/contrib/syndication.txt
@@ -919,7 +919,10 @@ They share this interface:
``self.feed`` for use with `custom feed generators`_.
All parameters should be Unicode objects, except ``categories``,
which
- should be a sequence of Unicode objects.
+ should be a sequence of Unicode objects. Beware that some control
characters
+ are `not allowed <http://www.w3.org/International/questions/qa-
controls>`_
+ in XML documents. If your content has some of them, you might
encounter a
+ :exp:`ValueError` when producing the feed.
:meth:`.SyndicationFeed.add_item`
Add an item to the feed with the given parameters.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:5>
* type: New feature => Cleanup/optimization
* stage: Accepted => Ready for checkin
* component: contrib.syndication => Documentation
Comment:
exp -> exc, otherwise looks good.
--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:6>
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"1c90a3dccadc7d2da3704ff17ac9ff1a67743934" 1c90a3dc]:
{{{
#!CommitTicketReference repository=""
revision="1c90a3dccadc7d2da3704ff17ac9ff1a67743934"
Fixed #24985 -- Added note about possible invalid feed content
Thanks Michael Wood for the report and Tim Graham for the review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:7>