[Django] #24985: Rss201rev2Feed invalid characters in character data for RSS

7 views
Skip to first unread message

Django

unread,
Jun 15, 2015, 12:20:58 PM6/15/15
to django-...@googlegroups.com
#24985: Rss201rev2Feed invalid characters in character data for RSS
-------------------------------------+--------------------
Reporter: michaelgwood | Owner: nobody
Type: Bug | Status: new
Component: contrib.syndication | Version: 1.7
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+--------------------
I have some data which comes from log files that I'd like to put into a
RSS feed, unfortunately due to the nature of this data it sometimes
contains control characters e.g. \0001 \0003 , this causes it to fail RSS
feed reader validation due to these characters (although valid utf-8) are
not allowed [1].

I'm not sure if this is something that should be fixed in this module,
perhaps in sax/saxutils or in somewhere like django.utils.encoding
force_text ?

At the moment I'm working around this issue with a regex which replaces
this range of chars.

[1] http://www.w3.org/TR/REC-xml/#charsets

--
Ticket URL: <https://code.djangoproject.com/ticket/24985>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 15, 2015, 12:21:57 PM6/15/15
to django-...@googlegroups.com
#24985: Rss201rev2Feed invalid characters in character data for RSS
-------------------------------------+-------------------------------------

Reporter: michaelgwood | Owner: nobody
Type: Bug | Status: new
Component: contrib.syndication | Version: 1.7
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by michaelgwood):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Old description:

> I have some data which comes from log files that I'd like to put into a
> RSS feed, unfortunately due to the nature of this data it sometimes
> contains control characters e.g. \0001 \0003 , this causes it to fail
> RSS feed reader validation due to these characters (although valid utf-8)
> are not allowed [1].
>
> I'm not sure if this is something that should be fixed in this module,
> perhaps in sax/saxutils or in somewhere like django.utils.encoding
> force_text ?
>
> At the moment I'm working around this issue with a regex which replaces
> this range of chars.
>
> [1] http://www.w3.org/TR/REC-xml/#charsets

New description:

I have some data which comes from log files that I'd like to put into a
RSS feed, unfortunately due to the nature of this data it sometimes
contains control characters e.g. \0001 \0003 , this causes it to fail RSS
feed reader validation due to these characters (although valid utf-8) are

not allowed (1).

I'm not sure if this is something that should be fixed in this module,
perhaps in sax/saxutils or in somewhere like django.utils.encoding
force_text ?

At the moment I'm working around this issue with a regex which replaces
this range of chars.

(1) http://www.w3.org/TR/REC-xml/#charsets

--

--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:1>

Django

unread,
Jun 15, 2015, 12:46:32 PM6/15/15
to django-...@googlegroups.com
#24985: Provide a way to santize invalid characters from Rss201rev2Feed
-------------------------------------+------------------------------------
Reporter: michaelgwood | Owner: nobody
Type: New feature | Status: new
Component: contrib.syndication | Version: 1.7
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+------------------------------------
Changes (by timgraham):

* stage: Unreviewed => Accepted
* type: Bug => New feature


Comment:

We could look and see if other web frameworks perform sanitization or make
alternate recommendations. If we don't make a change in Django, we could
at least update the docs to note that requirement of sanitizing your own
input and make a recommendation of how to do so.

--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:2>

Django

unread,
Jun 15, 2015, 2:14:34 PM6/15/15
to django-...@googlegroups.com
#24985: Provide a way to sanitize invalid characters from Rss201rev2Feed
-------------------------------------+------------------------------------
Reporter: michaelgwood | Owner: nobody
Type: New feature | Status: new
Component: contrib.syndication | Version: 1.7

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+------------------------------------

--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:3>

Django

unread,
Jun 19, 2015, 3:13:19 AM6/19/15
to django-...@googlegroups.com
#24985: Provide a way to sanitize invalid characters from Rss201rev2Feed
-------------------------------------+------------------------------------
Reporter: michaelgwood | Owner: nobody
Type: New feature | Status: new
Component: contrib.syndication | Version: 1.7

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+------------------------------------

Comment (by claudep):

#20197 is similar but targets XML serialization with `dumpdata`. I just
added a patch in that ticket to loudly fail instead of silently producing
invalid XML. Automatic sanitation is tricky, because depending on the use
case, you might want to remove the offending chars, replace them with some
alternative coding, or simply fix the source.

The patch for #20197 also affects RSS production, as the same
`django.utils.xmlutils.SimplerXMLGenerator` is used. If it gets committed,
we might want to add a similar admonition in syndication docs.

--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:4>

Django

unread,
Jun 19, 2015, 5:53:25 PM6/19/15
to django-...@googlegroups.com
#24985: Provide a way to sanitize invalid characters from Rss201rev2Feed
-------------------------------------+------------------------------------
Reporter: michaelgwood | Owner: nobody
Type: New feature | Status: new
Component: contrib.syndication | Version: 1.7

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+------------------------------------

Comment (by claudep):

Proposal for a documentation addition:

{{{
#!diff
diff --git a/docs/ref/contrib/syndication.txt
b/docs/ref/contrib/syndication.txt
index 6c86be0..940123c 100644
--- a/docs/ref/contrib/syndication.txt
+++ b/docs/ref/contrib/syndication.txt
@@ -919,7 +919,10 @@ They share this interface:
``self.feed`` for use with `custom feed generators`_.

All parameters should be Unicode objects, except ``categories``,
which
- should be a sequence of Unicode objects.
+ should be a sequence of Unicode objects. Beware that some control
characters
+ are `not allowed <http://www.w3.org/International/questions/qa-
controls>`_
+ in XML documents. If your content has some of them, you might
encounter a
+ :exp:`ValueError` when producing the feed.

:meth:`.SyndicationFeed.add_item`
Add an item to the feed with the given parameters.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:5>

Django

unread,
Jun 20, 2015, 7:53:05 PM6/20/15
to django-...@googlegroups.com
#24985: Warn about invalid RSS characters in syndication docs
-------------------------------------+-------------------------------------
Reporter: michaelgwood | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: 1.7
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* type: New feature => Cleanup/optimization
* stage: Accepted => Ready for checkin
* component: contrib.syndication => Documentation


Comment:

exp -> exc, otherwise looks good.

--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:6>

Django

unread,
Jun 21, 2015, 2:55:09 PM6/21/15
to django-...@googlegroups.com
#24985: Warn about invalid RSS characters in syndication docs
-------------------------------------+-------------------------------------
Reporter: michaelgwood | Owner: nobody
Type: | Status: closed

Cleanup/optimization |
Component: Documentation | Version: 1.7
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"1c90a3dccadc7d2da3704ff17ac9ff1a67743934" 1c90a3dc]:
{{{
#!CommitTicketReference repository=""
revision="1c90a3dccadc7d2da3704ff17ac9ff1a67743934"
Fixed #24985 -- Added note about possible invalid feed content

Thanks Michael Wood for the report and Tim Graham for the review.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24985#comment:7>

Reply all
Reply to author
Forward
0 new messages