[Django] #25401: django.utils.html.strip_tags can insert spurious semicolons

2 views
Skip to first unread message

Django

unread,
Sep 14, 2015, 1:01:29 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+--------------------
Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: new
Component: Utilities | Version: 1.8
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------+--------------------
In limited circumstances, strip_tags mangles legitimate text, inserting a
semicolon before underscores.

{{{
from django.utils.html import strip_tags

# Good
strip_tags("&first_name")
>>> '&first_name'

# Good
strip_tags("first_name<br>")
>>> u'first_name'

# Bad: semicolon introduced before underscore
strip_tags("&first_name<br>")
>>> u'&first;_name'

}}}

Our use-case is allowing rich emails to be drafted using Markdown;
completely safe Markdown urls with query strings can get mangled with this
bug.

--
Ticket URL: <https://code.djangoproject.com/ticket/25401>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 14, 2015, 1:08:26 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+------------------------------------

Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: new
Component: Utilities | Version: 1.8
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------+------------------------------------
Changes (by bak1an):

* needs_better_patch: => 0
* stage: Unreviewed => Accepted
* needs_tests: => 0
* needs_docs: => 0


Comment:

{{{
>>> import django
>>> from django.utils.html import strip_tags


>>> strip_tags("&first_name<br>")
u'&first;_name'

>>> django.get_version()
'1.9.dev20150914162508'
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25401#comment:1>

Django

unread,
Sep 14, 2015, 1:09:39 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+------------------------------------

Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: new
Component: Utilities | Version: 1.8
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------+------------------------------------

Comment (by bak1an):

Can be reproduced on 1.8.x as well.

--
Ticket URL: <https://code.djangoproject.com/ticket/25401#comment:2>

Django

unread,
Sep 14, 2015, 1:18:46 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+------------------------------------

Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: new
Component: Utilities | Version: 1.8
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------+------------------------------------

Comment (by timgraham):

I haven't looked into this in detail, but I'm not sure this is something
we should try to fix. It seems to me the original string isn't valid HTML
(the ampersand isn't properly escaped).

--
Ticket URL: <https://code.djangoproject.com/ticket/25401#comment:3>

Django

unread,
Sep 14, 2015, 2:47:56 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+------------------------------------
Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: closed
Component: Utilities | Version: 1.8
Severity: Normal | Resolution: wontfix

Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------+------------------------------------
Changes (by claudep):

* status: new => closed
* resolution: => wontfix


Comment:

strip_tags documentation is now pointing to the `bleach` Python lib for a
"more robust solution".
{{{
>>> import bleach
>>> bleach.clean("&first_name<br>", strip=True)
u'&amp;first_name'
}}}

If you have a not-too-hairy patch which would improve `strip_tags`, it
might be accepted (reopen in that case), but we are not pursuing a perfect
output for this utility.

--
Ticket URL: <https://code.djangoproject.com/ticket/25401#comment:4>

Django

unread,
Sep 14, 2015, 2:51:37 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+------------------------------------
Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: closed
Component: Utilities | Version: 1.8

Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------+------------------------------------

Comment (by bak1an):

Indeed, failing example is not a valid html piece so there is not much
sense in trying to guarantee some sort of valid behaviour with cases like
this.

The results are coming directly python's HTMLParser. In this particular
case it recognizes '&first' to be the character reference. In the first
case ({{{"&first_name"}}}) it processes the entire string as plain text
data without trying to parse it because there are no tags at all.

I'm closing this ticket as invalid since after a closer review it does not
look like something that should be addressed within Django.

jbaldivieso, you can try using {{{bleach}}} library
(http://bleach.readthedocs.org/en/latest/), using its {{{clean}}} and
{{{linkify}}} functions combination it might be possible to resolve your
markdown processing issues.
You can also (on your own risk) try to hack django's
{{{MLStripper.handle_entityref}}}
(https://github.com/django/django/blob/master/django/utils/html.py#L135).


Of course you can reopen this in case you have use cases where django's
{{{strip_tags}}} misbehaves with valid html data.

--
Ticket URL: <https://code.djangoproject.com/ticket/25401#comment:5>

Django

unread,
Sep 14, 2015, 2:52:20 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+------------------------------------
Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: closed
Component: Utilities | Version: 1.8

Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------+------------------------------------

Comment (by bak1an):

claudep has faster typing skills.

--
Ticket URL: <https://code.djangoproject.com/ticket/25401#comment:6>

Django

unread,
Sep 14, 2015, 3:22:38 PM9/14/15
to django-...@googlegroups.com
#25401: django.utils.html.strip_tags can insert spurious semicolons
-----------------------------+------------------------------------
Reporter: jbaldivieso | Owner: nobody
Type: Bug | Status: closed
Component: Utilities | Version: 1.8

Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-----------------------------+------------------------------------

Comment (by claudep):

At least, we concur :-)

--
Ticket URL: <https://code.djangoproject.com/ticket/25401#comment:7>

Reply all
Reply to author
Forward
0 new messages