[Django] #19237: The use of > in single or double quoted attributes in strip_tags

17 views
Skip to first unread message

Django

unread,
Nov 4, 2012, 1:37:38 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+--------------------
Reporter: chris.khoo@… | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 1
Easy pickings: 1 | UI/UX: 0
-------------------------------+--------------------
At the moment, if you run django.utils.html.strip_tags with '>' in single
or double quoted attributes, it won't work.

For example:
In [2]: strip_tags('Some <p
onclick="$(\'#id\').html(\'<i>input</i>\')">text</p> here')
Out[2]: u'Some input\')">text here'

Should return "Some text here"

Already patched it with tests in my github repo - khoomeister/django but
would like to go through the proper processes!

Chris

--
Ticket URL: <https://code.djangoproject.com/ticket/19237>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Nov 4, 2012, 1:39:49 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+--------------------------------------

Reporter: chris.khoo@… | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by khoomeister):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:1>

Django

unread,
Nov 4, 2012, 1:40:23 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+--------------------------------------
Reporter: chris.khoo@… | Owner: anonymous
Type: Uncategorized | Status: assigned
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by anonymous):

* status: new => assigned
* owner: nobody => anonymous


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:2>

Django

unread,
Nov 4, 2012, 1:41:52 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner: Chris
Type: Uncategorized | Khoo <chris.khoo@…>
Component: Uncategorized | Status: new
Severity: Normal | Version: master
Keywords: | Resolution:
Has patch: 1 | Triage Stage:
Needs tests: 0 | Unreviewed
Easy pickings: 1 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Chris Khoo <chris.khoo@…>):

* owner: anonymous => Chris Khoo <chris.khoo@…>
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:3>

Django

unread,
Nov 4, 2012, 1:42:10 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+--------------------------------------
Reporter: chris.khoo@… | Owner: anonymous
Type: Uncategorized | Status: assigned
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0
-------------------------------+--------------------------------------

Changes (by Chris Khoo <chris.khoo@…>):

* owner: Chris Khoo <chris.khoo@…> => anonymous


* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:4>

Django

unread,
Nov 4, 2012, 2:07:28 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister

Type: Uncategorized | Status: new
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by khoomeister):

* owner: anonymous => khoomeister


* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:5>

Django

unread,
Nov 4, 2012, 2:07:43 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Uncategorized | Status: assigned
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by khoomeister):

* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:6>

Django

unread,
Nov 4, 2012, 2:08:20 AM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Uncategorized | Status: assigned
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------+---------------------------------------

Comment (by khoomeister):

Doh - sorry, this is my first time using this!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:7>

Django

unread,
Nov 4, 2012, 12:44:14 PM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: assigned
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
------------------------------+---------------------------------------
Changes (by claudep):

* type: Uncategorized => Bug
* component: Uncategorized => Core (Other)
* stage: Unreviewed => Accepted


Comment:

Link to commit:
https://github.com/khoomeister/django/commit/605277ee32f60689e9a82a2e99767aef449c98ec

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:8>

Django

unread,
Nov 4, 2012, 8:03:02 PM11/4/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: assigned
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
------------------------------+---------------------------------------

Comment (by khoomeister):

Current code only works with Python 2.7 - updated it to work with Python
2.5 & 2.6

Link to commit:
https://github.com/khoomeister/django/commit/da1ba5fd79c28f0643a1eb0ce18feb51dfe4a499

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:9>

Django

unread,
Nov 17, 2012, 8:22:04 AM11/17/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: assigned
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
------------------------------+---------------------------------------
Changes (by thiderman):

* cc: thiderman (added)


Comment:

I can confirm that this works and the tests pass on Python 2.6.5.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:10>

Django

unread,
Nov 17, 2012, 12:15:25 PM11/17/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Core (Other) | Status: assigned

Severity: Normal | Version: master
Keywords: | Resolution:
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 1 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by claudep):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:11>

Django

unread,
Nov 24, 2012, 6:20:13 AM11/24/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Core (Other) | Status: closed
Severity: Normal | Version: master
Keywords: | Resolution: fixed

Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 1 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"bf1871d874a371ad0ae6c7e098e7665a468dca16"]:
{{{
#!CommitTicketReference repository=""
revision="bf1871d874a371ad0ae6c7e098e7665a468dca16"
Fixed #19237 -- Improved strip_tags utility

The previous pattern didn't properly addressed cases where '>'
was present inside quoted tag content.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:12>

Django

unread,
Nov 24, 2012, 6:22:22 AM11/24/12
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Core (Other) | Status: closed
Severity: Normal | Version: master
Keywords: | Resolution: fixed
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 1 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"9efe1a7210ee161d5688f66a759bcd8d89d33142"]:
{{{
#!CommitTicketReference repository=""
revision="9efe1a7210ee161d5688f66a759bcd8d89d33142"
[1.5.x] Fixed #19237 -- Improved strip_tags utility

The previous pattern didn't properly addressed cases where '>'
was present inside quoted tag content.

Backport of bf1871d87 from master.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:13>

Django

unread,
Feb 4, 2013, 6:29:03 AM2/4/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Core (Other) | Status: new

Severity: Normal | Version: master
Keywords: | Resolution:
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 1 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Pablo Recio <pablo.recioquijano@…>):

* status: closed => new
* resolution: fixed =>


Comment:

I just ran out a really basic case where this fails:

{{{
>>> from django.utils.html import strip_tags
>>> strip_tags('<strong>foo</strong><a href="http://example.com">bar</a>')
u'bar'
}}}

But the result should be 'foobar'. I think this should be fixed for the
1.5 release, should I mark it as blocker?

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:14>

Django

unread,
Feb 4, 2013, 12:10:35 PM2/4/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new

Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 0 => 1
* severity: Normal => Release blocker
* easy: 1 => 0
* stage: Ready for checkin => Accepted


Comment:

Yes, either we quickly find the flaw in the new regex or we should revert
the commit.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:15>

Django

unread,
Feb 4, 2013, 12:14:00 PM2/4/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 1 => 0


Comment:

I may have found it. Regex gurus needed :-)

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:16>

Django

unread,
Feb 4, 2013, 1:21:30 PM2/4/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Pablo Recio <pablo.recioquijano@…>):

I'm not a regex guru, but it seems good for me. Although, I feel that this
stuff should be either reverted or adding a really big batch of tests for
releasing this into 1.5. Just my humble thought.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:17>

Django

unread,
Feb 6, 2013, 12:16:11 PM2/6/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by claudep):

It seems the current agreement among Django devs is to revert the change
for 1.5 and switch to a non-regex technique for 1.6 (or maybe fallback to
regex when content is not valid HTML).

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:18>

Django

unread,
Feb 6, 2013, 3:20:11 PM2/6/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"20ac33100cd20c17d1ceab075d96698974cc4778"]:
{{{
#!CommitTicketReference repository=""
revision="20ac33100cd20c17d1ceab075d96698974cc4778"
Partially revert 9efe1a721, strip_tags improvements

The new regex seems not stable enough for being released. Stripping
with regex might need reevaluation for the next release.
Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:19>

Django

unread,
Feb 6, 2013, 3:23:57 PM2/6/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"d7504a3d7b8645bdb979bab7ded0e9a9b6dccd0e"]:
{{{
#!CommitTicketReference repository=""
revision="d7504a3d7b8645bdb979bab7ded0e9a9b6dccd0e"
Improved regex in strip_tags

Thanks Pablo Recio for the report. Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:20>

Django

unread,
Feb 6, 2013, 3:27:08 PM2/6/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Normal | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+---------------------------------------
Changes (by claudep):

* severity: Release blocker => Normal


Comment:

So I reverted the fix in 1.5, and commit my proposed fix in 1.6.

Surely, the next step is to provide more tests. I have no religion about
the use of regex or an HTML parser to strip tags, only tests will convince
me!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:21>

Django

unread,
Feb 8, 2013, 3:23:14 AM2/8/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"20ac33100cd20c17d1ceab075d96698974cc4778"]:


{{{
#!CommitTicketReference repository=""
revision="20ac33100cd20c17d1ceab075d96698974cc4778"
Partially revert 9efe1a721, strip_tags improvements

The new regex seems not stable enough for being released. Stripping
with regex might need reevaluation for the next release.
Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:22>

Django

unread,
Feb 19, 2013, 11:17:40 AM2/19/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+---------------------------------------

Comment (by jacob):

FTR, the new regex committed in bf1871d874a371ad0ae6c7e098e7665a468dca16
has severe performance issues on longer strings, please consider this a
veto on re-committing it.

We should probably avoid using regexes here entirely; I think HTMLParser
might be a better choice.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:23>

Django

unread,
Feb 19, 2013, 11:18:31 AM2/19/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+---------------------------------------

Comment (by jacob):

As does the "improved" version in
d7504a3d7b8645bdb979bab7ded0e9a9b6dccd0e; both are broken on larger
strings.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:24>

Django

unread,
Mar 18, 2013, 6:32:09 AM3/18/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------

Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by aaugustin):

* severity: Normal => Release blocker


Comment:

Reading the comment, it isn't clear to me that the regex currently used in
master doesn't suffer from performance issues. Marking as a release
blocker until this is sorted out.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:26>

Django

unread,
Apr 1, 2013, 10:52:52 AM4/1/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"a01361b5ae9df5220c0ce7b42c5ff36094d5718c"]:
{{{
#!CommitTicketReference repository=""
revision="a01361b5ae9df5220c0ce7b42c5ff36094d5718c"
Added more tests for strip_tags utility

Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:27>

Django

unread,
Apr 1, 2013, 10:59:24 AM4/1/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by claudep):

I just committed tests that show the current implementation has apparently
no performance issues on longer content. Anyone is still free to try a
parser-based implementation, of course!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:28>

Django

unread,
Apr 3, 2013, 7:35:35 AM4/3/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by simon29):

Finally, after nearly two weeks I've found the reason why my servers kept
crashing on a high traffic, multi server production site :-(

This new regex is prone to catastrophic backtracking; which, when given
the right string, spins the CPU into an infinite loop. Performance issues
is an understatement. Sorry, but this regex literally crashes sites and
absolutely cannot be included.
http://www.regular-expressions.info/catastrophic.html
http://stackoverflow.com/questions/6230489/100-cpu-usage-with-a-regexp-
depending-on-input-length

We need to revert this change until we have a parser-based implementation.
I'll attach a sample "problem string" for y'all to play with.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:29>

Django

unread,
Apr 3, 2013, 7:50:38 AM4/3/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by simon29):

* cc: simon@… (added)


Comment:

I've attached a simple patch using HTMLParser from stdlib. It's the only
way, as any regex based solution won't properly support multiple attribute
tags anyway.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:30>

Django

unread,
Apr 3, 2013, 7:52:38 AM4/3/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by simon29):

* easy: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:31>

Django

unread,
Apr 3, 2013, 7:54:21 AM4/3/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by claudep):

I wonder which code version you are using, because in your attached test
(problem_string.py), you are still using the regex which was clearly
broken and which has been fixed in [d7504a3d7b8645].

Thanks for your `HTMLParser`-based version, I will test it ASAP.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:32>

Django

unread,
Apr 3, 2013, 2:19:52 PM4/3/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 0 => 1

* easy: 1 => 0


Comment:

I just posted an improved version of Simon's patch. So far, I found one
important different behaviour with the HTMLParser-based solution:
unterminated/not-escaped `<` are removed with any following content. I'm
not sure that this is an acceptable regression.

Tests with both file-based content (84Kb and 8Kb) show that the parsing
version is somewhat slower than the regex one, but I think this remains
acceptable.

And FWIW, I'm still waiting to receive a string which really fails with
the current regex.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:33>

Django

unread,
Apr 3, 2013, 2:47:51 PM4/3/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 1 => 0


Comment:

I think I found a solution for unterminated/not-escaped `<` in that latest
patch.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:34>

Django

unread,
Apr 3, 2013, 5:17:05 PM4/3/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by simon29):

My mistake, I had the previous regex. The new one works flawlessly; and
yes, a compiled "C" regex will always outperform a python parser. Unless
we can find a string to break the new one I'm now leaning in favour of the
regex solution.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:35>

Django

unread,
May 21, 2013, 3:27:40 PM5/21/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Utilities | Status: new
Severity: Release blocker | Version: master

Keywords: | Resolution:
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 0 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* stage: Accepted => Ready for checkin


Comment:

The patch looks good to me.

Even though no one managed to break the current regex yet, it's a fragile
technique, and bugs could easily escalate into security issues. So let's
play safe!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:36>

Django

unread,
May 22, 2013, 11:34:34 AM5/22/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Utilities | Status: closed

Severity: Release blocker | Version: master
Keywords: | Resolution: fixed

Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 0 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"dc51ec8bc214cf60ebb99732363624c23df8005f"]:
{{{
#!CommitTicketReference repository=""
revision="dc51ec8bc214cf60ebb99732363624c23df8005f"
Fixed #19237 -- Used HTML parser to strip tags

The regex method used until now for the strip_tags utility is fast,
but subject to flaws and security issues. Consensus and good
practice lead use to use a slower but safer method.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:37>

Django

unread,
May 22, 2013, 3:18:39 PM5/22/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Utilities | Status: new

Severity: Release blocker | Version: master
Keywords: | Resolution:
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 0 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by claudep):

* status: closed => new
* resolution: fixed =>


Comment:

HTMLParser behaviour changed between 2.6-2.7, 3.2-3.3. So now we have
probably no other choice than ignoring content after any unclosed tag.
That means the behaviour is changed comparing to the regex-based solution.

I'm joining the proposed fix.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:38>

Django

unread,
May 23, 2013, 4:21:33 AM5/23/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Utilities | Status: new
Severity: Release blocker | Version: master
Keywords: | Resolution:
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 0 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by claudep):

Another patch after discussing with Anssi on IRC. Backwards-incompatible
due to `<` and `>` encoding, but still safer and less risk of content
disappearing...

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:39>

Django

unread,
May 23, 2013, 8:02:03 AM5/23/13
to django-...@googlegroups.com
#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Utilities | Status: closed

Severity: Release blocker | Version: master
Keywords: | Resolution: fixed

Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 0 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"b664cb818d2e5896df2763299ea2c61a9af069a8"]:
{{{
#!CommitTicketReference repository=""
revision="b664cb818d2e5896df2763299ea2c61a9af069a8"
Fixed #19237 (again) - Made strip_tags consistent between Python versions
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:40>

Reply all
Reply to author
Forward
0 new messages