[Django] #19237: The use of > in single or double quoted attributes in strip

Django

unread,

Nov 4, 2012, 1:37:38 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------+--------------------
Reporter: chris.khoo@… | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 1
Easy pickings: 1 | UI/UX: 0
-------------------------------+--------------------
At the moment, if you run django.utils.html.strip_tags with '>' in single
or double quoted attributes, it won't work.

For example:
In [2]: strip_tags('Some input\')">text here')
Out[2]: u'Some input\')">text here'

Should return "Some text here"

Already patched it with tests in my github repo - khoomeister/django but
would like to go through the proper processes!

Chris

--
Ticket URL: <https://code.djangoproject.com/ticket/19237>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Nov 4, 2012, 1:39:49 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------+--------------------------------------

Reporter: chris.khoo@… | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0

-------------------------------+--------------------------------------
Changes (by khoomeister):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:1>

Django

unread,

Nov 4, 2012, 1:40:23 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------+--------------------------------------
Reporter: chris.khoo@… | Owner: anonymous
Type: Uncategorized | Status: assigned
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0

-------------------------------+--------------------------------------
Changes (by anonymous):

* status: new => assigned
* owner: nobody => anonymous

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:2>

Django

unread,

Nov 4, 2012, 1:41:52 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner: Chris
Type: Uncategorized | Khoo <chris.khoo@…>
Component: Uncategorized | Status: new
Severity: Normal | Version: master
Keywords: | Resolution:
Has patch: 1 | Triage Stage:
Needs tests: 0 | Unreviewed
Easy pickings: 1 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Chris Khoo <chris.khoo@…>):

* owner: anonymous => Chris Khoo <chris.khoo@…>
* status: assigned => new

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:3>

Django

unread,

Nov 4, 2012, 1:42:10 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------+--------------------------------------
Reporter: chris.khoo@… | Owner: anonymous
Type: Uncategorized | Status: assigned

Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0

-------------------------------+--------------------------------------

Changes (by Chris Khoo <chris.khoo@…>):

* owner: Chris Khoo <chris.khoo@…> => anonymous

* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:4>

Django

unread,

Nov 4, 2012, 2:07:28 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister

Type: Uncategorized | Status: new
Component: Uncategorized | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0

-------------------------------+---------------------------------------
Changes (by khoomeister):

* owner: anonymous => khoomeister

* status: assigned => new

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:5>

Django

unread,

Nov 4, 2012, 2:07:43 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister

Type: Uncategorized | Status: assigned
Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0

-------------------------------+---------------------------------------
Changes (by khoomeister):

* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:6>

Django

unread,

Nov 4, 2012, 2:08:20 AM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Uncategorized | Status: assigned

Component: Uncategorized | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0

-------------------------------+---------------------------------------

Comment (by khoomeister):

Doh - sorry, this is my first time using this!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:7>

Django

unread,

Nov 4, 2012, 12:44:14 PM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister

Type: Bug | Status: assigned
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0

------------------------------+---------------------------------------
Changes (by claudep):

* type: Uncategorized => Bug
* component: Uncategorized => Core (Other)
* stage: Unreviewed => Accepted

Comment:

Link to commit:
https://github.com/khoomeister/django/commit/605277ee32f60689e9a82a2e99767aef449c98ec

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:8>

Django

unread,

Nov 4, 2012, 8:03:02 PM11/4/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

Easy pickings: 1 | UI/UX: 0

------------------------------+---------------------------------------

Comment (by khoomeister):

Current code only works with Python 2.7 - updated it to work with Python
2.5 & 2.6

Link to commit:
https://github.com/khoomeister/django/commit/da1ba5fd79c28f0643a1eb0ce18feb51dfe4a499

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:9>

Django

unread,

Nov 17, 2012, 8:22:04 AM11/17/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

Easy pickings: 1 | UI/UX: 0

------------------------------+---------------------------------------
Changes (by thiderman):

* cc: thiderman (added)

Comment:

I can confirm that this works and the tests pass on Python 2.6.5.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:10>

Django

unread,

Nov 17, 2012, 12:15:25 PM11/17/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Core (Other) | Status: assigned

Severity: Normal | Version: master
Keywords: | Resolution:

* stage: Accepted => Ready for checkin

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:11>

Django

unread,

Nov 24, 2012, 6:20:13 AM11/24/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister

Component: Core (Other) | Status: closed
Severity: Normal | Version: master
Keywords: | Resolution: fixed

Changes (by Claude Paroz <claude@…>):

* status: assigned => closed
* resolution: => fixed

Comment:

In [changeset:"bf1871d874a371ad0ae6c7e098e7665a468dca16"]:
{{{
#!CommitTicketReference repository=""
revision="bf1871d874a371ad0ae6c7e098e7665a468dca16"
Fixed #19237 -- Improved strip_tags utility

The previous pattern didn't properly addressed cases where '>'
was present inside quoted tag content.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:12>

Django

unread,

Nov 24, 2012, 6:22:22 AM11/24/12

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

Comment (by Claude Paroz <claude@…>):

In [changeset:"9efe1a7210ee161d5688f66a759bcd8d89d33142"]:
{{{
#!CommitTicketReference repository=""
revision="9efe1a7210ee161d5688f66a759bcd8d89d33142"
[1.5.x] Fixed #19237 -- Improved strip_tags utility

The previous pattern didn't properly addressed cases where '>'
was present inside quoted tag content.

Backport of bf1871d87 from master.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:13>

Django

unread,

Feb 4, 2013, 6:29:03 AM2/4/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister

Component: Core (Other) | Status: new

Severity: Normal | Version: master
Keywords: | Resolution:

Changes (by Pablo Recio <pablo.recioquijano@…>):

* status: closed => new
* resolution: fixed =>

Comment:

I just ran out a really basic case where this fails:

{{{
>>> from django.utils.html import strip_tags
>>> strip_tags('foo<a href="http://example.com">bar</a>')
u'bar'
}}}

But the result should be 'foobar'. I think this should be fixed for the
1.5 release, should I mark it as blocker?

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:14>

Django

unread,

Feb 4, 2013, 12:10:35 PM2/4/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new

Component: Core (Other) | Version: master

Severity: Release blocker | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 0 => 1
* severity: Normal => Release blocker
* easy: 1 => 0
* stage: Ready for checkin => Accepted

Comment:

Yes, either we quickly find the flaw in the new regex or we should revert
the commit.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:15>

Django

unread,

Feb 4, 2013, 12:14:00 PM2/4/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 1 => 0

Comment:

I may have found it. Regex gurus needed :-)

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:16>

Django

unread,

Feb 4, 2013, 1:21:30 PM2/4/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Pablo Recio <pablo.recioquijano@…>):

I'm not a regex guru, but it seems good for me. Although, I feel that this
stuff should be either reverted or adding a really big batch of tests for
releasing this into 1.5. Just my humble thought.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:17>

Django

unread,

Feb 6, 2013, 12:16:11 PM2/6/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by claudep):

It seems the current agreement among Django devs is to revert the change
for 1.5 and switch to a non-regex technique for 1.6 (or maybe fallback to
regex when content is not valid HTML).

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:18>

Django

unread,

Feb 6, 2013, 3:20:11 PM2/6/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"20ac33100cd20c17d1ceab075d96698974cc4778"]:
{{{
#!CommitTicketReference repository=""
revision="20ac33100cd20c17d1ceab075d96698974cc4778"
Partially revert 9efe1a721, strip_tags improvements

The new regex seems not stable enough for being released. Stripping
with regex might need reevaluation for the next release.
Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:19>

Django

unread,

Feb 6, 2013, 3:23:57 PM2/6/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"d7504a3d7b8645bdb979bab7ded0e9a9b6dccd0e"]:
{{{
#!CommitTicketReference repository=""
revision="d7504a3d7b8645bdb979bab7ded0e9a9b6dccd0e"
Improved regex in strip_tags

Thanks Pablo Recio for the report. Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:20>

Django

unread,

Feb 6, 2013, 3:27:08 PM2/6/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master

Severity: Normal | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0

------------------------------+---------------------------------------
Changes (by claudep):

* severity: Release blocker => Normal

Comment:

So I reverted the fix in 1.5, and commit my proposed fix in 1.6.

Surely, the next step is to provide more tests. I have no religion about
the use of regex or an HTML parser to strip tags, only tests will convince
me!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:21>

Django

unread,

Feb 8, 2013, 3:23:14 AM2/8/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"20ac33100cd20c17d1ceab075d96698974cc4778"]:

{{{
#!CommitTicketReference repository=""
revision="20ac33100cd20c17d1ceab075d96698974cc4778"
Partially revert 9efe1a721, strip_tags improvements

The new regex seems not stable enough for being released. Stripping
with regex might need reevaluation for the next release.
Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:22>

Django

unread,

Feb 19, 2013, 11:17:40 AM2/19/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+---------------------------------------

Comment (by jacob):

FTR, the new regex committed in bf1871d874a371ad0ae6c7e098e7665a468dca16
has severe performance issues on longer strings, please consider this a
veto on re-committing it.

We should probably avoid using regexes here entirely; I think HTMLParser
might be a better choice.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:23>

Django

unread,

Feb 19, 2013, 11:18:31 AM2/19/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Core (Other) | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+---------------------------------------

Comment (by jacob):

As does the "improved" version in
d7504a3d7b8645bdb979bab7ded0e9a9b6dccd0e; both are broken on larger
strings.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:24>

Django

unread,

Mar 18, 2013, 6:32:09 AM3/18/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

---------------------------------+---------------------------------------

Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new

Component: Utilities | Version: master
Severity: Release blocker | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0

---------------------------------+---------------------------------------
Changes (by aaugustin):

* severity: Normal => Release blocker

Comment:

Reading the comment, it isn't clear to me that the regex currently used in
master doesn't suffer from performance issues. Marking as a release
blocker until this is sorted out.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:26>

Django

unread,

Apr 1, 2013, 10:52:52 AM4/1/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"a01361b5ae9df5220c0ce7b42c5ff36094d5718c"]:
{{{
#!CommitTicketReference repository=""
revision="a01361b5ae9df5220c0ce7b42c5ff36094d5718c"
Added more tests for strip_tags utility

Refs #19237.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:27>

Django

unread,

Apr 1, 2013, 10:59:24 AM4/1/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by claudep):

I just committed tests that show the current implementation has apparently
no performance issues on longer content. Anyone is still free to try a
parser-based implementation, of course!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:28>

Django

unread,

Apr 3, 2013, 7:35:35 AM4/3/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by simon29):

Finally, after nearly two weeks I've found the reason why my servers kept
crashing on a high traffic, multi server production site :-(

This new regex is prone to catastrophic backtracking; which, when given
the right string, spins the CPU into an infinite loop. Performance issues
is an understatement. Sorry, but this regex literally crashes sites and
absolutely cannot be included.
http://www.regular-expressions.info/catastrophic.html
http://stackoverflow.com/questions/6230489/100-cpu-usage-with-a-regexp-
depending-on-input-length

We need to revert this change until we have a parser-based implementation.
I'll attach a sample "problem string" for y'all to play with.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:29>

Django

unread,

Apr 3, 2013, 7:50:38 AM4/3/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Changes (by simon29):

* cc: simon@… (added)

Comment:

I've attached a simple patch using HTMLParser from stdlib. It's the only
way, as any regex based solution won't properly support multiple attribute
tags anyway.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:30>

Django

unread,

Apr 3, 2013, 7:52:38 AM4/3/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 1 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by simon29):

* easy: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:31>

Django

unread,

Apr 3, 2013, 7:54:21 AM4/3/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by claudep):

I wonder which code version you are using, because in your attached test
(problem_string.py), you are still using the regex which was clearly
broken and which has been fixed in [d7504a3d7b8645].

Thanks for your `HTMLParser`-based version, I will test it ASAP.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:32>

Django

unread,

Apr 3, 2013, 2:19:52 PM4/3/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 0 => 1

* easy: 1 => 0

Comment:

I just posted an improved version of Simon's patch. So far, I found one
important different behaviour with the HTMLParser-based solution:
unterminated/not-escaped `<` are removed with any following content. I'm
not sure that this is an acceptable regression.

Tests with both file-based content (84Kb and 8Kb) show that the parsing
version is somewhat slower than the regex one, but I think this remains
acceptable.

And FWIW, I'm still waiting to receive a string which really fails with
the current regex.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:33>

Django

unread,

Apr 3, 2013, 2:47:51 PM4/3/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------
Changes (by claudep):

* needs_better_patch: 1 => 0

Comment:

I think I found a solution for unterminated/not-escaped `<` in that latest
patch.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:34>

Django

unread,

Apr 3, 2013, 5:17:05 PM4/3/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
---------------------------------+---------------------------------------
Reporter: chris.khoo@… | Owner: khoomeister
Type: Bug | Status: new
Component: Utilities | Version: master
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+---------------------------------------

Comment (by simon29):

My mistake, I had the previous regex. The new one works flawlessly; and
yes, a compiled "C" regex will always outperform a python parser. Unless
we can find a string to break the new one I'm now leaning in favour of the
regex solution.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:35>

Django

unread,

May 21, 2013, 3:27:40 PM5/21/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags

-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister

Component: Utilities | Status: new
Severity: Release blocker | Version: master

Keywords: | Resolution:
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin

Easy pickings: 0 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* stage: Accepted => Ready for checkin

Comment:

The patch looks good to me.

Even though no one managed to break the current regex yet, it's a fragile
technique, and bugs could easily escalate into security issues. So let's
play safe!

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:36>

Django

unread,

May 22, 2013, 11:34:34 AM5/22/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister

Component: Utilities | Status: closed

Severity: Release blocker | Version: master

Keywords: | Resolution: fixed

Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed

Comment:

In [changeset:"dc51ec8bc214cf60ebb99732363624c23df8005f"]:
{{{
#!CommitTicketReference repository=""
revision="dc51ec8bc214cf60ebb99732363624c23df8005f"
Fixed #19237 -- Used HTML parser to strip tags

The regex method used until now for the strip_tags utility is fast,
but subject to flaws and security issues. Consensus and good
practice lead use to use a slower but safer method.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:37>

Django

unread,

May 22, 2013, 3:18:39 PM5/22/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister

Component: Utilities | Status: new

Severity: Release blocker | Version: master
Keywords: | Resolution:

Changes (by claudep):

* status: closed => new
* resolution: fixed =>

Comment:

HTMLParser behaviour changed between 2.6-2.7, 3.2-3.3. So now we have
probably no other choice than ignoring content after any unclosed tag.
That means the behaviour is changed comparing to the regex-based solution.

I'm joining the proposed fix.

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:38>

Django

unread,

May 23, 2013, 4:21:33 AM5/23/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister
Component: Utilities | Status: new
Severity: Release blocker | Version: master
Keywords: | Resolution:
Has patch: 1 | Triage Stage: Ready for
Needs tests: 0 | checkin
Easy pickings: 0 | Needs documentation: 0
| Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by claudep):

Another patch after discussing with Anssi on IRC. Backwards-incompatible
due to `<` and `>` encoding, but still safer and less risk of content
disappearing...

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:39>

Django

unread,

May 23, 2013, 8:02:03 AM5/23/13

to django-...@googlegroups.com

#19237: The use of > in single or double quoted attributes in strip_tags
-------------------------------------+-------------------------------------
Reporter: chris.khoo@… | Owner:
Type: Bug | khoomeister

Component: Utilities | Status: closed

Severity: Release blocker | Version: master

Keywords: | Resolution: fixed

Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed

Comment:

In [changeset:"b664cb818d2e5896df2763299ea2c61a9af069a8"]:
{{{
#!CommitTicketReference repository=""
revision="b664cb818d2e5896df2763299ea2c61a9af069a8"
Fixed #19237 (again) - Made strip_tags consistent between Python versions
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19237#comment:40>

Reply all

Reply to author

Forward

[Django] #19237: The use of > in single or double quoted attributes in strip_tags

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django

Django