[Django] #18239: Only use custom subclass of HTMLParser for Python versions with buggy stdlib HTMLParser

12 views
Skip to first unread message

Django

unread,
Apr 29, 2012, 5:41:20 PM4/29/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
----------------------------------------+------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: new
Component: Core (Other) | Version: 1.3
Severity: Normal | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
----------------------------------------+------------------------
Django currently has its own subclass of `HTMLParser` (in
`django.utils.html_parser.HTMLParser`). It exists in order to patch
[http://bugs.python.org/issue670664 a bug] in the standard library's
`HTMLParser` in Python 2.5 and older versions of 2.6 and 2.7. The bug has
been fixed in Python 2.6.8, 2.7.3, and will be fixed in the upcoming 3.3
as well. There are also other fixes in 3.3's `HTMLParser` which conflict
with the patched version in Django, since it relies on numerous
undocumented internals.

For better forward-compatibility, we should only use our patched subclass
for versions of Python known to contain the bug, and otherwise simply use
the standard library's `HTMLParser` directly.

When we make this change, we can also roll back r17456, as that was simply
papering over a breakage due to the modified `HTMLParser` in 2.6.8 and
2.7.3 - that will no longer be a problem if we don't try to use our
subclass with those (and newer) Pythons.

--
Ticket URL: <https://code.djangoproject.com/ticket/18239>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 29, 2012, 5:45:43 PM4/29/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: new
Component: Core (Other) | Version: 1.3
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by carljm):

(Thanks to Vinay Sajip for discovering and raising this issue.)

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:1>

Django

unread,
Aug 2, 2012, 8:02:04 AM8/2/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: new
Component: Core (Other) | Version: 1.3
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Changes (by rhertzog):

* cc: rhertzog (added)


Comment:

For me the test suite of Django 1.4.1 fails with many invalid HTML parse
errors when I run it in Debian Sid with python 2.7.3. Is this bug the same
issue?

Example of error:
{{{
======================================================================
ERROR: test_count (regressiontests.test_utils.tests.HTMLEqualTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/«PKGBUILDDIR»/tests/regressiontests/test_utils/tests.py", line
396, in test_count
dom2 = parse_html('<p class="bar">foo</p>')
File "/«PKGBUILDDIR»/django/test/html.py", line 213, in parse_html
parser.feed(html)
File "/usr/lib/python2.7/HTMLParser.py", line 114, in feed
self.goahead(0)
File "/usr/lib/python2.7/HTMLParser.py", line 160, in goahead
k = self.parse_endtag(i)
File "/«PKGBUILDDIR»/django/utils/html_parser.py", line 96, in
parse_endtag
self.handle_endtag(tag.lower())
File "/«PKGBUILDDIR»/django/test/html.py", line 191, in handle_endtag
tag, self.format_position()))
File "/«PKGBUILDDIR»/django/test/html.py", line 153, in error
raise HTMLParseError(msg, self.getpos())
HTMLParseError: Unexpected end tag `p` (Line 1, Column 18), at line 1,
column 19

}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:2>

Django

unread,
Aug 3, 2012, 3:01:15 AM8/3/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
---------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: new
Component: Core (Other) | Version: 1.3
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Changes (by rhertzog):

* has_patch: 0 => 1
* severity: Normal => Release blocker


Comment:

Here's a patch that seems to solve the issue for me by doing what the bug
description suggest, i.e. use Django's own HTMLParser only with python
versions that have the problem. It should be straightforward to adapt it
for the development version.

I took the liberty to increase the severity as Django is effectively
broken for me on Debian Sid right now.

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:3>

Django

unread,
Aug 6, 2012, 1:48:46 PM8/6/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
---------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: new
Component: Core (Other) | Version: 1.3
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------

Comment (by claudep):

Python 3.2.3 has the fix also.

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:4>

Django

unread,
Aug 14, 2012, 8:26:08 AM8/14/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
---------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: new
Component: Core (Other) | Version: 1.3
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------

Comment (by rhertzog):

I would appreciate some ack/review of a core developer before I upload
this patch to debian... but it would be even better if I could just cherry
pick the definitive fix from the trunk.

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:5>

Django

unread,
Aug 16, 2012, 3:13:16 PM8/16/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
---------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: closed
Component: Core (Other) | Version: 1.3
Severity: Release blocker | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [5c79dd586534bc88ce7dc81c2d781c772d28b121]:
{{{
#!CommitTicketReference repository=""
revision="5c79dd586534bc88ce7dc81c2d781c772d28b121"
Fixed #18239 -- Subclassed HTMLParser only for selected Python versions

Only Python versions affected by http://bugs.python.org/issue670664
should patch HTMLParser.
Thanks Raphaël Hertzog for the initial patch (for 1.4).
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:6>

Django

unread,
Aug 16, 2012, 3:13:28 PM8/16/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
---------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: closed
Component: Core (Other) | Version: 1.3
Severity: Release blocker | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------

Comment (by Claude Paroz <claude@…>):

In [57d9ccc4aaef0420f6ba60a26e6af4e83b803ae9]:
{{{
#!CommitTicketReference repository=""
revision="57d9ccc4aaef0420f6ba60a26e6af4e83b803ae9"
[1.4.x] Fixed #18239 -- Subclassed HTMLParser only for selected Python

versions

Only Python versions affected by http://bugs.python.org/issue670664
should patch HTMLParser.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:7>

Django

unread,
Sep 9, 2012, 5:45:54 PM9/9/12
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
---------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: closed
Component: Core (Other) | Version: 1.3
Severity: Release blocker | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------

Comment (by claudep):

Applied to all Python 2.6 in [fcec904e4f3582a45d4d8e309e71e9f0c4d79a0c]

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:8>

Django

unread,
Mar 26, 2016, 12:48:52 PM3/26/16
to django-...@googlegroups.com
#18239: Only use custom subclass of HTMLParser for Python versions with buggy
stdlib HTMLParser
---------------------------------+------------------------------------
Reporter: carljm | Owner: nobody
Type: Bug | Status: closed

Component: Core (Other) | Version: 1.3
Severity: Release blocker | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"2c125bded1834cadf3a6132d9ab87bc74f5ed728" 2c125bde]:
{{{
#!CommitTicketReference repository=""
revision="2c125bded1834cadf3a6132d9ab87bc74f5ed728"
Refs #18239 -- Removed an obsolete workaround for bugs in HTMLParser.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18239#comment:9>

Reply all
Reply to author
Forward
0 new messages