[Django] #32547: assertHTMLEqual allows invalid HTML

20 views
Skip to first unread message

Django

unread,
Mar 14, 2021, 10:12:27 AM3/14/21
to django-...@googlegroups.com
#32547: assertHTMLEqual allows invalid HTML
---------------------------------------------+------------------------
Reporter: François Poulain | Owner: nobody
Type: Uncategorized | Status: new
Component: Testing framework | Version: 3.1
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
---------------------------------------------+------------------------
Hi,

The docs for assertHTMLEqual says "html1 and html2 must be valid HTML.".
The formulation suggest that html validation is enforced. But is is
actually easy to get test succeeding with invalid html. Eg.
```
SimpleTestCase.assertHTMLEqual(None, '<div class="bling" class="blang">',
'<div class="bling" class="blang">')
```
The code rely on Python's HTMLParser
(https://docs.python.org/3/library/html.parser.html) for which the
documentation states it produce parsers "able to parse invalid markup.".

I suggest to correct documentation and/or to enforce validation on the
parser side.

How do you think about it?

--
Ticket URL: <https://code.djangoproject.com/ticket/32547>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 14, 2021, 10:13:18 AM3/14/21
to django-...@googlegroups.com
#32547: assertHTMLEqual allows invalid HTML
-----------------------------------+--------------------------------------

Reporter: François Poulain | Owner: nobody
Type: Uncategorized | Status: new
Component: Testing framework | Version: 3.1
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------------+--------------------------------------
Description changed by François Poulain:

Old description:

> Hi,
>
> The docs for assertHTMLEqual says "html1 and html2 must be valid HTML.".
> The formulation suggest that html validation is enforced. But is is
> actually easy to get test succeeding with invalid html. Eg.
> ```
> SimpleTestCase.assertHTMLEqual(None, '<div class="bling" class="blang">',
> '<div class="bling" class="blang">')
> ```
> The code rely on Python's HTMLParser
> (https://docs.python.org/3/library/html.parser.html) for which the
> documentation states it produce parsers "able to parse invalid markup.".
>
> I suggest to correct documentation and/or to enforce validation on the
> parser side.
>
> How do you think about it?

New description:

Hi,

The docs for assertHTMLEqual says "html1 and html2 must be valid HTML.".
The formulation suggest that html validation is enforced. But is is
actually easy to get test succeeding with invalid html. Eg.
{{{
SimpleTestCase.assertHTMLEqual(None, '<div class="bling" class="blang">',
'<div class="bling" class="blang">')
}}}
The code rely on Python's HTMLParser
(https://docs.python.org/3/library/html.parser.html) for which the
documentation states it produce parsers "able to parse invalid markup.".

I suggest to correct documentation and/or to enforce validation on the
parser side.

How do you think about it?

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32547#comment:1>

Django

unread,
Mar 15, 2021, 2:47:39 AM3/15/21
to django-...@googlegroups.com
#32547: assertHTMLEqual()/assertHTMLNotEqual() allow invalid HTML.
--------------------------------------+------------------------------------

Reporter: François Poulain | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 3.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Mariusz Felisiak):

* type: Uncategorized => Cleanup/optimization
* component: Testing framework => Documentation
* stage: Unreviewed => Accepted


Comment:

Agreed, we should clarify this in
[https://docs.djangoproject.com/en/dev/topics/testing/tools/#django.test.SimpleTestCase.assertHTMLEqual
assertHTMLEqual()],
[https://docs.djangoproject.com/en/dev/topics/testing/tools/#django.test.SimpleTestCase.assertHTMLNotEqual
assertHTMLNotEqual()] docs and
[https://github.com/django/django/blob/876dc0c1a7dbf569782eb64f62f339c1daeb75e0/django/test/html.py#L227-L232
parse_html()] docstring.

--
Ticket URL: <https://code.djangoproject.com/ticket/32547#comment:2>

Django

unread,
Mar 17, 2021, 6:11:40 AM3/17/21
to django-...@googlegroups.com
#32547: assertHTMLEqual()/assertHTMLNotEqual() allow invalid HTML.
--------------------------------------+------------------------------------
Reporter: François Poulain | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 3.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0

--------------------------------------+------------------------------------
Changes (by Mariusz Felisiak):

* easy: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/32547#comment:3>

Django

unread,
Mar 17, 2021, 11:46:41 AM3/17/21
to django-...@googlegroups.com
#32547: assertHTMLEqual()/assertHTMLNotEqual() allow invalid HTML.
-------------------------------------+-------------------------------------
Reporter: François Poulain | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: assigned

Component: Documentation | Version: 3.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Hasan Ramezani):

* owner: nobody => Hasan Ramezani
* status: new => assigned
* has_patch: 0 => 1


Comment:

I've created a [https://github.com/django/django/pull/14143 PR] and
removed the `must be valid HTML` and `An AssertionError will be raised if
one of them cannot be parsed.` from
`assertHTMLEqual()` and `​assertHTMLNotEqual()` docs.

--
Ticket URL: <https://code.djangoproject.com/ticket/32547#comment:4>

Django

unread,
Mar 18, 2021, 5:23:19 AM3/18/21
to django-...@googlegroups.com
#32547: assertHTMLEqual()/assertHTMLNotEqual() allow invalid HTML.
-------------------------------------+-------------------------------------
Reporter: François Poulain | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: assigned
Component: Documentation | Version: 3.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/32547#comment:5>

Django

unread,
Mar 18, 2021, 6:09:04 AM3/18/21
to django-...@googlegroups.com
#32547: assertHTMLEqual()/assertHTMLNotEqual() allow invalid HTML.
-------------------------------------+-------------------------------------
Reporter: François Poulain | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: closed
Component: Documentation | Version: 3.1
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"ceb4b9ee68dffc6ab0398886f1758f15f037c472" ceb4b9e]:
{{{
#!CommitTicketReference repository=""
revision="ceb4b9ee68dffc6ab0398886f1758f15f037c472"
Fixed #32547 -- Corrected notes about validation in HTML assertions docs.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32547#comment:6>

Django

unread,
Mar 18, 2021, 6:09:32 AM3/18/21
to django-...@googlegroups.com
#32547: assertHTMLEqual()/assertHTMLNotEqual() allow invalid HTML.
-------------------------------------+-------------------------------------
Reporter: François Poulain | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: closed
Component: Documentation | Version: 3.1
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"6b020f3c94fb7f27875d5fe21a71a5ee7c9a7538" 6b020f3c]:
{{{
#!CommitTicketReference repository=""
revision="6b020f3c94fb7f27875d5fe21a71a5ee7c9a7538"
[3.2.x] Fixed #32547 -- Corrected notes about validation in HTML
assertions docs.

Backport of ceb4b9ee68dffc6ab0398886f1758f15f037c472 from main
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32547#comment:7>

Reply all
Reply to author
Forward
0 new messages