A critique of cgi.escape

7 views
Skip to first unread message

Lawrence D'Oliveiro

unread,
Sep 23, 2006, 8:00:16 AM9/23/06
to
The "escape" function in the "cgi" module escapes characters with special
meanings in HTML. The ones that need escaping are '<', '&' and '"'.
However, cgi.escape only escapes the quote character if you pass a second
argument of True (the default is False):

>>> cgi.escape("the \"quick\" & <brown> fox")
'the "quick" &amp; &lt;brown&gt; fox'
>>> cgi.escape("the \"quick\" & <brown> fox", True)
'the &quot;quick&quot; &amp; &lt;brown&gt; fox'

This seems to me to be dumb. The default option should be the safe one: that
is, escape _all_ the potentially troublesome characters. The only time you
can get away with NOT escaping the quote character is outside of markup,
e.g.

<TEXTAREA>
unescaped "quotes" allowed here
</TEXTAREA>

Nevertheless, even in that situation, escaped quotes are acceptable.

So I think the default for the second argument to cgi.escape should be
changed to True. Or alternatively, the second argument should be removed
altogether, and quotes should always be escaped.

Can changing the default break existing scripts? I don't see how. It might
even fix a few lurking bugs out there.

Fredrik Lundh

unread,
Sep 23, 2006, 2:19:19 PM9/23/06
to pytho...@python.org
Lawrence D'Oliveiro wrote:

> So I think the default for the second argument to cgi.escape should be
> changed to True. Or alternatively, the second argument should be removed
> altogether, and quotes should always be escaped.

you're confused: cgi.escape(s) is designed to be used for ordinary text,
cgi.escape(s, True) is designed for attributes. if you use the code the
way it's intended to be used, it works perfectly fine.

> Can changing the default break existing scripts? I don't see how. It might
> even fix a few lurking bugs out there.

I'm not sure this "every time I don't immediately understand something,
I'll write a change proposal instead of reading the library reference"
approach is healthy, really.

</F>

Lawrence D'Oliveiro

unread,
Sep 23, 2006, 6:41:02 PM9/23/06
to
In message <mailman.499.11590355...@python.org>, Fredrik
Lundh wrote:

> Lawrence D'Oliveiro wrote:
>
>> So I think the default for the second argument to cgi.escape should be
>> changed to True. Or alternatively, the second argument should be removed
>> altogether, and quotes should always be escaped.
>
> you're confused: cgi.escape(s) is designed to be used for ordinary text,
> cgi.escape(s, True) is designed for attributes.

What works for attributes also works for ordinary text.

Jon Ribbens

unread,
Sep 23, 2006, 10:28:17 PM9/23/06
to
In article <mailman.499.11590355...@python.org>, Fredrik Lundh wrote:
> Lawrence D'Oliveiro wrote:
>> So I think the default for the second argument to cgi.escape should be
>> changed to True. Or alternatively, the second argument should be removed
>> altogether, and quotes should always be escaped.
>
> you're confused: cgi.escape(s) is designed to be used for ordinary text,
> cgi.escape(s, True) is designed for attributes. if you use the code the
> way it's intended to be used, it works perfectly fine.

He's not confused, he's correct; the author of cgi.escape is the
confused one. The optional extra parameter is completely unnecessary
and achieves nothing except to make it easier for people to end up
with bugs in their code.

Making cgi.escape always escape the '"' character would not break
anything, and would probably fix a few bugs in existing code. Yes,
those bugs are not cgi.escape's fault, but that's no reason not to
be helpful. It's a minor improvement with no downside.

One thing that is flat-out wrong, by the way, is that cgi.escape()
does not encode the apostrophe (') character. This is essentially
identical to the quote character in HTML, so any code which escaping
one should always be escaping the other.

Lawrence D'Oliveiro

unread,
Sep 24, 2006, 12:49:22 AM9/24/06
to
In message <slrnehbra1.k...@snowy.squish.net>, Jon Ribbens wrote:

> In article <mailman.499.11590355...@python.org>, Fredrik
> Lundh wrote:
>> Lawrence D'Oliveiro wrote:
>>>
>>> So I think the default for the second argument to cgi.escape should be
>>> changed to True. Or alternatively, the second argument should be removed
>>> altogether, and quotes should always be escaped.
>>
>> you're confused: cgi.escape(s) is designed to be used for ordinary text,
>> cgi.escape(s, True) is designed for attributes. if you use the code the
>> way it's intended to be used, it works perfectly fine.
>
> He's not confused, he's correct; the author of cgi.escape is the
> confused one.

Thanks for backing me up. :)

> > One thing that is flat-out wrong, by the way, is that cgi.escape()
> does not encode the apostrophe (') character. This is essentially
> identical to the quote character in HTML, so any code which escaping
> one should always be escaping the other.

I must confess I did a double-take on this. But I rechecked the HTML spec
(HTML 4.0, section 3.2.2, "Attributes"), and you're right--single quotes
ARE allowed as an alternative to double quotes. It's just I've never used
them as quotes. :)

Fredrik Lundh

unread,
Sep 24, 2006, 4:38:36 AM9/24/06
to pytho...@python.org
Lawrence D'Oliveiro wrote:

> What works for attributes also works for ordinary text.

attributes and ordinary text are two different things in HTML and XML.
you're arguing that it's a good idea for *everyone* to bloat down
ordinary text just because you're too lazy to use a piece of code in the
intended way.

</F>

Fredrik Lundh

unread,
Sep 24, 2006, 4:48:54 AM9/24/06
to pytho...@python.org
Jon Ribbens wrote:

> Making cgi.escape always escape the '"' character would not break
> anything, and would probably fix a few bugs in existing code. Yes,
> those bugs are not cgi.escape's fault, but that's no reason not to
> be helpful. It's a minor improvement with no downside.

the "improvement with no downside" would bloat down the output for
everyone who's using the function in the intended way, and will also
break unit tests.

> One thing that is flat-out wrong, by the way, is that cgi.escape()
> does not encode the apostrophe (') character.

it's intentional, of course: you're supposed to use " if you're using
cgi.escape(s, True) to escape attributes. again, punishing people who
actually read the docs and understand them is not a very good way to
maintain software.

btw, you're both missing that cgi.escape isn't good enough for general
use anyway, since it doesn't deal with encodings at all. if you want a
general purpose function that can be used for everything that can be put
in an HTML file, you need more than just a modified cgi.escape. feel
free to propose a general-purpose replacement (which should have a new
name), but make sure you think through *all* the issues before you do that.

</F>

Lawrence D'Oliveiro

unread,
Sep 24, 2006, 6:07:26 AM9/24/06
to
In message <mailman.518.11590877...@python.org>, Fredrik
Lundh wrote:

> Jon Ribbens wrote:
>
>> Making cgi.escape always escape the '"' character would not break
>> anything, and would probably fix a few bugs in existing code. Yes,
>> those bugs are not cgi.escape's fault, but that's no reason not to
>> be helpful. It's a minor improvement with no downside.
>
> the "improvement with no downside" would bloat down the output for
> everyone who's using the function in the intended way, and will also
> break unit tests.

I don't understand this "bloat down" nonsense. Any tests that would break
are obviously testing the wrong thing.

> > One thing that is flat-out wrong, by the way, is that cgi.escape()
> > does not encode the apostrophe (') character.
>
> it's intentional, of course: you're supposed to use " if you're using
> cgi.escape(s, True) to escape attributes.

Attributes can be quoted with either single or double quotes. That's what
the HTML spec says. cgi.escape doesn't correctly allow for that. Ergo,
cgi.escape is broken. QED.

> btw, you're both missing that cgi.escape isn't good enough for general
> use anyway, since it doesn't deal with encodings at all.

Why does it need to?

Fredrik Lundh

unread,
Sep 24, 2006, 6:35:32 AM9/24/06
to pytho...@python.org
Lawrence D'Oliveiro wrote:

> Attributes can be quoted with either single or double quotes. That's what
> the HTML spec says. cgi.escape doesn't correctly allow for that. Ergo,
> cgi.escape is broken. QED.

do you ever think before you post?

</F>

Georg Brandl

unread,
Sep 24, 2006, 6:41:14 AM9/24/06
to
Lawrence D'Oliveiro wrote:
> In message <mailman.518.11590877...@python.org>, Fredrik
> Lundh wrote:
>
>> Jon Ribbens wrote:
>>
>>> Making cgi.escape always escape the '"' character would not break
>>> anything, and would probably fix a few bugs in existing code. Yes,
>>> those bugs are not cgi.escape's fault, but that's no reason not to
>>> be helpful. It's a minor improvement with no downside.
>>
>> the "improvement with no downside" would bloat down the output for
>> everyone who's using the function in the intended way, and will also
>> break unit tests.
>
> I don't understand this "bloat down" nonsense. Any tests that would break
> are obviously testing the wrong thing.

&quot; is 4 characters more than ".

>> > One thing that is flat-out wrong, by the way, is that cgi.escape()
>> > does not encode the apostrophe (') character.
>>
>> it's intentional, of course: you're supposed to use " if you're using
>> cgi.escape(s, True) to escape attributes.
>
> Attributes can be quoted with either single or double quotes. That's what
> the HTML spec says. cgi.escape doesn't correctly allow for that. Ergo,
> cgi.escape is broken. QED.

A function is broken if its implementation doesn't match the documentation.

As a courtesy, I've pasted it below.

escape(s[, quote])
Convert the characters "&", "<" and ">" in string s to HTML-safe sequences.
Use this if you need to display text that might contain such characters in HTML.
If the optional flag quote is true, the quotation mark character (""") is also
translated; this helps for inclusion in an HTML attribute value, as in <A
HREF="...">. If the value to be quoted might include single- or double-quote
characters, or both, consider using the quoteattr() function in the
xml.sax.saxutils module instead.


Now, do you still think cgi.escape is broken?


Georg

Fredrik Lundh

unread,
Sep 24, 2006, 6:53:32 AM9/24/06
to pytho...@python.org
Georg Brandl wrote:

> A function is broken if its implementation doesn't match the documentation.

or if it doesn't match the designer's intent. cgi.escape is old enough
that we would have noticed that, by now...

</F>

Jon Ribbens

unread,
Sep 24, 2006, 8:17:20 PM9/24/06
to
In article <ef5ncc$uus$1...@news.albasani.net>, Georg Brandl wrote:
>> Attributes can be quoted with either single or double quotes. That's what
>> the HTML spec says. cgi.escape doesn't correctly allow for that. Ergo,
>> cgi.escape is broken. QED.
>
> A function is broken if its implementation doesn't match the documentation.

Or if the design, as described in the documentation, is flawed in some
way.

> As a courtesy, I've pasted it below.
>

[...]


>
> Now, do you still think cgi.escape is broken?

Yes.

Jon Ribbens

unread,
Sep 24, 2006, 8:50:23 PM9/24/06
to
In article <mailman.518.11590877...@python.org>, Fredrik Lundh wrote:
>> Making cgi.escape always escape the '"' character would not break
>> anything, and would probably fix a few bugs in existing code. Yes,
>> those bugs are not cgi.escape's fault, but that's no reason not to
>> be helpful. It's a minor improvement with no downside.
>
> the "improvement with no downside" would bloat down the output for
> everyone who's using the function in the intended way,

By a miniscule degree. That is a very weak argument by any standard.

> and will also break unit tests.

Er, so change the unit tests at the same time?

> > One thing that is flat-out wrong, by the way, is that cgi.escape()
> > does not encode the apostrophe (') character.
>
> it's intentional, of course:

I noticed. That doesn't mean it isn't wrong.

> you're supposed to use " if you're using cgi.escape(s, True) to
> escape attributes. again, punishing people who actually read the
> docs and understand them is not a very good way to maintain
> software.

In what way is anyone being "punished"? Deliberately retaining flaws
and misfeatures that can easily be fixed without damaging
backwards-compatibility is not a very good way to maintain software
either.

> btw, you're both missing that cgi.escape isn't good enough for general
> use anyway,

I'm sorry, I didn't realise this was a general thread about any and
all inadequacies of Python's cgi module.

> since it doesn't deal with encodings at all.

Why does it need to? cgi.escape is (or should be) dealing with
character strings, not byte sequences. I must admit,
internationalisation is not my forte, so if there's something
I'm missing here I'd love to hear about it.

By the way, if you could try and put across your proposed arguments as
to why you don't favour this suggested change without the insults and
general rudeness, it would be appreciated.

Lawrence D'Oliveiro

unread,
Sep 25, 2006, 3:04:41 AM9/25/06
to
In message <mailman.524.11590953...@python.org>, Fredrik
Lundh wrote:

_We_ certainly have noticed it.

Fredrik Lundh

unread,
Sep 25, 2006, 8:41:37 AM9/25/06
to pytho...@python.org
Lawrence D'Oliveiro wrote:

>> Georg Brandl wrote:
>>
>>> A function is broken if its implementation doesn't match the
>>> documentation.
>>
>> or if it doesn't match the designer's intent. cgi.escape is old enough
>> that we would have noticed that, by now...
>
> _We_ certainly have noticed it.

you're not the designer, you're just some random guy who thinks that if you
don't understand something at first, it has to be changed, even if it that change
would break things for others. maybe you haven't done software long enough
to understand that software works better if you use it the way it was intended
to be used, but that's no excuse for being stupid.

</F>

Fredrik Lundh

unread,
Sep 25, 2006, 8:43:52 AM9/25/06
to pytho...@python.org
Jon Ribbens wrote:

> Or if the design, as described in the documentation, is flawed in some
> way.

it does exactly what it says, and is perfectly usable as is, if you bother to
use it the way it was intended to be used.

(still waiting for the "jon's enhanced escape" proposal, btw, but I guess it's
easier to piss on others than to actually contribute something useful).

</F>

Fredrik Lundh

unread,
Sep 25, 2006, 9:16:06 AM9/25/06
to pytho...@python.org
Jon Ribbens wrote:

>> since it doesn't deal with encodings at all.
>
> Why does it need to? cgi.escape is (or should be) dealing with
> character strings, not byte sequences. I must admit,
> internationalisation is not my forte, so if there's something
> I'm missing here I'd love to hear about it.

If you're really serious about making things easier to use, shouldn't
you look at the whole picture? HTML documents are byte streams, so
any transformation from internal character data to HTML must take both
escaping and encoding into account. If you and Lawrence have a hard
time remembering how to use the existing cgi.escape function, despite
it's utter simplicity, surely it would make your life even easier if
there was an alternative API that would handle both the easy part
(escaping) and the hard part (encoding) ?

> By the way, if you could try and put across your proposed arguments as
> to why you don't favour this suggested change without the insults and
> general rudeness, it would be appreciated.

I've already explained that, but since you're convinced that your use
case is more important than other use cases, and you don't care about
things like stability and respect for existing users of an API, nor
the cost for others to update their code and unit tests, I don't see
much need to repeat myself. Breaking things just because you think
you can simply isn't the Python way of doing things.

</F>

Jon Ribbens

unread,
Sep 25, 2006, 9:30:52 AM9/25/06
to
In article <mailman.559.11591881...@python.org>, Fredrik Lundh wrote:
> maybe you haven't done software long enough to understand that
> software works better if you use it the way it was intended to be
> used, but that's no excuse for being stupid.

So what's your excuse?

Jon Ribbens

unread,
Sep 25, 2006, 9:46:02 AM9/25/06
to
In article <mailman.563.11591903...@python.org>, Fredrik Lundh wrote:
> If you're really serious about making things easier to use, shouldn't
> you look at the whole picture? HTML documents are byte streams, so
> any transformation from internal character data to HTML must take both
> escaping and encoding into account.

Ever heard of modular programming? I would suggest that you do indeed
take a step back and look at the whole picture - it's the whole
picture that needs to take escaping and encoding into account. There's
nothing to say that cgi.escape should take them both into account in
the one function, and in fact as you yourself have already commented,
good reasons for it not to, in that it would make it excessively
complicated.

> If you and Lawrence have a hard time remembering how to use the
> existing cgi.escape function, despite it's utter simplicity, surely
> it would make your life even easier if there was an alternative API
> that would handle both the easy part (escaping) and the hard part
> (encoding) ?

You seem to be arguing that because, in an ideal world, it would be
better to throw away the 'cgi' module completely and start again, it
is not worth making minor improvements in what we already have.
I would suggest that this is, to put it mildly, not a good argument.

> I've already explained that, but since you're convinced that your use
> case is more important than other use cases, and you don't care about
> things like stability and respect for existing users of an API, nor
> the cost for others to update their code and unit tests, I don't see
> much need to repeat myself.

You are merely compounding your bad manners. All of your above
allegations are outright lies. I am not sure if you are simply not
understanding the simple points I am making, or are deliberately
trying to mislead people for some bizarre reason of your own.

> Breaking things just because you think you can simply isn't the
> Python way of doing things.

Your hyperbole is growing more extravagant. To begin with, you were
claiming that the suggested change would make things (minisculely)
less efficient, now you're claiming it will "break" unspecified
things. What precisely do you think it would "break"?

Duncan Booth

unread,
Sep 25, 2006, 9:50:07 AM9/25/06
to
Jon Ribbens <jon+u...@unequivocal.co.uk> wrote:
>> and will also break unit tests.
>
> Er, so change the unit tests at the same time?

It is generally a principle of Python that new releases maintain backward
compatability. An incompatible change such proposed here would probably
break many tests for a large number of people.

If the change were seen as a good thing, then a backwards compatible change
(e.g. introducing a function with a different name) might be considered,
but if so it should address the whole issue: the current lack of support
for encodings is IMHO a far bigger problem than whether or a quote mark is
escaped.

> Why does it need to? cgi.escape is (or should be) dealing with
> character strings, not byte sequences. I must admit,
> internationalisation is not my forte, so if there's something
> I'm missing here I'd love to hear about it.

If I have a unicode string such as: u'\u201d' (right double quote), then I
want that encoded in my html as '&#8221;' (or &rdquo; but the numeric form
is better). For many purposes I could just encode it in the encoding to be
used for the page, typically latin1 or utf8, but sometimes that isn't
possible e.g. if you don't know the encoding at the point when you produce
the string, or if there is no translation for the character in the desired
encoding. The character reference will work whatever encoding is used for
the page.

There should be a one-stop shop where I can take my unicode text and
convert it into something I can safely insert into a generated html page;
at present I need to call both cgi.escape and s.encode to get the desired
effect.

Jon Ribbens

unread,
Sep 25, 2006, 9:54:16 AM9/25/06
to
In article <mailman.561.11591883...@python.org>, Fredrik Lundh wrote:
> (still waiting for the "jon's enhanced escape" proposal, btw, but I guess it's
> easier to piss on others than to actually contribute something useful).

Well, yes, you certainly seem to be good at the "pissing on others"
part, even if you have to lie to do it. You have had the "enhanced
escape" proposal all along - it was the post which started this
thread! If you are referring to your strawman argument about
encodings, you have yet to show that it's relevant.

If it'll make you any happier, here's the code for the 'cgi.escape'
equivalent that I usually use:

_html_encre = re.compile("[&<>\"'+]")
_html_encodes = { "&": "&amp;", "<": "&lt;", ">": "&gt;", "\"": "&quot;",
"'": "&#39;", "+": "&#43;" }

def html_encode(raw):
return re.sub(_html_encre, lambda m: _html_encodes[m.group(0)], raw)

Max M

unread,
Sep 25, 2006, 10:00:45 AM9/25/06
to
Fredrik Lundh skrev:
> Jon Ribbens wrote:

>> By the way, if you could try and put across your proposed arguments as
>> to why you don't favour this suggested change without the insults and
>> general rudeness, it would be appreciated.
>
> I've already explained that, but since you're convinced that your use
> case is more important than other use cases, and you don't care about
> things like stability and respect for existing users of an API, nor
> the cost for others to update their code and unit tests, I don't see
> much need to repeat myself. Breaking things just because you think
> you can simply isn't the Python way of doing things.


This thread is highly entertaining but perhaps not that productive.


Lawrence is right that the escape method doesn't work the way he expects
it to.

Rewriting a library module simply because a developer is surprised is a
*very* bad idea. It would break just about every web app out there that
uses the escape module and uses testing. Which is probably most of them.
That could mean several man years of wasted time. It also makes the
escaped html harder to read for standard cases.

Frederik is right that doing so is utterly ... well let us call it
&quot;unproductive&quot;. Stupid is such a harsh word ;-)

Whether someone finds the bloat miniscule and thus a small enough change
to warrant the rewrite does not really matter.

Lawrence is free to write a wrapper and use that instead.

my_escape = lambda st: cgi.escape(st, 1)

So. Lawrence is happy, and the escape works as expected. Several man
years has been saved.

Max M

Fredrik Lundh

unread,
Sep 25, 2006, 10:00:35 AM9/25/06
to pytho...@python.org
Jon Ribbens wrote:

> There's nothing to say that cgi.escape should take them both into account
> in the one function

so what exactly are you using cgi.escape for in your code ?

> What precisely do you think it would "break"?

existing code, and existing tests.

</F>

Jon Ribbens

unread,
Sep 25, 2006, 10:05:26 AM9/25/06
to
In article <Xns984996E6BA...@127.0.0.1>, Duncan Booth wrote:
> It is generally a principle of Python that new releases maintain backward
> compatability. An incompatible change such proposed here would probably
> break many tests for a large number of people.

Why is the suggested change incompatible? What code would it break?
I agree that it would be a bad idea if it did indeed break backwards
compatibility - but it doesn't.

> There should be a one-stop shop where I can take my unicode text and
> convert it into something I can safely insert into a generated html page;

I disagree. I think that doing it in one is muddled thinking and
liable to lead to bugs. Why not keep your output as unicode until it
is ready to be output to the browser, and encode it as appropriate
then? Character encoding and character escaping are separate jobs with
separate requirements that are better off handled by separate code.

Jon Ribbens

unread,
Sep 25, 2006, 10:08:23 AM9/25/06
to
In article <mailman.569.11591928...@python.org>, Fredrik Lundh wrote:
>> There's nothing to say that cgi.escape should take them both into account
>> in the one function
>
> so what exactly are you using cgi.escape for in your code ?

To escape characters so that they will be treated as character data
and not control characters in HTML.

>> What precisely do you think it would "break"?
>
> existing code, and existing tests.

I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it *could* break existing code?

Fredrik Lundh

unread,
Sep 25, 2006, 10:20:52 AM9/25/06
to pytho...@python.org
Max M wrote:

> It also makes the escaped html harder to read for standard cases.

and slows things down a bit.

(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that are
obvious for anyone who's looked at the code).

</F>

Georg Brandl

unread,
Sep 25, 2006, 10:24:26 AM9/25/06
to

Is that so hard to see? If cgi.escape replaced "'" with an entity reference,
code that expects it not to do so would break.

Georg

Duncan Booth

unread,
Sep 25, 2006, 10:25:51 AM9/25/06
to
Jon Ribbens <jon+u...@unequivocal.co.uk> wrote:

> In article <Xns984996E6BA...@127.0.0.1>, Duncan Booth
> wrote:
>> It is generally a principle of Python that new releases maintain
>> backward compatability. An incompatible change such proposed here
>> would probably break many tests for a large number of people.
>
> Why is the suggested change incompatible? What code would it break?
> I agree that it would be a bad idea if it did indeed break backwards
> compatibility - but it doesn't.

I guess you've never seen anyone write tests which retrieve some generated
html and compare it against the expected value. If the page contains any
unescaped quotes then this change would break it.

>
>> There should be a one-stop shop where I can take my unicode text and
>> convert it into something I can safely insert into a generated html
>> page;
>
> I disagree. I think that doing it in one is muddled thinking and
> liable to lead to bugs. Why not keep your output as unicode until it
> is ready to be output to the browser, and encode it as appropriate
> then? Character encoding and character escaping are separate jobs with
> separate requirements that are better off handled by separate code.

Sorry, convert into something I can safely insert wasn't meant to imply
encoding: just entity escaping.

To be clear:

I'm talking about encoding certain characters as entity references. It
doesn't matter whether its the character ampersand or right double quote,
they both want to be converted to entities. Same operation.

The resulting string might be a byte string or it might still be unicode:
the point being that the conversion I want is from unescaped to entity
escaped, not from unicode to byte encoded. Right now the only way the
Python library gives me to do the entity escaping properly has a side
effect of encoding the string. I should be able to do the escaping without
having to encode the string at the same time.

Jon Ribbens

unread,
Sep 25, 2006, 10:37:01 AM9/25/06
to
In article <ef8oqr$9pt$1...@news.albasani.net>, Georg Brandl wrote:
>> I'm sorry, that's not good enough. How, precisely, would it break
>> "existing code"? Can you come up with an example, or even an
>> explanation of how it *could* break existing code?
>
> Is that so hard to see? If cgi.escape replaced "'" with an entity reference,
> code that expects it not to do so would break.

Sorry, that's still not good enough. Why would any code expect such a
thing?

Max M

unread,
Sep 25, 2006, 10:48:03 AM9/25/06
to
Jon Ribbens skrev:


Some examples are:

- Possibly any code that tests for string equality in a rendered
html/xml page. Testing is a prefered development tool these days.

- Code that generates cgi.escaped() markup and (rightfully) for some
reason expects the old behaviour to be used.

- 3. party code that parses/scrapes content from cgi.escaped() markup.
(you could even break Java code this way :-s )

Any change in Python that has these consequences will rightfully be
considered a bug. So what you are suggesting is to knowingly introduce a
bug in the standard library!


You are right that the html generated by cgi.escape() would (probably)
have the same visual appearence in the browsers. But that is a *very*
narrow definition of being bug free and not breaking stuff.

If you cannot think of other examples for yourself where your change
would introduce breakage, you are certainly not an experienced enough
programmer to suggest changes in the standard lib!


Max M

Jon Ribbens

unread,
Sep 25, 2006, 10:50:32 AM9/25/06
to
In article <Xns98499CF9DC...@127.0.0.1>, Duncan Booth wrote:
> I guess you've never seen anyone write tests which retrieve some generated
> html and compare it against the expected value. If the page contains any
> unescaped quotes then this change would break it.

You're right - I've never seen anyone do such a thing. It sounds like
a highly dubious and very fragile sort of test to me, of very limited
use.

> I'm talking about encoding certain characters as entity references. It
> doesn't matter whether its the character ampersand or right double quote,
> they both want to be converted to entities. Same operation.

This is that muddled thinking I was talking about. They are *not* the
same operation. You want to encode "<", for example, because it must
always be encoded to prevent it being treated as an HTML control
character. This has nothing to do with character encodings.

You might sometimes want to escape "right double quote" because it may
or may not be available in the character encoding you using to output
to the browser. Yes, this might sometimes seem a bit similar to the
"<" escaping described above, because one of the ways you could avoid
the character encoding issue would be to use numeric entities, but it
is actually a completely separate issue and is none of the business of
cgi.escape.

By your argument, cgi.escape should in fact escape *every single*
character as a numeric entity, and even that wouldn't work properly
since "&", "#", ";" and the digits might not be in their usual
positions in the output encoding.

> Right now the only way the Python library gives me to do the entity
> escaping properly has a side effect of encoding the string. I should
> be able to do the escaping without having to encode the string at
> the same time.

I'm getting lost here - the opposite of what you say above is true.
cgi.escape does the escaping properly (modulo failing to escape
quotes) without encoding.

Max M

unread,
Sep 25, 2006, 10:59:26 AM9/25/06
to
Jon Ribbens skrev:


Oh ... because you cannot see a use case for that *documented*
behaviour, it must certainly be wrong?


This funktion which is correct by current documentation will be broken
by you change.

def hasSomeWord(someword):
import urllib
f = urllib.open('http://www.example.com/cgi_escaped_content')
content = f.read()
f.close()
return '"%s"' % someword in content:

You might think that it is stupid code that should be changed to take
escaped quotes into account. But that is really not your bussines to
decide if the other behaviour is documented and correct.

I find it amazing that you cannot understand this. I will stop replying
in this thread now.

Max M

Jon Ribbens

unread,
Sep 25, 2006, 11:02:16 AM9/25/06
to
In article <4517ec24$0$13947$edfa...@dread15.news.tele.dk>, Max M wrote:
>> I'm sorry, that's not good enough. How, precisely, would it break
>> "existing code"? Can you come up with an example, or even an
>> explanation of how it *could* break existing code?
>
> Some examples are:
>
> - Possibly any code that tests for string equality in a rendered
> html/xml page. Testing is a prefered development tool these days.

Testing is good, but only if done correctly.

> - Code that generates cgi.escaped() markup and (rightfully) for some
> reason expects the old behaviour to be used.

That's begging the question again ("an example of code that would
break is code that would break").

> - 3. party code that parses/scrapes content from cgi.escaped() markup.
> (you could even break Java code this way :-s )

I'm sorry, I don't understand that one. What is "party code"? Code
that is scraping content from web sites already has to cope with
entities etc.

Your comment about Java is a little ironic given that I persuaded the
Java Struts people to make the exact same change we're talking about
here, back in 2002 (even if it did take 11 months) ;-)

> If you cannot think of other examples for yourself where your change
> would introduce breakage, you are certainly not an experienced enough
> programmer to suggest changes in the standard lib!

I'll take my own opinion on that over yours, thanks.

and-g...@doxdesk.com

unread,
Sep 25, 2006, 11:08:43 AM9/25/06
to
Jon Ribbens wrote:

> I'm sorry, that's not good enough. How, precisely, would it break
> "existing code"?

('owdo Mr. Ribbens!)

It's possible there could be software that relies on ' not being
escaped, for example:

# Auto-markup links to O'Reilly, everyone's favourite
# example name with an apostrophe in it
#
URI= 'http://www.oreilly.com/'
html= cgi.escape(text)
html= html.replace('O\'Reilly', '<a href="%s">O\'Reilly</a>' % URI)

Sure this may be rare, but it's what the documentation says, and
changing it may not only fix things but also subtly break things in
ways that are hard to detect.

A similar change to str.encode('unicode-escape') in Python 2.5 caused a
number of similar subtle problems. (In this case the old documentation
was a bit woolly so didn't prescribe the exact older behaviour.)

I'm not saying that the cgi.escape interface is *good*, just that it's
too late to change it.

I personally think the entire function should be deprecated, firstly
because it's insufficient in some corner cases (apostrophes as you
pointed out, and XHTML CDATA), and secondly because it's in the wrong
place: HTML-escaping is nothing to do with the CGI interface. A good
template library should deal with escaping more smoothly and correctly
than cgi.escape. (It may be able to deal with escape-or-not-bother and
character encoding issues automatically, for example.)

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

Jon Ribbens

unread,
Sep 25, 2006, 11:13:30 AM9/25/06
to
In article <4517eecf$0$14036$edfa...@dread15.news.tele.dk>, Max M wrote:
> Oh ... because you cannot see a use case for that *documented*
> behaviour, it must certainly be wrong?

No, but if nobody else can find one either, that's a clue that maybe
it's safe to change.

Here's a point for you - the documentation for cgi.escape says that
the characters "&", "<" and ">" are converted, but not what they are
converted to. Even by your own argument, therefore, code is not
entitled to rely on the output of cgi.escape being any particular
exact string.

> This funktion which is correct by current documentation will be broken
> by you change.
>
> def hasSomeWord(someword):
> import urllib
> f = urllib.open('http://www.example.com/cgi_escaped_content')
> content = f.read()
> f.close()
> return '"%s"' % someword in content:

That function is broken already, no change required.

Duncan Booth

unread,
Sep 25, 2006, 11:35:51 AM9/25/06
to
Jon Ribbens <jon+u...@unequivocal.co.uk> wrote:

It's easy enough to come up with examples which might. For example, I
have doctests which evaluate tal expressions. I don't think I currently
have any which depend on quotes, but I can easily create one (I just
did, and it passes):

>>> print T('''<tal:x tal:content="python:'It\\'s a \\x22tal\\x22 string'" />''')
It's a "tal" string
>>> print T('''<x tal:attributes="title python:'It\\'s a \\x22tal\\x22 string'" />''')
<x title="It's a &quot;tal&quot; string" />

More likely I might output a field value and just happen to have used a quote
in it.

FWIW, in zope tal, the value of tal:content is escaped using the equivalent of
cgi.escape(s, False), and attribute values are escaped using
cgi.escape(s, True).

The function T I use is defined as:

def T(template, **kw):
"""Create and render a page template."""
pt = PageTemplate()
pt.pt_edit(template, 'text/html')
return pt.pt_render(extra_context=kw).strip('\n')

Fredrik Lundh

unread,
Sep 25, 2006, 11:45:34 AM9/25/06
to pytho...@python.org
Jon Ribbens wrote:

> Sorry, that's still not good enough.

that's not up to you to decide, though.

</F>

Jon Ribbens

unread,
Sep 25, 2006, 11:49:52 AM9/25/06
to
In article <1159196923.5...@i42g2000cwa.googlegroups.com>, and-g...@doxdesk.com wrote:
>> I'm sorry, that's not good enough. How, precisely, would it break
>> "existing code"?
>
> ('owdo Mr. Ribbens!)

Good afternoon Mr Glover ;-)

> URI= 'http://www.oreilly.com/'
> html= cgi.escape(text)
> html= html.replace('O\'Reilly', '<a href="%s">O\'Reilly</a>' % URI)
>
> Sure this may be rare, but it's what the documentation says, and
> changing it may not only fix things but also subtly break things in
> ways that are hard to detect.

I'm not sure about "subtly break things", but you're right that the
above code would break. I could argue that it's broken already,
(since it's doing a plain-text search on HTML data) but given
real-world considerations it's reasonable enough that I won't be that
pedantic ;-)

> I personally think the entire function should be deprecated, firstly
> because it's insufficient in some corner cases (apostrophes as you
> pointed out, and XHTML CDATA), and secondly because it's in the wrong
> place: HTML-escaping is nothing to do with the CGI interface. A good
> template library should deal with escaping more smoothly and correctly
> than cgi.escape. (It may be able to deal with escape-or-not-bother and
> character encoding issues automatically, for example.)

I agree that in most situations you should probably be using a
template library, but sometimes a simple CGI-and-manual-HTML system
suffices, and I think (a fixed version of) cgi.escape should exist at
a low level of the web application stack.

Filip Salomonsson

unread,
Sep 25, 2006, 11:51:03 AM9/25/06
to pytho...@python.org
On 25 Sep 2006 15:13:30 GMT, Jon Ribbens <jon+u...@unequivocal.co.uk> wrote:
>
> Here's a point for you - the documentation for cgi.escape says that
> the characters "&", "<" and ">" are converted, but not what they are
> converted to.

If the documentation isn't clear enough, that means the documentation
should be fixed.

It does _not_ mean "you are free to introduce new behavior because
nobody should trust what this function does anyway".
--
filip salomonsson

Jon Ribbens

unread,
Sep 25, 2006, 11:52:22 AM9/25/06
to
In article <mailman.579.11591992...@python.org>, Fredrik Lundh wrote:
>> Sorry, that's still not good enough.
>
> that's not up to you to decide, though.

It's up to me to decide whether or not an argument is good enough to
convince me, thank you very much.

Jon Ribbens

unread,
Sep 25, 2006, 11:54:26 AM9/25/06
to
In article <mailman.580.11591994...@python.org>, Filip Salomonsson wrote:
>> Here's a point for you - the documentation for cgi.escape says that
>> the characters "&", "<" and ">" are converted, but not what they are
>> converted to.
>
> If the documentation isn't clear enough, that means the documentation
> should be fixed.

Incorrect - documentation can and frequently does leave certain
behaviours undefined. This is deliberate and (among other things)
is to allow for the behaviour to change in future versions without
breaking backwards-compatibility.

Fredrik Lundh

unread,
Sep 25, 2006, 12:03:18 PM9/25/06
to pytho...@python.org
Jon Ribbens wrote:

> It's up to me to decide whether or not an argument is good enough to
> convince me, thank you very much.

not if you expect anyone to take anything you say seriously.

</F>

Jon Ribbens

unread,
Sep 25, 2006, 12:17:18 PM9/25/06
to

Now you're just being ridiculous. In this thread you have been rude,
evasive, insulting, vague, hypocritical, and have failed to answer
substantive points in favour of sarcastic and erroneous sniping - I'd
suggest it's you that needs to worry about being taken seriously.

Brian Quinlan

unread,
Sep 25, 2006, 1:11:18 PM9/25/06
to Jon Ribbens, pytho...@python.org

Actually, at least in the context of this mailing list, Fredrik doesn't
have to worry about that at all. Why? Because he is one of the most
prolific contributers to the Python language and libraries and his
contributions have been of consistent high quality.

You, on the other hand, are "just some guy" and people don't have a lot
of incentive to convince you of anything.

I have no opinion on the actual debate though. Just trying to help with
the social analysis :-)

Cheers,
Brian

Jon Ribbens

unread,
Sep 25, 2006, 1:20:41 PM9/25/06
to
In article <mailman.585.11592042...@python.org>, Brian Quinlan wrote:
>> Now you're just being ridiculous. In this thread you have been rude,
>> evasive, insulting, vague, hypocritical, and have failed to answer
>> substantive points in favour of sarcastic and erroneous sniping - I'd
>> suggest it's you that needs to worry about being taken seriously.
>
> Actually, at least in the context of this mailing list, Fredrik doesn't
> have to worry about that at all. Why? Because he is one of the most
> prolific contributers to the Python language and libraries

I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)

Georg Brandl

unread,
Sep 25, 2006, 2:02:58 PM9/25/06
to
Jon Ribbens wrote:
> In article <4517eecf$0$14036$edfa...@dread15.news.tele.dk>, Max M wrote:
>> Oh ... because you cannot see a use case for that *documented*
>> behaviour, it must certainly be wrong?
>
> No, but if nobody else can find one either, that's a clue that maybe
> it's safe to change.
>
> Here's a point for you - the documentation for cgi.escape says that
> the characters "&", "<" and ">" are converted, but not what they are
> converted to.

It says "to HTML-safe sequences". That's reasonably clear without the need
to reproduce the exact replacements for each character.

If anyone doesn't know what is meant by this, he shouldn't really write apps
using the cgi module before doing a basic HTML course.

Or use the source.

Georg

Dan Bishop

unread,
Sep 25, 2006, 2:20:05 PM9/25/06
to
Fredrik Lundh wrote:
> Jon Ribbens wrote:
>
> > Making cgi.escape always escape the '"' character would not break
> > anything, and would probably fix a few bugs in existing code. Yes,
> > those bugs are not cgi.escape's fault, but that's no reason not to
> > be helpful. It's a minor improvement with no downside.
>
> the "improvement with no downside" would bloat down the output for
> everyone who's using the function in the intended way,

"Unless" "your" "CGI" "scripts" "output" "text" "like" "this," "I"
"think" "it's" "absurd" "to" "consider" "the" "bloat" "significant."

Jon Ribbens

unread,
Sep 25, 2006, 7:41:48 PM9/25/06
to
In article <ef95kk$oan$1...@news.albasani.net>, Georg Brandl wrote:
>> Here's a point for you - the documentation for cgi.escape says that
>> the characters "&", "<" and ">" are converted, but not what they are
>> converted to.
>
> It says "to HTML-safe sequences". That's reasonably clear without the need
> to reproduce the exact replacements for each character.
>
> If anyone doesn't know what is meant by this, he shouldn't really write apps
> using the cgi module before doing a basic HTML course.

So would you like to expliain the difference between &#34; and &quot; ,
or do you need to go on a "basic HTML course" first?

Lawrence D'Oliveiro

unread,
Sep 25, 2006, 11:02:18 PM9/25/06
to
In message <Xns984996E6BA...@127.0.0.1>, Duncan Booth wrote:

> If I have a unicode string such as: u'\u201d' (right double quote), then I
> want that encoded in my html as '&#8221;' (or &rdquo; but the numeric form
> is better).

Right-double-quote is not an HTML special, so there's no need to quote it.
I'm only concerned here with characters that have special meanings in HTML
markup.

> There should be a one-stop shop where I can take my unicode text and
> convert it into something I can safely insert into a generated html page;

> at present I need to call both cgi.escape and s.encode to get the desired
> effect.

What you're really asking for is a version of cgi.escape that a) fixes the
bugs discussed in this thread, and b) copes with different encodings while
doing so.

To handle b), you would need to pass it some indication of what the encoding
of the string is. In any case, converting a literal right-double-quote to
&#8221; is not relevant to the purpose of cgi.escape.

Lawrence D'Oliveiro

unread,
Sep 25, 2006, 11:39:54 PM9/25/06
to
In message <4517e10e$0$13929$edfa...@dread15.news.tele.dk>, Max M wrote:

> Lawrence is right that the escape method doesn't work the way he expects
> it to.
>
> Rewriting a library module simply because a developer is surprised is a
> *very* bad idea.

I'm not surprised. Disappointed, yes. Verging on disgust at some comments in
this thread, yes. But "surprised" is what a lot of users of the existing
cgi.escape function are going to be when they discover their code isn't
doing what they thought it was.

> It would break just about every web app out there that
> uses the escape module...

How will it break them? Give an example.

Lawrence D'Oliveiro

unread,
Sep 25, 2006, 11:41:30 PM9/25/06
to
In message <mailman.570.11591941...@python.org>, Fredrik
Lundh wrote:

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

Lawrence D'Oliveiro

unread,
Sep 25, 2006, 11:45:23 PM9/25/06
to
In message <4517ec24$0$13947$edfa...@dread15.news.tele.dk>, Max M wrote:

> Jon Ribbens skrev:
>> In article <mailman.569.11591928...@python.org>, Fredrik
>> Lundh wrote:
>>>> There's nothing to say that cgi.escape should take them both into
>>>> account in the one function
>>> so what exactly are you using cgi.escape for in your code ?
>>
>> To escape characters so that they will be treated as character data
>> and not control characters in HTML.
>>
>>>> What precisely do you think it would "break"?
>>> existing code, and existing tests.
>>
>> I'm sorry, that's not good enough. How, precisely, would it break
>> "existing code"? Can you come up with an example, or even an
>> explanation of how it *could* break existing code?
>
>
> Some examples are:
>
> - Possibly any code that tests for string equality in a rendered
> html/xml page.

You've got to be kidding. Any programmer knows that, to test two strings for
equality, you should do that on a canonical (non-encoded) representation.

> - Code that generates cgi.escaped() markup and (rightfully) for some
> reason expects the old behaviour to be used.

Whenever I use a channel-coding function, I expect the resulting output to
be only fit for feeding into the channel. I do NOT expect to do anything
else with it. Any kind of data manipulation I do, I do BEFORE feeding it
into the output channel, which means BEFORE putting it through the channel
coding.

> - 3. party code that parses/scrapes content from cgi.escaped() markup.
> (you could even break Java code this way :-s )

If that code follows the HTML rules, it will work.

Lawrence D'Oliveiro

unread,
Sep 25, 2006, 11:48:16 PM9/25/06
to
In message <mailman.579.11591992...@python.org>, Fredrik
Lundh wrote:

> In article <ef8oqr$9pt$1...@news.albasani.net>, Georg Brandl wrote:
>>> I'm sorry, that's not good enough. How, precisely, would it break
>>> "existing code"? Can you come up with an example, or even an

>>> explanation of how it could break existing code?


>>
>> Is that so hard to see? If cgi.escape replaced "'" with an entity
>> reference, code that expects it not to do so would break.
>
> Sorry, that's still not good enough. Why would any code expect such a
> thing?
>>

> that's not up to you to decide, though.

Yes it is. An HTML-quoting function converts a string to its HTML-compatible
representation. Since it is now HTML-compatible, any code that tries to
work with it afterwards has got to expect it to be HTML-compatible. Which
means it has to allow for what HTML allows.

Lawrence D'Oliveiro

unread,
Sep 25, 2006, 11:53:34 PM9/25/06
to
In message <mailman.559.11591881...@python.org>, Fredrik
Lundh wrote:

> Lawrence D'Oliveiro wrote:
>
>>> Georg Brandl wrote:
>>>
>>>> A function is broken if its implementation doesn't match the
>>>> documentation.
>>>
>>> or if it doesn't match the designer's intent. cgi.escape is old enough
>>> that we would have noticed that, by now...
>>
>> _We_ certainly have noticed it.
>
> you're not the designer...

I don't have to be. Whoever the designer was, they had not properly thought
through the uses of this function. That's quite obvious already, to anybody
who works with HTML a lot. So the function is broken and needs to be fixed.

If you're worried about changing the semantics of a function that keeps the
same "cgi.escape" name, then fine. We delete the existing function and add
a new, properly-designed one. _That_ will be a wake-up call to all the
users of the existing function to fix their code.

Steven D'Aprano

unread,
Sep 26, 2006, 12:43:24 AM9/26/06
to
On Mon, 25 Sep 2006 16:48:03 +0200, Max M wrote:

> Any change in Python that has these consequences will rightfully be
> considered a bug. So what you are suggesting is to knowingly introduce a
> bug in the standard library!


It isn't like there have never been backwards _in_compatible changes to
the standard library before.

Ten seconds of googling finds
http://www.python.org/download/releases/2.3/highlights/:

int() - this can now return a long when converting a string with many
digits, rather than raising OverflowError. (New in 2.3a2: issues a
FutureWarning when sign-folding an unsigned hex or octal literal.)

Bastion and rexec - these modules are disabled, because they aren't
safe in Python 2.3 (nor in Python 2.2). (New in 2.3a2.)

Hex/oct literals prefixed with a minus sign were handled
inconsistently. This has been fixed in accordance with PEP 237. (New
in 2.3a2.)

Passing a float to C functions expecting an integer now issues a
DeprecationWarning; in the future this will become a TypeError. (New
in 2.3a2.)

None - assignment to variables or attributes named None will now
trigger a warning. In the future, None may become a keyword.

And more, all from one release.

If the behaviour of cgi.escape is "broken", or incomplete, or misleading,
then Python has a great mechanism for introducing incompatible changes
slowly: warnings.

It isn't good enough to say that the function does what it says it does,
if what it does is dangerous and misleading. Artificial example:

def sqr(x):
"""Returns the square of almost all numbers."""
if x != 1: return x**2
else: return -1

The function does exactly what it says, and yet still has badly dangerous
behaviour that risks introducing serious bugs. If people are relying on
unit tests which include specific tests for that behaviour, then the
function and the code needs to be fixed in parallel. That's what the
warnings module is for.

So any arguments about "breaking code" are a red herring: if cgi.escape
does the wrong thing (and that's arguable), and code relies on that
behaviour, then the code is already broken and needs to be fixed in
parallel with the function. So can we accept that:

(1) *if* there is a problem with cgi.escape it needs to be fixed;

(and, dear gods, I would hope that nobody here wants to argue that Python
should make backwards compatibility a higher virtue than correctness!)

(2) it doesn't need to be fixed *immediately* without warning;

(3) but it can be fixed through a gradual process with warning; and

(4) unit tests and code that expect the (presumed) bad behaviour can be
fixed gradually?

Now that we've got that out of the way, can we CALMLY and RATIONALLY
discuss whether cgi.escape is or isn't broken?

Or, more specifically, UNDER WHAT CIRCUMSTANCES it does the wrong thing?

--
Steven D'Aprano

Gabriel G

unread,
Sep 26, 2006, 12:18:57 AM9/26/06
to pytho...@python.org
At Monday 25/9/2006 11:08, Jon Ribbens wrote:

> >> What precisely do you think it would "break"?
> >
> > existing code, and existing tests.
>

>I'm sorry, that's not good enough. How, precisely, would it break
>"existing code"? Can you come up with an example, or even an

>explanation of how it *could* break existing code?

FWIW, a *lot* of unit tests on *my* generated html code would break,
and I imagine a *lot* of other people's code would break too. So
changing the defaults is not a good idea.
But if you want, import this on sitecustomize.py and pretend it said
quote=True:

import cgi
cgi.escape.func_defaults = (True,)
del cgi

Gabriel Genellina
Softlab SRL





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Steve Holden

unread,
Sep 26, 2006, 2:46:33 AM9/26/06
to pytho...@python.org

I generally find that Fredrik's rudeness quotient is satisfactorily
biased towards discouraging ill-informed comment. As far as rudeness
goes, I've found your approach to this discussion to be pretty
obnoxious, and I'm generally know as someone with a high tolerance for
idiotic behaviour.

If your intention was to troll you could not have crafted your
contributions in a better way.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Dan Bishop

unread,
Sep 26, 2006, 2:57:14 AM9/26/06
to

How exactly would you make s = s.replace('"',"&quot;") faster than
*not* doing the replacement?

Duncan Booth

unread,
Sep 26, 2006, 3:00:09 AM9/26/06
to
Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:

>> (cgi.escape(s, True) is slower than cgi.escape(s), for reasons that
>> are obvious for anyone who's looked at the code).
>
> What you're doing is adding to the reasons why the existing cgi.escape
> function is stupidly designed and implemented. The True case is by far
> the most common, so to make that the slow case, as well as being the
> non-default case, is doubly brain-dead.

It is slightly slower because it does more. Both cases are about 15 times
faster than the regular expression implementation someone posted to this
thread yesterday.

Duncan Booth

unread,
Sep 26, 2006, 3:00:10 AM9/26/06
to
Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:

> In message <Xns984996E6BA...@127.0.0.1>, Duncan Booth
> wrote:
>
>> If I have a unicode string such as: u'\u201d' (right double quote),
>> then I want that encoded in my html as '&#8221;' (or &rdquo; but the
>> numeric form is better).
>
> Right-double-quote is not an HTML special, so there's no need to quote
> it. I'm only concerned here with characters that have special meanings
> in HTML markup.

There is no need to quote " or ' either except in particular situations.

Would you care to suggest how you get a right double quote into any iso-
8859-1 encoded web page without quoting it? Even if the page is utf-8
encoded quoting it can be a good idea.

>
>> There should be a one-stop shop where I can take my unicode text and
>> convert it into something I can safely insert into a generated html
>> page; at present I need to call both cgi.escape and s.encode to get
>> the desired effect.
>
> What you're really asking for is a version of cgi.escape that a) fixes
> the bugs discussed in this thread, and b) copes with different
> encodings while doing so.
>
> To handle b), you would need to pass it some indication of what the
> encoding of the string is. In any case, converting a literal
> right-double-quote to &#8221; is not relevant to the purpose of
> cgi.escape.
>

You don't seem to understand about html entity escapes. &#8221; is a valid
way to express right double quote whatever the page encoding. There is no
need to know the encoding of the page in order to escape entities, just
escape anything which can be problematic.

Lawrence D'Oliveiro

unread,
Sep 26, 2006, 3:15:22 AM9/26/06
to
In message <1159253834.5...@m7g2000cwm.googlegroups.com>, Dan
Bishop wrote:

Wrong answer. Correctness comes first, then we worry about efficiency.

Lawrence D'Oliveiro

unread,
Sep 26, 2006, 3:16:24 AM9/26/06
to
In message <mailman.633.11592516...@python.org>, Gabriel G
wrote:

> At Monday 25/9/2006 11:08, Jon Ribbens wrote:
>
>> >> What precisely do you think it would "break"?
>> >
>> > existing code, and existing tests.
>>
>>I'm sorry, that's not good enough. How, precisely, would it break
>>"existing code"? Can you come up with an example, or even an
>>explanation of how it *could* break existing code?
>

> FWIW, a *lot* of unit tests on *my* generated html code would break...

Why did you write your code that way?

Georg Brandl

unread,
Sep 26, 2006, 3:20:12 AM9/26/06