Support full HTML set in (un)escapeHTML

disccomp

unread,

Nov 20, 2009, 3:47:37 PM11/20/09

to Prototype: Core

I was using unescapeHTML when I noticed that it was missing many of
the encoded characters in this string I passed it. I went and checked
the code, it seems that these functions only deal with about 3 out of
hundreds of characters. " ' for example.

I came across a script which handles the named entities, maybe we can
build on it: http://pastie.org/708114

Ideas, thoughts, complaints?

Ngan Pham

unread,

Nov 20, 2009, 4:01:11 PM11/20/09

to prototy...@googlegroups.com

I came across this problem myself. I think prototype should provide some type of API for things like this. In all my apps, I have a prototype_ext.js file that make small modifications/enhancements to prototype. It would be nice to be able to do something like this:

prototype_ext.js:

Prototype.HTMLEntities.add('&blah;', '0x0039');

or

Prototype.HTMLEntities.add({

'&foo;': '0x0039',

'&bar;': '0x0040'

});

Kind of like Rails and its custom inflections.

--
You received this message because you are subscribed to the Google Groups "Prototype: Core" group.
To post to this group, send email to prototy...@googlegroups.com
To unsubscribe from this group, send email to prototype-cor...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/prototype-core?hl=en

Tobie Langel

unread,

Nov 20, 2009, 8:10:24 PM11/20/09

to Prototype: Core

Thanks for your input.

A correct character encoding should be all you really need to handle
such entities.

Best,

Tobie

disccomp

unread,

Nov 21, 2009, 10:05:42 AM11/21/09

to Prototype: Core

Is it possible to manipulate the encoding of a string to convert HTML
entities? I'm trying to display the string in the title attribute of
an anchor. I'm using PJSON to get rss feeds from google's feed API, so
I have no ability to preprocess the data.

T.J. Crowder

unread,

Nov 21, 2009, 5:28:43 PM11/21/09

to Prototype: Core

Tobie,

> A correct character encoding should be all you really need to handle
> such entities.

That's rather flip, don't you think? How does character encoding
choice solve " or  ?

These entities are valid HTML, regardless of character encoding. There
are sometimes very good reasons for using them. It's fine to say
String#unescapeHTML won't handle these because it's too big a problem
and document that, but let's not just dismiss it like the person
asking the question is being dumb, which is how the above comes
across.

-- T.J.

Tobie Langel

unread,

Nov 21, 2009, 6:53:49 PM11/21/09

to Prototype: Core

Good point (and sorry if the tone of my earlier post came out wrong,
that wasn't my intention).

There's indeed a number of entities which are part of the HTL 4.01 spec
[1].

It's legitimate to want to be able to convert those, notably when
dealing with legacy or external content.

However, given the sheer size of the code compared its usage (now that
utf-8 is ubiquitous), I don't think this belongs in Prototype core.

Would be welcomed as a plugin, though.

Best,

Tobie

[1] http://www.w3.org/TR/html401/sgml/entities.html

disccomp

unread,

Nov 23, 2009, 7:37:00 PM11/23/09

to Prototype: Core

How about this sweet little scripty I found[1], call it a textarea
hack:

function html_entity_decode(str) {
var ta=document.createElement("textarea");
ta.innerHTML=str.replace(/</g,"<").replace(/>/g,">");
return ta.value;
}

[1] http://javascript.internet.com/snippets/convert-html-entities.html#

Tobie Langel

unread,

Nov 24, 2009, 5:54:27 AM11/24/09

to Prototype: Core

We previously used different variants of that and moved away from it.

It's slow and full of inconsistencies across browsers.

Reply all

Reply to author

Forward