Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Function to convert > < to &gt; &lt; etc. ???

1,304 views
Skip to first unread message

Tjipke A. van der Plaats

unread,
Jul 31, 2001, 5:57:09 AM7/31/01
to
[Considering this applies as well to websnap as to xml posted to both
groups]

Does anyone know if there is a function somewhere in Delphi that converts
strings with special symbols like "&", ">" and "<" to "&amp;", "&gt;" and
"&lt;" to use in html/xml code?

(Couldn't find it anywhere in the source & help of delphi)

Thanks,

Tjipke van der Plaats

www.tiriss.com


Rolf Frei

unread,
Jul 31, 2001, 8:59:24 AM7/31/01
to
Why do you not use StringReplace?

s := StringReplace(s, '>', '&gt;', [rfReplaceAll]);

Bye Rolf

"Tjipke A. van der Plaats" <in...@tiriss.com> schrieb im Newsbeitrag news:3b668121_1@dnews...

Rune Moberg

unread,
Jul 31, 2001, 9:27:53 AM7/31/01
to
"Rolf Frei" <ro...@hms.ch> wrote in message news:3b66ab8c_2@dnews...

> Why do you not use StringReplace?
>
> s := StringReplace(s, '>', '&gt;', [rfReplaceAll]);

It's not very efficient, now is it?

You'll have to replace atleast '<', '>', '&' to mention three literals.
That's three string scans! I believe you should also replace '"' (&quot;)
just to be on the safe side. Luckily, you shouldn't have to worry about
ascii characters > 128 assuming your web server correctly identified the
character set used.

--
Rune

Tjipke A. van der Plaats

unread,
Jul 31, 2001, 10:06:06 AM7/31/01
to
[Conversation moved to borland.public.delphi.internet.websnap]

> > Why do you not use StringReplace?
> >
> > s := StringReplace(s, '>', '&gt;', [rfReplaceAll]);
>
> It's not very efficient, now is it?
>
> You'll have to replace atleast '<', '>', '&' to mention three literals.
> That's three string scans! I believe you should also replace '"' (&quot;)
> just to be on the safe side.

Exactly what I thought!

I allready made my own function, but it is such a general functionality, I
thought that it could/should be somewhere in websnap. But it looks like
websnap is also using stringreplace sometimes...

Just to let you know, my function is now as follows:

function EncodeForXML(const aString: string): string;
var
i: Integer;
begin
Result := '';
for i := 1 to Length(aString) do
begin
case aString[i] of
'&': Result := Result + '&amp;';
'>': Result := Result + '&gt;';
'<': Result := Result + '&lt;';
'"': Result := Result + '&quot;'
else
Result := Result + aString[i]
end;
end;
end;

I know that this 'appending one char at a time' is not very efficient, but
good enough for my case.


Regards,

Danny Heijl

unread,
Jul 31, 2001, 3:35:25 PM7/31/01
to
http://codecentral.borland.com/codecentral/ccweb.exe/listing?id=16272

The HTMLEncode and URLencode functions might be what you are looking for.

URLEncode escapes all "unsafe" characters for use in URL Query parameters.
HTMLEncode handles &<>" and all characters > 127 for use in HTML.
Should be more efficient then StringReplace.

Danny
---

"Tjipke A. van der Plaats" <in...@tiriss.com> schreef in bericht
news:3b668121_1@dnews...

Tjipke A. van der Plaats

unread,
Aug 2, 2001, 3:52:26 AM8/2/01
to
> http://codecentral.borland.com/codecentral/ccweb.exe/listing?id=16272
>
> The HTMLEncode and URLencode functions might be what you are looking for.
>
> URLEncode escapes all "unsafe" characters for use in URL Query parameters.
> HTMLEncode handles &<>" and all characters > 127 for use in HTML.

Thanks!

But I see it is: Provided as freeware under the GPL licence...

Doesn't GPL mean that if I use it my own source also becomes GPL? Then I
can't use the function in commercial software, can I?
I've written my own allready, but yours also handles #160 and higher, mine
doesn't yet. However wouldn't converting #160 to &#160; in stead of to
&nbsp; also do the trick?

I also think that this function should be somewhere in the internet stuff
delivered with Delphi (and should have a name without HTML because I also
use it for XML ;-)

Regards,

Tjipke

John Kaster (Borland)

unread,
Aug 3, 2001, 12:54:48 AM8/3/01
to
"Tjipke A. van der Plaats" wrote:
> Does anyone know if there is a function somewhere in Delphi that converts
> strings with special symbols like "&", ">" and "<" to "&amp;", "&gt;" and
> "&lt;" to use in html/xml code?

Sounds like a good candidate for a derivative of HTTPEncode.

However, here's one I wrote for YAPP, along with notes:

{
<!ENTITY nbsp CDATA "&#160;" -- no-break space -->
<!ENTITY iexcl CDATA "&#161;" -- inverted exclamation mark -->
<!ENTITY cent CDATA "&#162;" -- cent sign -->
<!ENTITY pound CDATA "&#163;" -- pound sterling sign -->
<!ENTITY curren CDATA "&#164;" -- general currency sign -->
<!ENTITY yen CDATA "&#165;" -- yen sign -->
<!ENTITY brvbar CDATA "&#166;" -- broken (vertical) bar -->
<!ENTITY sect CDATA "&#167;" -- section sign -->
<!ENTITY uml CDATA "&#168;" -- umlaut (dieresis) -->
<!ENTITY copy CDATA "&#169;" -- copyright sign -->
<!ENTITY ordf CDATA "&#170;" -- ordinal indicator, feminine -->
<!ENTITY laquo CDATA "&#171;" -- angle quotation mark, left -->
<!ENTITY not CDATA "&#172;" -- not sign -->
<!ENTITY shy CDATA "&#173;" -- soft hyphen -->
<!ENTITY reg CDATA "&#174;" -- registered sign -->
<!ENTITY macr CDATA "&#175;" -- macron -->
<!ENTITY deg CDATA "&#176;" -- degree sign -->
<!ENTITY plusmn CDATA "&#177;" -- plus-or-minus sign -->
<!ENTITY sup2 CDATA "&#178;" -- superscript two -->
<!ENTITY sup3 CDATA "&#179;" -- superscript three -->
<!ENTITY acute CDATA "&#180;" -- acute accent -->
<!ENTITY micro CDATA "&#181;" -- micro sign -->
<!ENTITY para CDATA "&#182;" -- pilcrow (paragraph sign) -->
<!ENTITY middot CDATA "&#183;" -- middle dot -->
<!ENTITY cedil CDATA "&#184;" -- cedilla -->
<!ENTITY sup1 CDATA "&#185;" -- superscript one -->
<!ENTITY ordm CDATA "&#186;" -- ordinal indicator, masculine -->
<!ENTITY raquo CDATA "&#187;" -- angle quotation mark, right -->
<!ENTITY frac14 CDATA "&#188;" -- fraction one-quarter -->
<!ENTITY frac12 CDATA "&#189;" -- fraction one-half -->
<!ENTITY frac34 CDATA "&#190;" -- fraction three-quarters -->
<!ENTITY iquest CDATA "&#191;" -- inverted question mark -->
<!ENTITY Agrave CDATA "&#192;" -- capital A, grave accent -->
<!ENTITY Aacute CDATA "&#193;" -- capital A, acute accent -->
<!ENTITY Acirc CDATA "&#194;" -- capital A, circumflex accent -->
<!ENTITY Atilde CDATA "&#195;" -- capital A, tilde -->
<!ENTITY Auml CDATA "&#196;" -- capital A, dieresis or umlaut mark
-->
<!ENTITY Aring CDATA "&#197;" -- capital A, ring -->
<!ENTITY AElig CDATA "&#198;" -- capital AE diphthong (ligature)
-->
<!ENTITY Ccedil CDATA "&#199;" -- capital C, cedilla -->
<!ENTITY Egrave CDATA "&#200;" -- capital E, grave accent -->
<!ENTITY Eacute CDATA "&#201;" -- capital E, acute accent -->
<!ENTITY Ecirc CDATA "&#202;" -- capital E, circumflex accent -->
<!ENTITY Euml CDATA "&#203;" -- capital E, dieresis or umlaut mark
-->
<!ENTITY Igrave CDATA "&#204;" -- capital I, grave accent -->
<!ENTITY Iacute CDATA "&#205;" -- capital I, acute accent -->
<!ENTITY Icirc CDATA "&#206;" -- capital I, circumflex accent -->
<!ENTITY Iuml CDATA "&#207;" -- capital I, dieresis or umlaut mark
-->
<!ENTITY ETH CDATA "&#208;" -- capital Eth, Icelandic -->
<!ENTITY Ntilde CDATA "&#209;" -- capital N, tilde -->
<!ENTITY Ograve CDATA "&#210;" -- capital O, grave accent -->
<!ENTITY Oacute CDATA "&#211;" -- capital O, acute accent -->
<!ENTITY Ocirc CDATA "&#212;" -- capital O, circumflex accent -->
<!ENTITY Otilde CDATA "&#213;" -- capital O, tilde -->
<!ENTITY Ouml CDATA "&#214;" -- capital O, dieresis or umlaut mark
-->
<!ENTITY times CDATA "&#215;" -- multiply sign -->
<!ENTITY Oslash CDATA "&#216;" -- capital O, slash -->
<!ENTITY Ugrave CDATA "&#217;" -- capital U, grave accent -->
<!ENTITY Uacute CDATA "&#218;" -- capital U, acute accent -->
<!ENTITY Ucirc CDATA "&#219;" -- capital U, circumflex accent -->
<!ENTITY Uuml CDATA "&#220;" -- capital U, dieresis or umlaut mark
-->
<!ENTITY Yacute CDATA "&#221;" -- capital Y, acute accent -->
<!ENTITY THORN CDATA "&#222;" -- capital THORN, Icelandic -->
<!ENTITY szlig CDATA "&#223;" -- small sharp s, German (sz
ligature) -->
<!ENTITY agrave CDATA "&#224;" -- small a, grave accent -->
<!ENTITY aacute CDATA "&#225;" -- small a, acute accent -->
<!ENTITY acirc CDATA "&#226;" -- small a, circumflex accent -->
<!ENTITY atilde CDATA "&#227;" -- small a, tilde -->
<!ENTITY auml CDATA "&#228;" -- small a, dieresis or umlaut mark
-->
<!ENTITY aring CDATA "&#229;" -- small a, ring -->
<!ENTITY aelig CDATA "&#230;" -- small ae diphthong (ligature) -->
<!ENTITY ccedil CDATA "&#231;" -- small c, cedilla -->
<!ENTITY egrave CDATA "&#232;" -- small e, grave accent -->
<!ENTITY eacute CDATA "&#233;" -- small e, acute accent -->
<!ENTITY ecirc CDATA "&#234;" -- small e, circumflex accent -->
<!ENTITY euml CDATA "&#235;" -- small e, dieresis or umlaut mark
-->
<!ENTITY igrave CDATA "&#236;" -- small i, grave accent -->
<!ENTITY iacute CDATA "&#237;" -- small i, acute accent -->
<!ENTITY icirc CDATA "&#238;" -- small i, circumflex accent -->
<!ENTITY iuml CDATA "&#239;" -- small i, dieresis or umlaut mark
-->
<!ENTITY eth CDATA "&#240;" -- small eth, Icelandic -->
<!ENTITY ntilde CDATA "&#241;" -- small n, tilde -->
<!ENTITY ograve CDATA "&#242;" -- small o, grave accent -->
<!ENTITY oacute CDATA "&#243;" -- small o, acute accent -->
<!ENTITY ocirc CDATA "&#244;" -- small o, circumflex accent -->
<!ENTITY otilde CDATA "&#245;" -- small o, tilde -->
<!ENTITY ouml CDATA "&#246;" -- small o, dieresis or umlaut mark
-->
<!ENTITY divide CDATA "&#247;" -- divide sign -->
<!ENTITY oslash CDATA "&#248;" -- small o, slash -->
<!ENTITY ugrave CDATA "&#249;" -- small u, grave accent -->
<!ENTITY uacute CDATA "&#250;" -- small u, acute accent -->
<!ENTITY ucirc CDATA "&#251;" -- small u, circumflex accent -->
<!ENTITY uuml CDATA "&#252;" -- small u, dieresis or umlaut mark
-->
<!ENTITY yacute CDATA "&#253;" -- small y, acute accent -->
<!ENTITY thorn CDATA "&#254;" -- small thorn, Icelandic -->
<!ENTITY yuml CDATA "&#255;" -- small y, dieresis or umlaut mark
-->
}
function THTML.Escape(const sText: string): string;
var
i, l : integer;
begin
l := Length( sText );
Result := '';
for i := 1 to l do
case sText[ i ] of
'<' : Result := Result + '&lt;';
'>' : Result := Result + '&gt;';


'&' : Result := Result + '&amp;';

'"' : Result := Result + '&quot;';
#92,
#160 .. #255 : Result := Result + '&#' + IntToStr( Ord( sText[ i ] )
);
else
Result := Result + sText[ i ]
end; { case }
end; { THTML.Escape() }


Adding one char at a time isn't as inefficient as you might think, if I
remember how strings are actually handled in Delphi.

--
John Kaster, Borland Developer Relations, http://community.borland.com
$1150/$50K: Thanks to my donors!
http://homepages.borland.com/jkaster/tnt/thanks.html
Buy Kylix! http://www.borland.com/kylix * Got source?
http://codecentral.borland.com
The #1 Java IDE: http://www.borland.com/jbuilder


Danny Heijl

unread,
Aug 3, 2001, 4:55:24 PM8/3/01
to
As far as I am concerned you can use the code as you like.

I will remove the reference to the GPL. It is too restricive.

Danny
---

Rune Moberg

unread,
Aug 4, 2001, 6:09:50 AM8/4/01
to
"John Kaster (Borland)" wrote:
> #160 .. #255 : Result := Result + '&#' + IntToStr( Ord( sText[ i ] )

IMO that's not solving much. As long as you specify the character set in
your header, you shouldn't need to encode your 8 bit characters. (I've
never seen a browser strip out the high bit?) If you haven't set the
right character set, then encoding 'Æ' as anything else besides &AElig;
might prove futile (if the other party has something else as default
than ISO-8859-1).

I don't know what the XML standard has to say about 8-bit characters
though. (It's hard to be authorative with these crossposts! <g>)

--
Rune

Danny Heijl

unread,
Aug 4, 2001, 9:07:01 AM8/4/01
to
> Adding one char at a time isn't as inefficient as you might think, if I
> remember how strings are actually handled in Delphi.

This would depend on how you define "inefficient".
IntToStr calls FmtBuf calls NewAnsistring calls GetMem for every character.
And the concatenation calls LStrCatn calls Realloc etc...

Appending characters to a buffer with pointer manipulation only needs just a
couple of assembler instructions with no calls at all. The Delphi compiler
generates *very* good code for this kind of loop.

Long reference-counted strings made string manipulation extremely easy, but
they have their price, especially when concatenating in loops.

Sorry if I sound pedantic, but one of the reasons I like Delphi so much is
the speed I can get out of it when speed is what I need, and the ease of
programming it gives me when speed is not so important.

Danny
---


John Kaster (Borland)

unread,
Aug 4, 2001, 12:18:07 PM8/4/01
to
Danny Heijl wrote:
> Sorry if I sound pedantic, but one of the reasons I like Delphi so much is
> the speed I can get out of it when speed is what I need, and the ease of
> programming it gives me when speed is not so important.

Sorry if I sound pragmatic, but try the code.

Danny Heijl

unread,
Aug 5, 2001, 9:39:35 AM8/5/01
to

"John Kaster (Borland)" <jka...@borland.com> schreef in bericht
news:3B6C203F...@borland.com...

>
> Sorry if I sound pragmatic, but try the code.
>

I did, (code and testdata below).

The result with D6: my HTMLEncode routine is consistently 150 times faster
than your code on my PIII 800.

This might not be relevant to you, but it may make a difference for a loaded
webserver.

Danny
---

**********************************
Test unit ( a form with Button1, Label1 and Label2) :
**********************************
uses webutil, stringconcatpatch;

procedure TForm1.Button1Click(Sender: TObject);
var
sText: string;
sResult: string;
i, j: integer;
m: TMemoryStream;
ts, te: TDateTime;

begin
Button1.Enabled := False;
try
m := TMemoryStream.Create;
try
m.LoadFromFile('file1.txt');
SetString(sText, PChar(m.Memory), m.Size);
finally
m.Free;
end;
ts := Now;
for j := 1 to 1000 do begin
sResult := '';
for i := 1 to Length(sText) do
sResult := sResult + '&#' + IntToStr(Ord(sText[i]));
end;
te := Now;
Label1.Caption := Format('%3.9g', [(te -ts ) * (24 * 3600)]);
ts := Now;
for j := 1 to 1000 do begin
sResult := HTMLEncode(sText);
end;
te := Now;
Label2.Caption := Format('%3.9g', [(te -ts ) * (24 * 3600)]);
finally
Button1.Enabled := True;
end;
end;
**********************************
The contents of file1.txt :
**********************************
Dès le premier janvier 2002, lors du passage à l'Euro, les taxes prélevées
par
la Région wallonne augmenteront. Leur taux sera arrondi à la hausse en Euro.

Cette décision a été votée par le Parlement wallon au cours de sa dernière
séance avant les vacances. Pour la Région wallonne, il s'agit d'indexer ses
taxes en profitant de la conversion à l'Euro. Mais, précise le ministre
wallon
Charles Michel, qui a la tutelle sur les Communes, il n'est pas question que
les Communes wallonnes augmentent les taxes sur leurs habitants, quand on
passera à l'Euro.


Les Communes ne peuvent pas, elles, profiter du passage à l'Euro pour
augmenter
leurs taxes, ce sont les taxes régionales, c'est la Région wallonne,
l'Institution régionale wallonne qui va, elle, profiter du passage à l'Euro
pour permettre l'augmentation infime de certaines taxes.

Par contre, les Communes, elles, ne peuvent absolument pas profiter du
passage
à l'Euro pour augmenter leurs taxes. Les choses sont extrêmement claires
pour les Communes. Pour les Communes, on engage la règle fiscale depuis
1998,
les Communes sont forcées de respecter les normes qui sont fixées par la
Région
wallonne et qui empêchent les Communes de taxer trop les contribuables
wallons.

Rien n'est changé par rapport à ça. Par conséquent, au sein des Communes, il
n'y a pas le moindre changement dans le cadre du passage à l'Euro. Par
exemple,
le précompte immobilier relève des Communes. Et donc il ne fait aucun doute
que
l'on n'y touche pas. Idem pour le précompte professionnel.
**********************************

Danny Heijl

unread,
Aug 5, 2001, 9:51:42 AM8/5/01
to

"Danny Heijl" <danny...@pandora.be> schreef in bericht
news:3b6d4c96_1@dnews...

Sorry, it is not 150 times, but only 31 times.
I initially isolated the slowest line of your code when looking at string
concatenation in the CPU window and did the speed comparison with that line.
Corrected version that uses your function below.
31 times is still a lot though.

Danny
---


>
> The result with D6: my HTMLEncode routine is consistently 150 times faster
> than your code on my PIII 800.

> Danny
> ---
>
> **********************************
> Test unit ( a form with Button1, Label1 and Label2) :
> **********************************

{$R *.dfm}

uses webutil, stringconcatpatch;

function HTMLEscape(const sText: string): string;


var
i, l : integer;
begin
l := Length( sText );

Result := '';


for i := 1 to l do
case sText[ i ] of
'<' : Result := Result + '&lt;';
'>' : Result := Result + '&gt;';
'&' : Result := Result + '&amp;';
'"' : Result := Result + '&quot;';
#92,
#160 .. #255 : Result := Result + '&#' + IntToStr( Ord( sText[ i ] )
);
else
Result := Result + sText[ i ]
end; { case }
end; { THTML.Escape() }

procedure TForm1.Button1Click(Sender: TObject);


var
sText: string;
sResult: string;
i, j: integer;
m: TMemoryStream;
ts, te: TDateTime;

begin
Button1.Enabled := False;
try
m := TMemoryStream.Create;
try
m.LoadFromFile('file1.txt');
SetString(sText, PChar(m.Memory), m.Size);
finally
m.Free;
end;
ts := Now;
for j := 1 to 1000 do begin

sResult := HTMLEscape(sText);

John Kaster (Borland)

unread,
Aug 8, 2001, 3:37:00 AM8/8/01
to
Danny Heijl wrote:
> 31 times is still a lot though.

Agreed. Is this the string concatenation issue, or the concern you had
before, with:

"IntToStr calls FmtBuf calls NewAnsistring calls GetMem for every
character."

I was talking about the string concatenation, not the IntToStr call per
se. Have you posted your HTMLEncode routine? Would be interesting to see
how my routine performs if length were estimated in advance, then
resized at the end. I've used that on other string output routines that
had to be performant. For this one (YAPP) document generation
performance wasn't something I was too concerned about yet, since it
wasn't really for a production system.

Danny Heijl

unread,
Aug 11, 2001, 5:19:40 AM8/11/01
to
> Have you posted your HTMLEncode routine?

http://codecentral.borland.com/codecentral/ccweb.exe/listing?id=16272

Danny
---


0 new messages