XML - HTML Entities

25 views
Skip to first unread message

חגי רצבי

unread,
Jul 15, 2020, 8:32:35 AM7/15/20
to akomantoso-xml
Hi Everyone!

Do you write to your AKN XML files XML-HTML Entities, or you use the original regular signs?

On the one hand the law should be kept as it is written in source.
But on the other hand there are entities that cannot be written directly because it makes the XML invalid, like greater than sign  > in the middle of sentence , etc.

For example:

This as written in source:
<p>Dryopteris pallida (Borry) C.Chr. ex Maire & Petitm.</p>

Or This:
<p>Dryopteris pallida (Borry) C.Chr. ex Maire &amp; Petitm.</p>

if you use xml-html entities, how use declare it in your xml? 

Best regardes,
Hagay/

Fabio Vitali

unread,
Jul 15, 2020, 9:31:48 AM7/15/20
to akomant...@googlegroups.com
Dear Hagay,

XML declares, and prohibits further declarations of, exactly 5 entities:

* < (&lt;)
* > (&gt;)
* & (&amp;)
* ' (&apos;)
* " (&quot;)

As such, you need not declare them in any form. In fact, it is an XML error to declare them.

All other entities must be declared in a DTD fragment, if you really need to use them.

E.g.:

> <?xml version="1.0" ?>
> <!DOCTYPE akomaNtoso [
> <!ENTITY nbsp "&#160;">
> ]>

Yet, if the only use of entities is to define mnemonics for characters, I suggest to use UTF-8 and just use the real characters.

Ciao

Fabio
> --
> You received this message because you are subscribed to the Google Groups "akomantoso-xml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xm...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/akomantoso-xml/ee5e3d80-9eab-4772-9987-7bce7a577b52o%40googlegroups.com.




--

Fabio Vitali The sage and the fool
Dept. of Informatics go to their graves
Univ. of Bologna ITALY alike in this respect:
phone: +39 051 2094872 both believe the sage to be a fool.
e-mail: fa...@cs.unibo.it Where, then, may wisdom be found?
http://vitali.web.cs.unibo.it/ Qi, "Neither Yes nor No", The codeless code

חגי רצבי

unread,
Jul 15, 2020, 12:41:23 PM7/15/20
to akomantoso-xml
Thanks Fabio!

Your recommendation to use a just UTF-8 real characters.
But I don't understand, I can't just put in the middle of a element value &, it's not valid XML, no? 
The AKN XML need to be valid, right?

Thanks again!

בתאריך יום רביעי, 15 ביולי 2020 בשעה 16:31:48 UTC+3, מאת fvitali:
> To unsubscribe from this group and stop receiving emails from it, send an email to akomant...@googlegroups.com

Ashok Hariharan

unread,
Jul 15, 2020, 1:47:55 PM7/15/20
to akomant...@googlegroups.com
‪On Wed, Jul 15, 2020 at 10:11 PM ‫חגי רצבי‬‎ <hr4...@gmail.com> wrote:‬
Thanks Fabio!

Your recommendation to use a just UTF-8 real characters.
But I don't understand, I can't just put in the middle of a element value &, it's not valid XML, no? 
The AKN XML need to be valid, right?

You can't put "&" as is because "&" itself is an escape character in xml i.e as shown:

* < (&lt;) 
* > (&gt;) 
* & (&amp;) 

e.g you want to show other utf-8 characters:
A B Š Ñ
You can use those directly without having to use the escaped form in a UTF-8 xml

but the moment you want to use "&",  ">",   "<" literally you have to use it in the escaped form:
A B & Š Ñ

in the xml:
 
<xml>A B &amp; Š Ñ</xml>

The  above is an xml encoding, and not the law as written in source, the rendered form would instead be the law as written at source and that is what matters:

A B & Š Ñ
 
Even a docx word document with a series of "& & &" will encode it like this:

<w:r><w:t>&amp; &amp; &amp;</w:t></w:r>

rgds

Ashok Hariharan


To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/akomantoso-xml/4781e757-7d1b-479c-9d0e-511af2b22496o%40googlegroups.com.

חגי רצבי

unread,
Jul 16, 2020, 4:20:24 AM7/16/20
to akomantoso-xml
Ok friends, now I finally understand it!!

Fabio  & Ashok,
Thanks a lot!

בתאריך יום רביעי, 15 ביולי 2020 בשעה 20:47:55 UTC+3, מאת Ashok Hariharan:


Reply all
Reply to author
Forward
0 new messages