Google Groups Home
Help | Sign in
Help: 'ELEMENT' definition
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
loiterer2  
View profile
 More options Apr 25, 5:09 pm
Newsgroups: comp.text.sgml
From: loiterer2 <adem.m...@gmail.com>
Date: Fri, 25 Apr 2008 14:09:57 -0700 (PDT)
Local: Fri, Apr 25 2008 5:09 pm
Subject: Help: 'ELEMENT' definition
Hi,

I would like to --well, at least, I am hoping I can-- do 2 things:

First, write a DTD parser. Then, concuct a DTD parser FAQ from a hands-
on point of view.

I have written a number of parsers, but they were all easy. SGML as in
used in DTDs is proving to be much harder.

The fact that there's hardly any usable information on the Web does
not make things any easier. By 'usable', I mean stuff you can lookup
and it tells you what is what in a language you can understand. From
this POV, books I have seen (not many) have been way over my head.

All I need is simle, example-oriented explanations. And, no, 'read the
code stupid' does not help either.

So, I decided to ask here --hoping people would help creating such
documentation.

While still hoping, I'll list a few examples for 'ENTITY' definition:

<!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT" -- repeatable
head elements -->
<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!ENTITY % HTML.Frameset "IGNORE">
<!ENTITY % list "UL | OL">
<!ENTITY % MediaDesc "CDATA" -- single or comma-separated list of
media descriptors -->
<!ENTITY % preformatted "PRE">

Now, it's obvious that what follows '<!' is what we are defining. In
this case an 'ENTITY'.

Then.. we have a '%' sign...

I am assuming that it tells us that we are about to find a name
string.

I don't remember seeing any 'ENTITY' definitions that did not have '%'
as the next non-whitespace char.
So, I am assuming that '%' must be present.
Is that a correct assumotion?
If not, what else can there be, and what do they mean?

After ''%'' char, next, we have a piece of non-whitespace string.

I am assuming it means 'name' of the 'ENTITY' we are defining.

Is that assumption correct, could there be something else meaning
something else.
And, is it case-sensitive --I believe it isn't but I might as well
have it confirmed.

Then, we have all sorts of goobledygook..

I am assuming these to be the value(s) that ENTITY can have.

I don't have much problem with those that are explicetly listed, but
what does "IGNORE", "CDATA" mean?

What other stuff can be there apart from "IGNORE", "CDATA", and what
do they mean?

Could you help clarify these please.

[I'll come back with others :) ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Flynn  
View profile
 More options Apr 25, 7:32 pm
Newsgroups: comp.text.sgml
From: Peter Flynn <peter.n...@m.silmaril.ie>
Date: Sat, 26 Apr 2008 00:32:10 +0100
Local: Fri, Apr 25 2008 7:32 pm
Subject: Re: Help: 'ELEMENT' definition

loiterer2 wrote:
> Hi,

> I would like to --well, at least, I am hoping I can-- do 2 things:

> First, write a DTD parser. Then, concuct a DTD parser FAQ from a hands-
> on point of view.

> I have written a number of parsers, but they were all easy. SGML as in
> used in DTDs is proving to be much harder.

That is called Declaration Syntax (as opposed to Document Syntax).
It's not hard, per se, just different.

> The fact that there's hardly any usable information on the Web does
> not make things any easier. By 'usable', I mean stuff you can lookup
> and it tells you what is what in a language you can understand. From
> this POV, books I have seen (not many) have been way over my head.

ISO 8859 (the standard document) is a commercial product of the ISO.
You have to buy it, or buy Goldfarb's _SGML Handbook_.

> All I need is simle, example-oriented explanations. And, no, 'read the
> code stupid' does not help either.

The best guide to writing DTDs is "SGML DTDs" by Maler and El Andaloussi.

Yes, it's called the MDO (Markup Declaration Open).

> Then.. we have a '%' sign...

The Parameter Entity Reference Open (pero).

> I am assuming that it tells us that we are about to find a name
> string.

Not quite. It defines that the name being declared is a PE (Parameter
Entity -- one that can be used only in replacements in the DTD) as
opposed to a General Entity (which is used in the actual document).

> I don't remember seeing any 'ENTITY' definitions that did not have '%'
> as the next non-whitespace char.

That's because the only ones you have seen are PEs. Here are some
General Entities:

<!ENTITY IBM CDATA "International Business Machines">
<!ENTITY foobar SYSTEM "chapter1.sgm">

You use them in the text to refer to &IBM; or to include &foobar;

> So, I am assuming that '%' must be present.
> Is that a correct assumotion?

No. See pp 394-401 of Goldfarb, especially Productions 101-104.

> If not, what else can there be, and what do they mean?

The pero is only used for PEs. GEs don't have a symbol there, but they
may use the reserved string #DEFAULT (production 103).

> After ''%'' char, next, we have a piece of non-whitespace string.

The entity name.

> I am assuming it means 'name' of the 'ENTITY' we are defining.

Yep.

> Is that assumption correct, could there be something else meaning
> something else.

Nope.

> And, is it case-sensitive --I believe it isn't but I might as well
> have it confirmed.

This is defined in the SGML Declaration for the specific DTD. It can be
made case-sensitive or case-insensitive.

> Then, we have all sorts of goobledygook..

This is the entity text. In the case of PEs, this is usually a content
model fragment, consisting of element type names in the form used in
element declarations, allowing the parameter entity reference to be usd
in constructing complex content models. But it can also be a parameter
literal and a bunch of other things (like the HTML.Frameset value, used
in switching features on and off).

> I am assuming these to be the value(s) that ENTITY can have.

No, you will have to read the standard to find out. It's 650pp.

> I don't have much problem with those that are explicetly listed, but
> what does "IGNORE", "CDATA" mean?

Too much to explain here. Read Eve Maler's book.

> What other stuff can be there apart from "IGNORE", "CDATA", and what
> do they mean?

Lots and lots.

> Could you help clarify these please.

Could you please go and read the documentation first, then ask about
what more you need to know.

///Peter


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google