Attributes on body tag

42 views
Skip to first unread message

markus

unread,
Sep 20, 2012, 10:07:03 AM9/20/12
to tagsoup...@googlegroups.com
Hi,

I attempt to read Microdata attributes for all elements including the body. In any case, no attributes on the body tag are returned. In Apache Tika's child implementation of TagSoup's Parser i have instructed the schema to add the itemscope and itemtype attributes to the body tag:

        HTML_SCHEMA.attribute("body", "itemscope", "nmtoken", null);
        HTML_SCHEMA.attribute("body", "itemtype", "nmtoken", null);

I am, however, unsure to what the token should be. If i were editting the html.tssl file i would configure them as NMTOKENs. I check the source and i've seen that if the type is CDATA the attribute is not added so this should work... i was thinking :)

Any hints to share on how i can instruct TagSoup to pass those attributes for the body tag to my content handler implementation?

Many thanks,
Markus

John Cowan

unread,
Sep 20, 2012, 2:11:35 PM9/20/12
to markus, tagsoup...@googlegroups.com
markus scripsit:

> I attempt to read Microdata attributes for all elements including the body.
> In any case, no attributes on the body tag are returned.

TagSoup does not discard any attributes, ever. The only effects of declaring
an attribute in the schema are:

1) So that TagSoup can return the correct attribute types to its caller.

2) So that leading, trailing, and multiple spaces are correctly handled.

3) So that default values are provided.

If you don't declare an attribute, but it's present anyway, it is reported
with type CDATA.

--
Time alone is real John Cowan <co...@ccil.org>
the rest imaginary
like a quaternion --phma http://www.ccil.org/~cowan

markus

unread,
Sep 21, 2012, 4:35:33 AM9/21/12
to tagsoup...@googlegroups.com, markus, co...@mercury.ccil.org
Thanks John! I'll deep further in Apache Tika.

markus

unread,
Sep 21, 2012, 4:36:16 AM9/21/12
to tagsoup...@googlegroups.com, markus, co...@mercury.ccil.org
Digg deeper of course!
Reply all
Reply to author
Forward
0 new messages