Allowing block elements in inline elements

66 views
Skip to first unread message

jomaras

unread,
Oct 1, 2010, 3:57:33 PM10/1/10
to tagsoup-friends
Hi everybody,

Does anybody know how to allow block elements in inline elements
- e.g. www.google.com has:

<span id="main">
<div id="ghead">.....</div>
...
</span>

and tagsoup can't handle that - it just ignores the div in the span
element.

And also: how can i compile tagsoup:
- HTMLModels, Scanner, Schema are giving errors in constructs like:
@@MODEL_DEFINITIONS@@

Thank you!


James Abley

unread,
Oct 4, 2010, 3:04:42 PM10/4/10
to tagsoup-friends, jom...@gmail.com
The current version builds fine for me.

$ unzip tagsoup-1.2-src.zip
$ cd tagsoup-1.2
$ export CLASSPATH=~/.m2/repository/xalan/xalan/2.7.0/xalan-2.7.0.jar # Default Java on Snow Leopard isn't picking up a TransformerFactory
$ ant

 
HTML 4.x says <span/> should only contain inline [1] content. Potentially in this instance you want <span/>s promoted to <div/>s? It's a messy business.

HTML 5 "Says No" [2],[3] from my reading of a recent draft as well as from running code.

"Cases where the default styles are likely to lead to confusion

Certain elements have default styles or behaviors that make certain combinations likely to lead to confusion. Where these have equivalent alternatives without this problem, the confusing combinations are disallowed.

For example, div elements are rendered as block boxes, and span elements as inline boxes. Putting a block box in an inline box is unnecessarily confusing; since either nesting just div elements, or nesting just span elements, or nesting span elements inside div elements all serve the same purpose as nesting a div element in a span element, but only the latter involves a block box in an inline box, the latter combination is disallowed."

I'm not sufficiently familiar with HTML5 to say whether disallowed means silently dropped or an error happens.

Cheers,

John Cowan

unread,
Oct 4, 2010, 5:42:58 PM10/4/10
to james...@gmail.com, tagsoup-friends, jom...@gmail.com
On Mon, Oct 4, 2010 at 3:04 PM, James Abley <james...@gmail.com> wrote:

> The current version builds fine for me.
> $ wget "http://home.ccil.org/~cowan/XML/tagsoup/tagsoup-1.2-src.zip"
> $ unzip tagsoup-1.2-src.zip
> $ cd tagsoup-1.2
> $ export CLASSPATH=~/.m2/repository/xalan/xalan/2.7.0/xalan-2.7.0.jar #
> Default Java on Snow Leopard isn't picking up a TransformerFactory
> $ ant

Good. The key is to always use ant.

> I'm not sufficiently familiar with HTML5 to say whether disallowed means
> silently dropped or an error happens.

Neither. It means that HTML authors who intend to conform to HTML5
MUST NOT do something. It does not necessarily mean that the behavior
of a HTML5-conforming browser is to report an error, or even that the
behavior is not defined by HTML5.

jomaras

unread,
Oct 5, 2010, 1:39:20 AM10/5/10
to tagsoup-friends
Thank you for your responses (not to mention thanks for the awesome
piece of software!) - i've managed to get it working.

I know that nesting block elements inside of inline elements is not
allowed, but that fact doesn't stop people from using that kind of
constructs - since browsers allow it. In my case i needed that to go
through the parser, so i had to enable it in tagSoup.
Maybe it wouldn't be a bad idea to have an option in tagsoup that
would allow people to parse these "div" inside a "span", or div inside
"p" constructs.

Again, thank you for you time!


On 4 list, 23:42, John Cowan <co...@ccil.org> wrote:

Brian Harcourt

unread,
Aug 5, 2012, 8:44:01 PM8/5/12
to tagsoup...@googlegroups.com
Any chance you could share exactly how you enabled blocks within inlines in tagSoup?
Struggling the parsing HTML and am finding content I want buried in these malformed docs a lot.
Reply all
Reply to author
Forward
0 new messages