bug - string content of attribute being tokenized by tagsoup

31 views
Skip to first unread message

Ihe Onwuka

unread,
Mar 28, 2014, 10:45:13 AM3/28/14
to tagsoup...@googlegroups.com
Steps to recreate

run tagsoup on this page


and look at the either or both of the following content attributes

<meta property="og:description" content="

or

<meta name="twitter:description" content="

in both cases the content attribute appears tokenized delimited by whitespace instead of as one string.

John Cowan

unread,
Mar 28, 2014, 11:57:01 AM3/28/14
to Ihe Onwuka, tagsoup...@googlegroups.com
Ihe Onwuka scripsit:

> <meta property="og:description" content="
>
> or
>
> <meta name="twitter:description" content="

The element looks like this:

<meta property="og:description" content="Synopsis: Ed "Big Daddy" Roth was a genius of outlaw art who took America's obsession with all that is fast, loud, and streamlined and built it into an empire. In the 1950s, Roth was a hot-rodder who moved from bodywork and helping guys fine-tune the look" />

TagSoup does not know how to cope with double quotes inside a
double-quoted attribute value. The double quote before "Big" terminates
the attribute value, and the remaining text is assumed to be attributes
without values. Unfortunately, there is no fix or workaround for this.

--
A few times, I did some exuberant stomping about, John Cowan
like a hippo auditioning for Riverdance, though co...@ccil.org
I stopped when I thought I heard something at http://ccil.org/~cowan
the far side of the room falling over in rhythm
with my feet. --Joseph Zitt
Reply all
Reply to author
Forward
0 new messages