Parsing HTML email replies generates multiple bodies

60 views
Skip to first unread message

Peter Stibrany

unread,
Nov 22, 2010, 10:10:33 AM11/22/10
to tagsoup-friends
Hello,

I am having trouble with parsing HTML email replies. For example, this
is part of reply generated by Thunderbird (edited):

<html>
<body bgcolor="#ffffff" text="#000000">
On 18.11.2010 16:12, Peter Stibrany wrote:
<blockquote cite="..." type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<title></title>
<table bgcolor="#ffffff" cellspacing="10" width="100%">
...
</table>
</blockquote>
</body>
</html>

Problem is that when tagsoup finds meta and title elements within
blockquote, terminates body element, and generates new head + body
elements:

<html>
<body bgcolor="#ffffff" text="#000000">
On 18.11.2010 16:12, Peter Stibrany wrote:
<blockquote cite="..." type="cite">
</blockquote>
</body>

<head>
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<title></title>
</head>

<body>
<table bgcolor="#ffffff" cellspacing="10" width="100%">
...

Is it possible to disable this "body-splitting"? I would like to keep
single body element, I don't mind having meta+title within blockquote.

Thank you,
-Peter

Peter Stibrany

unread,
Jan 21, 2011, 4:20:00 AM1/21/11
to tagsoup-friends
Bumping this thread ... hope never dies, and I'd prefer to use TagSoup
if there is a way to get around this problem.

Thanks,
-Peter
Reply all
Reply to author
Forward
0 new messages