Robert Sanders
unread,Jul 25, 2012, 4:21:41 PM7/25/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tagsoup...@googlegroups.com
Hi,
I've been tasked with converting some legacy HTML into an XML format for import into a CMS. The problem is that there is some metadata that is encoded in non-compliant "tags" left over from whatever was used to generate the HTML in the first place (and whomever was doing that is long gone). And when I say "non compliant" I don't been the "bogons", I mean stuff like:
<body>
<!!uid 5usc500>
<h2>This is the page title</h2>
So, is there any relatively simply way to transform the <!! type tags into something like an element, processing instruction or comment? If not then I'll probably end up just scanning for them using a regexp and then parsing the rest of the HTML using TagSoup.