On Mon, Dec 03, 2012 at 01:49:08AM -0800, Benito2313 wrote:
> What is it that you're trying to do? HTML is an XML dialect, after
> all (or can be, if XHTML). You should be able to parse it with all
> XML tools.
>
> My program handles with Xml's.
> I can see the script code of the HTML when i open it noteblock. how can i see
> if it is XHTML?
I just checked the HTML output from Tesseract. It is XHTML, so it is
a proper dialect of XML. You can tell from the <?xml opening tag,
plus the doctype and xmlns on the following lines.
Nick