On 11/27/2017 04:02 PM, emacstheviking wrote:
> Jan, Carlo,
> Thanks for spotting that. I didn't know about load_structure/3 showing
> errors. I've not used any of these predicates before. I took a closer
It is more load_html/3 suppressing them as invalid HTML is more the
rule than an exception. Most of the times the resulting tree is still
close to the intend of the author though. This is a rather bad
> look at the HTML being returned from the site, it's *awful*. Firefox
> complains about the closing HEAD tag so that's not a good start is it!
> I will have to find a means to isolate the table markup in the returned
> string and then either:
> - clean up the mess by replacing illegal markup with better markup
It finds the table ok (despite all the rest), but corrupts the table
itself due to illegal quotes.
> - write a set of DCG rules to extract what I want.
> I don't mind, it's all good practice which is why I am subjecting myself
> to this anyway!
If the target is just this site a dedicated DCG hack may get what you
want. In general this is hard though and I'd probably see whether there
are tools out there that implement the official HTML5 error handling and
can write valid HTML.
Seems this code is written by someone not familiar with the notion of
code injection :(
Cheers --- Jan
> Thanks again all,
> On 27 November 2017 at 14:25, Carlo Capelli <cc.car...@gmail.com
> Hi Sean
> I think the HTML structure is ill formed (well, Jan already spotted
> it while I was writing this).
> In Firefox, just show the source to get red coloring of bad structures.
> 2017-11-27 14:49 GMT+01:00 Sean Charles <obj...@gmail.com
> it, send an email to swi-prolog+...@googlegroups.com
> You received this message because you are subscribed to the Google
> Groups "SWI-Prolog" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to swi-prolog+...@googlegroups.com