Thanks for the reply. This html stuff is down right evil. In case anyone
else has a similar problem I ended up working around the lack of
follow-sibling by using a few patterns as there was only up to 2
headings and 4 tables page, ie, if Headings are the headings above the
tables and Tables are the tables, then
build_tables([], [], Out) ->
Out;
build_tables([H1, H2], [T1, T2], Out) ->
%% do some cleanup of Hs and Ts
build_tables([], [], [{H1, [T1]}, {H2, [T2]} | Out]);
build_tables([H1 | MoreH], [T1, T2 | MoreT], Out) ->
%% do some cleanup of Hs and Ts
build_tables(MoreH, MoreT, [{H1, [T1, T2]} | Out]);
build_tables([H1], [T1, T2], Out) ->
%% do some cleanup of Hs and Ts
build_tables([], [], [{H1, [T1, T2]} | Out});
build_tables([H1], [T1], Out) ->
%% do some cleanup of Hs and Ts
build_tables([], [], [{H1, [T1]} | Out]).
which seems to do the job, but I was hoping for a more generalised way
of doing this.
Thanks for writting mochiweb_xpath. It (with mochiweb_html) does make
processing scraped web pages easier.
Jeff.