HTML Parser in Haxe

274 views
Skip to first unread message

Gautam Jain

unread,
Aug 30, 2015, 11:21:10 AM8/30/15
to Haxe
I am searching for a way to parse a webpage to extract data from it in Haxe,
This is one of the pages I want to parse - http://nptel.ac.in/courses.php?disciplineId=106

Currently my application, Borg is made in Adobe AIR and I am using StageWebView to get the data from webpages,
but I want to port my application to Haxe.
The lack of support for newer Air Runtimes makes it difficult for me to run my application on Linux.
And the Android version of Borg carries an extra 30 mb with the captive runtime.

You can get more details about my application at http://gautamji.com/blog/borg/

Victor / tokiop

unread,
Aug 30, 2015, 12:48:30 PM8/30/15
to haxe...@googlegroups.com
Hi,

the most up-to-date haxe html-parser seems to be the one maintained by
Yaroslav Sivakov : https://bitbucket.org/yar3333/haxe-htmlparser

For quick and dirty scraping, regexes can be less tedious than trying to
parse a random html page.
http://www.matthijskamstra.nl/blog/2015/07/24/scraping-with-haxe/

If you target only js/nodejs platforms, you could maybe use js to parse
the html more reliably.

A tolerant "spagetti-html-parser" or "tidy-html5" implementation in Haxe
could help..

Victor

On 30/08/2015 17:21, Gautam Jain wrote:
> I am searching for a way to parse a webpage to extract data from it in
> Haxe,
> This is one of the pages I want to parse -
> http://nptel.ac.in/courses.php?disciplineId=106
>
> Currently my application, Borg is made in Adobe AIR and I am using
> StageWebView to get the data from webpages,
> but I want to port my application to Haxe. <http://gautamji.com/blog/borg/>

Gautam Jain

unread,
Aug 30, 2015, 1:47:22 PM8/30/15
to haxe...@googlegroups.com
Thanks Victor!
What about quaxe?
Does any one know when quaxe will be available?, I think it has a standards compliant html parser in it too.



--
To post to this group haxe...@googlegroups.com
http://groups.google.com/group/haxelang?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Haxe" group.
For more options, visit https://groups.google.com/d/optout.

szczepan

unread,
Aug 30, 2015, 4:46:41 PM8/30/15
to Haxe

Gautam Jain

unread,
Sep 1, 2015, 6:57:30 AM9/1/15
to haxe...@googlegroups.com
Thank you. I will definitely look into that.
And other than that I need to download files and write them to disc as soon as the bytes are available, currently I am using the AIR's URLStream and FileStream for this purpose.
How can I get the same result in Haxe?


Sam MacPherson

unread,
Sep 1, 2015, 3:41:55 PM9/1/15
to Haxe
The dom4 library has a decent parser in it: https://github.com/therealglazou/dom4

I use it for my haxe-dom project.

Gautam Jain

unread,
Sep 1, 2015, 3:56:50 PM9/1/15
to Haxe
Cool. I'll look into that, I had started porting jsoup to Haxe, but maybe I won't need to. thanks :)

Gautam Jain

unread,
Sep 2, 2015, 2:53:30 AM9/2/15
to Haxe
Sam, could you please post some sample code here.


On Wednesday, September 2, 2015 at 1:11:55 AM UTC+5:30, Sam MacPherson wrote:

Sam MacPherson

unread,
Sep 2, 2015, 3:53:33 PM9/2/15
to Haxe
There is an example in the dom4 library: https://github.com/therealglazou/dom4/blob/master/test/Test.hx

Lines 55-57 show parsing an html document. You can then use the document variable just like in the browser.

Victor / tokiop

unread,
Sep 3, 2015, 4:41:01 PM9/3/15
to haxe...@googlegroups.com
On 01/09/2015 21:56, Gautam Jain wrote:
> Cool. I'll look into that, I had started porting jsoup
> <https://github.com/jhy/jsoup> to Haxe, but maybe I won't need to. thanks :)

porting jsoup is interesting, how is it going ?

Dom4 looks like an high-quality xml/xhtml parser, but not very tolerant
to malformed html, for example an unclosed <link> tag breaks it..

Could Dom4 become as tolerant as browsers, or it isn't it's goal ?

Gautam Jain

unread,
Sep 6, 2015, 3:44:54 AM9/6/15
to Haxe
I thinks Dom4 is made for making ui in quaxe, for which it works fine.
But for html scraping, porting a library like jsoup will be a better solution.
I have started the work on porting it over to Haxe, so far so good.

Do you know of a unified way to play videos (mp4, flv) using Haxe? currently I need support for Windows, Mac and Linux.
Reply all
Reply to author
Forward
0 new messages