GWT useful for HTML Parsing? Is it fast enough

410 views
Skip to first unread message

myapplicationquestions

unread,
Apr 18, 2010, 10:16:51 AM4/18/10
to Google Web Toolkit
Hi All,

I have a requirement where i get a huge html report from a third party
application (@50,000 records), i need to put a filter on it so i show
only some records matching a certain criteria.

I am thinking of using GWT for this, where i will go through DOM and
parse each row, if a particular record does not match my criteria i
delete the TR using GWT.

Is this the correct approach? Is GWT fast enough for this?

Thanks,
Parag

--
You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group.
To post to this group, send email to google-we...@googlegroups.com.
To unsubscribe from this group, send email to google-web-tool...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-web-toolkit?hl=en.

Chris Lercher

unread,
Apr 18, 2010, 10:48:50 AM4/18/10
to Google Web Toolkit
Hi Parag,

I would use GWT for many things, but in this case, I would probably
decide between

- Using pure JavaScript (should usually be enough to do this.)
- Using jQuery selectors, if it gets more complex.
- Or write a quick standalone Java App which parses the HTML (using
htmlunit, or NekoHTML directly), walk the DOM, and write it to a file.
(I don't know, if you have to integrate the solution directly in a
browser?)

Or do you plan on using some special feature of GWT for your tool?

Chris


On Apr 18, 4:16 pm, myapplicationquestions <parag.bhag...@cgi.com>
wrote:

Thomas Broyer

unread,
Apr 18, 2010, 7:01:05 PM4/18/10
to Google Web Toolkit


On Apr 18, 4:48 pm, Chris Lercher <cl_for_mail...@gmx.net> wrote:
> Hi Parag,
>
> I would use GWT for many things, but in this case, I would probably
> decide between
>
> - Using pure JavaScript (should usually be enough to do this.)

Why would it be better than GWT?

> - Using jQuery selectors, if it gets more complex.

GwtQuery gives you the same, in GWT.

> - Or write a quick standalone Java App which parses the HTML (using
> htmlunit, or NekoHTML directly), walk the DOM, and write it to a file.

Yes, this is an actual alternative (I'd rather use
htmlparser.validator.nu though for the HTML parsing)

Using GWT or another JS toolkit is not the correct answer when the
question is "can GWT do it?" or "is GWT fast enough?": GWT can do what
JS do, and speed is mainly a browser issue.

Chris Lercher

unread,
Apr 18, 2010, 8:09:47 PM4/18/10
to Google Web Toolkit
Hi Thomas,

I agree. I just don't see any advantage for GWT in this case. So I'd
say, that using it only makes sense, if there are other reasons, which
weren't expressed in the question.

By the way, GWT uses NekoHTML, too (it's in gwt-dev.jar). Why do you
prefer the HTML parser you mentioned?

Chris


On Apr 19, 1:01 am, Thomas Broyer <t.bro...@gmail.com> wrote:
> > - Using pure JavaScript (should usually be enough to do this.)
>
> Why would it be better than GWT?
>
> > - Using jQuery selectors, if it gets more complex.
>
> GwtQuery gives you the same, in GWT.
>
> > - Or write a quick standalone Java App which parses the HTML (using
> > htmlunit, or NekoHTML directly), walk the DOM, and write it to a file.
>
> Yes, this is an actual alternative (I'd rather use
> htmlparser.validator.nu though for the HTML parsing)

Why? GWT uses Neko, too...

Chris Lercher

unread,
Apr 18, 2010, 8:17:23 PM4/18/10
to Google Web Toolkit
BTW, re-reading my original answer, maybe it was mistakable. It was
not my intention not suggest, that GWT is slower than the other
methods (even if it sounded that way). I just wanted to say, that for
the problem it doesn't look like the most natural choice.

Thomas Broyer

unread,
Apr 19, 2010, 5:05:31 AM4/19/10
to Google Web Toolkit


On Apr 19, 2:09 am, Chris Lercher <cl_for_mail...@gmx.net> wrote:
> Hi Thomas,
>
> I agree. I just don't see any advantage for GWT in this case. So I'd
> say, that using it only makes sense, if there are other reasons, which
> weren't expressed in the question.
>
> By the way, GWT uses NekoHTML, too (it's in gwt-dev.jar). Why do you
> prefer the HTML parser you mentioned?

Because it implements the HTML5 parsing rules, algorithm that has been
written to predictably parse web pages as found "in the wild", with
results that are as close as possible as what browsers do today (when
they disagree, a choice had to be done obviously), and which browsers
are implementing today for their next version. Moreover, this
particular implementation is AFAIK the on that ships in Firefox (not
as the default parser for now, but will be soon), after being
translated to C++ (by a script); it is also the one used to back the
HTML5 validator at validator.nu and validator.w3.org.
So I tend to believe its results more than any other "tag soup
parser".
(oh, and for the story htmlparser.validator.nu has successfully been
compiled with GWT! ;-) )

BTW, GWT doesn't "use neko", it uses HTMLUnit (which happens to use
NekoHTML as its parser).

Chris Lercher

unread,
Apr 19, 2010, 5:53:35 AM4/19/10
to Google Web Toolkit
Thanks, very interesting. I set a bookmark.


On Apr 19, 11:05 am, Thomas Broyer <t.bro...@gmail.com> wrote:

> Because it implements the HTML5 parsing rules, algorithm that has been
> written to predictably parse web pages as found "in the wild", with
> results that are as close as possible as what browsers do today (when
> they disagree, a choice had to be done obviously), and which browsers
> are implementing today for their next version. Moreover, this
> particular implementation is AFAIK the on that ships in Firefox (not
> as the default parser for now, but will be soon), after being
> translated to C++ (by a script); it is also the one used to back the
> HTML5 validator at validator.nu and validator.w3.org.
> So I tend to believe its results more than any other "tag soup
> parser".
> (oh, and for the story htmlparser.validator.nu has successfully been
> compiled with GWT! ;-) )

Reply all
Reply to author
Forward
0 new messages