Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Text extraction

2 views
Skip to first unread message

Sebastian Sieber

unread,
Nov 22, 2009, 11:44:10 AM11/22/09
to
Dear all,

I am developing a FF extension to extract text from websites and save
it to a file. The ext receives the URL from an xml file and opens a
new tab with the website. After traversing through the DOM tree, the
text is saved as a text file.

Source: http://pastebin.com/m549d31f3

My problem: the getText function is called at least two or three times
by the event listener (see loadPage function). Usually already the
first time when the tab is opened, the last time when the page is
fully loaded. Why?

I need your help!
Thanks in advance!

Cheers,
Sebastian

Zorkzero

unread,
Nov 22, 2009, 7:40:25 PM11/22/09
to
On 22 Nov., 17:44, Sebastian Sieber <sebastian.sie...@gmail.com>
wrote:

> My problem: the getText function is called at least two or three times
> by the event listener (see loadPage function). Usually already the
> first time when the tab is opened, the last time when the page is
> fully loaded. Why?

I don't know why, but I use the code from here:
https://developer.mozilla.org/en/Code_snippets/Tabbed_browser#Detecting_page_load,
and it works. I use the "DOMContentLoaded" event instead of the "load"
event.

Sebastian Sieber

unread,
Nov 23, 2009, 7:04:53 AM11/23/09
to
On Nov 23, 1:40 am, Zorkzero <zorkz...@hotmail.com> wrote:
> On 22 Nov., 17:44, Sebastian Sieber <sebastian.sie...@gmail.com>
> wrote:
>
> > My problem: the getText function is called at least two or three times
> > by the event listener (see loadPage function). Usually already the
> > first time when the tab is opened, the last time when the page is
> > fully loaded. Why?
>
> I don't know why, but I use the code from here:https://developer.mozilla.org/en/Code_snippets/Tabbed_browser#Detecti...,

> and it works. I use the "DOMContentLoaded" event instead of the "load"
> event.

I also tried using code from the website you are refering to. Same
problem.

But at least I now know, why it calls the function more than one time.
If it is a very simple webpage, it works perfectly, but if there are
e.g. iframes there is an event whenever anything finished loading. if
the webpage is manipulated by javascript, this happens very often.

if someone has a solution, i would appreciate!

Neil

unread,
Nov 23, 2009, 5:29:56 PM11/23/09
to
Sebastian Sieber wrote:

>My problem: the getText function is called at least two or three times by the event listener (see loadPage function).
>

You're adding a capturing load listener. This will listen to loads in
contained objects. (At least you're turning off images so they don't get
in your way, but other objects have load events too!)

--
Warning: May contain traces of nuts.

0 new messages