segfaults and bus errors

francisco treacy

unread,

Apr 6, 2010, 7:23:26 AM4/6/10

to libxmljs

On Apr 3, 7:59 am, Marco Rogers <marco.rog...@gmail.com> wrote:
> Problems with repl - In the last thread we had[1], I said i was
> getting Bus Errors and Segfaults when testing various functions. This
> seems to be a problem with the node-repl environment vs actual running
> node rather than a problem with libxmljs. I'm going to investigate
> further. But you should be aware that the results of testing the
> module in the repl don't represent true behavior.

Jeff is busy right now. He might be able to tackle these bugs in a
couple of weeks. Meanwhile, Marco are you taking it over? There is a
leak_stop branch which you're probably aware of.

As I said in the node.js ml thread, I consistently see segfaults and
bus errors within a running node program. So I would confirm: it's not
an issue with the REPL.

My usecase is: I am opening an epub file, reading its TOC, getting
some file path data (via xpath), opening and parsing HTML from another
bunch of files, getting more data via xpath... this is when things get
problematic with medium-sized files.
I am able, however, to open a small epub file and process it
correctly.

Thus it has definitely something to do with the amount of processed
xml / xpaths in a node program.

Please let me know what other information / data / code I can provide.

See also last comments
http://github.com/sprsquish/libxmljs/issues#issue/18

Francisco

Marco Rogers

unread,

Apr 7, 2010, 1:19:47 AM4/7/10

to libxmljs

I know Jeff created the leak_stop branch but I haven't looked at it
and I'm not sure where he left it. I'm not taking over Jeff. I've
just been in touch with him about contributing because I'm using
libxmljs pretter extensively for something I'm doing. So I'm doing
what I can on a fork and sending him pull requests.

I think you're right that the segfaults increase greatly with larger
files. I also find problems when using the lib extensively for long
periods. For instance I was trying to run benchmarks and was parsing
and using a very small xml file in a loop. At a certain point the
loop craps out and the program segfaults.

I'm definitely not the best person to be tackling this issue. But I'm
giving it a shot. Thanks for the info and I'll keep you posted on
anything noteworthy.

:Marco

On Apr 6, 7:23 am, francisco treacy <francisco.tre...@gmail.com>
wrote:

Marco Rogers

unread,

Apr 7, 2010, 1:21:07 AM4/7/10

to libxmljs

FYI here's my fork.

http://github.com/polotek/libxmljs

I've got a few updates there you might be interested in. I've got
pull requests in to Jeff. But obviously he's busy :)

:Marco

On Apr 6, 7:23 am, francisco treacy <francisco.tre...@gmail.com>
wrote:

francisco treacy

unread,

Apr 9, 2010, 6:11:23 AM4/9/10

to libx...@googlegroups.com

Hey Marco,

Just checked out and tried your fork (version 61d1a5).

It still behaves exactly as the older libxmljs versions from Jeff.

The html files are about 50kb. This works:

for (...) {
var doc2 = libxml.parseHtmlFile(htmlFile);
var body = doc2.get('//body');
}

However this:

for (...) {
var doc2 = libxml.parseHtmlFile(htmlFile);
var body = doc2.get('//body');
sys.puts(body);
}

...blows (segmentation fault) when I print out (or just simply access)
the body node, after many iterations. It happens consistently after
~20 iterations in the forEach loop.

Hope this helps?

Francisco

2010/4/7 Marco Rogers <marco....@gmail.com>:

> --
> To unsubscribe, reply using "remove me" as the subject.
>

francisco treacy

unread,

Apr 9, 2010, 6:38:39 AM4/9/10

to libx...@googlegroups.com

ps: the body.toString() and body.text() methods cause the segfault.
If I call body.name(), it works just fine.

2010/4/9 francisco treacy <francisc...@gmail.com>:

francisco treacy

unread,

Apr 9, 2010, 9:36:01 AM4/9/10

to libx...@googlegroups.com

Update: *very strange* ... looks like it works now.

In the beginning of the file I had an unnecessary import of express.

var express = kiwi.require('express');

Commenting that out makes the program run without any trouble.
Hmmm...this goes beyond my understanding :)

Francisco

2010/4/9 francisco treacy <francisc...@gmail.com>:

francisco treacy

unread,

Apr 9, 2010, 1:28:05 PM4/9/10

to libx...@googlegroups.com

Yet another update:

Getting rid of the express require has helped, however when processing
more than one book at a time (per node process), i *still* get this
kind of errors.

Marco Rogers

unread,

Apr 9, 2010, 11:55:29 PM4/9/10

to libxmljs

Thanks for the specific info francisco. I know that express has a lot
of code in it and some of it modifies the global scope outside of the
modules. I'm sure that's not the ultimate source of our woes though.
Keep the info coming. I'll be spending significant time on this over
the weekend.

:Marco

On Apr 9, 1:28 pm, francisco treacy <francisco.tre...@gmail.com>
wrote:

> Yet another update:
>
> Getting rid of the express require has helped, however when processing
> more than one book at a time (per node process), i *still* get this
> kind of errors.
>
> Francisco
>

> 2010/4/9 francisco treacy <francisco.tre...@gmail.com>:

>
>
>
> > Update: *very strange* ... looks like it works now.
>
> > In the beginning of the file I had an unnecessary import of express.
>
> > var express = kiwi.require('express');
>
> > Commenting that out makes the program run without any trouble.
> > Hmmm...this goes beyond my understanding :)
>
> > Francisco
>

> > 2010/4/9 francisco treacy <francisco.tre...@gmail.com>:

> >> ps: the body.toString() and body.text() methods cause the segfault.
> >> If I call body.name(), it works just fine.
>

> >> 2010/4/9 francisco treacy <francisco.tre...@gmail.com>:

> >>> Hey Marco,
>
> >>> Just checked out and tried your fork (version 61d1a5).
>
> >>> It still behaves exactly as the older libxmljs versions from Jeff.
>
> >>> The html files are about 50kb. This works:
>
> >>> for (...) {
> >>> var doc2 = libxml.parseHtmlFile(htmlFile);
> >>> var body = doc2.get('//body');
> >>> }
>
> >>> However this:
>
> >>> for (...) {
> >>> var doc2 = libxml.parseHtmlFile(htmlFile);
> >>> var body = doc2.get('//body');
> >>> sys.puts(body);
> >>> }
>
> >>> ...blows (segmentation fault) when I print out (or just simply access)
> >>> the body node, after many iterations. It happens consistently after
> >>> ~20 iterations in the forEach loop.
>
> >>> Hope this helps?
>
> >>> Francisco
>

> >>> 2010/4/7 Marco Rogers <marco.rog...@gmail.com>:

Marco Rogers

unread,

Apr 10, 2010, 4:10:50 AM4/10/10

to libxmljs

Okay I've got a little test setup here that reproduces the issue, or
at least it used to. With this test, I'm ONLY getting seg faults with
node_g. Running it with node proper runs fine. I get no seg faults
anymore. I can run 100+ iterations. Memory still goes up, but I've
run 1000 iterations and the memory went to 500+MB but it didn't
crash. That's progress :)

The question is what's different between my set up and yours. Here is
the full gist of what I'm doing.

git://gist.github.com/361906.git

A few notes about this script:

- I've got some code at the top that makes some changes for
compatibility with node 0.1.90. If you're using 0.1.33 it shouldn't
affect anything, but if it does, you can remove this.
- I've got 2 ways of running the test.
- looptest runs the iterations in one loop without pausing or
yielding to the event loop. This is more like your original test.
The memory stays higher, but it still works.
- eventlooptest runs each iteration as a separate callback on the
next round of the event loop. This is more efficient and is really
how this type of stuff should be done. You might notice that the
memory stays lower because the garbage collector has time to run
between iterations. But it's not significantly slower.
- Both functions take a filename, number of iterations and an optional
parameter which is a wait time. At the end of the loop test, it will
wait for the specified number of milliseconds and then print memory
usage again. This just to see if the garbage collector catches up
after the load test.
- Don't run both tests at once. Just comment out either the call to
looptest() or eventlooptest at any given time.

:Marco

Reply all

Reply to author

Forward