looking for pointers on what I need to dive into next - I suspect
something horrible is happening in
node.io: htmlparser.js.
I am using
node.io to scrape
http://www.reuters.com/article/2012/03/30/utilities-southern-kemper-idUSL2E8EUAHQ20120330?feedType=RSS&feedName=utilitiesSector
and I get a segfault each time.
information:
valgrind & strace output -
http://pastebin.com/McidkC0g
System info - Linux 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:09:10 EDT
2010 i686 athlon i386 GNU/Linux
$ free -m
total used free shared buffers
cached
Mem: 3034 2809 224 0 531
1593
-/+ buffers/cache: 684 2349
Swap: 2047 0 2047
$ node -v
v0.6.15
$ npm -v
1.1.18
node.io scrape options:
var scrapeOptions = {
silent: true,
jsdom: true, // enable
parsing of js files
external_resources: ['script'],
timeout: 10, //Timeout after 10
seconds
max: 1,
retries: 3 //Threads can retry 3
times before failing
};
FWIW, I also had this error in node.js v0.4.11, and one of the first
steps I took was to upgrade node.js, npm and relevant npm modules.