[NEWBIE] Traversing a list of links

瀏覽次數:42 次
跳到第一則未讀訊息

Bruce Axtens

未讀,
2015年4月17日 上午10:38:102015/4/17
收件者:phan...@googlegroups.com
Given the following code, why do I get lots of

QEventDispatcherUNIXPrivate(): Unable to create thread pipe: Too many open files
2015-04-17T22:35:47 [FATAL] QEventDispatcherUNIXPrivate(): Can not continue without a thread pipe

I'm closing the files after I create them after all



var fs = require('fs');

function innerCall(nam, ref) {
   
var ipage = require("webpage").create();
    ipage
.settings.userAgent = 'RosettaCodeSlurper';
    ipage
.open(ref, function(status) {
            console
.log('Finished inner');
           
var description = ipage.evaluate(function() {
               
var start = document.getElementsByClassName("infobox")[0];
               
var cursor = start.nextElementSibling;
               
var desc = "";
               
while (cursor.tagName !== "TABLE" && cursor.id !== "toc") {
                    there
= cursor;
                    desc
= desc + there.innerText + "\n";
                    cursor
= cursor.nextElementSibling;
               
}
               
return desc;
           
});
           
var fileName = nam + ".txt";
            fileName
= fileName.replace(/\//g, "_");
           
var h = fs.open(fileName, 'w');
            h
.write(description);
            h
.flush();
            h
.close();
            ipage
.close();
       
}
   
);

}

var page = require('webpage').create();
page
.settings.userAgent = 'RosettaCodeSlurper';
page
.open('http://rosettacode.org/wiki/Category:Programming_Tasks', function(status) {
        console
.log('Finished outer');
       
var anchors = page.evaluate(function() {
           
var result = [];
           
var anchs = document.getElementById("mw-pages").getElementsByTagName("a");
           
for (var i = 0; i < anchs.length; i++) {
                result
.push([anchs[i].innerHTML, anchs[i].href]);
           
}
           
return result;
       
});
       
for (var i = 0; i < anchors.length; i++) {
           
var txt = anchors[i][0];
           
var hrf = anchors[i][1];
            innerCall
(txt, hrf);
       
}
       
//page.close();
   
    phantom
.exit();
});


Ivan

未讀,
2015年4月22日 下午5:02:522015/4/22
收件者:phan...@googlegroups.com
You should reuse same webpage instance in your innerCall function. So I suggest that you move ipage declaration outside the innerCall function, also remove ipage.close() call.

James M. Greene

未讀,
2015年4月22日 下午6:42:352015/4/22
收件者:phan...@googlegroups.com

If you reuse the `page` object as Ivan suggested, you'll need to synchronize your loop iterations.

Your problem is that, right now, you are spawning `anchors.length` new asynchronously executing `WebPage#open` calls simultaneously. Not only does that eat up a huge chunk of memory (2000 links => 2000 new WebPage instances created) but it also opens potentially the same number of file descriptors for writing out your results. This could be running into system/session limitations like `ulimit` maximums.

Lastly but certainly not leastly, your `phantom.exit()` call is going to occur before all/most/any of your loop's async callouts are finished.

Sincerely,
   James M. Greene

On Apr 22, 2015 4:02 PM, "Ivan" <ira...@gmail.com> wrote:
You should reuse same webpage instance in your innerCall function. So I suggest that you move ipage declaration outside the innerCall function, also remove ipage.close() call.

--
You received this message because you are subscribed to the Google Groups "phantomjs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phantomjs+...@googlegroups.com.
Visit this group at http://groups.google.com/group/phantomjs.
For more options, visit https://groups.google.com/d/optout.
回覆所有人
回覆作者
轉寄
0 則新訊息