Page doesn't always load in PhantomJS

115 wyświetleń
Przejdź do pierwszej nieodczytanej wiadomości

Anthony McKeever

11 sty 2016, 23:33:5411.01.2016
do phantomjs

Hey All,

I'm working on a webpage scraper that uses PhantomJS and one of the pages I'm scraping will only load about 40% of the time or so.  I'm not sure if there is something I'm doing wrong or if its something wrong in phantom.

The URL I'm trying to hit is ''.  The script below is supposed to print all the iframe sources (src).  The trouble is, its a crapshoot whether or not the code will print the iframes or not.  This isn't the only code I've written where the site doesn't load every time I run against this site.

To run this, just point phantom to a the javascript file.  Example: /path/to/phantom/bin/phantom ~/path/to/file/test.js

You'll also need a copy of the latest jQuery.  Just change the page.injectJs() to the correct path.

Some other details:
  • Code is being used in both CentOS 6.5 and LinuxMint 17.3 (where LinuxMint is the dev environment), both OSes exhibit the same behavior.
  • The script would usually be called by a NodeJS project (using Node version 4.2.3) (in which case a phantom.exit(); would be added to the end of the iterateUrls function), however executing via the Node project or directly through phantom causes
  • This code has been tested with PhantomJS v1.9.17, v1.9.18, and v1.9.19
  • PhantomJS was downloaded through NPM.

Any thoughts?  And, thanks in advance!

Code Sample:
var webPage = require('webpage');
var page = webPage.create();
var urls = [''];

page.settings.userAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36";

page.onConsoleMessage = function(msg){
 //system.stderr.writeline('console: ' + msg);

var phantomOps = function(url)
{, function(status){

  var iFrames = page.evaluate(function(){
   var urlArray = [];
   var frames = $("*").find("iframe");

   for(var i = 0; i < frames.length; i++)

   return urlArray;

  for(var i = 0; i < iFrames.length; i++)
  console.log("Done - Press CTRL-C to exit.");

var iterateUrls = function()
 for(var i = 0; i < urls.length; i++)
  var url = urls.shift();


Odpowiedz wszystkim
Odpowiedz autorowi
Nowe wiadomości: 0