Web Scraping AJAX with a twist (Cisco Network Switch Awesomeness involved)

81 views
Skip to first unread message

Afiq Hamid

unread,
Apr 7, 2017, 5:23:10 AM4/7/17
to phantomjs

Hi Everyone.

I recently got into PhantomJS (I found it to be a nice and powerful tool BTW, regardless of the natively asynchronous aspect. Props to Mr. Ariya Hidayat). Currently using PhantomJS for my first developer job.

I've been tasked to web scrape network switch information (hostname, productID, IPaddress, MAC address, etc) from an old Cisco catalyst 2960 x switch connected to my PC via LAN cable.

I got the http authenticatiion working fine with phantomJS headless browser and can open the first switch page but it leads me to a startup page as seen in the attached below. 


I've set my useragent to web crawler bot thinking that it would fool the system. (Screenshot acquired using page.render)

This startup page only appears for first time login/access to the switch after witch user must click the continue button which has the form button input property shown  below. (written in AJAX by the way)

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
<form METHOD="GET">
<input type="button" name="button1" value="Continue" onclick="setcookiesandLoadCiscoDeviceManager()"></form>
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Usually on Chrome browser we click on it and move on. Which subsequently brings us to a the main page of interest, the Cisco Device Manager Page  containing the information needed.



My question is, to bypass the startup report with phantomJS headless browser what is best approach? Either...
  1. Simulate button press on the form submission Method GET above triggering the link to go to the next page ($.ajax() comes to mind) or...
  2. Call the function setcookiesandLoadCiscoDeviceManager() through the .js file (more on that latter). This is more of a hacking approach.
From my understanding The architecture of the switch web pages are outlined in the page below




When the URL 10.44.39.252 is first requested 3 frame src are called. I know this because I managed to call the status through the callback page.onNavigationRequested 

  1. Frmwrkresource.htm
  2. topbannernofpv.shtml
  3. setup_report.htm
The button "button1" exists inside setup_report.htm DOM (does calling it a DOM apply if its written in AJAX?). When it is pressed setscookiesandLoadsCiscoDeviceManager(); is called

this function call (setscookiesandLoadsCiscoDeviceManager(); ) exists only in preflight.js among all the javascript resources that are called transitioning between startup report and device manager(10.44.39.252/xhome.htm). Therefore browser cookies is a major part of this problem i'm thinking

attached is my source code. It is at various levels of completeion

var page = require('webpage').create();
var fs = require('fs');

//phantom.addCookie({
// Cisco_DeviceManager : 'value', /* required property */
// SSLPreference : 2, /* required property */
// gettingstarted : 1
//});

console.log("\n:Welcome to my Crawler Scrapper:");

var url = 'http://10.44.39.252/';

page.settings.userName='star';
page.settings.password='----------';
page.customHeaders={'Authorization': 'Basic '+btoa('star:xzsawq4321')};

page.settings.userAgent = 'PMG Web Crawler Bot/1.0';//I even tell it that i'm a web crawler

page.onNavigationRequested = function(url,type,willNavigate, main){
//perhaps one of the most important parts
console.log("\n----------------------------------------------");
console.log("Navigation Request Information:\n")
console.log('Trying to navigate to: ' + url); //where are you going?
console.log('Caused by: ' + type); //request type
console.log('Will it actually navigate: ' + willNavigate);
console.log('Sent from the page\'s main frame: ' + main);
console.log("----------------------------------------------\n");
};

page.onResourceError = function(resourceError){
console.log("\nHold Up, We have Errors!")
console.log("Resource Error Information: \n")
console.log('Resoruce ErrorID:' + resourceError.id + '\nURL:' + resourceError.url);
console.log('Resource Error Code: ' + resourceError.errorCode + '\nDescription: ' + resourceError.errorString);
};

page.onConsoleMessage = function(msg) {
console.log("The Browser Replied:" + msg);
};

//////////////////////////////////////////////////////////////////
page.onLoadStarted = function(){
console.log("Loadng Page...")

};

page.onLoadFinished = function(){
console.log("Loading finished:\n");
};
//////////////////////////////////////////////////////////////////

page.viewportSize = {
width: 1920,
height: 1200
};

var sel = 'button1'; //DOM manipulate, selector
var type = 'click', //action

//webpage.open
page.open(url,function(status){
if(status === "success"){
page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() { //jquery syntax has been successfully included
setTimeout(function(){
var t = page.evaluate(function(sel) {

var a = $('title').text();
//console.log(sel);
return a;

},0,sel);
console.log("Title: " + t + "\n\n");

phantom.addCookie({
Cisco_DeviceManager : 'value', /* required property */
SSLPreference : 2, /* required property */
gettingstarted : 1
});

page.open('http://10.44.39.252/xhome.htm', function (status) {
$(document).ready(function(){
                     console.log("Your Document is Ready:"+ document.title +"\n");

/*ajax assynchronous http request
$.ajax({
async: false,//blocks the ajax call, SYNCHRONOUS ajax Request
url: 'http://10.44.39.252/setup_report.htm?button1=Continue', //blocking javascript request
type: 'GET',
data: {button1: 'Continue'},
success: function (out) {

console.log("REQUEST SENT!\n\n");
console.log(typeof(out));
$('button1').trigger(sel);
console.log($('.homecontent').text);
//$("button1").click(function(){
// $("input").trigger("select");
//});

},
error: function(){
console.log("Nein!");
}
});
*/
                     });


});

},3000);

setTimeout(function() {
page.render("phantomspecs1.jpg");
console.log("\nNow GTFO!")
phantom.exit();
},20000);

console.log("Wait for the Async...");//prints first!



},0);//closes includejs which doesnt operate in the next open...

}else{
console.log("Connect fail");
phantom.exit();
}
});

The Set timeout is there because I've been playing with async programming...

which is ultimately the objective that I need phantomJS to do. but my knowledge of JavaScript, JQuery and AJAX are still lacking (not natively a programmer but landed myself a coder job after college but I do have some basic concepts) 

If any of you guys could help me in the right direction for the next step I can finish the task and do documentation on it. No doubt would be valuable to the Phantom community. (of which I am proud to be a part of BTW)

Sincerely,
Afiq Abdul Hamid,
Cyberjaya Malaysia



Afiq Hamid

unread,
Apr 9, 2017, 10:45:47 PM4/9/17
to phantomjs
Reply all
Reply to author
Forward
0 new messages