generating content from austincc.bncollege.com - issue!

6 views
Skip to first unread message

bruce

unread,
Jun 10, 2017, 7:01:20 PM6/10/17
to casp...@googlegroups.com
Hi.

Got a bit of an issue/prob.

A target site uses dynamically generated/obfuscated code to generate
the website content. A simple curl process appears to be untenable.

So, one would thing the solution, use casper/phantom/etc, and feed the
url used for the browser into a test script, to then generate the
content. This doesn't appear to work consistently, nor is it fast
enough...

I'm running into an issue where the script/process runs every now and
then with the correct results. Most times though, the script returns
incorrect content with the content not having the book data. When the
test does run, it takes a few minutes.

If you copy the url directly into the browser (chrome/firefox) you can
easily see the correct content..

We're not sure if continually running the test is somehow being
captured by the server but running the same url from the browser has
no issue.


If someone is able to "help" resolve this issue, we're willing to
shell out some cash.

We're running the tests on fedora, on a cloud vm.

Oh, one of the major/weird things we do notice is that the test script
appears to fail at the selector. We've tried to increase the timeout
attribute but the results don't work consistently.


Thoughts/comments are helpful. There is some urgency to this.

Thanks



The target site:
https://austincc.bncollege.com

http://austin.bncollege.com/webapp/wcs/stores/servlet/TBListView?&catalogId=10001&storeId=65166&langId=-1&termMapping=Y&courseXml=<?xml
version='1.0' encoding='UTF-8'?><textbookorder><courses><course
dept='ACCT' num='2302' sect='26234' term='A17' /><course dept='ACCT'
num='2302' sect='26232' term='A17' /><course dept='ACNT' num='1329'
sect='26238' term='A17' /><course dept='ACNT' num='1331' sect='26240'
term='A17' /><course dept='ACNT' num='1347' sect='26242' term='A17'
/></courses></textbookorder>"



Script Usage:
casperjs bnbook_track.js
--url1="http://austin.bncollege.com/webapp/wcs/stores/servlet/TBListView?catalogId=10001&storeId=65166&langId=-1&termMapping=Y&courseXml=%3C?xml%20version=%271.0%27%20encoding=%27UTF-8%27?%3E%3Ctextbookorder%3E%3Ccourses%3E%3Ccourse%20dept=%27ACCT%27%20num=%272301%27%20sect=%2725087%27%20term=%27A17%27%20/%3E%3C/courses%3E%3C/textbookorder%3E"
--trackID='foo'




The test script:

============================================================================
/*
*
* bnbook_track.js: test to generate the term page for the bkstr input
*
* casper.js bnbook_track.js --url1="foo1" --track='zz'
*
* update jun 10/17
* -simply uses the "track" input as a way of tracking the process
in the procTBL
* -the parent app then tracks the uid in a inproc file..
* -the external app then blows away the matching pid/process based
on the amount of time running
* -a rough approach to ensuring the casperjs/bnbook fetch process
runs consistently
*
*
*/
var casper = require('casper').create({
verbose: true,
logLevel: 'debug',
waitTimeout: 50000, // new maximum waitTimeout
pageSettings: {
loadImages: false,
javascriptEnabled: true,
userAgent: 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Firefox/45.0'
//userAgent: 'Opera/9.80 (X11; Linux x86_64) Presto/2.12.388 Version/12.16'
}
});

phantom.timeout=50000;
phantom.stepTimeout=50000;
phantom.cookiesEnabled = true;

var x = require('casper').selectXPath;


var url1 = casper.cli.get("url1");

/*
casper.options.onResourceRequested = function(casper, requestData, request) {
// If any of these strings are found in the requested resource's URL, skip
// this request. These are not required for running tests.
var skip = [
'staticxx.facebook.com',
'googleads.g.doubleclick.net',
'www.facebook.com',
'www.google.com',
'bid.g.doubleclick.net',
'login.dotomi.com',
'www.google.com',
'insight.adsrvr.org',
'cdn.krxd.net'
];

skip.forEach(function(needle) {
if (requestData.url.indexOf(needle) > 0) {
request.abort();
}
})
};
*/


/*
casper.onResourceRequested = function(requestData, request) {
if ((/http:\/\/.+?\.css/gi).test(requestData['url']) ||
requestData.headers['Content-Type'] == 'text/css') {
console.log('The url of the request is matching. Aborting: ' +
requestData['url']);
request.abort();
}
};
*/


u="http://www.bncollege.com"

casper.start(u);
//casper.start();


var processPage = function() {
this.echo(this.page.content).exit();
};

casper.thenOpen(url1).waitForText("Remarketing", processPage,
//casper.thenOpen(url1).waitForSelector(x('//input[@id="remarketingStoreId"]'),
processPage,

function fail() {
console.log("oops");
}
);


casper.echo("dsdsdsd");

casper.run(function () {
// echo results in some pretty fashion
// this.echo(this.debugPage()).exit();
// fs.write(cookie_file, JSON.stringify(phantom.cookies), 644);
// this.echo("finish").exit();
this.echo(this.page.content).exit();

});

============================================================================
Reply all
Reply to author
Forward
0 new messages