How can I run patch monkey in async mode?

40 views

Skip to first unread message

Long Nguyen

unread,

Jan 2, 2013, 3:24:05 AM1/2/13

to ql...@googlegroups.com

I want to implement simple html scraper into ql.io. My idea is when you write query like code below, ql.io will fetch the page, create browser like environment (JSDom), include jQuery, execute query and fetch result. But my problem is JSDom run in async mode but ql.io 's patch monkey run in sync mode, so ql.io return result before JSDom finish running. In example below, result always is blank array. Any chance to fix that?

https://github.com/ql-io/ql.io/blob/3db25d6a673ec1cb47ad718390c94d485d51a1a5/modules/engine/lib/engine/http.request.js#L414

Query:

select * from www.scraper
  where url = "http://abcdef.com/xyz"
      and title = '[{"selector":"ul li a.title"}]'
      and url = '[{"selector":"ul li a.title", "prop": "href"}]';

www.scraper.ql

-- definition of www.scraper tablecreate table www.scraper
  on select get from "{url}"
  using patch 'www.scraper.js'
  resultset 'results'

www.scraper.js

var jsdom 	= require('jsdom');
exports['parse response'] = function(args) {
    var result = [];
    // get body
    var body = '';
    _.each(args.body, function(buf) {
         body += buf.toString('UTF-8');
    });

    // create JSDom environment, include jQuery
    jsdom.env({
         html: body,
         scripts: ['http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js'],
         done: function(err, window) {
            // run CSS query using jQuery
            // jQuery(selector).each(function() { result.push($(element).text()) });
         }
    });

    // return result as new response
    return {
       type: 'application/json',
       content: JSON.stringify(result)
    }
}

shimonchayim

unread,

Jan 2, 2013, 6:52:45 PM1/2/13

to ql...@googlegroups.com

Hi Nguyen,

Right now it doesn't but something we are thinking about. Please do file an issue.