Split model

Ariya Hidayat

olvasatlan,

2011. ápr. 23. 14:15:502011. 04. 23.

– phan...@googlegroups.com

Since I will tag and release 1.1 in this weekend, now we can discuss
about 1.2 and beyond.

One of the reason I am reluctant to add more API functions to the
phantom object is outlined in
http://code.google.com/p/phantomjs/issues/detail?id=41. Now, when I
start creating PhantomJS few years ago for my own uses, what I wanted
to do is running the script in the context of the web page. After
PhantomJS is launched, seems that what most people want (including but
not limited to testing use cases) is to able "control" the life time
and execution of a web page. IOW we want to run a "supervisor" script
that can fully control the "worker" web page.

If we start with an example first (and let's assume we already have
module support from
http://code.google.com/p/phantomjs/issues/detail?id=32), the code may
look like this (API names are subject to change):

var WebPage = require('WebPage');
var page = WebPage.open(url);

page.addListener("progressChanged", function (percent) {
if (percent >= 100) {
page.rasterize(outputFileName);
}
});

Obviously sometimes we still want to run a script in the context of
the page itself, e.g. to grab certain elements using selectors. This
means it is also necessary to transfer the data between the sandboxed
"worker" script back to the "supervisor" script (and vice versa). What
I propose in issue #41 is something similar to the WebWorker API, e.g.
message passing approach.

The outcome of this solution is that the worker script will never get
access to anything new (except the post message function, which is
harmless). Thus, we will have the freedom to extend the API of the
supervisor script to cover things like socket communication, file API,
etc etc.

Thoughts? Feedback?

--
Ariya Hidayat
http://www.linkedin.com/in/ariyahidayat

James Roe

olvasatlan,

2011. ápr. 23. 16:07:032011. 04. 23.

– phantomjs

Sounds great! My only comments are on what the code may look like, but
that's so minor it's not worth mentioning. Can't wait. :)

Ivan De Marino

olvasatlan,

2011. ápr. 27. 5:22:012011. 04. 27.

– phan...@googlegroups.com

The only concerne I have is that it will require a complete rework.

Plus, we can't do this without modifying the actual (Qt)WebKit. Raising the bar for contribution ;)

In all my humbleness, I can't just jump in and help like with pure Qt.

But I guess it's the right way to go.

I'm wondering though: wouldn't make more sense to rethink PhantomJS?

One of the things that PhantomJs misses, is the ability to be a true command line interpreter. Something like the "python" interpreter... or even Bash!

The WebPage could become an API, an instance-able object. The scripts will than be executed not from the WebKit JavascriptCore as we do now, but from QtScript (that, I know, inside uses JavascriptCore and, maybe, in the future even V8(!!!)).

Next natural step for that would than be make PhantomJS "CommonJS" compliant.

And, given that I don't think CommonJS defines and API for "web page manipulation", we could even contribute something back!!!

Am I taking it too far?

The Worker Thread approach would still be "centered around the page": what I feel is that we should actually move to a Script-Centric approach, where the webpage is just an object. Just like you showed in the example. Maybe we are already on the same page, and this is me thinking out aloud.

PEOPLE, we need thoughts here.

Experienced Opinions are key!

Ariya Hidayat

olvasatlan,

2011. ápr. 27. 20:02:092011. 04. 27.

– phan...@googlegroups.com

> The Worker Thread approach would still be "centered around the page": what I
> feel is that we should actually move to a Script-Centric approach, where the
> webpage is just an object. Just like you showed in the example. Maybe we are
> already on the same page, and this is me thinking out aloud.

Most likely. Suppose we ignore the require/module support and don't
care about regressions, this is as easy as starting with creating a
WebPage QObject and add it to the window object. The said QObject
instance will have methods like open and signals like progressChanged.
It's too sad we have to wrap it in another QObject (as opposed to just
using QWebPage), but that makes easier to not suddenly expose every
single public methods in QWebPage.

Ivan De Marino

olvasatlan,

2011. ápr. 28. 11:56:562011. 04. 28.

– phan...@googlegroups.com

What about basing the whole thing on QtScript? It would make a lot of sense if you ask me.

Ivan De Marino

olvasatlan,

2011. máj. 4. 19:17:102011. 05. 04.

– phan...@googlegroups.com

Hello,

a quick follow up on this topic, that to me represents the future direction of PhantomJS.

I have given a lot of thought and I start to believe that rewriting PhantomJS based on QtScript would make a world of sense.

As it is, PhantomJS is serving people quite well, but we have found a couple of shortcoming:

- No final, decided, approved, agreed way to load scripts/dependencies (a part from doing all in pure javascript)

- No programmatic way to simulate user interaction (a part from doing all in pure javascript)

- No "clear" workflow of how a script is executed. Some people still get confused by the fact that they need to "pivot" their code around a R/W string (i.e. phantom.state)

QtScript might provide a change of prospective. It would be an approach 'a-la-Node.js' I'd dare to say!

Ariya said it before: the WebPage would be something the Developer would "require" and use within a larger script. The webpage will be a resource within an execution context.

And given that QtScript is based on WebKit JavascriptCore (and, in the future, maybe V8), it would be a nice layered architecture. People could think of writing scripts with more than just 1 WebPage: there would be not a real limit on the number of simultaneously manipulated pages (memory and cpu aside).

So, I started playing around with QtScript a bit more, to figure out what it can offer. And, boy, it is a powerful baby!

First of all, all the official ECMA Script is there. In particular, the Global Object is exactly what you find defined here: http://interglacial.com/javascript_spec/a-15.html#a-15.1

I so decided to "play" and implement the "console" object. In it's very basic form was just this:

QCoreApplication a(argc, argv);
if (argc < 2) {
fprintf(stderr, "*** you must specify a script file to evaluate\n");
return(-1);
}
QString fileName = QString::fromLatin1(argv[1]);
QFile file(fileName);
if (!file.open(QFile::ReadOnly)) {
fprintf(stderr, "*** failed to open `%s' for reading\n", argv[1]);
return(-1);
}
QScriptEngine engine;
QString code = QTextStream(&file).readAll();
file.close();
Console console;
QScriptValue consoleObj = engine.newQObject(&console);
engine.globalObject().setProperty("console", consoleObj);
QScriptValue result = engine.evaluate(code);
if (engine.hasUncaughtException()) {
int line = engine.uncaughtExceptionLineNumber();
std::cerr << "uncaught exception at line" << line << ":" << qPrintable(result.toString()) << std::endl;
}
exit(EXIT_SUCCESS);

The above code reads from the commandline the name of a script, loads it in memory and executes it. Extremely simple.

The "Console" class is just this:

#include <iostream>

#include "console.h"

Console::Console(QObject *parent) :

    QObject(parent)

void Console::log(const QString &message)

    std::cout << qPrintable(message) << std::endl;

void Console::error(const QString &message)

    std::cout << qPrintable(message) << std::endl;

Extremely lean and simple!

And this is just scratching the surface: QtScript can be used to define an entire new Javascript Library, written in native code (peformance!) and exposed in pure Javascript (easy!).

This approach could also address Issue #41: the WebPage would not be able to access the "phantom script space", hence all the issues/fear we have about pages abusing the phantom api would be gone. There would be a "total inversion of flux" :P

OK, enough beating the keyboard.

What you all think?

Does this think make sense?

Did I explained myself properly? Or do you need me to write in slightly better english :P ?

Feedback is very much welcome.

PhantomJS could actually be "the Node.js for the command line".

Night.

James Roe

olvasatlan,

2011. máj. 5. 14:46:082011. 05. 05.

– phantomjs

Ivan. Thanks so much for taking a look at that and testing it out! I'm
really liking your examples. :)

Your console representation got me to thinking a little,. Maybe we
could also allow a stdin mode so to speak. Where you can just write/
execute scripts from stdin, or even pipe a script into the command
(might not be as useful or needed, but I think that some people may
actually find a use for that).

What do you think?

> *And this is just scratching the surface:* QtScript can be used to define an

> entire new Javascript Library, written in native code (peformance!) and
> exposed in pure Javascript (easy!).
>

> This approach could also address Issue #41<http://code.google.com/p/phantomjs/issues/detail?id=41>:

> the WebPage would not be able to access the "phantom script space", hence
> all the issues/fear we have about pages abusing the phantom api would be

> gone. There would be a *"total inversion of flux"* :P

Ivan De Marino

olvasatlan,

2011. máj. 5. 16:05:472011. 05. 05.

– phan...@googlegroups.com

Absolutely!
It would allow to write shell scripts based on JS.
And, maybe I'm exaggerating, a shell based on JS :)

Ivan De Marino
Front End Developer @ Betfair

Sent from my iPhone 4

Ariya Hidayat

olvasatlan,

2011. máj. 6. 6:17:062011. 05. 06.

– phan...@googlegroups.com

If this would have been a non-WebKit tool, Qt Script makes a lot of
sense. In fact, with the automatic binding generator it's possible to
write any application, in particular GUI one based on Qt, entirely in
ECMAScript. I have demonstrated this, also please refer to Kent
Hansen's past blog entries on Qt Labs.

Right now, the reservation I have is debugger support. There is a
stand-along Qt Script debugger Kent has built, but I would like to
leverage Web Inspector feature and provide a unified remote debugger
(see issue #6).

All is not lost. The advantage of easy of use, compared with the
current state of QtWebKit <-> native world, bindings will be there
once http://bugreports.qt.nokia.com/browse/QTBUG-11464 is solved.

Ivan De Marino

olvasatlan,

2011. máj. 7. 20:09:212011. 05. 07.

– phan...@googlegroups.com

On Fri, May 6, 2011 at 12:17 PM, Ariya Hidayat <ariya....@gmail.com> wrote:

If this would have been a non-WebKit tool, Qt Script makes a lot of
sense. In fact, with the automatic binding generator it's possible to
write any application, in particular GUI one based on Qt, entirely in
ECMAScript. I have demonstrated this, also please refer to Kent
Hansen's past blog entries on Qt Labs.

The main reason for me not just jumping on the keyboard and start typing is mainly because I know very well that you (and the other Trolls like Alessandro) are the right people to "filter ideas through".

For example, I almost assumed that QtScript was "in synch" with WebKit JavascriptCore, while it isn't.

Still tough, I believe QtScript would provide us with a great, cross platform "backbone" for future phantom.

Right now, the reservation I have is debugger support. There is a
stand-along Qt Script debugger Kent has built, but I would like to
leverage Web Inspector feature and provide a unified remote debugger
(see issue #6).

Remote Web Inspector? http://pmuellr.github.com/weinre/

Still, the web inspector is built around a webpage: how hard would be to move that to a more general scope? The idea of generalizing PhantomJS, thinking of the webpage as a module, would make a "normal debugger" still a good idea. IMHO.

Maybe we could start with normal "script debugging" and than the WebInspector could be a module. Something like

var page = require("WebPage"),
inspector = require ("WebInspector");
inspector.attach(page);

This would not only allow very cool code (come on, initialising an inspector like that would be really cool ;) ), but would allow to start coding, setting this as a future task.

All is not lost. The advantage of easy of use, compared with the
current state of QtWebKit <-> native world, bindings will be there
once http://bugreports.qt.nokia.com/browse/QTBUG-11464 is solved.

Do you think what's available today in QtScript is not good enough for our purpose?

I think it is... we should be able to build all the required with what's at hand today.

I'll have a read to the planned tasks of 11464, to see if there is anything extra that could make the task even easier.

Ivan

--
Ariya Hidayat
http://www.linkedin.com/in/ariyahidayat

--

Ivan De Marino

Front-End Developer @ Betfair

email: ivan.de...@gmail.com | detron...@gmail.com | ivan.d...@betfair.com

web: blog.ivandemarino.me | www.linkedin.com/in/ivandemarino | twitter.com/detronizator

mobile: +44 (0)7515 955 861

Ivan De Marino

olvasatlan,

2011. máj. 7. 20:48:442011. 05. 07.

– phan...@googlegroups.com

Well, having a quick look at this seems pretty straightforward: http://doc.qt.nokia.com/4.7-snapshot/qwebinspector.html

The only limitation would be that it wouldn't be possible to debug the "script" but just the "webpage manipulated by the script". But I'm not really sure a web inspector would have sense outside that context anyway.

Ariya Hidayat

olvasatlan,

2011. máj. 8. 21:35:162011. 05. 08.

– phan...@googlegroups.com

> The main reason for me not just jumping on the keyboard and start typing is
> mainly because I know very well that you (and the other Trolls like
> Alessandro) are the right people to "filter ideas through".
> For example, I almost assumed that QtScript was "in synch" with WebKit
> JavascriptCore, while it isn't.
> Still tough, I believe QtScript would provide us with a great, cross
> platform "backbone" for future phantom.

Once the script engine is unified, it's just the same thing whether to
use Qt Script or not. Qt Script just provides the convenient
wrapper/API.

> Remote Web Inspector? http://pmuellr.github.com/weinre/

Nothing to do with that (which is a nice workaround). But that's
different chapter.

> Still, the web inspector is built around a webpage: how hard would be to
> move that to a more general scope? The idea of generalizing PhantomJS,
> thinking of the webpage as a module, would make a "normal debugger" still a
> good idea. IMHO.
> Maybe we could start with normal "script debugging" and than the
> WebInspector could be a module. Something like
>
> var page = require("WebPage"),
> inspector = require ("WebInspector");
> inspector.attach(page);
>
> This would not only allow very cool code (come on, initialising an
> inspector like that would be really cool ;) ), but would allow to start
> coding, setting this as a future task.

Inspecting the web page itself is one thing, debugging the code that
does the above 3-lines is another thing.

> Do you think what's available today in QtScript is not good enough for our
> purpose?

As I said before, it won't allow us to have debugging support.

Ivan De Marino

olvasatlan,

2011. máj. 9. 18:11:542011. 05. 09.

– phan...@googlegroups.com

To put it simple: are you proposing to wait?

Or to do what exactly?

I understand, you'd like to have the possibility to run a debugger for PhantomJS scripts. And that is a great feature.

But I don't understand: you talked about a posible approach (from a Troll) but at the same time, seems like you think is not doable.

I really want to help: I do believe Phantom has a great future. But we need a bit more "words" to figure out what you have in mind.

Don't worry, I know you are on holiday: this is just a place holder email ;)

Andrew Petersen

olvasatlan,

2011. máj. 21. 15:43:172011. 05. 21.

– phan...@googlegroups.com

All,

I have never used PhantomJS, but have been researching it as a component in a nodejs project I'm working on. So please take my opinions lightly, as they could be grossly misinformed in regards to my PhantomJS knowledge.

I realize that the FAQ explicitly says:

Q: Can you bind this to NodeJS?

A: I am afraid I am not an expert on NodeJS.

However, after reading this thread (and the documentation, and many issue reports), it sounds like that is exactly what should happen. The main component node is missing is one of the most powerful components of the browser: the DOM and associated parsers (HTML5, XML). While there are great projects like JSDOM, it's a bit of reinventing the wheel when we already have browsers that are tried and true.

PhantomJS seems like a wonderful project, but has limited uses in its current form. It seems like it's really hard to get "stuff" out. There is limited security from internal code exploitation. So it's really good at running a controller-esque script that can load a webpage. It's not completely headless (I realize this isn't necessarily a PhantomJS limitation).

I believe PhantomJS should be two fold:

1) A command line-utility, much as it is today, that allows a script to cheaply run (in terms of resources) in the context of a webpage. While a few extra objects could be added to the global object, this should be limited to prevent security issues. At most, add "console.stdout" and "console.stderr" methods that allow for direct output. Everything else is just strictly what the browser gives you, but possibly allow for command-line configuration, such as allowing cross-domain XHR. There are so many JS libraries that do basically everything, it seems like a waste of time to try and implement a whole other set of APIs in another language (like QtScript).

2) A nodejs C++ extension that allows a user to do:

var phantom = new require('phantomjs').Browser({ loadExternalResources: false, enableJavaScript: false });

phantom.window.location = 'http://google.com';

phantom.window.onload = function(){ console.log('google is loaded!') }

// do some sanitization or something

phantom.enableJavaScript();

phantom.render(function(imgBuffer){

// save the buffer using the fs module

});

console.log(phantom.window.document.querySelectorAll('input').length);

// and more!

From there, you get the entire ecosystem of Node AND browser libraries that could be run against a context of phantom.window.document and such. This would give node a rock-solid DOM, HTML5 parser, built-in security policy, AND all the great test libraries that people want, including the ability to trigger native events and such for BDD. In addition, PhantomJS gets an awesome scriptable "control" environment ready to rock in the form of nodejs. This includes the ability to run PhantomJS as a nodejs-controlled server, to spawn off new instances in the form of nodejs-controlled web workers, and more! Plus, all the tooling that has been created around nodejs is available to developers.

Ivan, you wrote:

It would allow to write shell scripts based on JS.

And, maybe I'm exaggerating, a shell based on JS :)

This is exactly what nodejs gives you! It is very easy these days to write a script in js, and do everything you could do in bash in JS. At the recent nodeconf, native Windows support (not relying on Cygwin) was announced as an important goal of the project, so that's coming too.

And just to note: I found PhantomJS because I was looking for a no-fuss way to sanitize large HTML documents in a performant manner. While my use case is admittedly limited, everything else I've read on this forum and in the bug reports suggests that other people would enjoy the usage as outlined above.

If you made it this far, thanks for reading! I realize I'm effectively a stranger, and that the opinions of strangers are often uninformed or irrelevant, and I am sorry if that is true for the above. I just want to help, and if even disagreeing with this helps, then it's a win in my book!

Thanks, and please let me know if clarification is needed in any of the ideas above!

James Roe

olvasatlan,

2011. máj. 22. 2:10:462011. 05. 22.

– phantomjs

I do have to admit that having NodeJS type compatibility would be
amazingly useful. Would also allow us to not have to re-make so many
of the DOM elements, such as file API, etc. However, I'm not sure how
doable that actually is, seeing as it might be hard/impossible to
integrate into QtWebKit.

I think QtScript though can help fill that gap if it's not possible,
so all wouldn't be lost.

Note: I'm not an expert in Qt (yet), so take my opinions lightly.
Ariya and Ivan are the ones that would be more well informed to answer
this.

Ariya Hidayat

olvasatlan,

2011. máj. 24. 18:50:442011. 05. 24.

– phan...@googlegroups.com

I'll try to address the concerns in brief.

It's probably safe to assume that, if it would have been easy to make
headless WebKit as part of NodeJS, then I would have done that at the
first place. There are different technical challenges which need to be
solved first, if possible at all. If I am about to mention one, it's
because QtWebKit and Node will not share the same JS engine and
therefore it's impossible to share the context.

Don't get me wrong, I love Node and I use it for what it is intended
to be. The Windows support is getting better and recently becomes one
of the emphasizes of the project. However, I have delayed PhantomJS
long enough, it's not funny to not giving Windows users the full
support just because of that.

In other words, there is nothing really wrong with the Unix approach
of "Write programs that do one thing and do it well". You can do all
your crazy JavaScript processing with Node and when you are ready,
just feed it to/from PhantomJS.

I'm sure that all various technical challenges can be solved in time,
probably with the help of some experts out there. However, for those
who need various testing solutions using headless WebKit, PhantomJS as
it (and with upcoming improvements for 1.2, stay tuned), they can
already use it right now.

Thanks for the feedback.

Regards,

Ariya

Andrew Petersen

olvasatlan,

2011. máj. 25. 16:34:052011. 05. 25.

– phan...@googlegroups.com

Thanks for the thoughtful response.

Sorry if my initial query seemed to trivialize the work already accomplished or the work required for nodejs integration, that was definitely not my intent. PhantomJS is a great project, and it's clear that a lot of thought and hard work has gone into it.

Since you mentioned context sharing...

I had an idea for semi-sharing of context between two JS engines (with Qt/C++ as the mediator), and possibly an improvement for PhantomJS's security model. This is definitely not a true sharing, but it does allow for data to be passed, similar to a WebWorker.

Right now PJS exposes a global phantom object. For running tests, this is just fine, but for running potentially unsafe code (for example to sanitize, or scraping or something), this is an issue. I think everyone is aware of that, I'm just laying out the explanation. A possible solution is to use window.postMessage to communicate with Qt. On initialization, something like the following code could be run in the context of the QWebPage (not added as a script tag, purely injected):

(function(){

// these are generated by Qt, and could be anything relatively secure

var localSecurityTarget = "http://local/21EC2020-3AEA-1069-A2DD-08002B30309D";

var remoteSecurityTarget = "http://remote/21EC2020-3AEA-1069-A2DD-08002B30309D";

var phantom = { .. } // current phantom stuff

// user's init script, or script passed on command line is injected here

var userStuff = { something: function(){} };

// this event listener could also be part of the user's injected code

window.addEventListener('message', function(e){

if(event.origin === remoteSecurityTarget ){

// do something here...

userStuff.something();

// example of talking to Qt:

window.postMessage(JSON.stringify({

data: "I'm a message!"

}), localSecurityTarget );

}

}, false);

})();

Except that Qt is listening for the message event as well, with a similar check for expected origin. Having everything in the closure prevents any script on the page from accessing the securityToken. So, with a bit more work (and some script pre-processing), you could expose the current phantom object only within the context of the above closure, where the user's script would be. This seems secure, unless I'm missing something (pretty good chance of that :) ).

Yip yip,
~ Drew*

Ariya Hidayat

olvasatlan,

2011. máj. 25. 17:45:152011. 05. 25.

– phan...@googlegroups.com

Hi Andrew,

Check also my recent WebPage refactoring. For passing from the web
page context back to the script, that's fairly easy using evaluate()
function. However, I agree that we may need message passing (see my
comment #5 at http://code.google.com/p/phantomjs/issues/detail?id=41#c5)
for the other way around.

Keep the feedback coming!

Thanks.

Regards,

Ariya

Válasz mindenkinek

Válasz a szerzőnek

Továbbítás