phantom 1.7 cookies-file question

5,258 views
Skip to first unread message

Davis Ford

unread,
Oct 1, 2012, 9:02:36 PM10/1/12
to phan...@googlegroups.com
I tacked this question on to another thread, but I think it got lost in the weeds, so I want to answer it here with another quick summary.


$ phantomjs --cookies-file=cookies.txt test.js

There are no cookies, so it forces a login, cookies are set, and persisted to cookies.txt.  You can cat the file and see it is set.

Now, run it again the same way:

$ phantomjs --cookies-file=cookies.txt test.js

This time, the script prints phantom.cookies array is populated, but it still forces a login, even if the user agent is spoofed, why?

Thanks in advance

Admittedly, I have not tried with any other server - but Reddit cookies definitely do work in my browser -- as a regular user, so I don't see why they shouldn't also work here.

Davis Ford

unread,
Oct 6, 2012, 8:06:17 AM10/6/12
to phan...@googlegroups.com
Hi - this is the second time this question has fallen on deaf ears.  I tried to frame it simply, and provide a workable example.  My assumptions are either wrong about the way this feature (cookies) is supposed to work, or I'm doing it wrong or there's a problem with the feature.

Unfortunately, I can't tell which it is via the documentation nor through exhaustive search through google, so the next step will be to go to the source code.  I was hoping to avoid that as it can be a considerable time investment.  I was hoping that it would be a fairly quick answer for someone who is familiar with how this feature is implemented.  I have seen a number of threads on cookies support, but nothing that illustrates the answer (for me at least).

So, is there any chance someone might have a quick answer to this query -- and if not, care to share the underlying reason why the question goes unanswered?  I'm starting to wonder if it is something *I did* concerning the way I framed the question.

James Greene

unread,
Oct 6, 2012, 8:12:30 AM10/6/12
to phan...@googlegroups.com
Unfortunately, I believe that the only person who is familiar with the cookies functionality is Ivan (detro) and he is crazy busy with real life at the moment.  I haven't even seen him online in Gmail all week long.  So, yes, being an open source project, you may have to actually dig into this yourself. :(

If you do, be sure to share your new-found knowledge/assumptions so others can benefit from them.

~~James

Davis Ford

unread,
Oct 6, 2012, 8:38:19 AM10/6/12
to phan...@googlegroups.com
Hi James,

Thank you for the reply -- I understand -- I'm also crazy busy with 9 different projects, so I was hoping for a quick answer, but alas, I will go dig into it myself and post anything I find here.  Assuming this really is an issue (as opposed to user error), I'm surprised no one else has raised it.  This is what led me to suspect I was doing something wrong -- surely other phantomjs users must be using the cookies feature...but perhaps not?

--
 
 
 



--

Ariya Hidayat

unread,
Oct 6, 2012, 10:03:58 AM10/6/12
to phan...@googlegroups.com
Well, you still need to investigate whether the failed login is due to
the cookies or because the server rejects it for other reasons
(blocked user agent etc).

The cookies implementation passes our basic unit tests, see
test/webpage-spec.js. If something does not work, you have to trace it
between that uses and your case. For example, you can run a web server
and write a simple server-side app which simulates the typical login
mechanism.

In short, this is just standard debugging practice. You need to move
away from dealing with a remote server only to troubleshooting it with
everything you can control.


Regards,

--
Ariya Hidayat, http://ariya.ofilabs.com
http://twitter.com/ariyahidayat

Peter Warren

unread,
Oct 6, 2012, 10:32:29 AM10/6/12
to phan...@googlegroups.com
I run :
console.log('\n\n cookies we know about => \n\n' + JSON.stringify(phantom.cookies, null, 2)); 
before page.open and have 4 cookies.

The script continues and logs into the site and returns and I log the cookies again (this time I have 7 cookies including a ASP.NET_SessionId cookie.
I am using the parameter --cookies-file=cookies.txt but it does not save the cookies to file (it seems to save them before page.open instead of at page.onLoadFinished)
If I run script again, it has 4 cookies and I am not logged in.

Ariya Hidayat

unread,
Oct 6, 2012, 10:35:09 AM10/6/12
to phan...@googlegroups.com
Thanks for the analysis! This is very useful.

So if I understand correctly, the additional 3 cookies set later did
not make it to the cookies.txt file, right?

Davis Ford

unread,
Oct 6, 2012, 11:59:13 AM10/6/12
to phan...@googlegroups.com
I updated the gist https://gist.github.com/3808630 to dump request and response for the document -- ignoring all static files.

I also made a few other tweaks.  When I run it with out having cookies.txt, it forces authentication (as expected), and then prints the following cookies it is aware of (6 cookies total):

[
  {
    "domain": ".reddit.com",
    "httponly": false,
    "name": "reddit_session",
    "path": "/",
    "secure": false,
    "value": "14595681%2C2012-10-06T08%3A42%3A26%2Ceefcd2f109184370b0b74de07bf55ccc2b692fd5"
  },
  {
    "domain": ".reddit.com",
    "expires": "Sun, 07 Apr 2013 03:42:25 GMT",
    "expiry": 1365306145,
    "httponly": false,
    "name": "__utmz",
    "path": "/",
    "secure": false,
    "value": "55650728.1349538146.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"
  },
  {
    "domain": ".reddit.com",
    "expires": "Sun, 07 Apr 2013 03:42:25 GMT",
    "expiry": 1365306145,
    "httponly": false,
    "name": "__utmc",
    "path": "/",
    "secure": false,
    "value": "55650728"
  },
  {
    "domain": ".reddit.com",
    "expires": "Sat, 06 Oct 2012 16:12:25 GMT",
    "expiry": 1349539945,
    "httponly": false,
    "name": "__utmb",
    "path": "/",
    "secure": false,
    "value": "55650728.1.10.1349538146"
  },
  {
    "domain": ".reddit.com",
    "expires": "Mon, 06 Oct 2014 15:42:25 GMT",
    "expiry": 1412610145,
    "httponly": false,
    "name": "__utma",
    "path": "/",
    "secure": false,
    "value": "55650728.1597129995.1349538146.1349538146.1349538146.1"
  },
  {
    "domain": ".reddit.com",
    "expires": "Thu, 31 Dec 2037 23:59:59 GMT",
    "expiry": 2145916799,
    "httponly": false,
    "name": "reddit_first",
    "path": "/",
    "secure": false,
    "value": "%7B%22organic_pos%22%3A%201%2C%20%22firsttime%22%3A%20%22first%22%7D"
  }
]

cat the contents of "cookies.txt" after this runs reveals...

[General]
cookies="@Variant(\0\0\0\x7f\0\0\0\x16QList<QNetworkCookie>\0\0\0\0\x1\0\0\0\x4\0\0\0\x94reddit_first=%7B%22organic_pos%22%3A%202%2C%20%22firsttime%22%3A%20%22first%22%7D; expires=Thu, 31-Dec-2037 23:59:59 GMT; domain=.reddit.com; path=/\0\0\0\x80__utma=55650728.1597129995.1349538146.1349538146.1349538146.1; expires=Mo
n, 06-Oct-2014 15:42:28 GMT; domain=.reddit.com; path=/\0\0\0\x62__utmb=55650728.2.10.1349538146; expires=Sat, 06-Oct-2012 16:12:28 GMT; domain=.reddit.com; pat
h=/\0\0\0\x8f__utmz=55650728.1349538146.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); expires=Sun, 07-Apr-2013 03:42:28 GMT; domain=.reddit.com; path=/)"

It isn't clear how that is encoded to me, but it doesn't seem like it has the same information?  At least if you compare the "expires" properties, there are only 4 here, and 6 printed above.

Also, if I have persisted these cookies to cookies.txt, and I run the script again, and I dump out the requests that are made, I can see that the request headers never contain the cookies.  Here is one of the requests that is made for http://www.reddit.com despite knowing up front about the cookies before page.open happens:

  {
    "headers": [
      {
        "name": "User-Agent",
        "value": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0"
      },
      {
        "name": "Accept",
        "value": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
      },
      {
        "name": "Referer",
        "value": "http://www.reddit.com/"
      }
    ],
    "id": 31,
    "method": "GET",
    "time": "2012-10-06T15:42:27.291Z",
    "url": "http://www.reddit.com/"
  }

So, despite phantom reporting that it knows about these cookies via console.log(JSON.stringify(phantom.cookies)) before page.open occurs, when the request is made, no headers are set for the cookies.  

Thanks for the tip on the unit test, I will also dig into that as well.  I also follow a similar approach of writing tests close to impl that mock out other resources, but I've found the need to run real world integration scenarios as well to flesh out the kinds of behavior or data that I wasn't expecting -- and then add this as another case to the unit test suite.  It seems like something is awry here with reddit (at least) -- could be other sites as well...when I have more time, I will investigate further and report back.

Peter Warren

unread,
Oct 7, 2012, 7:54:29 AM10/7/12
to phan...@googlegroups.com
That is correct.


On Sunday, 7 October 2012 00:35:11 UTC+10, Ariya Hidayat wrote:
> I run :
> console.log('\n\n cookies we know about => \n\n' +
> JSON.stringify(phantom.cookies, null, 2));
> before page.open and have 4 cookies.
>
> The script continues and logs into the site and returns and I log the
> cookies again (this time I have 7 cookies including a ASP.NET_SessionId
> cookie.
> I am using the parameter --cookies-file=cookies.txt but it does not save the
> cookies to file (it seems to save them before page.open instead of at
> page.onLoadFinisThat is correct.hed)

Peter Warren

unread,
Oct 7, 2012, 7:56:12 AM10/7/12
to phan...@googlegroups.com
And my tests were with my own website.

AllOlli

unread,
Oct 8, 2012, 7:09:42 AM10/8/12
to phantomjs


On Oct 6, 5:59 pm, Davis Ford <davisf...@gmail.com> wrote:

> cat the contents of "cookies.txt" after this runs reveals...
>
> [General]
> cookies="@Variant(\0\0\0\x7f\0\0\0\x16QList<QNetworkCookie>\0\0\0\0\x1\0\0\0\x4\0\0\0\x94reddit_first=%7B%22organic_pos%22%3A%202%2C%20%22firsttime%22%3A%20%22first%22%7D;
> expires=Thu, 31-Dec-2037 23:59:59 GMT; domain=.reddit.com;
> path=/\0\0\0\x80__utma=55650728.1597129995.1349538146.1349538146.1349538146.1;
> expires=Mo
> n, 06-Oct-2014 15:42:28 GMT; domain=.reddit.com;
> path=/\0\0\0\x62__utmb=55650728.2.10.1349538146; expires=Sat, 06-Oct-2012
> 16:12:28 GMT; domain=.reddit.com; pat
> h=/\0\0\0\x8f__utmz=55650728.1349538146.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
> expires=Sun, 07-Apr-2013 03:42:28 GMT; domain=.reddit.com; path=/)"
>

My case is much more simple: I already have a cookie file from an
earlier version (phantomjs 1.3) which works fine up to version 1.6.1
(Windows static builds). But when using 1.7 it fails. The only thing I
need to get past a login form is the seession ID. If I call the page
with no cookie file and use v1.6.1 it creates a file containing
something like this:

[my.site.net]
PHPSESSID=97io6hlb3bj2fwwo6ocpi8ho80

But with v1.7 it creates this:

[General]
cookies=@Variant(\0\0\0\x7f\0\0\0\x16QList<QNetworkCookie>
\0\0\0\0\x1\0\0\0\0)

This file clearly contains no information whatsoever about the session
ID. I can never get past the login page even if I provide a pre-
prepared cookie file containing the session ID in the format used by
the older versions. For the moment it seems that the cookie support
was completely broken with v1.7.

Ivan De Marino

unread,
Oct 8, 2012, 12:17:09 PM10/8/12
to phan...@googlegroups.com
1) cookie format has been "binarified" in 1.7
2) the cookie storage in <1.7 was buggy and faulty
3) this doesn't mean that 1.7 might still have issues on it's own: taming the QNetworkCookieJar is not an easy task.

I'd be happy to investigate the issue if you can provide a script/example.

Ivan


--






--
Ivan De Marino
Coder, Technologist, Cook, Italian

blog.ivandemarino.me | www.linkedin.com/in/ivandemarino | twitter.com/detronizator

Ivan De Marino

unread,
Oct 8, 2012, 12:25:58 PM10/8/12
to phan...@googlegroups.com
Will look into this and let you know.

A bit of Background (for who cares to know what's going on with the cookies API):
- before 1.7 we were dealing with cookies in a very poor manner, without providing a real "cookiejar", the backend that ultimately discerns which cookie gets stored, and provided, given the URL
- 1.7 contains a total rewrite of the cookies code: I have tried to implement a full CookieJar for PhantomJS, and to base it on the OFFICIAL Qt class - QNetworkCookieJar
- unfortunately the official Class seem to have issue on it's own, and I have been writing the subclass CookieJar, trying to "iron out" those issues
- during development, many times I had to go back and rethink the approach, as new, weird behaviour were demonstrated by the CookieJar code

I'm considering even trowing away the "official" CookieJar implementation and make a completely new one.

Let me investigate using the Gist above: THAT is going to be really useful.

Ivan

--
 
 
 

Ivan De Marino

unread,
Oct 8, 2012, 1:13:15 PM10/8/12
to phan...@googlegroups.com
Fortunately (for PhantomJS 1.7) this is not a bug at all.

I have tested this script on PhantomJS 1.7 AND on Chrome and Firefox.
Chrome is "guilty as charged" here: apparently it DOESN'T clear the Reddit Session Cookies when shut down.
Maybe this is due to the fact that, even when Chrome is closed (at least on my mac), 1 process hangs around for a while.

Reddit creates (correctly) session cookies when you login: as soon as the browser is shutdown, the session cookies HAVE TO EXPIRE.

PhantomJS does exactly that: hence, even passing the "--cookies-file=<filename>" option, that can't preserve Session Cookies: they are supposed to be purged (try --debug=true to see how it's happening).

THERE IS A FIX for this though.
Reddit, as many other sites, offer the option to "remember me": this means that the Session Cookies are NOT stored as Session Cookies anymore.
But your script wasn't doing that: you were not "checking" that box.
I forked your Gist with this: https://gist.github.com/3853638

With that checkbox "checked", now you can login and preserve your session inside the Cookies file: it will be there next time you login.

I hope this helps.

Ivan

Davis Ford

unread,
Oct 8, 2012, 6:20:53 PM10/8/12
to phan...@googlegroups.com, phan...@googlegroups.com
Awesome!! Thank you Ivan!


--
 
 
 

Peter Warren

unread,
Oct 9, 2012, 1:27:49 AM10/9/12
to phan...@googlegroups.com
I had already set this option and saved an image to verify that the "remember me" option was set.
Maybe I am just doing the submit wrong and it is not actually logging in.
I am accessing a DotNetNuke website and my login button is like this <a id="dnn_ctr565_Login_Login_DNN_cmdLogin" title="Login" class="dnnPrimaryAction" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;dnn$ctr565$Login$Login_DNN$cmdLogin&quot;, &quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, true))">Login</a>

It must be doing something as the 3 new cookies are created.

Here is partial script:
page.onLoadFinished = function() { page.evaluate(function() { $.noConflict(); jQuery(document).ready(function($) { // Code that uses jQuery's $ can follow here. $('input[id="dnn_ctr565_Login_Login_DNN_txtUsername"]').attr('value', 'username'); $('input[id="dnn_ctr565_Login_Login_DNN_txtPassword"]').attr('value', 'password'); jQuery("#dnn_ctr565_Login_Login_DNN_chkCookie").prop("checked",true); jQuery("#dnn_ctr565_Login_Login_DNN_cmdLogin").click(); // or I could call the javascript function directly //page.WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("dnn$ctr565$Login$Login_DNN$cmdLogin", "", true, "", "", false, true)); }); }); window.setTimeout(function () { page.render('login.png'); //this has username, password and remember me option set console.log('\n\n cookies we know about (2) => \n\n' + JSON.stringify(phantom.cookies, null, 2)); phantom.exit(); }, 5000); };

I would appreciate it if you could tell me where I am going wrong.

Ivan De Marino

unread,
Oct 9, 2012, 6:34:58 AM10/9/12
to phan...@googlegroups.com
I'd love to be of hand, but I think there is a fundamental thing to undertand here: SESSION COOKIES are supposed to DIE when you shutdown the browser (and they WON'T be saved).
Also, that it's up to the webapp HOW they handle their cookies.

I think would be very helpful for you guys, just to learn what's going on, if you added the option "--debug=true" to phantomjs.
This way you can see what is going on with the Session cookies, and notice if and when they are removed.

The output will contain a lot of things, but the Cookie Saved / Loaded stuff is clearly marked.

Ivan

If you want to make sure to simulate a "what happens to my cookies when a browser closes", install an extension to manipulate Cookies in your browser and clear "Session Cookies".
You will see: many website will suddenly log you out :)

AllOlli

unread,
Oct 9, 2012, 9:16:40 AM10/9/12
to phan...@googlegroups.com

I'd love to be of hand, but I think there is a fundamental thing to undertand here: SESSION COOKIES are supposed to DIE when you shutdown the browser (and they WON'T be saved).
Also, that it's up to the webapp HOW they handle their cookies.

For my case, I now understand that my code no longer works because PhantomJS behaves more like a true browser regarding cookies: It correctly deletes the session cookie on end of program execution. The problem is, that I explicitly uses the cookie file to transfer an existing session ID to the PhantomJS call. The cookie file is manually created before calling PhantomJS and up to <1.7 it was then read and used. The reason that this no longer works is obviously the change of format to binary data so it doesn't recognize cookies in old format. And since the session cookie is deleted on quit I don't see its binary representation in the cookie file after the program run.

For my case, I think the solution will be to supply the session ID as parameter to the script and use the API to manually set the cookie in code.

I suggest some changes that will reduce transition problems for projects that use old PhantomJS versions in their projects:
  1. Implement some form of legacy reading for cookie files in old format so they can still bes upplied via this file in non-binary manner.
  2. Add an option to deactivate cookie deletion at the end of program execution.
  3. Clearly state in the documentation that the cookie file contains binary data, along with some examples for this format.

Thanks for your feedback!


Peter Warren

unread,
Oct 9, 2012, 9:31:22 AM10/9/12
to phan...@googlegroups.com
I agree AllOlli.
Point 2 alternative could be a saveCookie function which could be executed at any time but it would need to be accompanied by a loadCookie function which would either replace or add to the cookies loaded through the command line arguments.
BTW, I am using the debug=true option and I could really do with some help to understand why the cookies are not being set in my case.  I am loading a page, then setting some form values, then trying to execute a click event on the page and then expecting the page to submit to server and then set cookies on the return of this page.  Are multiple round trips to the server okay with the "page" variable?

AllOlli

unread,
Oct 9, 2012, 10:40:27 AM10/9/12
to phan...@googlegroups.com

On Tuesday, October 9, 2012 3:16:40 PM UTC+2, AllOlli wrote:
For my case, I think the solution will be to supply the session ID as parameter to the script and use the API to manually set the cookie in code.

Man, this was frustrating...
  • First I had to search without luck for any documentation on the cookie management API.
  • At last I found "https://github.com/ariya/phantomjs/pull/280" where it says "a quick reference can be found under the folder of src/phantomjs/test/webpage-spec.js, grep the text cookie".
  • Then I found out that calling addCookie or any other function before calling "page.open" would try to set the cookie for "about:blank" and the CookieJar would reject it.
  • The "page.render" call is based on the contents received by "page.open" so I can't set cookie before performing a request.
  • The only solution seems to be to do two requests which is completely braindead for server load, network delay and countless other reasons. D-Uh!

All in all, the current cookie implementation is much less usable than the pre 1.7 one for two reasons:

  1. The page object apparently does not allow to set the URL before actually requesting a page so cookies will be rejected because they to do not match the "about:blank" pseudo hostname.
  2. Cookies must be supplied from inside the script because of the new binary format as opposed to some simple text file format that could easily be created with any other tool.

I think that both of these issues should be addressed in future releases.

Ivan De Marino

unread,
Oct 9, 2012, 5:58:33 PM10/9/12
to phan...@googlegroups.com
All what you are trying to do can be done by nice API.
But it's true: our lack of documentation is killing us.

That is a key issue, and we are well aware of it, but at the moment only few people understand that it's just few of us working on this and we NEED HELP TO WRITE THE FREAKING DOCUMENTATION.
Or, better, SOMEONE has to make a mechanism so that we can generate the documentation.

Anyway, I'll write something here now because I see your frustration:

A bit of documentation

* page.cookies
* page.addCookie({cookie object})
* page.deleteCookie("cookie name")
* page.clearCookies()

Those API are specific to the page they are in and, most importantly, THE URL the page is on.
That's how browser work and that's how PhantomJS, an HEADLESS BROWSER, has to work.
In other words: if you are on "github.com" and try to set a cookie for "google.com", it will be rejected.

But, there is hope if you want to manipulate cookies directly in the JAR (that means, without caring for a specific page).

* phantom.cookiesEnabled
* phantom.cookies
* phantom.addCookie({cookie object})
* phantom.deleteCookie("cookie name")
* phantom.clearCookies()

those API allow you to set any cookie you want, before you load a page (or without the need of opening one at all).
If you set a cookie on ".google.com" and than open a page on "www.google.com", that cookie will be picked up.

Preserving Session Cookies

Said all this, now about the Session cookies: if you want to "PRESERVE" session cookies, you need to make it yourself.
You can access all the session cookies until they are purged (at shutdown): this means you can collect them, store them (use the "fs" API) and restore them at the next launch.

It's simple and it's, most importantly, the correct behaviour for the browser.

Session file format change

To put it as simple as possible, the previous format was wrong.
Cookies were stored "per domain", and didn't take into consideration the fact that cookies for ".google.com" will be visible on "www.google.com / translate.google.com / whatever.google.com".
Only the fact that PhantomJS doesn't have an extremely large audience didn't make us look like fools with that implementation: but that needed to be rewritten and that's what I did.

The fact that cookies are stored in binary is just a convenience to speed things up: having to write the cookies in a "textual" format would have required to deal with parsing and all sort of stuff that:
- there was no time to deal with
- would have been slower than serializing/deserializing the binary data

Embrace improvements and changes, even if you don't get it

A short remark. I have read this thread and tried to be helpful.
I have spent time debugging code was not mine, only to discover that the issue was in not understanding how session cookie work.
I'm now seeing you guys criticising work you frankly don't get, because "it doesn't behave like before and now my code breaks".
This is not a commercial software: this is open source software.
We make it while going along.

Sometimes we get it right, a lot of times we get it wrong.
It's not in our intention to create disruption or "ruin your day".
But it's our freaking time we spend on this stuff: have the mercy to understand that.
And the grace to spend more than a couple of neurons to realize that a change to improve things might break previous code.
We release versions for a reason.
There are some of us dedicating their time to build binary versions to release and create "versions".

You don't like a new version? Stick with the previous one.
You want "the new shiny thing in the new version", deal with the fact that there might be changes and things will/can break.

Ultimately, the code is there: if you think something is wrong or broken, or you just want to understand "what the heck are we doing",
start your favourite editor and read up.
You might be on to something, and anyone here will praise you for your fix/improvement.
HELP TO MAKE THE PROJECT BETTER if you think we are making mistakes.
Fix code, write tests, write documentation. That is ALL helpful.

Sorry, but I had to say something.
Maybe this was suitable for a blog post... but it's now here and I'm too lazy and tired to cut&paste it somewhere else.

--
 
 
 

James Greene

unread,
Oct 10, 2012, 2:25:27 AM10/10/12
to phan...@googlegroups.com
Updated the API reference to include the newly redesigned Cookies API:

New entries:
If there is incorrect information, please feel free to update it.  I did the best I could based on what Ivan posted here, and looking at the pertinent source and tests.

~~James

Ivan De Marino

unread,
Oct 10, 2012, 2:29:25 AM10/10/12
to phan...@googlegroups.com
You rock James!
Will take a look at it today.

-----
Ivan De Marino
Coder, Cook, Cyclist, Gamer

Sent while standing on one leg
--
 
 
 

James Greene

unread,
Oct 10, 2012, 2:30:56 AM10/10/12
to phan...@googlegroups.com
Happy to help, albeit at 1:30am. =Þ
~~James

AllOlli

unread,
Oct 10, 2012, 5:14:24 AM10/10/12
to phan...@googlegroups.com
Give it up! A big round of applause for Ivan and James!

Ivan De Marino

unread,
Oct 10, 2012, 6:11:57 AM10/10/12
to phan...@googlegroups.com
that was not the point.

but I hope now you guys can interact better with cookies and it's clearer why things work the way they do.

I should probably start blogging on the features I add :)

On 10 October 2012 10:14, AllOlli <oliver...@gmx.de> wrote:
Give it up! A big round of applause for Ivan and James!


--
 
 
 

Peter Warren

unread,
Oct 10, 2012, 9:46:21 AM10/10/12
to phan...@googlegroups.com
I have nothing but admiration for the work that all you guys do. All the forum users (I am one of them) want to do is use your product. Everyone is at different levels of expertise and wanting to move forward but you guys are the only ones that can help us.  Only recently have I started to read and post to forums and I see a lot of people ask questions without really attempting to do anything on their own first, like reading documentation and looking at examples.  I have spent quite a bit of time trying to solve this on my own. I should have just kept to my problem without butting into AllOlli's comments.  Sorry if you took my comments as criticism.  It was not intended.

AllOlli

unread,
Oct 11, 2012, 6:12:30 AM10/11/12
to phan...@googlegroups.com

Am Mittwoch, 10. Oktober 2012 15:46:21 UTC+2 schrieb Peter Warren:
I have nothing but admiration for the work that all you guys do. All the forum users (I am one of them) want to do is use your product. Everyone is at different levels of expertise and wanting to move forward but you guys are the only ones that can help us.  Only recently have I started to read and post to forums and I see a lot of people ask questions without really attempting to do anything on their own first, like reading documentation and looking at examples.  I have spent quite a bit of time trying to solve this on my own. I should have just kept to my problem without butting into AllOlli's comments.

The most important part is to try your best to solve the problem. If that doesn't get you where you want to go, it's time to look at existing bug reports or discussions and if you find one exactly matching the problem, then join the discussion. You just have to find the right place for discussion and you get actual results. For my problem it turned out that the biggest problem was that all the information I could find about PhantomJS wasn't enough to correctly work with cookies. Enhanced documentation solved that, and because I told about my problem at the right place it all happened in just one day.

The possibility to directly reach the product developers is the biggest plus for open source software. Good luck reaching the right developer for a bug you found in Windows. One example: The Aero animation in Windows 7 when closing a window is choppy and broken whereas in Vista it is butter smooth. Some people don't even notice this, others are majorly distracted and have deactivated animations to ease their pain. The problem is discussed in some official Windows forum for almost three years on 29 pages and nothing was done about it. See how open source runs circles around big companies?

Now just the second part of the problem is left in some way: One could write code using the "fs" API to read cookies from a plain text file and set them but this takes away big parts of the usefulness of the numerous scripts in the "examples" directory that already solved most problems potential users have but will from now on not get them to results if they want to work on pages that require a valid login session. Just for convenience some way to provide cookies in a plain text file should be restored.

AllOlli

unread,
Oct 11, 2012, 6:17:19 AM10/11/12
to phan...@googlegroups.com

Am Dienstag, 9. Oktober 2012 07:27:49 UTC+2 schrieb Peter Warren:
I had already set this option and saved an image to verify that the "remember me" option was set.
Maybe I am just doing the submit wrong and it is not actually logging in.

I cannot directly provide a solution for your problem, but have you tried looking at the actual requests that you werer sending? Did they include the cookie headers? If you cannot log the received headers on the server side then as a last resort you could use a packet sniffer (Ethereal, Packetyzer) to spy on those HTTP requests. Also, you should activate the debug mode for PhantomJS and look if your cookies are rejected by the CookieJar.

Ivan De Marino

unread,
Oct 11, 2012, 6:17:28 AM10/11/12
to phan...@googlegroups.com
So, this is too difficult?

function saveCookies(cookiePersonalizedStorage, page) {
    var fs = require("fs");
    fs.write(cookiePersonalizedStorage, JSON.stringify(page.cookies));
}

function restoreCookies(cookiePersonalizedStorage, page) {
    var fs = require("fs");
    var cookies = fs.read(cookiePersonalizedStorage);
    page.cookies = JSON.parse(cookies);
}

It's really THAT easy.

--
 
 
 

AllOlli

unread,
Oct 11, 2012, 6:39:50 AM10/11/12
to phan...@googlegroups.com

Am Donnerstag, 11. Oktober 2012 12:17:50 UTC+2 schrieb Ivan De Marino:
So, this is too difficult?
[...]

It's really THAT easy.

Yes it is. But that is not the point of my suggestion. I want to point out a regression in usability.

The point is that there is a directory full of examples that even novice users can make great use of. Those users could use ALL OF THEM them AS IS for pages behind login just by writing a cookie into a text file and calling the script. The directory provided a "swiss army knife" of of ready-to-use commands. This is no longer the case because now code is needed and the scripts need to be changed. This is not difficult, but it takes away convenience and might turn away potential new users. Or, even worse, it might lead to a big increase in annoying novice questions on the communication channels. This is a moment where you developers need to decide if you want a large user base or if you want to go with a smaller audience of users who are more adept in changing code themselves.

Adding the two above functions to the core (use phantom.cookies in them instead) and adding a "--json-cookies-file=<filename>" switch directly to the command will make a lot of users happy. Trust me.

To make that clear: My own problem is solved, this is now an excercise in product design for end users.

AllOlli

unread,
Oct 11, 2012, 6:44:53 AM10/11/12
to phan...@googlegroups.com
For new visitors to the thread here are two examples for the binary cookie format.

This is a file with no cookies stored:
[General]
cookies
=@Variant(\0\0\0\x7f\0\0\0\x16QList<QNetworkCookie>\0\0\0\0\x1\0\0\0\0)

This is a file with two session cookies in it:
[General]
cookies
="@Variant(\0\0\0\x7f\0\0\0\x16QList<QNetworkCookie>\0\0\0\0\x1\0\0\0\x2\0\0\0OPHPSESSID=5db1uf7858n6gsg3q9kfhshgo4; domain=.abcdefg.a-domain-name.net; path=/\0\0\0HPHPSESSID=jmkmbvpear5p6722m7pag2p3c1; domain=other12.adomain.com; path=/)"

This should clarify that with actual cookies in it, you should still see them, even though it's a binary format.

Message has been deleted
Message has been deleted

0xnigarth

unread,
Oct 12, 2012, 1:16:02 PM10/12/12
to phan...@googlegroups.com
Hello,

Since I am total noob on the way google groups work I guess I have to re write my first post which I don't know if got sent or lost or what.

Anyway, I have experienced similar behavior to what has been described on some of the earliest posts on this group. I am trying to automate some process on a web app (.NET) that I am currently working at without much luck so far. I have managed to pass the login screen and land on the Home Page of my application, However any subsequent requests triggers some kind of session time out or rejection that redirects Phantom to the login screen.

After many hours of debugging and messing around I figured it was caused by some incorrect cookie behavior. It seemed that session cookies are not being persisted even with the --cookies-file... thing. Then I Learned about the customHeaders page property. With this one I tried to manually insert the cookie values that I received after the login which are the sessionID and settings. I thought this was going to solve the problem but again no luck.

Finally I decided to take a look at the headers of the requests made on FireFox's FireBug when I logged in manually. It was at this point that I came across with the mysterious HTTPOnly Cookies.

Besides the sessionID and the settings value, firebug shows that the requests have a third cookie which is the .AUTH value. I have not found out if it is possible to retrieve these type of cookies on with Phantom.

I don't know the details of the implementation of how are the cookie values being retrieved on responses, however if by any chance cookie values are obtained through a page.evaluate() call then it appears that HTTPOnly Cookies will not be present on phantom's page.cookies array.
I'm just making assumptions at this moment but any help, insight or comments would be greatly appreciated.

Best Regards,

0xnigarth

Ivan De Marino

unread,
Oct 13, 2012, 4:27:39 AM10/13/12
to phan...@googlegroups.com
Hi,

well if PhantomJS doesn't handle HttpOnly cookies, there is a bug.
But I'm sure we handle (or at least try to handle) them.

If you think phantomjs fails to handle them, please provide a simple reproducible example that shows how HttpOnly cookies are ignored by phantomjs.

Also, given that this topic was about a specific topic, next time start a specific topic please.

Thanks
Ivan



--
 
 
 

Sergei Dorogin

unread,
Nov 26, 2012, 8:30:56 AM11/26/12
to phan...@googlegroups.com
Hello.
I'm having issues with working with cookies.
I've read all thread and I understand that format of file supplied to --cookies-file isn't json.
I'm trying to solve pretty simple issue as I thought but I'm stuck. I need to load a page get cookies, save them to file and then later restore them.
At first I tried to use --cookie-file option. After first session authenticated I dumped cookies as:

var fs = require("fs");
fs.write("cookies.txt", JSON.stringify(phantom.cookies));
and then specify this "cookies" file for other sessions:
& $phantomDir\phantomjs --cookies-file=cookies.txt "selftest.js"

To my suprise content of cookies.txt was changed by phantomjs onto that disscussed a lot here binary format. This is not a problem itself. But cookies are ignored (phantom.cookies is []).
Why do you change cookies file content but ignore it? This is hard to understand.

Then I decided to go without "--cookies-file" option: just save cookies to json file and restore them by js-code (as Ivan mentioned in the thread).
But I can't restore them! All calls to phantom.cookies and phantom.addCookies are ignored.
There's repro code below.

var g_cookiesFile = "selftest-cookies.txt";

function saveCookies() {

   
var fs = require("fs");

    fs
.write(g_cookiesFile, JSON.stringify(phantom.cookies));
    console
.log("Saving cookies: " + JSON.stringify(phantom.cookies));
}

function restoreCookies() {
    console
.log("Restoring cookies");

   
var fs = require("fs");

   
if (fs.exists(g_cookiesFile)) {
       
var cookies = fs.read(g_cookiesFile);
        cookies
= JSON.parse(cookies);
        console
.log("Restored cookies: " + JSON.stringify(cookies));
       
if (cookies.length > 0) {
           
for(var i=0; i<cookies.length; ++i)  {
                phantom
.addCookie(cookies[i]);
                console
.log("add: " + JSON.stringify(cookies[i]));
           
}
       
}
       
//also tried: phantom.cookies = cookies;
   
}
}

restoreCookies
();
console
.log("Using cookies: " + JSON.stringify(phantom.cookies));

if (phantom.cookies.length === 0) {
    phantom
.addCookie({
       
'name':     'Valid-Cookie-Name',
       
'value':    'Valid-Cookie-Value',
       
'domain':   'localhost',
       
'path':     '/',
       
'httponly': false,
       
'secure':   false,
       
'expires':  (new Date()).getTime() + 3600
   
});

    saveCookies
();
}
phantom
.exit(0);


Message has been deleted

Sergei Dorogin

unread,
Nov 26, 2012, 11:29:24 AM11/26/12
to phan...@googlegroups.com
It's strange. I posted a reply with explanation of fix for my issue, but now I can see only "This message has been deleted."
Anyway, I'll repeat.

It seems that phantomjs fails on setting cookies with not-Date values of expires/expiry fields.
The problem is in that JSON.stringify serialize Dates as string like "Пн, 26 ноя 2012 17:42:18 GMT", but JSON.parse parses them as strings. And phantom fails on settings cookies with incorrect field (silently)
The simplest fix:
function restoreCookies() {
    console
.log("Restoring cookies");
   
var fs = require("fs");
   
if (fs.exists(g_cookiesFile)) {
       
var cookies = fs.read(g_cookiesFile);
        cookies
= JSON.parse(cookies);

       
if (cookies.length > 0) {
           
for(var i=0; i<cookies.length; ++i)  {

               
var item = cookies[i];
               
if (item.expires)
                    item
.expires = Date.parse(item.expires);
               
if (item.expiry)
                    item
.expiry = Date.parse(item.expiry);
                phantom
.addCookie(item);
           
}
   
       
}
   
}
}


On Tuesday, October 2, 2012 5:02:36 AM UTC+4, Davis Ford wrote:
I tacked this question on to another thread, but I think it got lost in the weeds, so I want to answer it here with another quick summary.


$ phantomjs --cookies-file=cookies.txt test.js

There are no cookies, so it forces a login, cookies are set, and persisted to cookies.txt.  You can cat the file and see it is set.

Now, run it again the same way:

$ phantomjs --cookies-file=cookies.txt test.js

This time, the script prints phantom.cookies array is populated, but it still forces a login, even if the user agent is spoofed, why?

Thanks in advance

Admittedly, I have not tried with any other server - but Reddit cookies definitely do work in my browser -- as a regular user, so I don't see why they shouldn't also work here.

Sergei Dorogin

unread,
Nov 26, 2012, 11:34:14 AM11/26/12
to phan...@googlegroups.com
Let ask one more question.
phantom.cookies is empty when my code loading remote url (it works when loading localhost). Why? Is it on purpose?

I tried to turn off web-security options (="no").
My cookie is httponly.

Sergei Dorogin

unread,
Nov 27, 2012, 3:57:20 AM11/27/12
to phan...@googlegroups.com
I create an issue - http://code.google.com/p/phantomjs/issues/detail?id=886

The key point is that cookies are recieved in response of ajax request.

James Greene

unread,
Nov 27, 2012, 9:46:47 AM11/27/12
to phan...@googlegroups.com
CCed Ivan on the issue.
~~James
Reply all
Reply to author
Forward
0 new messages