So I just filed
https://github.com/ariya/phantomjs/issues/12216 ... the quick summary of which is that `page.url` calls `QUrl::toString()` instead of `QUrl::toEncoded()`, which is inconvenient if you want to store page URLs in a database somewhere and then, later, re-access them using something else. In fact, it can destroy information and make it *impossible* to re-access the URL. Here's a concrete example where this matters: right now, if you attempt to load the page `
http://genesis-ec.com/`, you get a 302 Moved Temporarily which puts you at `
http://genesis-ec.com/err.asp?shopcd=99999&errmsg=%81E%83V%83%87%83b%83v%82%AA%91I%91%F0%82%B3%82%EA%82%C4%82%A2%82%DC%82%B9%82%F1%3CBR%3E`. That query string is not valid UTF-8. (Based on playing around with the Character Encoding menu in Firefox and then pasting stuff into Google Translate, I think it's Shift_JIS.) `page.url` for the redirected page comes out as '
http://genesis-ec.com/err.asp?shopcd=99999&errmsg=\ufffdE\ufffdV\ufffd\ufffd\ufffdb\ufffdv\ufffd\ufffd\ufffdI\ufffd\ufffd\ufffd\ufffd\u0102\ufffd\ufffd\u0702\ufffd\ufffd\ufffd<BR>', which as you can see has lost information.
This may be impossible to fix 100% without modifying Qt itself -- the QUrl documentation leads me to believe that it internally assumes URLs are always encoded in UTF-8, which, as the above example demonstrates, is wrong -- but it would be a step in the right direction to give access to `QUrl::toEncoded`. Now, it would be trivial to add a `page.encoded_url` property, but I'm wondering if it would be *better* to define a "URL object" which exposes as much of the QUrl API as makes sense, and make that be the value of `page.url` and various other properties (basically wherever pjs internally stores a QUrl). For backward compatibility it would stringify as it always has, but one could also access page.url.encoded, page.url.hostname, and so on.
This is a blocker for me on a project I'm using PhantomJS for, so I am volunteering to do the programming, but CONTRIBUTING.md says discuss new features here first. :-) What do you think?
zw