File downloading

9,798 views
Skip to first unread message

Vitaliy Slobodin

unread,
May 2, 2013, 8:49:54 AM5/2/13
to phan...@googlegroups.com
Hi guys,

I finished a new feature for file downloading.

What's inside:
Two new callbacks for the WebPage:
1) onFileDownloadError - invokes when error has been occurred while downloading a file
2) onFileDownload - invokes when a web servers' response includes a "Content-Disposition" header with the "attachment" directive.
How it looks in action:
webpage.onFileDownload = function(url) {
   
return "filename";
}

The 'url' argument is a full link to the downloading file.
This callback must return a string value which indicates where PhantomJS should save a file. If user wants to abort downloading, he should return an empty string.
---
webpage.onFileDownloadError = function(errorMessage) {
    console
.log(errorMessage)
}
This callback raises when PhantomJS was unable to download file. 
This callback will not be raised if error is network related (in this case our standard onResourceError callback will be called).


Comments, thoughts, ideas?

Thanks,
Vitaliy.



James Greene

unread,
May 2, 2013, 10:22:00 AM5/2/13
to phan...@googlegroups.com
Comments/Questions:
  1. Awesome, you rock!

  2. Regarding the filename to be returned:
    • Does it save to a particular folder (e.g. a "downloads" folder to be configured at the command line)?
    • Does it save relative to the current working directory?
    • Can I return an absolute path to make it save somewhere specific regardless of the answers to the above two questions?

  3. I'm assuming that when we enable the ability to read data from HTTP responses (i.e. in `onResourceReceived`) that people would also want to be able to do the same to inspect/read download responses rather than always having to save them.  Will that be a feasible possibility with the current implementation, or will we always have to save to file first?

Sincerely,
    James Greene





--
You received this message because you are subscribed to the Google Groups "phantomjs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phantomjs+...@googlegroups.com.
Visit this group at http://groups.google.com/group/phantomjs?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Vitaliy Slobodin

unread,
May 2, 2013, 12:49:22 PM5/2/13
to phan...@googlegroups.com
Hi,

1) ;) I planning to merge a bunch of new features/improvements in 2-3 days.
2) You can use whatever path you want: relative or absolute.
3) I thought about that. Yes, we can implement such feature. We could setup new callback which will allow user to examine the current download buffer content. Something like a 'onFileDownloadProgress' callback

Regards,
Vitaliy.

James Greene wrote:

*Comments/Questions:*

1. Awesome, you rock!

2. Regarding the filename to be returned:
      * Does it save to a particular folder (e.g. a "downloads" folder

        to be configured at the command line)?
      * Does it save relative to the current working directory?
      * Can I return an absolute path to make it save somewhere

        specific regardless of the answers to the above two questions?

3. I'm assuming that when we enable the ability to read data from

    HTTP responses (i.e. in `onResourceReceived`) that people would
    also want to be able to do the same to inspect/read download
    responses rather than always having to save them.  Will that be a
    feasible possibility with the current implementation, or will we
    always have to save to file first?


Sincerely,
    James Greene



On Thu, May 2, 2013 at 7:49 AM, Vitaliy Slobodin
<vitaliy....@gmail.com <mailto:vitaliy....@gmail.com>> wrote:

    Hi guys,

    I finished a new feature for file downloading.

  & nbsp; *What's inside:*

    Two new callbacks for the WebPage:
    1) onFileDownloadError - invokes when error has been occurred
    while downloading a file
    2) onFileDownload - invokes when a web servers' response includes
    a "Content-Disposition" header with the "attachment" directive.
    How it looks in action:
    ||
    webpage.onFileDownload =function(url){
    return"filename";
    }

    The 'url' argument is a full link to the downloading file.
    This callback must return a string value which indicates where
    PhantomJS should save a file. If user wants to abort downloading,
    he should return an empty string.
    ---
    ||
    webp age.onFileDownloadError =function(errorMessage){
        console.log(errorMessage)
    }
    This callback raises when PhantomJS was unable to download file.
    This callback will not be raised if error is network related (in
    this case our standard onResourceError callback will be called).



    Comments, thoughts, ideas?

    Thanks,
    Vitaliy.



    --
    You received this message because you are subscribed to the Google
    Groups "phantomjs" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to phantomjs+...@googlegroups.com

Vasiliy P

unread,
May 2, 2013, 12:50:05 PM5/2/13
to phan...@googlegroups.com
Thanks for the new feature, it sounds good. Is there any way to read more information about the file inside the callback? For instance, file's size using 'Content-Length' header, mime type and such.


2013/5/2 James Greene <james.m...@gmail.com>

Vitaliy Slobodin

unread,
May 2, 2013, 12:54:12 PM5/2/13
to phan...@googlegroups.com
Hi Vasiliy,

Could you describe what kind of information would be useful for you?

Regards,
Vitaliy.

Vasiliy P wrote:

Thanks for the new feature, it sounds good. Is there any way to read
more information about the file inside the callback? For instance,
file's size using 'Content-Length' header, mime type and such.


2013/5/2 James Greene <james.m...@gmail.com


    *Comments/Questions:*

     1. Awesome, you rock!

     2. Regarding the filename to be returned:
          * Does it save to a particular folder (e.g. a "downloads"

            folder to be configured a t the command line)?
          * Does it save relative to the current working directory?
          * Can I return an absolute path to make it save somewhere

            specific regardless of the answers to the above two questions?

     3. I'm assuming that when we enable the ability to read data from

        HTTP responses (i.e. in `onResourceReceived`) that people
        would also want to be able to do the same to inspect/read
        download responses rather than always having to save them.
         Will that be a feasible possibility with the current
        implementation, or will we alway s have to save to file first?



    Sincerely,
        James Greene



    On Thu, May 2, 2013 at 7:49 AM, Vitaliy Slobodin

    wrote:

        Hi guys,

        I finished a new feature for file downloading.

        *What's inside:*

        Two new callbacks for the WebPage:
        1) onFileDownloadError - invokes when error has been occurred
        while downloading a file
        2) onFileDownload - invokes when a web servers' response
      &nb sp; includes a "Content-Disposition" header with the "attachment"

        directive.
        How it looks in action:
        ||
        webpage.onFileDownload =function(url){
        return"filename";
        }

        The 'url' argument is a full link to the downloading file.
        This callback must return a string value which indicates where
        PhantomJS should save a file. If user wants to abort
        downloading, he should return an empty string.
        ---
        ||
    & nbsp;   webpage.onFileDownloadError =function(errorMessage){

            console.log(errorMessage)
        }
        This callback raises when PhantomJS was unable to download file.
        This callback will not be raised if error is network related
        (in this case our standard onResourceError callback will be
        called).

        *Branch:*

        https://github.com/Vitallium/phantomjs/tree/download-support

        Comments, thoughts, ideas?

        Thanks,
        Vitaliy.



         --
        You received this message because you are subscribed to the
        Google Groups "phantomjs" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to phantomjs+...@googlegroups.com
        <mailto:phantomjs%2Bunsu...@googlegroups.com>.

        Visit this group at
        http://groups.google.com/group/phantomjs?hl=en.
        For more options, visit https://groups.google.com/groups/opt_out.



    --
    You received this message because you are subscribed to the Google
    Groups "phan tomjs" group.

    To unsubscribe from this group and stop receiving emails from it,
    send an email to phantomjs+...@googlegroups.com

James Greene

unread,
May 2, 2013, 12:54:36 PM5/2/13
to phan...@googlegroups.com
What happens if a download is triggered but this callback is not setup?  I'm assuming PhantomJS will ignore it rather automatically downloading it to a Downloads/CWD folder using the filename that Content-Disposition header suggests (auto-saving files like that could get pretty sketchy security-wise).

Sincerely,
    James Greene

Vitaliy Slobodin

unread,
May 2, 2013, 12:57:56 PM5/2/13
to phan...@googlegroups.com
Yes, PhantomJS will ignore all download requests. Also the `resourceError` callback will be invoked with code 5 (operation canceled).

James Greene wrote:

What happens if a download is triggered but this callback is not
setup?  I'm assuming PhantomJS will ignore it rather automatically
downloading it to a Downloads/CWD folder using the filename that
Content-Disposition header suggests (auto-saving files like that could
get pretty sketchy security-wise).

Sincerely,
    James Greene



On Thu, May 2, 2013 at 11:50 AM, Vasiliy P <vas...@hotger.com
<mailto:vas...@hotger.com>> wrote:

    Thanks for the new feature, it sounds good. Is there any way to
    read more information about the file inside the callback? For
&n bsp;   instance, file's size using 'Content-Length' header, mime type and

    such.


    2013/5/2 James Greene <james.m...@gmail.com
    <mailto:james.m...@gmail.com>>

        *Comments/Questions:*

         1. Awesome, you rock!

         2. Regarding the filename to be returned:
              * Does it save to a particular folder (e.g. a

                "downloads" folder to be configured at the command line)?
              * Does it save relative to the current working directory?
       &nb sp;      * Can I return an absolute path to make it save

                somewhere specific regardless of the answers to the
                above two questions?

         3. I'm assuming that when we enable the ability to read data

            from HTTP responses (i.e. in `onResourceReceived`) that
            people would also want to be able to do the same to
            inspect/read download responses rather than always having
            to save them.  Will that be a feasible possibility with
 &nbsp ;          the current implementation, or will we always have to save

            to file first?


        Sincerely,
            James Greene



        On Thu, May 2, 2013 at 7:49 AM, Vitaliy Slobodin
        <vitaliy....@gmail.com
        <mailto:vitaliy....@gmail.com>> wrote:

            Hi guys,

            I finished a new feature for file downloading.

            *What's inside:*
       & nbsp;    Two new callbacks for the WebPage:

            1) onFileDownloadError - invokes when error has been
            occurred while downloading a file
            2) onFileDownload - invokes when a web servers' response
            includes a "Content-Disposition" header with the
            "attachment" directive.
            How it looks in action:
            ||
            webpage.onFileDownload =function(url){
            return"fi lename";
            }

            The 'url' argument is a full link to the downloading file.
            This callback must return a string value which indicates
            where PhantomJS should save a file. If user wants to abort
            downloading, he should return an empty string.
            ---
            ||
            webpage.onFileDownloadError =function(errorMessage){
                console.log(errorMessage)
            }
            This callback raises when PhantomJS was unable to download
            file.
            This callback will not be raised if error is network
            related (in this case our standard onResourceError
            callback will be called).

            *Branch:*

            https://github.com/Vitallium/phantomjs/tree/download-support

            Comments, thoughts, ideas?

      &nbs p;     Thanks,

            Vitaliy.



            --
            You received this message because you are subscribed to
            the Google Groups "phantomjs" group.
            To unsubscribe from this group and stop receiving emails
            from it, send an email to
            phantomjs+...@googlegroups.com
            <mailto:phantomjs%2Bunsu...@googlegroups.com>.

            Visit this group at
            http://groups.google.com/group/phantomjs?hl=en.
            For more options, visit
            https://groups.google.com/groups/opt_out.



        --
        You received this message because you are subscribed to the
        Google Groups "phantomjs" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to phantomjs+...@googlegroups.com
        <mailto:phantomjs%2Bunsu...@googlegroups.com>.
        Visit this group at

        For more options, visit https://groups.google.com/groups/opt_out.



    --
    You received this message because you are subscribed to the Google
    Groups "phantomjs" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to phantomjs+...@googlegroups.com

    Visit this group at http://groups.google.com/group/phantomjs?hl=en.
    For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google
Groups "phantomjs" group.
To unsubscribe from this group and stop receiving emails from i t, send

Vasiliy P

unread,
May 2, 2013, 12:59:58 PM5/2/13
to phan...@googlegroups.com
I think it would be really useful to have file size, file name, mime type - just from HTTP headers. So, the phantom developer would be able to introduce some logic on top of this information. For example, ignore files larger than 1Tb


2013/5/2 James Greene <james.m...@gmail.com>

Ariya Hidayat

unread,
May 23, 2013, 1:26:45 AM5/23/13
to phan...@googlegroups.com
> I think it would be really useful to have file size, file name, mime type -
> just from HTTP headers. So, the phantom developer would be able to introduce
> some logic on top of this information. For example, ignore files larger than
> 1Tb

I like that.

Thus, the example code might look like:

webpage.onFileDownload = function(downloadInfo) {
if (downloadInfo.size < 1024*1024*4) {
return "/some/path" + downloadInfo.fileName;
}
}

Having an object passed to this callback will also provide more
extensibility in the future.

That's my only minor comment. Otherwise, it looks good, Vitaliy!


Thank you!

Regards,


--
Ariya Hidayat, http://ariya.ofilabs.com
http://twitter.com/ariyahidayat
http://gplus.to/ariyahidayat

Patrick Lalonde

unread,
Jun 5, 2013, 2:59:52 PM6/5/13
to phan...@googlegroups.com
Thanks for adding this feature, I think this is what I have been looking for!

I am trying to download a pdf file which is accessed by clicking a form button; the filename and download url are unknown.

With your version of PhantomJS, after I click the download button, the onFileDownload function is triggered which is the desired effect, but the download fails.  onFileDownloadError produces no error.

The URL provided to the onFileDownload function is the URL of the form I just clicked, while I was expecting the URL of the file being pushed to me.

Any suggestions how to download such a file, or how to debug this download problem would be much appreciated.

Thanks,

Patrick

Vitaliy Slobodin

unread,
Jun 6, 2013, 11:10:11 AM6/6/13
to phan...@googlegroups.com
Hi,
This feature is still under development.
If you could create a working example for your situation, it would be great!

Regards,
Vitaliy.

Patrick Lalonde wrote:

Thanks for adding this feature, I think this is what I have been
looking for!

I am trying to download a pdf file which is accessed by clicking a
form button; the filename and download url are unknown.

With your version of PhantomJS, after I click the download button, the
onFileDownload function is triggered which is the desired effect, but
the download fails. onFileDownloadError produces no error.

The URL provided to the onFileDownload function is the URL of the form
I just clicked, while I was expecting the URL of the file being pushed
to me.

Any suggestions how to download such a file, or how to debug this
download problem would be much appreciated.

Thanks,

Patrick

On Thursday, May 2, 2013 8:49:54 AM UTC-4, Vitaliy Slobodin wrote:

    Hi guys,

    I finished a new feature for file downloading.

    *What's inside:*

    Two new callbacks for the WebPage:
    1) onFileDownloadError - invokes when error has been occurred
    while downloading a file
    2) onFileDownload - invokes when a web servers' response includes
    a "Content-Disposition" header with the "attachment" directive.
    How it looks in action:
    ||
    webpage.onFileDownload =function(url){
    return"filename";
    }

    The 'url' argument is a full link to the downloading file.
&nbs p;   This callback must return a string value which indicates where

    PhantomJS should save a file. If user wants to abort downloading,
    he should return an empty string.
    ---
    ||
    webpage.onFileDownloadError =function(errorMessage){
    console.log(errorMessage)
    }
    This callback raises when PhantomJS was unable to download file.
    This callback will not be raised if error is network related (in
    this case our standard onResourceError callback will be called).


    <https://github.com/Vitallium/phantomjs/tree/download-support>

    Comments, thoughts, ideas?

    Th anks,

Patrick Lalonde

unread,
Jun 6, 2013, 7:19:03 PM6/6/13
to phan...@googlegroups.com
After experimenting with this version of PhantomJS I came to realize that, contrary to what I thought, the download was being received properly.

The problem in my case was that the path and filename assigned to the file get 'mangled' which causes the file to be stored in unexpected places.

After clicking a download url in the format of "/docviewer.aspx?keyid=dmWHxTXXXXXLI1lRULBT+5l7y+XXXXXXXXXXXXXXXXXXn+4UEjwZ4muZ9aXXXXXXXXX2/td99G/Lf+y2XRHK4wQ==", I end up with a file named "Lf+y2XRHK4wQ=="

If I return an absolute path from onFileDownload (/some/path/to/download), the file "Lf+y2XRHK4wQ==" will be stored as /some/path/toLf+y2XRHK4wQ==.  Relative paths produce different results.

If I open the file, it's perfect.

The filename seems to be derived from the URL instead of the content-disposition header.

I just wanted to clarify my previous feedback and wanted to thank you again for adding this feature, I'm looking forward to testing new versions.

Regards,

Patrick

Shola O

unread,
Jul 17, 2013, 5:27:27 PM7/17/13
to phan...@googlegroups.com
I know this is an old thread but I was interested in how you added download support without webkit crashing?

Shola

Matt McClure

unread,
Sep 22, 2013, 10:04:46 AM9/22/13
to phan...@googlegroups.com
On Thursday, June 6, 2013 7:19:03 PM UTC-4, Patrick Lalonde wrote:
The problem in my case was that the path and filename assigned to the file get 'mangled' which causes the file to be stored in unexpected places.

After clicking a download url in the format of "/docviewer.aspx?keyid=dmWHxTXXXXXLI1lRULBT+5l7y+XXXXXXXXXXXXXXXXXXn+4UEjwZ4muZ9aXXXXXXXXX2/td99G/Lf+y2XRHK4wQ==", I end up with a file named "Lf+y2XRHK4wQ=="

If I return an absolute path from onFileDownload (/some/path/to/download), the file "Lf+y2XRHK4wQ==" will be stored as /some/path/toLf+y2XRHK4wQ==.  Relative paths produce different results.

The filename seems to be derived from the URL instead of the content-disposition header.

I implemented using the Content-Disposition header in a branch where I rebased Vitaliy's commits.[1]

Matt McClure

unread,
Sep 22, 2013, 10:19:40 AM9/22/13
to phan...@googlegroups.com
On Sunday, September 22, 2013 10:04:46 AM UTC-4, Matt McClure wrote:
I implemented using the Content-Disposition header in a branch where I rebased Vitaliy's commits.[1]


I'm using it via the CasperJS code below[2]. I'm finding that the URL passed to the onFileDownload callback is encoded differently than the one in the variant object passed to the resource.received callback. The former has a query parameter value with literal space characters (...&stmtdate=08/26/2013 12:00:00 am&...) whereas the corresponding parameter in the latter is URL encoded (&stmtdate=08/26/2013%2012:00:00%20am&).

I'm guessing the right fix that would make the encodings consistent is in PhantomJS. I could use some remedial help understanding why the declarations of the two callbacks[3] differ and how the latter is declared and defined. It seems that the latter is generated at least in part by the MOC.

Where is the download code path decoding the URL string? Or where is the resource received code path encoding it?

[2]

    casper.on('resource.received', function(res) {
        casper.log('resource.received: ' + JSON.stringify(res, undefined, 4));
        delete(casper.waitForResources[res.url]);
        casper.log('onResourceReceived: waiting for ' + JSON.stringify(casper.waitForResources) +
                   ' and ' + casper.waitForResourcesUnknownUrl +
                   ' other resources');
    });
    
    casper.page.onFileDownload = function (url, responseData) {
        casper.waitForResources[url] = 1;
        casper.waitForResourcesUnknownUrl--;
        casper.log('onFileDownload: waiting for ' + JSON.stringify(casper.waitForResources) +
                   ' and ' + casper.waitForResourcesUnknownUrl +
                   ' other resources');
        d = new Date();
        ds = d.toISOString();
        prefix = ds.replace(/:/g, '-');
        return prefix + '-' + responseData.filename;
    };
    
    casper.start(..., function() {
        casper.waitForResources = {};
        casper.waitForResourcesUnknownUrl = 0;
        ...
    });
    
    casper.then(function() {
        casper.waitForResourcesUnknownUrl++;
        this.click(...download activation selector...);
    });
    
    casper.waitFor(function testFx() {
            casper.log('waiting for ' + JSON.stringify(casper.waitForResources) +
                       ' and ' + casper.waitForResourcesUnknownUrl +
                       ' other resources');
            return (Object.keys(casper.waitForResources).length === 0 &&
                    casper.waitForResourcesUnknownUrl === 0);
        },
        function then() {},
        function onTimeout() {},
        60000);
 
[3]

    connect(m_customWebPage, SIGNAL(unsupportedContent(QNetworkReply*)), this, SLOT(downloadRequested(QNetworkReply*)));

    connect(m_networkAccessManager, SIGNAL(resourceReceived(QVariant)),
            SIGNAL(resourceReceived(QVariant)));

Linto Cheeran

unread,
Jan 19, 2014, 11:46:08 AM1/19/14
to phan...@googlegroups.com
phantomjs 1.10.0 (development) working perfectly in my ubuntu x64 laptop
but it don't work in amazon t1.micro free instance it's shows an error

Program received signal SIGSEGV, Segmentation fault.
0x0995acbe in QByteArray::resize(int) ()
(gdb) 


how i slove it ?
Message has been deleted

Aurelien Leleu

unread,
Apr 29, 2014, 4:30:38 PM4/29/14
to phan...@googlegroups.com
Hello,

At work, i use CasperJS to automate web browsing.

I simulate a click on a "save as" button (it is a javascript app, so i use cordinate to click...)

Then, an Excel file (xls) is generated and if i am using firefox, it asks me if i want to download the file.

With CasperJS, i have an error (status failed 200).

Can your solution solve my problem ?

If yes, i don't know how to install it, could you help me?

Thanks you very much,

pelzi

unread,
Sep 23, 2014, 4:00:50 PM9/23/14
to phan...@googlegroups.com
Hi Vitaliy,

that's exactly what I was missing in PhantomJS. Did you or do you plan to merge this into the main PhantomJS branch?

Yours,
Andreas.

Linto Cheeran

unread,
Oct 18, 2014, 12:51:11 PM10/18/14
to phan...@googlegroups.com
how i know file download complete or on progess ?

meidika wardana

unread,
Jan 1, 2015, 1:09:10 PM1/1/15
to phan...@googlegroups.com
hi.. can someone please send me a brief tutorial how to implement file downloading? I'm really confuse.. help me..

Ankit Jain

unread,
Jan 4, 2015, 8:40:00 AM1/4/15
to phan...@googlegroups.com
In which version of PhantonJs I can use this feature and from where  I can download the same

Salomón Córdova

unread,
Feb 17, 2016, 8:17:29 AM2/17/16
to phantomjs
Arya.. can you merge this feature to your lastest stable phantomJS version?

Regards!
Salomón.
--------------------------------------------------
Message has been deleted

Salomón Córdova

unread,
Feb 17, 2016, 8:25:24 AM2/17/16
to phantomjs
Are there a prebuilt release of this branch for ubuntu?
Reply all
Reply to author
Forward
0 new messages