Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Download file without webbrowser

66 views
Skip to first unread message

A.D. Fundum

unread,
Aug 12, 2012, 2:01:46 PM8/12/12
to
Is there an utility which can be used to download this file
automatically, i.e. without using a webbrowser?

https://europeanequities.nyx.com/en/popup/data/download?ml=nyx_pd_stoc
ks&cmd=default&formKey=nyx_pd_filter_values%3Aaddfde0604ff3b35052feeb4
143d2809

This URL is identical to the .SUBJECT EA of the downlaoded file. So
far I haven't found a way to change the settings and to download the
file. The most important change is the file format (from MS Exces to
CSV or TXT). I know the old URL, which was downloadable with WGET, but
WGETSLL nor SM works with an appended
"&format=txt&formatDecimal=.&formatDate=dd/MM/yy".


--

Andreas Schnellbacher

unread,
Aug 12, 2012, 2:29:24 PM8/12/12
to
A.D. Fundum wrote:

> Is there an utility which can be used to download this file
> automatically, i.e. without using a webbrowser?

Just use wget.

--
Andreas Schnellbacher

A.D. Fundum

unread,
Aug 12, 2012, 7:05:16 PM8/12/12
to
>> Is there an utility which can be used to download this file
>> automatically, i.e. without using a webbrowser?

> Just use wget.

That'll be WGETSSL, and then the question still is: how?
Unfortunately the new URL with the old parameters (e.g. "format=txt")
downloads the form, but not the file.


--

Mr. G

unread,
Aug 13, 2012, 12:09:18 PM8/13/12
to
On Sun, 12 Aug 2012 18:01:46 UTC, "A.D. Fundum" <what...@neverm.ind>
wrote:
Maybe Curl?

A.D. Fundum

unread,
Aug 14, 2012, 5:36:57 PM8/14/12
to
>> CSV or TXT). I know the old URL, which was downloadable with WGET,
>> but WGETSSL nor SM works with an appended "&format=txt

> Maybe Curl?

Thanks, I'll take a look at it. So far the best solution is to open
the form with a WPS URL object> But that still involves a browser, and
my physical backup doesn't have a WPS.


--

A.D. Fundum

unread,
Aug 14, 2012, 8:39:13 PM8/14/12
to
>> Maybe Curl?

Also downloads the form instead of the file:

curl -k --data
"format=2&layout=1&decimal_separator=1&date_format=1&op=Go"
"https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_sto
cks&cmd=default&formKey=nyx_pd_filter_values%3A68e8d9d0ec59ac5717ef48d
de90b02d7"


--

Dave Yeo

unread,
Aug 15, 2012, 12:37:07 AM8/15/12
to
There are a lot of possible paramtres to wget (and curl), it's just a
matter of figuring out the correct ones.
I know that simply using wget fails on quite a few files yet if I use
the awget plugin to call wget, it succeeds.
Dave

A.D. Fundum

unread,
Aug 15, 2012, 12:18:01 PM8/15/12
to
> I know that simply using wget fails on quite a few files yet if
> I use the awget plugin to call wget, it succeeds.

There probably is no WGET "autoconf", which scans the situation and
produces required options?


--

Steven Levine

unread,
Aug 15, 2012, 3:47:27 PM8/15/12
to
On Wed, 15 Aug 2012 04:37:07 UTC, Dave Yeo <dave....@gmail.com>
wrote:

Hi,

> There are a lot of possible paramtres to wget (and curl), it's just a
> matter of figuring out the correct ones.

It will take a bit of tinkering to fetch this URL with wget or curl,
because it's not a file URL and does not resolve to a file URL. The
URL is a form which means wget will have to be supplied with
--post-data. Wget can handle this, but someone needed to figure out
the form fields and inputs.

> I know that simply using wget fails on quite a few files yet if I use
> the awget plugin to call wget, it succeeds.

Wget often fails because it generates an illegal file name for the
platform. Awget handles this by determining a reasonable file name.
I have a wgetx.cmd wget wrapper which does this kind of thing.

Steven

--
---------------------------------------------------------------------
Steven Levine <ste...@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------

A.D. Fundum

unread,
Aug 16, 2012, 7:28:16 AM8/16/12
to
> The URL is a form which means wget will have to be
> supplied with --post-data.

I tried that earlier (4 parameters and a button, with the form as the
source of information), but so far without a positive result.

> Wget often fails because it generates an illegal file
> name for the platform.

FTR: Wget 1.9.1's -O option is in use, i.e.:

wgetssl -O20120816.CSV --post-data
"format=2&layout=1&decimal_separator=1&date_format=1&op=Go"
"https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_sto
cks&cmd=default&formKey=nyx_pd_filter_values%3A68e8d9d0ec59ac5717ef48d
de90b02d7"

I also tried cookies, but the generated cookies.txt file was "empty".
The options above still download the form instead of the file. The URL
itself works with any eCS webbrowser (Netscape, Firefox/SM, Links).


--

Dave Yeo

unread,
Aug 16, 2012, 11:02:23 AM8/16/12
to
IIRC, Lynx had a way to automate downloading files.
Dave

A.D. Fundum

unread,
Aug 16, 2012, 11:31:59 AM8/16/12
to
> IIRC, Lynx had a way to automate downloading files.

I'll check that out too! One of the added options seems to provide
additional information. Please note I'm just pretending to be a
systems-abusing Windows-client, I'm using eCS. And the small
downloaded file is the form (about 7KB) itself, not the about 60KB of
data I'm after.

Near the end there's a ...

utime(20120816.CSV): The file or directory specified is read-only.

... which probably isn't important.

But this may be a problem:

europeanequities.nyx.com/nl/popup/data/download@ml=nyx_pd_stocks&cmd=d
efault&for
mKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48dde90b02d7: The file
or director
y specified cannot be found.


--


[C:\]wgetssl -O20120816.CSV --user-agent="Mozilla/5.0 (Windows; U;
Windows NT 5.
1; en-US; rv:1.5b)" --page-requisites --server-response
--restrict-file-names=wi
ndows --post-data
"format=2&layout=1&decimal_separator=1&date_format=1&op=Go" "h
ttps://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_stock
s&cmd=defa
ult&formKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48dde90b02d7"
--17:23:14--
https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_
stocks&cmd=default&formKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48
dde90b02d7

=> `20120816.CSV'
Resolving europeanequities.nyx.com... 159.125.78.24
Connecting to europeanequities.nyx.com[159.125.78.24]:443...
connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 200 OK
2 Date: Thu, 16 Aug 2012 15:23:00 GMT
3 Server: Apache/2.2.3 (Red Hat)
4 X-Powered-By: PHP/5.2.17 ZendServer/5.0
5 Last-Modified: Thu, 16 Aug 2012 15:23:00 GMT
6 Cache-Control: no-cache, must-revalidate, post-check=0,
pre-check=0, max-age=
1209600
7 ETag: "1345130580"
8 Expires: Thu, 30 Aug 2012 15:23:00 GMT
9 Dap: 16
10 Content-Length: 7042
11 Content-Type: text/html; charset=utf-8
12 Set-Cookie: ZDEDebuggerPresent=php,phtml,php3; path=/
13 Connection: close

100%[====================================>] 7,042 --.--K/s

utime(20120816.CSV): The file or directory specified is read-only.
17:23:16 (6.72 MB/s) - `20120816.CSV' saved [7,042/7,042]

europeanequities.nyx.com/nl/popup/data/download@ml=nyx_pd_stocks&cmd=d
efault&for
mKey=nyx_pd_filter_valuesA68e8d9d0ec59ac5717ef48dde90b02d7: The file
or director
y specified cannot be found.

FINISHED --17:23:17--
Downloaded: 7,042 bytes in 1 files

A.D. Fundum

unread,
Aug 16, 2012, 11:54:06 AM8/16/12
to
> The file or directory specified cannot be found.

I retried it with a verified file (and "https://": same errr. When
using the same URL with a webbrowser, the form appears again:

https://europeanequities.nyx.com/en/popup/data/download?ml=nyx_pd_stoc
ks&cmd=default&formKey=nyx_pd_filter_values%3Aaddfde0604ff3b35052feeb4
143d2809

In production I use a same, unchanged URL every day; the hexadecimal
filter value doesn't really matter, so long as the URL is valid. In
the end the data is verified (full rewrite, a bigger PITA than this
Wget issue).


--

Steven Levine

unread,
Aug 16, 2012, 11:56:42 AM8/16/12
to
On Thu, 16 Aug 2012 11:28:16 UTC, "A.D. Fundum" <what...@neverm.ind>
wrote:

> I tried that earlier (4 parameters and a button, with the form as the
> source of information), but so far without a positive result.

That just means you have not figured out the correct post data yet.
My counting says there are 7 post data items. There are two hidden
fields.

> > Wget often fails because it generates an illegal file
> > name for the platform.
>
> FTR: Wget 1.9.1's -O option is in use, i.e.:

That was a comment to Dave. It was clear that your issue was
something else.

cks&cmd=default&formKey=nyx_pd_filter_values%3A68e8d9d0ec59ac5717ef48d
> de90b02d7"

You need to double the %'s to keep the shell happy, but the problem is
elsewhere.

> I also tried cookies, but the generated cookies.txt file was "empty".

It's something else. Blocking cookies at the browser has no effect.

> The options above still download the form instead of the file. The URL
> itself works with any eCS webbrowser (Netscape, Firefox/SM, Links).

There are a couple of differences between these and wgetssl. The
first is they obviously know how to encode the post data correctly.
This means need to percent-encode post data for wgetssl. It does not
do this for you.

The other is that they do not use self-signed certificates. I don't
think this matters, but one never knows.

I checked if the user agent mattered, and this does not appear to be
the case.

Paul Ratcliffe

unread,
Aug 16, 2012, 1:28:33 PM8/16/12
to
On Thu, 16 Aug 2012 10:56:42 -0500, Steven Levine
<ste...@nomail.earthlink.net> wrote:

> You need to double the %'s to keep the shell happy, but the problem is
> elsewhere.
>
>> I also tried cookies, but the generated cookies.txt file was "empty".
>
> It's something else. Blocking cookies at the browser has no effect.

I find iptrace/ipformat very useful for these sort of things. You can
see exactly what goes out on the wire and compare the two cases.
When they are the same, it will work. If they aren't, it might not.

A.D. Fundum

unread,
Aug 16, 2012, 9:27:18 PM8/16/12
to
>> I tried that earlier (4 parameters and a button, with the form
>> as the source of information), but so far without a positive
>> result.

> That just means you have not figured out the correct post
> data yet. My counting says there are 7 post data items.
> There are two hidden fields.

Thanks, that's the solution! I overlooked this, for one because my
real problems start when I have to process the downloaded data
(website_redesign==burned_beyond_recognition)...

>>> Wget often fails because it generates an illegal
>>> file name for the platform.

>> FTR: Wget 1.9.1's -O option is in use, i.e.:

> That was a comment to Dave.

I was aware of that, hence the FTR. ;-)

> You need to double the %'s to keep the shell happy, but the
> problem is elsewhere.

I used the original .SUBJECT EA as-is, including a single %3A, and
double quotes. Like:

wgetssl -OSolv.ed! --post-data
"format=2&layout=2&decimal_separator=1&date_format=1&op=Go&form_build_
id=form-db6385f424cebdc1a634c46fb963cd28&form_id=nyx_download_form"
"https://europeanequities.nyx.com/nl/popup/data/download?ml=nyx_pd_sto
cks&cmd=default&formKey=nyx_pd_filter_values%3Aaddfde0604ff3b35052feeb
4143d2809"


--

Steven Levine

unread,
Aug 20, 2012, 12:19:09 PM8/20/12
to
On Thu, 16 Aug 2012 17:28:33 UTC, Paul Ratcliffe
<ab...@orac12.clara34.co56.uk78> wrote:

Hi Paul,

> I find iptrace/ipformat very useful for these sort of things. You can
> see exactly what goes out on the wire and compare the two cases.
> When they are the same, it will work. If they aren't, it might not.

I would have done that if this were not an SSL connection. The
encryption happens too early and the site is https only, so use http:
and iptrace was not an available option.
0 new messages