Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to download a file from webserver

6 views
Skip to first unread message

M. Strobel

unread,
Jul 24, 2008, 4:29:59 AM7/24/08
to
Hi,

I want to save a file from a web URL.

The URL looks like https://www.my-machine.com/download?ticket=1234567890

When I call this URL with a browser, I get the usual selection "open with ...", "save to disk", and the
filename is the one delivered by the webserver.

Because curl has a tcl interface, I tried this in curl (Windows this time) with the -O option, but it does not
work, this only works for direct URLs to a file.

How do you save the download with the correct name in tcl, in a "high level" way?

Max

Glenn Jackman

unread,
Jul 24, 2008, 9:27:27 AM7/24/08
to
At 2008-07-24 04:29AM, "M. Strobel" wrote:
> I want to save a file from a web URL.
> Because curl has a tcl interface, I tried this in curl (Windows this
> time) with the -O option, but it does not
> work, this only works for direct URLs to a file.

How, specifically, does it not work? Do you get an error? Do you
download a file that you didn't expect?

> How do you save the download with the correct name in tcl, in a "high
> level" way?

I would write something like this:

package require http
set fh [open my.file w]
fconfigure $fh -translation binary
http::geturl $url -channel $fh
close $fh

--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous

Cameron Laird

unread,
Jul 24, 2008, 10:07:17 AM7/24/08
to
In article <6eqss7F...@mid.uni-berlin.de>,

From the information here, it *ought* to work. I don't
understand how your URL is not a "direct URL". Are you
saying that that particular URL does a redirect?

If I were in your shoes, my next step might be to exper-
iment with

wget $URL

just to establish a more pertinent baseline than "...
with a browser ..."

M. Strobel

unread,
Jul 24, 2008, 11:36:11 AM7/24/08
to
Cameron Laird schrieb:

> In article <6eqss7F...@mid.uni-berlin.de>,
> M. Strobel <sorry_no_...@nowhere.dee> wrote:
>> Hi,
>>
>> I want to save a file from a web URL.
>>
>> The URL looks like https://www.my-machine.com/download?ticket=1234567890
>>
>> When I call this URL with a browser, I get the usual selection "open
>> with ...", "save to disk", and the
>> filename is the one delivered by the webserver.
>>
>> Because curl has a tcl interface, I tried this in curl (Windows this
>> time) with the -O option, but it does not
>> work, this only works for direct URLs to a file.
>>
>> How do you save the download with the correct name in tcl, in a "high
>> level" way?
>>
>> Max
>
> From the information here, it *ought* to work. I don't
> understand how your URL is not a "direct URL". Are you
> saying that that particular URL does a redirect?
>

No, I mean that in contrast to a URL like http://my.machine.com/download/info.pdf

> If I were in your shoes, my next step might be to exper-
> iment with
>
> wget $URL
>
> just to establish a more pertinent baseline than "...
> with a browser ..."

Protocol: ---------------------------------------------------------------------------
strobel@lx007:~/_tmp> wget
http://at.home.loc/auftraggeber/download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0
--17:33:40-- http://at.home.loc/auftraggeber/download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0
=> `download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0'
Auflösen des Hostnamen »at.home.loc«.... 127.0.0.1
Verbindungsaufbau zu at.home.loc|127.0.0.1|:80... verbunden.
HTTP Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 177 [text/plain]

100%[================================================================================================>] 177
--.--K/s

17:33:41 (15.35 MB/s) - »download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0« gespeichert [177/177]

strobel@lx007:~/_tmp> ls -l
insgesamt 4
-rw-r--r-- 1 strobel users 177 24. Jul 17:33 download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0

-------------------------------------- end of protocol

The file is named debug.txt, and this is sent in a header like: Content-Disposition: attachment; filename=...

As I said, a browser like Firefox offers to save debug.txt.

Max

M. Strobel

unread,
Jul 24, 2008, 11:38:08 AM7/24/08
to
Glenn Jackman schrieb:

> At 2008-07-24 04:29AM, "M. Strobel" wrote:
>> I want to save a file from a web URL.
>> Because curl has a tcl interface, I tried this in curl (Windows this
>> time) with the -O option, but it does not
>> work, this only works for direct URLs to a file.
>
> How, specifically, does it not work? Do you get an error? Do you
> download a file that you didn't expect?
>
>> How do you save the download with the correct name in tcl, in a "high
>> level" way?
>
> I would write something like this:
>
> package require http
> set fh [open my.file w]

I don't know the name of my.file. The name is sent in an header like:

Content-Disposition: attachment; filename=

> fconfigure $fh -translation binary
> http::geturl $url -channel $fh
> close $fh
>

Max

Donal K. Fellows

unread,
Jul 24, 2008, 11:44:47 AM7/24/08
to
M. Strobel wrote:
> The file is named debug.txt, and this is sent in a header like:
> Content-Disposition: attachment; filename=...
>
> As I said, a browser like Firefox offers to save debug.txt.

In that case you need to do a HEAD request to get the filename metadata
(set the -validate option to true in http::geturl, poke around in the
token array directly) open a channel to the filename, and then do a
proper http::geturl with the -channel option set.

Only a few lines of code. :-) The only really interesting bit is how to
sanitize the filename to make sure that nothing "bad" leaks through.

Donal.

Glenn Jackman

unread,
Jul 24, 2008, 11:47:13 AM7/24/08
to
At 2008-07-24 11:38AM, "M. Strobel" wrote:
> Glenn Jackman schrieb:

> > package require http
> > set fh [open my.file w]
>
> I don't know the name of my.file. The name is sent in an header like:
>
> Content-Disposition: attachment; filename=
>
> > fconfigure $fh -translation binary
> > http::geturl $url -channel $fh
> > close $fh
> >

You might want to investigate the mime package in tcllib.

M. Strobel

unread,
Jul 24, 2008, 11:57:38 AM7/24/08
to
Donal K. Fellows schrieb:

> M. Strobel wrote:
>> The file is named debug.txt, and this is sent in a header like:
>> Content-Disposition: attachment; filename=...
>>
>> As I said, a browser like Firefox offers to save debug.txt.
>
> In that case you need to do a HEAD request to get the filename metadata
> (set the -validate option to true in http::geturl, poke around in the
> token array directly) open a channel to the filename, and then do a
> proper http::geturl with the -channel option set.
>

I still hope to find THE solution - high level, with one command.

> Only a few lines of code. :-) The only really interesting bit is how to
> sanitize the filename to make sure that nothing "bad" leaks through.
>
> Donal.


File names are a never ending source of problems in this environment (webserver linux, all sorts of uploaders,
file names might be important). Next time I store the files on the web server with a numeric basename as index
into a db table.

Max

Cameron Laird

unread,
Jul 24, 2008, 12:23:03 PM7/24/08
to
In article <6ern3gF...@mid.uni-berlin.de>,


I'm confused now about the goal; is

exec wget $URL &

the sort of "one command" you're after?

M. Strobel

unread,
Jul 24, 2008, 2:59:43 PM7/24/08
to
Cameron Laird schrieb:

>
>
> I'm confused now about the goal; is
>
> exec wget $URL &
>
> the sort of "one command" you're after?

Yes it is.

It just does not work. And I don't have any solution now.

Max

Cameron Laird

unread,
Jul 24, 2008, 3:27:38 PM7/24/08
to
In article <6es1osF...@mid.uni-berlin.de>,

How does
exec wget $URL &
not work?

M. Strobel

unread,
Jul 24, 2008, 4:48:58 PM7/24/08
to
Cameron Laird schrieb:

see my posting 17:36h 64 lines ... the filename is lost.

Max

Cameron Laird

unread,
Jul 24, 2008, 6:24:11 PM7/24/08
to
In article <6es85mF...@mid.uni-berlin.de>,
M. Strobel <sorry_no_...@nowhere.dee> wrote:
.
.

.
>>> It just does not work. And I don't have any solution now.
>>>
>>> Max
>>
>> How does
>> exec wget $URL &
>> not work?
>
>see my posting 17:36h 64 lines ... the filename is lost.
>
>Max

Ah; I agree that Tcl does not have a single command
that effects the download AND computes the filename
of the download.

Donal K. Fellows

unread,
Jul 25, 2008, 4:57:52 AM7/25/08
to
M. Strobel wrote:
> I still hope to find THE solution - high level, with one command.

Picky picky picky! This code is based exactly off the plan I wrote
previously, and was a Simple Matter Of Programming to create.

package require Tcl 8.5
package require http
proc filegrab url {
set token [http::geturl $url -validate 1]
upvar #0 $token head
set cd [dict get $head(meta) Content-Disposition]
http::cleanup $token
if {![regexp {^attachment; filename=([^ /]+)} $cd -> filename]} {
return -code error "no file available"
}
# Perhaps sanitize more here? Compromise and refuse to overwrite
# any existing file instead (file already must be in [pwd]).
set f [open $filename {WRONLY CREAT EXCL}]
set token [http::geturl $url -channel $f]
http::cleanup $token
close $f
}

That works for me when I tried with grabbing a patch off a SourceForge
tracker. YMMV. (This is on the wiki at http://wiki.tcl.tk/21368)

Donal.

Alexandre Ferrieux

unread,
Jul 25, 2008, 9:28:35 AM7/25/08
to
On Jul 25, 12:24 am, cla...@lairds.us (Cameron Laird) wrote:
>
> Ah; I agree that Tcl does not have a single command
> that effects the download AND computes the filename
> of the download.

Quite true, but the external executables wget/curl do in the nominal
case. From a cursory trial, it seems wget 'invents' filenames when it
doesn't find one in the response, like "index.html" and the like. It
is even smart enough to invent a name different from any existing one.
Can you provide an Internet-reachable URL which defeats this
mechanism ?

-Alex


M. Strobel

unread,
Jul 28, 2008, 5:20:05 AM7/28/08
to
Alexandre Ferrieux schrieb:

Sorry for the late answer, I did not work this weekend :-)

I set up a test script, from the urls below you will get

1. the bugzilla guide as pdf
2. a zip file with some programming examples in java

so the correct file name and the transfer integrity can be tested.

http://atweb3.alstertext-webmanager.de/download.php?ticket=123
http://atweb3.alstertext-webmanager.de/download.php?ticket=456

I can read in files sent as mail attachments (yes, in tcl), I hope this programming assignment will be easier
(so much to 'picky').

Max

Donal K. Fellows

unread,
Jul 28, 2008, 6:11:16 AM7/28/08
to
M. Strobel wrote:
> http://atweb3.alstertext-webmanager.de/download.php?ticket=123
> http://atweb3.alstertext-webmanager.de/download.php?ticket=456

Check out http://wiki.tcl.tk/21368 - the code on that page seems to
work with those specific examples perfectly for me. (You probably
ought to add code so that it falls back sanely when there isn't a
filename in the headers, but I'll leave that as an exercise in use of
tcllib's uri package...)

Donal.

Fandom

unread,
Jul 28, 2008, 6:21:15 AM7/28/08
to
Hi,

> Because curl has a tcl interface, I tried this in curl (Windows this time)
> with the -O option, but > it does not
> work, this only works for direct URLs to a file.

So you are using 'curl' directly with 'exec', right?

In that case the option you need is '-o filename', no the capital O
which
looks up the name of the file in the server.

You may need to add '-l' for curl to follow redirections
automagically.

Andres

M. Strobel

unread,
Jul 28, 2008, 7:20:35 AM7/28/08
to
Fandom schrieb:

The problem is I don't know the filename. The server sends it in a header line.

Max

M. Strobel

unread,
Jul 28, 2008, 7:22:53 AM7/28/08
to
Donal K. Fellows schrieb:

Thank you, I will test it ASAP - I know you told me already 27th of july 10:57h CEST...

My test url will help me too...

Max

Fandom

unread,
Jul 28, 2008, 9:10:30 AM7/28/08
to

> The problem is I don't know the filename. The server sends it in a header line.

That can hardly be consider a problem:

#!/usr/local/bin/tclsh8.5

set filename [exec curl -s -I http://atweb3.alstertext-webmanager.de/download.php?ticket=123
| grep filename]

puts $filename

regexp {(?:filename=)(.*)} $filename nada filename

puts $filename

exec curl -s -o $filename http://atweb3.alstertext-webmanager.de/download.php?ticket=123

puts Done


Andres

Fandom

unread,
Jul 28, 2008, 9:16:02 AM7/28/08
to
Oh!, you are trapped in Windows, then you'd better do:

set filename [exec curl -s -I http://atweb3.alstertext-webmanager.de/download.php?ticket=123]

puts $filename

regexp {(?:filename=)(.*?)(?:\n)} $filename nada filename

Andres

M. Strobel

unread,
Jul 28, 2008, 10:45:15 AM7/28/08
to
Fandom schrieb:

okay, there does not seem to be a "ready made" solution, but the headers are right there, and the file
contents is okay anyway.

not so difficult, after all..

Max

0 new messages