I want to save a file from a web URL.
The URL looks like https://www.my-machine.com/download?ticket=1234567890
When I call this URL with a browser, I get the usual selection "open with ...", "save to disk", and the
filename is the one delivered by the webserver.
Because curl has a tcl interface, I tried this in curl (Windows this time) with the -O option, but it does not
work, this only works for direct URLs to a file.
How do you save the download with the correct name in tcl, in a "high level" way?
Max
How, specifically, does it not work? Do you get an error? Do you
download a file that you didn't expect?
> How do you save the download with the correct name in tcl, in a "high
> level" way?
I would write something like this:
package require http
set fh [open my.file w]
fconfigure $fh -translation binary
http::geturl $url -channel $fh
close $fh
--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous
From the information here, it *ought* to work. I don't
understand how your URL is not a "direct URL". Are you
saying that that particular URL does a redirect?
If I were in your shoes, my next step might be to exper-
iment with
wget $URL
just to establish a more pertinent baseline than "...
with a browser ..."
No, I mean that in contrast to a URL like http://my.machine.com/download/info.pdf
> If I were in your shoes, my next step might be to exper-
> iment with
>
> wget $URL
>
> just to establish a more pertinent baseline than "...
> with a browser ..."
Protocol: ---------------------------------------------------------------------------
strobel@lx007:~/_tmp> wget
http://at.home.loc/auftraggeber/download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0
--17:33:40-- http://at.home.loc/auftraggeber/download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0
=> `download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0'
Auflösen des Hostnamen »at.home.loc«.... 127.0.0.1
Verbindungsaufbau zu at.home.loc|127.0.0.1|:80... verbunden.
HTTP Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 177 [text/plain]
100%[================================================================================================>] 177
--.--K/s
17:33:41 (15.35 MB/s) - »download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0« gespeichert [177/177]
strobel@lx007:~/_tmp> ls -l
insgesamt 4
-rw-r--r-- 1 strobel users 177 24. Jul 17:33 download.php?ticket=eefeb3ec92bdeae224110cd0d0e3a7392bc44ee0
-------------------------------------- end of protocol
The file is named debug.txt, and this is sent in a header like: Content-Disposition: attachment; filename=...
As I said, a browser like Firefox offers to save debug.txt.
Max
I don't know the name of my.file. The name is sent in an header like:
Content-Disposition: attachment; filename=
> fconfigure $fh -translation binary
> http::geturl $url -channel $fh
> close $fh
>
Max
In that case you need to do a HEAD request to get the filename metadata
(set the -validate option to true in http::geturl, poke around in the
token array directly) open a channel to the filename, and then do a
proper http::geturl with the -channel option set.
Only a few lines of code. :-) The only really interesting bit is how to
sanitize the filename to make sure that nothing "bad" leaks through.
Donal.
You might want to investigate the mime package in tcllib.
I still hope to find THE solution - high level, with one command.
> Only a few lines of code. :-) The only really interesting bit is how to
> sanitize the filename to make sure that nothing "bad" leaks through.
>
> Donal.
File names are a never ending source of problems in this environment (webserver linux, all sorts of uploaders,
file names might be important). Next time I store the files on the web server with a numeric basename as index
into a db table.
Max
I'm confused now about the goal; is
exec wget $URL &
the sort of "one command" you're after?
Yes it is.
It just does not work. And I don't have any solution now.
Max
see my posting 17:36h 64 lines ... the filename is lost.
Max
Ah; I agree that Tcl does not have a single command
that effects the download AND computes the filename
of the download.
Picky picky picky! This code is based exactly off the plan I wrote
previously, and was a Simple Matter Of Programming to create.
package require Tcl 8.5
package require http
proc filegrab url {
set token [http::geturl $url -validate 1]
upvar #0 $token head
set cd [dict get $head(meta) Content-Disposition]
http::cleanup $token
if {![regexp {^attachment; filename=([^ /]+)} $cd -> filename]} {
return -code error "no file available"
}
# Perhaps sanitize more here? Compromise and refuse to overwrite
# any existing file instead (file already must be in [pwd]).
set f [open $filename {WRONLY CREAT EXCL}]
set token [http::geturl $url -channel $f]
http::cleanup $token
close $f
}
That works for me when I tried with grabbing a patch off a SourceForge
tracker. YMMV. (This is on the wiki at http://wiki.tcl.tk/21368)
Donal.
Quite true, but the external executables wget/curl do in the nominal
case. From a cursory trial, it seems wget 'invents' filenames when it
doesn't find one in the response, like "index.html" and the like. It
is even smart enough to invent a name different from any existing one.
Can you provide an Internet-reachable URL which defeats this
mechanism ?
-Alex
Sorry for the late answer, I did not work this weekend :-)
I set up a test script, from the urls below you will get
1. the bugzilla guide as pdf
2. a zip file with some programming examples in java
so the correct file name and the transfer integrity can be tested.
http://atweb3.alstertext-webmanager.de/download.php?ticket=123
http://atweb3.alstertext-webmanager.de/download.php?ticket=456
I can read in files sent as mail attachments (yes, in tcl), I hope this programming assignment will be easier
(so much to 'picky').
Max
Check out http://wiki.tcl.tk/21368 - the code on that page seems to
work with those specific examples perfectly for me. (You probably
ought to add code so that it falls back sanely when there isn't a
filename in the headers, but I'll leave that as an exercise in use of
tcllib's uri package...)
Donal.
So you are using 'curl' directly with 'exec', right?
In that case the option you need is '-o filename', no the capital O
which
looks up the name of the file in the server.
You may need to add '-l' for curl to follow redirections
automagically.
Andres
The problem is I don't know the filename. The server sends it in a header line.
Max
Thank you, I will test it ASAP - I know you told me already 27th of july 10:57h CEST...
My test url will help me too...
Max
That can hardly be consider a problem:
#!/usr/local/bin/tclsh8.5
set filename [exec curl -s -I http://atweb3.alstertext-webmanager.de/download.php?ticket=123
| grep filename]
puts $filename
regexp {(?:filename=)(.*)} $filename nada filename
puts $filename
exec curl -s -o $filename http://atweb3.alstertext-webmanager.de/download.php?ticket=123
puts Done
Andres
set filename [exec curl -s -I http://atweb3.alstertext-webmanager.de/download.php?ticket=123]
puts $filename
regexp {(?:filename=)(.*?)(?:\n)} $filename nada filename
Andres
okay, there does not seem to be a "ready made" solution, but the headers are right there, and the file
contents is okay anyway.
not so difficult, after all..
Max