How to test if file exists on web?

Dave Farrance

unread,

Aug 4, 2008, 2:23:55 PM8/4/08

to

How do I test if a file exists on the web, where the file can be
expressed as either an ftp or http url?

--
Dave Farrance

TJ

unread,

Aug 4, 2008, 2:35:07 PM8/4/08

to

On 2008-08-04, Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:
> How do I test if a file exists on the web, where the file can be
> expressed as either an ftp or http url?

Hello,

You can use wget --spider :

$ wget --spider URI && command_if_success

Dave Farrance

unread,

Aug 4, 2008, 2:49:55 PM8/4/08

to

TJ <tj+u...@a13.fr> wrote:

>On 2008-08-04, Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:
>> How do I test if a file exists on the web, where the file can be
>> expressed as either an ftp or http url?

> You can use wget --spider :

>
>$ wget --spider URI && command_if_success

Thanks. That'll do fine.

--
Dave Farrance

Dave Farrance

unread,

Aug 4, 2008, 4:05:41 PM8/4/08

to

>TJ <tj+u...@a13.fr> wrote:
>
>>On 2008-08-04, Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:
>>> How do I test if a file exists on the web, where the file can be
>>> expressed as either an ftp or http url?
>
>> You can use wget --spider :
>>
>>$ wget --spider URI && command_if_success

Ah, it seems to work fine for http but gives a false positive for ftp:

$ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
$
$ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
YES
$
$ wget --version
GNU Wget 1.11

--
Dave Farrance

h.stroph

unread,

Aug 4, 2008, 4:52:33 PM8/4/08

to

In news:5gne94l6s61vujp6c...@4ax.com,
Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> typed:

>>>> How do I test if a file exists on the web, where the file can be
>>>> expressed as either an ftp or http url?
>>
>>> You can use wget --spider :
>>>
>>> $ wget --spider URI && command_if_success
>
> Ah, it seems to work fine for http but gives a false positive for ftp:
>
> $ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
> $
> $ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
> YES

Don't use the --spider option with ftp.

Dave Farrance

unread,

Aug 4, 2008, 5:02:42 PM8/4/08

to

"h.stroph" <m...@privacy.net> wrote:

>In news:5gne94l6s61vujp6c...@4ax.com,
>Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> typed:
>

>> Ah, it seems to work fine for http but gives a false positive for ftp:
>>
>> $ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
>> $
>> $ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
>> YES
>
>Don't use the --spider option with ftp.

That'd download the file if the URL is valid, which is not what I want.

I've got a script that takes the URL of a very large file as a parameter,
sleeps until midnight, and then downloads the file. But I want to put in
a test to check that the URL is valid when the script is first run.

--
Dave Farrance

h.stroph

unread,

Aug 4, 2008, 5:20:45 PM8/4/08

to

In news:33re94ps0q794e2dp...@4ax.com,
Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> typed:

>>> Ah, it seems to work fine for http but gives a false positive for
>>> ftp:
>>>
>>> $ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
>>> $
>>> $ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
>>> YES
>>
>> Don't use the --spider option with ftp.
>
> That'd download the file if the URL is valid, which is not what I
> want.

curl -sl ftp://ftp.mirrorservice.org/ | grep invalid && echo YES

Dave Farrance

unread,

Aug 4, 2008, 5:39:03 PM8/4/08

to

"h.stroph" <m...@privacy.net> wrote:

>curl -sl ftp://ftp.mirrorservice.org/ | grep invalid && echo YES

The script would have to parse and split the URL, and handle http
differently, so I hope there's a simpler way.

--
Dave Farrance

Chris F.A. Johnson

unread,

Aug 4, 2008, 6:07:57 PM8/4/08

to

It is simple:

url=ftp://ftp.mirrorservice.org/invalid
file=${url##*/}
host=${url%"$file"}
curl -sl "$hone" | grep "$file" && echo YES

--
Chris F.A. Johnson, author <http://cfaj.freeshell.org/shell/>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

h.stroph

unread,

Aug 4, 2008, 7:35:07 PM8/4/08

to

In news:aqse94dmdb5mt7i6s...@4ax.com,
Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> typed:

>> curl -sl ftp://ftp.mirrorservice.org/ | grep invalid && echo YES
>
> The script would have to parse and split the URL, and handle http
> differently, so I hope there's a simpler way.

if [ `echo "$URL" | grep '^ftp'` ]; then
URL=`dirname "$URL"`
filename=`basename "$URL"`
curl -sl "$URL" | grep "$filename" && echo YES ftp
else
wget -q --spider "$URL" && echo YES http
fi

Bit Twister

unread,

Aug 4, 2008, 8:17:12 PM8/4/08

to

On Mon, 04 Aug 2008 22:07:57 +0000, Chris F.A. Johnson wrote:
>
> url=ftp://ftp.mirrorservice.org/invalid
> file=${url##*/}
> host=${url%"$file"}
> curl -sl "$hone" | grep "$file" && echo YES

May I assume,

> curl -sl "$host" | grep "$file" && echo YES

mop2

unread,

Aug 4, 2008, 8:52:40 PM8/4/08

to

On Mon, 04 Aug 2008 15:23:55 -0300, Dave Farrance <DaveFa...@omitthisyahooandthis.co.uk> wrote:

> How do I test if a file exists on the web, where the file can be
> expressed as either an ftp or http url?
>

Functions, test and output:

prompt$ cat t
#-----------------
#FTP
f(){
wget --server-response "$1" 2>&1|while read;do
[ "${REPLY:0:4}" = "213 " ]&&killall wget&&return
done
}

URL=ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz
t=$(date +%s)
for e in 1 2 3;do
F=$URL$e
f $F
echo \$?=$? $(($(date +%s)-$t))s $F
done

#---------------------------
#HTTP bash
f(){
H=${1#*/*/}
P=${H#*/}
exec 3<>/dev/tcp/${H%%/*}/80
printf "GET /$P HTTP/1.0\r\n\r\n">&3
read<&3;REPLY=${REPLY%?}
exec 3<&-
C=${REPLY#* }
[[ ' 200 302 ' =~ "${C% *}" ]]
}

URL=http://www.google.com/intl/en_ALL/images/logo.gi
for e in e f g;do
F=$URL$e
f $F
echo \$?=$? $(($(date +%s)-$t))s $C $REPLY: $P
done
#-------------------------------

prompt$ . ./t
$?=1 4s ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz1
$?=0 7s ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz2
$?=1 11s ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz3
$?=1 11s 404 Not Found HTTP/1.0 404 Not Found: intl/en_ALL/images/logo.gie
$?=0 11s 200 OK HTTP/1.0 200 OK: intl/en_ALL/images/logo.gif
$?=1 12s 404 Not Found HTTP/1.0 404 Not Found: intl/en_ALL/images/logo.gig
prompt$

Dave Farrance

unread,

Aug 5, 2008, 3:17:28 PM8/5/08

to

Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:

>How do I test if a file exists on the web, where the file can be
>expressed as either an ftp or http url?

Thanks to everybody that replied.

I think that the best check for my purpose is to find if a file length
exists -- and display it as an extra check because I'll normally know the
magnitude of the file that I intend to download, such as a DVD iso.

"wget -S --spider $url" outputs the file size prefixed with "213" for ftp
urls or "Content-Length:" for http urls.

The script below schedules a valid url to be downloaded from midnight:

#!/bin/bash
url=$1
len=$(wget -S --spider "$url" 2>&1 | \
grep -E '^ Content-Length:|^213' | tail -n1 | \
sed 's/ Content-Length://;s/213//')
[[ -z "$len" ]] && echo "Invalid URL" && exit 1
echo "File length = $len"
secs=$(($(date "+86400-%-H*3600-%-M*60-%-S")))
{ sleep $secs; wget $url; }&