Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to test if file exists on web?

1,468 views
Skip to first unread message

Dave Farrance

unread,
Aug 4, 2008, 2:23:55 PM8/4/08
to
How do I test if a file exists on the web, where the file can be
expressed as either an ftp or http url?

--
Dave Farrance

TJ

unread,
Aug 4, 2008, 2:35:07 PM8/4/08
to
On 2008-08-04, Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:
> How do I test if a file exists on the web, where the file can be
> expressed as either an ftp or http url?

Hello,

You can use wget --spider :

$ wget --spider URI && command_if_success

Dave Farrance

unread,
Aug 4, 2008, 2:49:55 PM8/4/08
to
TJ <tj+u...@a13.fr> wrote:

>On 2008-08-04, Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:
>> How do I test if a file exists on the web, where the file can be
>> expressed as either an ftp or http url?

> You can use wget --spider :


>
>$ wget --spider URI && command_if_success

Thanks. That'll do fine.

--
Dave Farrance

Dave Farrance

unread,
Aug 4, 2008, 4:05:41 PM8/4/08
to
>TJ <tj+u...@a13.fr> wrote:
>
>>On 2008-08-04, Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:
>>> How do I test if a file exists on the web, where the file can be
>>> expressed as either an ftp or http url?
>
>> You can use wget --spider :
>>
>>$ wget --spider URI && command_if_success

Ah, it seems to work fine for http but gives a false positive for ftp:

$ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
$
$ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
YES
$
$ wget --version
GNU Wget 1.11

--
Dave Farrance

h.stroph

unread,
Aug 4, 2008, 4:52:33 PM8/4/08
to
In news:5gne94l6s61vujp6c...@4ax.com,
Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> typed:

>>>> How do I test if a file exists on the web, where the file can be
>>>> expressed as either an ftp or http url?
>>
>>> You can use wget --spider :
>>>
>>> $ wget --spider URI && command_if_success
>
> Ah, it seems to work fine for http but gives a false positive for ftp:
>
> $ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
> $
> $ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
> YES

Don't use the --spider option with ftp.


Dave Farrance

unread,
Aug 4, 2008, 5:02:42 PM8/4/08
to
"h.stroph" <m...@privacy.net> wrote:

>> Ah, it seems to work fine for http but gives a false positive for ftp:
>>
>> $ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
>> $
>> $ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
>> YES
>
>Don't use the --spider option with ftp.

That'd download the file if the URL is valid, which is not what I want.

I've got a script that takes the URL of a very large file as a parameter,
sleeps until midnight, and then downloads the file. But I want to put in
a test to check that the URL is valid when the script is first run.

--
Dave Farrance

h.stroph

unread,
Aug 4, 2008, 5:20:45 PM8/4/08
to
In news:33re94ps0q794e2dp...@4ax.com,
Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> typed:

>>> Ah, it seems to work fine for http but gives a false positive for
>>> ftp:
>>>
>>> $ wget -q --spider http://www.mirrorservice.org/invalid && echo YES
>>> $
>>> $ wget -q --spider ftp://ftp.mirrorservice.org/invalid && echo YES
>>> YES
>>
>> Don't use the --spider option with ftp.
>
> That'd download the file if the URL is valid, which is not what I
> want.

curl -sl ftp://ftp.mirrorservice.org/ | grep invalid && echo YES


Dave Farrance

unread,
Aug 4, 2008, 5:39:03 PM8/4/08
to
"h.stroph" <m...@privacy.net> wrote:

>curl -sl ftp://ftp.mirrorservice.org/ | grep invalid && echo YES

The script would have to parse and split the URL, and handle http
differently, so I hope there's a simpler way.

--
Dave Farrance

Chris F.A. Johnson

unread,
Aug 4, 2008, 6:07:57 PM8/4/08
to

It is simple:

url=ftp://ftp.mirrorservice.org/invalid
file=${url##*/}
host=${url%"$file"}
curl -sl "$hone" | grep "$file" && echo YES

--
Chris F.A. Johnson, author <http://cfaj.freeshell.org/shell/>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

h.stroph

unread,
Aug 4, 2008, 7:35:07 PM8/4/08
to
In news:aqse94dmdb5mt7i6s...@4ax.com,
Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> typed:

>> curl -sl ftp://ftp.mirrorservice.org/ | grep invalid && echo YES
>
> The script would have to parse and split the URL, and handle http
> differently, so I hope there's a simpler way.

if [ `echo "$URL" | grep '^ftp'` ]; then
URL=`dirname "$URL"`
filename=`basename "$URL"`
curl -sl "$URL" | grep "$filename" && echo YES ftp
else
wget -q --spider "$URL" && echo YES http
fi


Bit Twister

unread,
Aug 4, 2008, 8:17:12 PM8/4/08
to
On Mon, 04 Aug 2008 22:07:57 +0000, Chris F.A. Johnson wrote:
>
> url=ftp://ftp.mirrorservice.org/invalid
> file=${url##*/}
> host=${url%"$file"}
> curl -sl "$hone" | grep "$file" && echo YES

May I assume,

> curl -sl "$host" | grep "$file" && echo YES

mop2

unread,
Aug 4, 2008, 8:52:40 PM8/4/08
to
On Mon, 04 Aug 2008 15:23:55 -0300, Dave Farrance <DaveFa...@omitthisyahooandthis.co.uk> wrote:

> How do I test if a file exists on the web, where the file can be
> expressed as either an ftp or http url?
>


Functions, test and output:

prompt$ cat t
#-----------------
#FTP
f(){
wget --server-response "$1" 2>&1|while read;do
[ "${REPLY:0:4}" = "213 " ]&&killall wget&&return
done
}

URL=ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz
t=$(date +%s)
for e in 1 2 3;do
F=$URL$e
f $F
echo \$?=$? $(($(date +%s)-$t))s $F
done

#---------------------------
#HTTP bash
f(){
H=${1#*/*/}
P=${H#*/}
exec 3<>/dev/tcp/${H%%/*}/80
printf "GET /$P HTTP/1.0\r\n\r\n">&3
read<&3;REPLY=${REPLY%?}
exec 3<&-
C=${REPLY#* }
[[ ' 200 302 ' =~ "${C% *}" ]]
}


URL=http://www.google.com/intl/en_ALL/images/logo.gi
for e in e f g;do
F=$URL$e
f $F
echo \$?=$? $(($(date +%s)-$t))s $C $REPLY: $P
done
#-------------------------------

prompt$ . ./t
$?=1 4s ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz1
$?=0 7s ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz2
$?=1 11s ftp://ftp.gnupg.org/gcrypt/gnupg/gnupg-2.0.9.tar.bz3
$?=1 11s 404 Not Found HTTP/1.0 404 Not Found: intl/en_ALL/images/logo.gie
$?=0 11s 200 OK HTTP/1.0 200 OK: intl/en_ALL/images/logo.gif
$?=1 12s 404 Not Found HTTP/1.0 404 Not Found: intl/en_ALL/images/logo.gig
prompt$

Dave Farrance

unread,
Aug 5, 2008, 3:17:28 PM8/5/08
to
Dave Farrance <DaveFa...@OMiTTHiSyahooANDTHiS.co.uk> wrote:

>How do I test if a file exists on the web, where the file can be
>expressed as either an ftp or http url?

Thanks to everybody that replied.

I think that the best check for my purpose is to find if a file length
exists -- and display it as an extra check because I'll normally know the
magnitude of the file that I intend to download, such as a DVD iso.

"wget -S --spider $url" outputs the file size prefixed with "213" for ftp
urls or "Content-Length:" for http urls.

The script below schedules a valid url to be downloaded from midnight:

#!/bin/bash
url=$1
len=$(wget -S --spider "$url" 2>&1 | \
grep -E '^ Content-Length:|^213' | tail -n1 | \
sed 's/ Content-Length://;s/213//')
[[ -z "$len" ]] && echo "Invalid URL" && exit 1
echo "File length = $len"
secs=$(($(date "+86400-%-H*3600-%-M*60-%-S")))
{ sleep $secs; wget $url; }&

0 new messages