Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.

9 views
Skip to first unread message

Hongyi Zhao

unread,
Apr 4, 2015, 1:43:17 AM4/4/15
to
Hi all,

I'm using Debian, I want to make a local repository which can let me
install packages more conveniently.

Considering that the rsync tool is the Debian official proposed tool for
syncing the files among its different rsync server sites, I use the rsync
client to downloading the deb packages from the different rsync servers
distributed around the world-wide for good loadbalancing and high
efficiency.

The steps are as follows:

1- Make the packages list file to be downloaded based on the Packages.gz
files for the corresponding OS distribution and architecture, say, for
testing, i.e., coded name by jessie and the amd64 architecture, the
following files can be use for extracting the packages list information:

https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-all/
Packages.gz

After I've downloaded all of the above files, then use the following
command for extract the deb packages filenmaes list:

find /path/to/Packages.gz -type f -name Packages.gz -exec zcat \{\} + |
awk '/^Filename:/{ print $2 } ' > deb-file.list

At this point, the deb-file.list will contain a great number of lines
like the following:

----------
[snipped]
pool/main/m/mockobjects/libmockobjects-java-doc_0.09-5_all.deb
pool/main/s/subtitleeditor/subtitleeditor_0.33.0-3_amd64.deb
pool/main/h/haskell-hgl/libghc-hgl-prof_3.2.0.5-1_amd64.deb
pool/main/l/lsh-utils/lsh-doc_2.1-5_all.deb
pool/main/liba/libav/libswscale3_11.3-1_i386.deb
pool/main/s/smokeqt/libsmokeqtuitools4-3_4.12.2-2_amd64.deb
pool/main/libo/libotf/libotf0-dbg_0.9.13-2_amd64.deb
[snipped]
----------

2- Secondly, I obtain the list for all of the available rsync servers
supplied by Debian official and other open-source sites from here:

https://www.debian.org/CD/mirroring/rsync-mirrors

Note, though the above site say these rsync-mirrors are for Debian CD
images, in fact, most of them are also have the non-cd sections of Debian
repository. So, I can use them for my purpose without any care.

At this stage, I make the rsync-mirrors for my purpose as follows:

curl https://www.debian.org/CD/mirroring/rsync-mirrors 2>/dev/null |awk
'/::debian-cd\//{gsub(/debian-cd/,"debian",$NF) ; split($NF,a,"<"); print
a[1] }' > mirrors.list

The content of the mirrors.list looks like the following:

----------------
[snipped]
debian.mirror.digitalpacific.com.au::debian-cd/
mirror.as24220.net::debian-cd/
mirror.intrapower.net.au::debian-cd/
mirror.rackcentral.com.au::debian-cd/
debian.anexia.at::debian-cd/
debian.sil.at::debian-cd/
[snipped]
----------------

Currently, I obtain 94 available rsync servers by using the above method
which are exactly the content of the file mirrors.list.

3- Finally, I use the powerful rsync tool to downloading all of these deb
files listed in deb-file.list by using all of the rsync servers stored in
the mirrors.list. Considering that the bandwidth and maxconnections
limit imposed by these servers' webmasters -- which are the fact for
most of these servers, I want only download one deb file from each of
these rsync servers at the same time. And after the downloading finished
for the specific rsync server, than let rsync read in the next deb file
from the deb-file.list. Again and again, till all of the deb files been
downloaded successfully by parallely using all of these rsync servers.

For the above purpose, I must use a script to do it, I've tried the
following one, but which cann't meet all of the above requirements:

-------------------
mirror=1

while read -r -a line
do
mirror=`awk 'NR=='"$aa"'' mirrors.list`
rsync -amH --progress --append-verify --timeout=10 --contimeout=5 $mirror
${line[0]} debs/ &
mirror=$[mirror+1]
done < deb-file.list

wait
-------------------

Any hints for this issue?

Regards
--
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.

Hongyi Zhao

unread,
Apr 4, 2015, 1:44:37 AM4/4/15
to
On Sat, 04 Apr 2015 05:43:14 +0000, Hongyi Zhao wrote:

> mirror=`awk 'NR=='"$aa"'' mirrors.list`

Oops, my typo, should be as follows:

mirror=`awk 'NR=='"$mirror"'' mirrors.list`

Hongyi Zhao

unread,
Apr 4, 2015, 2:20:51 AM4/4/15
to
On Sat, 04 Apr 2015 05:43:14 +0000, Hongyi Zhao wrote:

> -------------------
> mirror=1
>
> while read -r -a line
> do mirror=`awk 'NR=='"$aa"'' mirrors.list`
> rsync -amH --progress --append-verify --timeout=10 --contimeout=5
> $mirror ${line[0]} debs/ &
> mirror=$[mirror+1]
> done < deb-file.list
>
> wait
>-------------------

Sorry for variable-names infliction by my typos, the script I obtained is
as follows but has a great distance from achieving the requirements I
posted here:

-------------------
mirror=1

while read -r -a line
do
mirror_used=`awk 'NR=='"$mirror"'' mirrors.list`
rsync -amH --progress --append-verify --timeout=10 --contimeout=5 \
${mirror_used} ${line[0]} debs/ &
mirror=$[mirror+1]
done < deb-file.list

wait
-------------------

Ed Morton

unread,
Apr 4, 2015, 11:28:26 AM4/4/15
to
On 4/4/2015 12:43 AM, Hongyi Zhao wrote:
<snip>
> For the above purpose, I must use a script to do it, I've tried the
> following one, but which cann't meet all of the above requirements:
>
> -------------------
> mirror=1
>
> while read -r -a line
> do
> mirror=`awk 'NR=='"$aa"'' mirrors.list`
> rsync -amH --progress --append-verify --timeout=10 --contimeout=5 $mirror
> ${line[0]} debs/ &
> mirror=$[mirror+1]
> done < deb-file.list

For those reading this and considering helping: note that despite having asked
and received answers to about 100 questions over the past few weeks alone (and
probably hundreds over the years) covering most or all of the following issues,
the OP in the above script has these glaringly obvious issues:

A shell variable that is never populated
Unquoted shell variables
Deprecated backticks
Missing IFS= on the while read line
Invalid arithmetic syntax
Incorrect use of a shell variable in an awk script
Invoking awk once per input line in a loop instead of just once
Populating an array when a scalar is required

Just something to think about....

Regards,

Ed.




Dave Sines

unread,
Apr 4, 2015, 10:12:53 PM4/4/15
to
Hongyi Zhao <hongy...@gmail.com> wrote:
> Hi all,
>
> I'm using Debian, I want to make a local repository which can let me
> install packages more conveniently.
>
> Considering that the rsync tool is the Debian official proposed tool for
> syncing the files among its different rsync server sites, I use the rsync
> client to downloading the deb packages from the different rsync servers
> distributed around the world-wide for good loadbalancing and high
> efficiency.
>
> The steps are as follows:

[snip]

> At this point, the deb-file.list will contain a great number of lines
> like the following:
>
> ----------
> [snipped]
> pool/main/m/mockobjects/libmockobjects-java-doc_0.09-5_all.deb
> pool/main/s/subtitleeditor/subtitleeditor_0.33.0-3_amd64.deb
> pool/main/h/haskell-hgl/libghc-hgl-prof_3.2.0.5-1_amd64.deb
> pool/main/l/lsh-utils/lsh-doc_2.1-5_all.deb
> pool/main/liba/libav/libswscale3_11.3-1_i386.deb
> pool/main/s/smokeqt/libsmokeqtuitools4-3_4.12.2-2_amd64.deb
> pool/main/libo/libotf/libotf0-dbg_0.9.13-2_amd64.deb
> [snipped]
> ----------
>
> 2- Secondly, I obtain the list for all of the available rsync servers

[snip]
Untested, etc.

#! /bin/bash -

mirrorlist=mirrors.list
filelist=deb-file.list

unset mirrors
mirrors=( $mirrors )
unset active
active=( $active )

while IFS= read -r line ; do
mirrors+=( "$line" )
active+=( 0 )
done < "$mirrorlist"

idx=0
bound="${#mirrors[@]}"

if [ $bound -eq 0 ] ; then
echo "no mirrors" 1>&2
exit 1
fi

available_mirror()
{
set -- $idx
while [ ${active[$idx]} -ne 0 ]; do
if kill -0 ${active[$idx]} >/dev/null 2>&1 ; then
idx=$(( ( $idx + 1 ) % $bound ))
if [ $idx -eq $1 ]; then
return 1
fi
else
active[$idx]=0
return 0
fi
done
}

while IFS= read -r line ; do
until available_mirror ; do
sleep 5
done
# removed --progress from argument list.
rsync -amH --append-verify --timeout=10 --contimeout=5 \
"${mirrors[$idx]}" "$line" debs/ &
active[$idx]=$!
idx=$(( ( $idx + 1 ) % $bound ))
done < "$filelist"

wait

Hongyi Zhao

unread,
Apr 5, 2015, 7:17:47 AM4/5/15
to
On Sun, 05 Apr 2015 03:11:53 +0100, Dave Sines wrote:
[snip]

Firstly, thanks a lot, I will try and test it.

> #! /bin/bash -

What's the last dash `-' mean used here?

Dave Sines

unread,
Apr 5, 2015, 2:10:32 PM4/5/15
to
Hongyi Zhao <hongy...@gmail.com> wrote:
> On Sun, 05 Apr 2015 03:11:53 +0100, Dave Sines wrote:
> [snip]
>
> Firstly, thanks a lot, I will try and test it.
>
>> #! /bin/bash -
>
> What's the last dash `-' mean used here?

In this context it's equivalent to '--' which signals the end of option
processing.

bash$ printf '#! /bin/sh\necho "hello from $0"\n' > ./-i
bash$ chmod 755 ./-i
bash$ dash
$ PATH=$PATH:
$ -i
sh-4.3$ exit
exit
$ exit
bash$

The same sequence with the '-'.

bash$ printf '#! /bin/sh -\necho "hello from $0"\n' > ./-i
bash$ chmod 755 ./-i
bash$ dash
$ PATH=$PATH:
$ -i
hello from -i
$ exit
bash$

Hongyi Zhao

unread,
Apr 6, 2015, 7:56:55 PM4/6/15
to
On Sun, 05 Apr 2015 19:09:37 +0100, Dave Sines wrote:

>>> #! /bin/bash -

In my mind, most of the bash scripts I've used with the above line as
this:

#!/bin/bash

I mean, don't have the space between ! and /. So, which form is standard?

David W. Hodgins

unread,
Apr 6, 2015, 8:25:24 PM4/6/15
to
On Mon, 06 Apr 2015 19:56:52 -0400, Hongyi Zhao <hongy...@gmail.com> wrote:

> On Sun, 05 Apr 2015 19:09:37 +0100, Dave Sines wrote:
>>>> #! /bin/bash -

> In my mind, most of the bash scripts I've used with the above line as
> this:
> #!/bin/bash
> I mean, don't have the space between ! and /. So, which form is standard?

Either is fine. See https://en.wikipedia.org/wiki/Shebang_(Unix)#Magic_number
for details.

Regards, Dave Hodgins

--
Change nomail.afraid.org to ody.ca to reply by email.
(nomail.afraid.org has been set up specifically for
use in usenet. Feel free to use it yourself.)

Hongyi Zhao

unread,
Apr 6, 2015, 8:43:13 PM4/6/15
to
On Mon, 06 Apr 2015 20:25:13 -0400, David W. Hodgins wrote:

> On Mon, 06 Apr 2015 19:56:52 -0400, Hongyi Zhao <hongy...@gmail.com>
> wrote:
>
>> On Sun, 05 Apr 2015 19:09:37 +0100, Dave Sines wrote:
>>>>> #! /bin/bash -
>
>> In my mind, most of the bash scripts I've used with the above line as
>> this:
>> #!/bin/bash I mean, don't have the space between ! and /. So, which
>> form is standard?
>
> Either is fine. See
> https://en.wikipedia.org/wiki/Shebang_(Unix)#Magic_number for details.

Thanks for your hints.

>
> Regards, Dave Hodgins
0 new messages