Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why is my rsync transfer slow?

668 views
Skip to first unread message

dbonde+forum+rsyn...@gmail.com

unread,
Jan 21, 2016, 3:41:28 AM1/21/16
to
I run a rsync job transferring about 45 million files/approximately 1.8
TB data (a Mac OS X Time Machine backup) over a 100 MBit connection.

I use rsync 3.1.1 from MacPorts (I first tried the built in rsync,
version 2.6.9, since it has a Mac OS X specific cache parameter, but it
ran out of memory) with the following parameters

% rsync -HzvhErlptgoDW --stats --progress --out-format="%t %f %b"
/source/ /destination/

The source is an external 3.5" HDD connected with Firewire 800. The
destination is a sparse disk image bundle mounted locally (but its
"source file" is on a network storage). Initially I got good speeds, 7-9
MB/s for reasonably large files but the longer this operation has been
going on (I restarted it three days ago, see below), the slower it gets.
There are also long pauses when nothing happens, like this:

2011-01-22-070305/Macintosh HD/Library/Application
Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/Mask3.png
1.28K 100% 3.26kB/s 0:00:00 (xfr#48406, ir-chk=1050/4166332)

2016/01/16 18:26:48
Volumes/src/Backups.backupdb/mm/2011-01-22-070305/Macintosh
HD/Library/Application
Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/Mask3.png
313

2011-01-22-070305/Macintosh HD/Library/Application
Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/banner-green.jpg
32.26K 100% 0.00kB/s 0:00:00 (xfr#48407, ir-chk=1049/4166332)

2016/01/16 19:17:37
Volumes/2TB/Backups.backupdb/mm/2011-01-22-070305/Macintosh
HD/Library/Application
Support/Apple/Mail/Stationery/Apple/Contents/Resources/Photos/Contents/Resources/Bamboo.mailstationery/Contents/Resources/banner-green.jpg
31279

As you can see, the first file is finished 18:26, the second file 19:17,
almost an hour for a file that is just 32 kB.

I don't think the transfer is CPU limited. There are some CPU spikes but
generally CPU load is less than 10%. The three rsync processes spawned
by this operation has, all in all, used almost exactly 5h of CPU time in
the 72h the transfer has been going on. The computer itself idles 23h a day.

Nor is memory a problem. Memory pressure has been "green" since the
operation begun.

Kernel task has accumulated quite a bit of CPU time (57h when I write
this), but on the other hand, the uptime is 25 days and all these 57h
can't have been consumed by rsync.

Some final details

* I had had this process running for a couple of days when I restarted
it to get better logging three days ago. It took nine hours before the
first file was transferred.

* I first used Finder to transfer this directory tree from the same
source to the same destination. That took 3 days, all in all. Now I have
spent 6 days and I don't think I even have transferred a third of the tree.

* I have tried transferring files between the same source and
destination outside of this operation and they go at full speed

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Kevin Korb

unread,
Jan 21, 2016, 9:29:19 AM1/21/16
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

First, don't use -z on a local copy. It will only make rsync slower
for no reason at all.

Second, 45 million files means 90 million calls to stat(). This will
take a while even if nothing needs copying.

On 01/21/2016 03:20 AM, dbonde+forum+rsyn...@gmail.com
wrote:
- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Ke...@FutureQuest.net (work)
Orlando, Florida k...@sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlag5JkACgkQVKC1jlbQAQcTwwCeKKbLa6UXxuiG7TJidqa1PKcT
lh0AnRfDtS90pUJFmDptXmyGEH09G0pS
=E+fZ
-----END PGP SIGNATURE-----

dbonde+forum+rsyn...@gmail.com

unread,
Jan 21, 2016, 2:15:58 PM1/21/16
to
On 2016-01-21 15:00, Kevin Korb wrote:
> First, don't use -z on a local copy. It will only make rsync slower
> for no reason at all.

Thanks. Hadn't thought about that. I just copied most from the spelled
out "archive" list of switches. But is rsync so "stupid" that it really
considers z for a local transfer?

> Second, 45 million files means 90 million calls to stat(). This will
> take a while even if nothing needs copying.

Hmm, is there a way to benchmark how long time it takes to do a stat()
call?

And still, why is it so much slower than Finder? Finder is dog when it
comes to file operations. Rsync (and cp) is usually many times faster.

dbonde+forum+rsyn...@gmail.com

unread,
Jan 23, 2016, 11:48:11 AM1/23/16
to
On 2016-01-21 09:20, dbonde+forum+rsyn...@gmail.com wrote:
> I run a rsync job transferring about 45 million files/approximately 1.8
> TB data (a Mac OS X Time Machine backup) over a 100 MBit connection.
>
> I use rsync 3.1.1 from MacPorts (I first tried the built in rsync,
> version 2.6.9, since it has a Mac OS X specific cache parameter, but it
> ran out of memory) with the following parameters
>
> % rsync -HzvhErlptgoDW --stats --progress --out-format="%t %f %b"
> /source/ /destination/

Well, after some examination I found at least one problem with this
transfer (that is still running): hard links are not preserved:

This is how a certain file looks at the source where it is backed up on
several locations using hard links:

source volume:

zsh-% ls -i "/…/backups/2011-06-23-040258/Pictures/DSCF0748.JPG"
9236871 /…/backups/2011-06-23-040258/Pictures/DSCF0748.JPG

zsh-% ls -i "/…/backups/2010-12-18-070445/Pictures/DSCF0748.JPG"
9236871 /…/backups/2010-12-18-070445/Pictures/DSCF0748.JPG


destination volume:

zsh-% ls -i "/…/backups/2011-06-23-040258/Pictures/DSCF0748.JPG"
20765913 /…/backups/2011-06-23-040258/Pictures/DSCF0748.JPG

zsh-% ls -i "/…/backups/2010-12-18-070445/Pictures/DSCF0748.JPG"
704428 /…/backups/2010-12-18-070445/Pictures/DSCF0748.JPG

As you can see the inode number is the same on the source volume while
it is completely different on the destination volume.

Why are my hard links not preserved? I thought the purpose with -H was
to transfer the hard links rather than the file itself.

Kevin Korb

unread,
Jan 23, 2016, 11:50:57 AM1/23/16
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It will, assuming it sees both links in the same rsync run.

On 01/23/2016 11:46 AM, dbonde+forum+rsyn...@gmail.com
wrote:
- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Ke...@FutureQuest.net (work)
Orlando, Florida k...@sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlajr1kACgkQVKC1jlbQAQdG3QCgwRt/K9u6xrxGFeZP2uoPoaoT
OlcAnjE4eozRjJ1Mb9YC88YNhVTLEpP8
=p3pD
-----END PGP SIGNATURE-----

dbonde+forum+rsyn...@gmail.com

unread,
Jan 23, 2016, 3:06:42 PM1/23/16
to
On 2016-01-23 17:50, Kevin Korb wrote:
> It will, assuming it sees both links in the same rsync run.

How does one handle interrupted transfers if one wants to preserve hard
links? Would --partial and --append-verify work?

Kevin Korb

unread,
Jan 23, 2016, 3:16:42 PM1/23/16
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

As long as it still sees both links it is fine.

Essentially, the way it works is that whenever rsync -H (on the
source) sees a file with a link count >1 it remembers the
inode#>filename pair. If it finds another instance of that inode it
then links to the same file on the target. So, if you abort after it
copies one but before it links the other it will still handle it
correctly on the next run.

It just won't handle it if you rsync tree #1 then rsync tree #2. It
won't see a hard link that is common to both since it wasn't analyzing
them together.

On 01/23/2016 03:04 PM, dbonde+forum+rsyn...@gmail.com
wrote:
> On 2016-01-23 17:50, Kevin Korb wrote:
>> It will, assuming it sees both links in the same rsync run.
>
> How does one handle interrupted transfers if one wants to preserve
> hard links? Would --partial and --append-verify work?
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Ke...@FutureQuest.net (work)
Orlando, Florida k...@sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlaj34gACgkQVKC1jlbQAQdDPgCfQBP/mR7x0a6JVLIJZzye+6Io
0woAn3BGe9y0mjOfZbK62R0OHuzOzChl
=q3hs
-----END PGP SIGNATURE-----

dbonde+forum+rsyn...@gmail.com

unread,
Jan 23, 2016, 3:45:57 PM1/23/16
to
On 2016-01-23 21:16, Kevin Korb wrote:
> As long as it still sees both links it is fine.
>
> Essentially, the way it works is that whenever rsync -H (on the
> source) sees a file with a link count >1 it remembers the
> inode#>filename pair. If it finds another instance of that inode it
> then links to the same file on the target. So, if you abort after it
> copies one but before it links the other it will still handle it
> correctly on the next run.
>
> It just won't handle it if you rsync tree #1 then rsync tree #2. It
> won't see a hard link that is common to both since it wasn't analyzing
> them together.

I'm not sure I understand your answer. As you could see in my previous
message, the files that should be linked but was duplicated was located
in the same root directory ("/backups"):

/backups/2011-06-23-040258/Pictures/DSCF0748.JPG"


/backups/2010-12-18-070445/Pictures/DSCF0748.JPG"

Why is rsync losing track of the links just because the transfer was
interrupted if your explanation is correct?

Kevin Korb

unread,
Jan 23, 2016, 4:03:35 PM1/23/16
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

What was your rsync source and target that made those?

On 01/23/2016 03:44 PM, dbonde+forum+rsyn...@gmail.com
wrote:
> On 2016-01-23 21:16, Kevin Korb wrote:
>> As long as it still sees both links it is fine.
>>
>> Essentially, the way it works is that whenever rsync -H (on the
>> source) sees a file with a link count >1 it remembers the
>> inode#>filename pair. If it finds another instance of that inode
>> it then links to the same file on the target. So, if you abort
>> after it copies one but before it links the other it will still
>> handle it correctly on the next run.
>>
>> It just won't handle it if you rsync tree #1 then rsync tree #2.
>> It won't see a hard link that is common to both since it wasn't
>> analyzing them together.
>
> I'm not sure I understand your answer. As you could see in my
> previous message, the files that should be linked but was
> duplicated was located in the same root directory ("/backups"):
>
> /backups/2011-06-23-040258/Pictures/DSCF0748.JPG"
>
>
> /backups/2010-12-18-070445/Pictures/DSCF0748.JPG"
>
> Why is rsync losing track of the links just because the transfer
> was interrupted if your explanation is correct?
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Ke...@FutureQuest.net (work)
Orlando, Florida k...@sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlaj6n4ACgkQVKC1jlbQAQcwzwCgnWFzGaLQ1/JZN/JQ/hghlE7C
rkcAoO1rLGhDnUj4dIGlqvNr7sZkDjMn
=236y
-----END PGP SIGNATURE-----

dbonde+forum+rsyn...@gmail.com

unread,
Jan 23, 2016, 4:58:40 PM1/23/16
to
On 2016-01-23 22:02, Kevin Korb wrote:
> What was your rsync source and target that made those?

What do you mean? Filesystem is HFS (Mac OS X). Rsync version is 3.1.2
from MacPorts. Source is a regular directory/folder on an external HD,
destination is a disk image.

Kevin Korb

unread,
Jan 23, 2016, 5:00:20 PM1/23/16
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I want to know what your whole command line was so I can understand
your results.

On 01/23/2016 04:57 PM, dbonde+forum+rsyn...@gmail.com
wrote:
> On 2016-01-23 22:02, Kevin Korb wrote:
>> What was your rsync source and target that made those?
>
> What do you mean? Filesystem is HFS (Mac OS X). Rsync version is
> 3.1.2 from MacPorts. Source is a regular directory/folder on an
> external HD, destination is a disk image.
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Ke...@FutureQuest.net (work)
Orlando, Florida k...@sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlaj990ACgkQVKC1jlbQAQesqgCfVxaHonbTnYX2ItVzP7V37oG7
3V4AnRCEcTLIXmELY1w835KGWx98svIL
=VTAk
-----END PGP SIGNATURE-----

dbonde+forum+rsyn...@gmail.com

unread,
Jan 23, 2016, 5:11:15 PM1/23/16
to
On 2016-01-23 22:59, Kevin Korb wrote:
> I want to know what your whole command line was so I can understand
> your results.

% rsync -HzvhErlptgoDW --stats --progress --out-format="%t %f %b"
/source/ /destination/

(and after the interruption I removed z)

Kevin Korb

unread,
Jan 23, 2016, 5:24:40 PM1/23/16
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I need to know what the paths were so I know how they relate to the
file names you listed.

On 01/23/2016 05:10 PM, dbonde+forum+rsyn...@gmail.com
wrote:
> On 2016-01-23 22:59, Kevin Korb wrote:
>> I want to know what your whole command line was so I can
>> understand your results.
>
> % rsync -HzvhErlptgoDW --stats --progress --out-format="%t %f %b"
> /source/ /destination/
>
> (and after the interruption I removed z)
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Ke...@FutureQuest.net (work)
Orlando, Florida k...@sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlaj/YIACgkQVKC1jlbQAQdPMwCgzXGs44+wEB/j76JN6wWcLZiM
gHIAniFzg7aGojdhmgmxJxJQ4mnTFNl/
=5M/S
-----END PGP SIGNATURE-----

dbonde+forum+rsyn...@gmail.com

unread,
Jan 23, 2016, 6:26:32 PM1/23/16
to
On 2016-01-23 23:24, Kevin Korb wrote:
> I need to know what the paths were so I know how they relate to the
> file names you listed.

I posted the relevant parts of the path in a previous message

/Volumes/A/Backups.backupdb/mm/2011-06-23-040258/path/DSCF0748.JPG
/Volumes/B/Backups.backupdb/mm/2011-06-23-040258/path/DSCF0748.JPG

The only difference is the name of the volume, A and B, above.

Kevin Korb

unread,
Jan 23, 2016, 9:52:56 PM1/23/16
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Are you rsyncing from one to the other? Both of them to somewhere
else? One at a time to somewhere else? Why won't you just show your
actual command line and an ls -li of the correct source and incorrect
target?

On 01/23/2016 06:25 PM, dbonde+forum+rsyn...@gmail.com
wrote:
> On 2016-01-23 23:24, Kevin Korb wrote:
>> I need to know what the paths were so I know how they relate to
>> the file names you listed.
>
> I posted the relevant parts of the path in a previous message
>
> /Volumes/A/Backups.backupdb/mm/2011-06-23-040258/path/DSCF0748.JPG
> /Volumes/B/Backups.backupdb/mm/2011-06-23-040258/path/DSCF0748.JPG
>
> The only difference is the name of the volume, A and B, above.
>

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Ke...@FutureQuest.net (work)
Orlando, Florida k...@sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlakPCsACgkQVKC1jlbQAQfRzACcDVYPU+1c6a03LjszmJmRhxQb
1VgAoLiYVcSi6pyuhmK+oXB61E182Gck
=KZ9c
-----END PGP SIGNATURE-----

dbonde+forum+rsyn...@gmail.com

unread,
Jan 24, 2016, 12:31:00 PM1/24/16
to
On 2016-01-24 03:51, Kevin Korb wrote:
> Are you rsyncing from one to the other? Both of them to somewhere
> else? One at a time to somewhere else? Why won't you just show your
> actual command line and an ls -li of the correct source and incorrect
> target?


Are you trolling me? All the information you ask for above has been
clearly spelled out in previous messages, messages you have replied to.


Why do you need my username, drive/volume name and other personal
details from my folder hierarchy? Both paths are regular local paths
with no special characters except space. And they are identical except
for the volume name.

Selva Nair

unread,
Jan 24, 2016, 2:41:31 PM1/24/16
to

On Sun, Jan 24, 2016 at 12:29 PM, <dbonde+forum+rsyn...@gmail.com> wrote:
On 2016-01-24 03:51, Kevin Korb wrote:
Are you rsyncing from one to the other?  Both of them to somewhere
else?  One at a time to somewhere else?  Why won't you just show your
actual command line and an ls -li of the correct source and incorrect
target?


Are you trolling me? All the information you ask for above has been clearly spelled out in previous messages, messages you have replied to.

Sorry for butting in, but hope this helps:

The command line you posted earlier reads

 % rsync -HzvhErlptgoDW --stats --progress --out-format="%t %f %b" /source/ /destination/

I think Kevin is asking you write out that /source/ and /destination exactly as used on the command line so that one could understand what is going on better. The issues you're facing are rather unusual so a more complete description may help figure what's going on. Sure, you can mask username/password etc but do not simplify source and destination paths.

Also the the description "The destination is a sparse disk image bundle mounted locally (but its
"source file" is on a network storage)" is too cryptic. What kind of network storage? How is it mounted -- NFS? SMB? What kind of sparse disk image? What's a bundle?  Not that I have any clue why the transfer could be so slow or why rsync is not detecting hardlinks in your case (it should, as Kevin initially pointed out), but someone else may be able to shed some light..

Just trying to help,

Selva


dbonde+forum+rsyn...@gmail.com

unread,
Jan 24, 2016, 4:49:22 PM1/24/16
to
On 2016-01-24 20:39, Selva Nair wrote:

> Sorry for butting in, but hope this helps:
>
> The command line you posted earlier reads
>
> % rsync -HzvhErlptgoDW --stats --progress --out-format="%t %f %b"
> /source/ /destination/
>
> I think Kevin is asking you write out that /source/ and /destination
> exactly as used on the command line so that one could understand what is
> going on better.

That doesn't make sense. Both the source and destination path contains
simple alphanumeric characters, no more no less. Why would it matter
whether the path is /abc/ or /def/ or even /123/?

The issues you're facing are rather unusual so a more
> complete description may help figure what's going on. Sure, you can mask
> username/password etc but do not simplify source and destination paths.
>
> Also the the description "The destination is a sparse disk image bundle
> mounted locally (but its
> "source file" is on a network storage)" is too cryptic. What kind of
> network storage? How is it mounted -- NFS? SMB? What kind of sparse disk
> image? What's a bundle?

It is exactly as I wrote. On a network volume (A) a "sparse disk image
bundle" (B), i.e., a type of disk image used in OS X, is stored. B is
then mounted locally (i.e., local to where rsync is run) on a computer
(C) where it appears as one of many volumes.

In other words, B is stored on A. A is then mounted (using AFP) on C. C
then mounts B (=opens a file on a network volume, but instead of opening
e.g., a spreadsheet in Excel, opening B shows a new volume on the
desktop of C) stored on A. The computer where it is mounted just sees a
mounted volume - it can't distinguish between a disk image stored
remotely or stored on the computers internal hard drive.

I assume you are familiar with the idea of disk images?

Selva Nair

unread,
Jan 24, 2016, 7:14:50 PM1/24/16
to

On Sun, Jan 24, 2016 at 4:48 PM, <dbonde+forum+rsyn...@gmail.com> wrote:
That doesn't make sense. Both the source and destination path contains simple alphanumeric characters, no more no less. Why would it matter whether the path is /abc/ or /def/ or even /123/?

Hmm.. I thought your are the one who has been asking for help. It does very much matter what your source and destination exactly are.
 
I assume you are familiar with the idea of disk images?

There are many different kinds of disk images, so are there many ways of network mounting. If you say I do "rsync /a /b/" and it runs too slow, you are not going to get any useful responses..

Good luck,

Selva

Simon Hobson

unread,
Jan 25, 2016, 10:23:07 AM1/25/16
to
dbonde+forum+rsyn...@gmail.com wrote:

> It is exactly as I wrote. On a network volume (A) a "sparse disk image bundle" (B), i.e., a type of disk image used in OS X, is stored. B is then mounted locally (i.e., local to where rsync is run) on a computer (C) where it appears as one of many volumes.
>
> In other words, B is stored on A. A is then mounted (using AFP) on C. C then mounts B (=opens a file on a network volume, but instead of opening e.g., a spreadsheet in Excel, opening B shows a new volume on the desktop of C) stored on A.


> The computer where it is mounted just sees a mounted volume - it can't distinguish between a disk image stored remotely or stored on the computers internal hard drive.

I wouldn't count on that !

> I assume you are familiar with the idea of disk images?

I think most are familiar with disk images - but not so many with the specific implementations used by OS X.
OS X has the concept of a "bundle". To the user this appears as a single file with it's own name and icon. Internally it's a folder tree with a number of files/folders.
As a quick test, I've just created a 100M sparse image, here's the contents before I've added any files :
> $ ls -lRh a.sparsebundle/
> total 16
> -rw-r--r-- 1 simon staff 496B 25 Jan 14:36 Info.bckup
> -rw-r--r-- 1 simon staff 496B 25 Jan 14:36 Info.plist
> drwxr-xr-x 8 simon staff 272B 25 Jan 14:36 bands
> -rw-r--r-- 1 simon staff 0B 25 Jan 14:36 token
>
> a.sparsebundle//bands:
> total 34952
> -rw-r--r-- 1 simon staff 2.1M 25 Jan 14:37 0
> -rw-r--r-- 1 simon staff 2.4M 25 Jan 14:36 1
> -rw-r--r-- 1 simon staff 2.0M 25 Jan 14:36 2
> -rw-r--r-- 1 simon staff 912K 25 Jan 14:36 6
> -rw-r--r-- 1 simon staff 8.0M 25 Jan 14:36 b
> -rw-r--r-- 1 simon staff 1.7M 25 Jan 14:36 c
It is **NOT** the same as a unix sparse file !
The contents are divided up into chunks, with each chunk stored in a file of it's own. I suspect this may also have an impact on performance. As the disk is filled, the "bands" files grow in number and size - with the disk filled, the bands are are complete from 0 through c, with all but c being 8M.

As an aside, there is also an unfortunate combination of name and Finder behaviour. If you set the Finder to show file extensions, it will show (eg in this case) "a.sparsebundle" - but if the name is a bit longer, it shows the begging of the name, an ellipsis "...", and the end of the name including extension. My mother was "a little confused" when she saw a folder on my screen with several "...arsebundle"s !


There are a lot of layers in your setup - any of them (or some combination thereof) could be slowing things down.

Rsync
Filesystem on B
Loppback mount (and associated systems) on B
AFP between A and B - is the host for A an OS X machine running native AFP, or something like Linux running Netatalk ?
Filesystem on A - inc sparse bundle file support
Disk subsystem on A

A few things come to mind ...

1) I am aware that AFP has some performance issues with some combinations of operations - no I don't know if this is one of them.
2) More importantly, if you look back through the archives, there was a thread not long ago about poor performance of rsync for "very large" file counts - and 45 million is "large". I didn't pay much attention, but IIRC the originator of that thread was proposing some alterations to improve things.
3) While rsync is designed to operate efficiently over slow/high latency links - 100MBps is always going to have an impact on throughput.

As an experiment, can you mount the disk of A locally on B ? Shut down the system hosting A and put it in FireWire Target Mode then connect it to B - A's disk then appears as a local FireWire disk on B. This will show whether AFP has any bearing on performance. If the computer hosting A doesn't support target mode then your a bit stuffed - but there may be other options.
Or alternately, connect the external disk directly to A's host rather than to B.
Either way, you can then run rsync as a local copy without the network element.


But as I write this, something far far more important comes to mind.Files on HFS disks are not like files on other filesystems (though I believe NTFS has a feature which adds similar complications). I am not sure exactly how rsync handled this - I do recall that Apple's version adds support for the triplet of "metadata + resource fork + data fork". From memory this results in many files getting re-copied every time regardless of whether they were modified or not. Memory is only vague, but I think it was something to do with comparing source and dest doesn't work properly when one end is looking at "whole file" and the other is only looking at one part.

I would suggest doing a test copy using only a small part of the tree, and do the copy again (so no files actually changed) and watch carefully what's been copied. I vaguely recall (from a looong time ago) that any file with a resource fork was re-copied each time even though it's not changed.

If this is the case, and I'm not misremembering, then it's possible that the combination of "rsync not handling very large file sets well" and "resource forks causing issues" could be (at least partly) behind your performance problem.


Another test I;d be inclined to try would be to copy things one restore point at a time. As you'll be aware, each restore point is it's own timestamped directory - hardlinked to the previous one for files that haven't changed. Try rsyncing only the last one, then the last two, then the last 3, then the last 4, and so on. You can use --include and --exclude to do this. See how performance varies as the number of included trees increases - I suspect it increases more than linearly given the work involved in tracking hard-links.

dbonde+forum+rsyn...@gmail.com

unread,
Jan 25, 2016, 11:08:32 AM1/25/16
to
Thank you. I will try your suggestions. First I will connect the NAS
directly to the computer (Do you recommend USB2 or 1 Gb Ethernet? Or
should I daisy chain external HD and NAS? Then it would look like this:

Computer <--FW800--> HD <--USB2--> NAS

The other option is

HD <--FW800--> Computer <--USB2 or Ethernet 1000Mbit --> NAS

But I still must say it is weird that rsync seems slower than Finder. I
also might have a look at ditto or CpMac.

dbonde+forum+rsyn...@gmail.com

unread,
Jan 25, 2016, 11:12:19 AM1/25/16
to
On 2016-01-25 01:13, Selva Nair wrote:
> On Sun, Jan 24, 2016 at 4:48 PM,
> <dbonde+forum+rsyn...@gmail.com
> <mailto:dbonde+forum+rsyn...@gmail.com>> wrote:
>
> That doesn't make sense. Both the source and destination path
> contains simple alphanumeric characters, no more no less. Why would
> it matter whether the path is /abc/ or /def/ or even /123/?
>
>
> Hmm.. I thought your are the one who has been asking for help. It does
> very much matter what your source and destination exactly are.

Regarding paths, look at

<56A094E9...@gmail.com> <message-id:56A094E9...@gmail.com>
<56A3AE68...@gmail.com> <message-id:56A3AE68...@gmail.com>

Simon Hobson

unread,
Jan 25, 2016, 5:18:56 PM1/25/16
to
dbonde+forum+rsyn...@gmail.com wrote:

> Thank you. I will try your suggestions. First I will connect the NAS

Ah, you didn't mention NAS ! How is it connected to the computer hosting "A" ? If via network then you've added *another* layer.

> directly to the computer (Do you recommend USB2 or 1 Gb Ethernet? Or should I daisy chain external HD and NAS? Then it would look like this:
>
> Computer <--FW800--> HD <--USB2--> NAS

Highly unlikely to work - the drive won't have a Firewire to USB bridge. Having the two different ports is to allow the drive to be connected to *a* host with either Firewire *OR* USB.

> The other option is
>
> HD <--FW800--> Computer <--USB2 or Ethernet 1000Mbit --> NAS

If you use a network connection then you've still got that network layer. If connected via USB, does it appear to the host as "just a drive" ? If so then use that.

> I also might have a look at ditto or CpMac.

Also consider Carbon Copy Cloner - it has a free trial.

Simon Hobson

unread,
Jan 26, 2016, 3:07:53 AM1/26/16
to
Simon Hobson <li...@thehobsons.co.uk> wrote:


>> The other option is
>>
>> HD <--FW800--> Computer <--USB2 or Ethernet 1000Mbit --> NAS
>
> If you use a network connection then you've still got that network layer.

Just thinking a bit more about that ...

Is your normal setup :
NAS --ethernet--> Computer hosting A
or
NAS --USB--> Computer hosting A

Or is "Computer hosting A" actually the NAS, not another MAC ?

Robert DuToit

unread,
Jan 26, 2016, 8:01:33 AM1/26/16
to
Just chiming in here,

I haven’t read all the previous posts so may be repeating…

Mike Bombich has a good piece on benchmarks for various source/destination scenarios with rsync.

https://bombich.com/kb/ccc3/how-long-should-clone-or-backup-take

Note that copying to sparsebundle on local media is as fast as to local disk. Sparsebundle is a mac filesystem whereas NAS is not so certain file metadata are not covered like Hard links,Acls and groups and owners. So it is important to turn off those rsync options to scan those file parts on NAS. But sparsebundle on NAS is ok for HFS options.

Network backups are always slower but you can see that sparseimage on network volume (AFP) is better according to Mike’s chart.

I also develop an rsync based backup app (backupList+) and some users have reported very slow times even to sparse images on NAS but I am not the expert Mike is on these matters. Network issues vary for sure….

Cheers, Rob

Simon Hobson

unread,
Jan 26, 2016, 11:58:08 AM1/26/16
to
Robert DuToit <rdu...@comcast.net> wrote:

> Mike Bombich has a good piece on benchmarks for various source/destination scenarios with rsync.
>
> https://bombich.com/kb/ccc3/how-long-should-clone-or-backup-take

I hadn't seen that link, thanks.
There's an interesting anomaly in the first chart. Not unsurprisingly, for a single source connection method - increasing the speed of the destination connection increases throughput. And vice-versa, for a given destination connection type, faster source connections give more throughput.
Almost !
The fact that internal SATA to internal SATA is slower than SATA-FW800 (either way, but SATA to FW800 is most pronounced) suggests that there is an internal bottleneck when using two internal SATA devices simultaneously on the machine used for the tests.
Also, internal SATA <-> FW800 is the only combination where there's significant asymmetry in rates, and it does look like there is a write speed issue on the internal SATA. It would be interesting to see what speed something like "dd if=/dev/zero of=/dev/${device} bs=1m" gets.

What I can say is his figures for SATA -> FW800 are in the same ballpark as I get (with a different backup package) on big files.


> Network backups are always slower but you can see that sparseimage on network volume (AFP) is better according to Mike’s chart.

That's not all that surprising really. If you think about it, it means the source computer can cache filesystem metadata and dirty data locally and only deals with the host for a relatively small number of relatively large files. So it really just has to throw a list of blocks to write at the destination, and the writes can be queued and (presumably if async options are set) overlap requests to keep the pipes full.
Much reduced network overhead compared with dealing with a large number of remote file operations - not so much the bandwidth required, but the latency of all the round trip network conversations that need to take place.

dbonde+forum+rsyn...@gmail.com

unread,
Feb 6, 2016, 1:47:04 PM2/6/16
to
I scrapped all my previous progress and started over with a different
"connection setup", now the NAS is connected to the computer using 1
Gbit wired ethernet, while the source disk is still using FW800.

On bigger files I now typically get 20-30 MB/s so that was a substantial
improvement. It still are occaissonal hiccups (I'm waiting for one as I
write this, and this has lasted half an hour) but my impression is that
they are fewer than before.

However, there are still problems.

1. Does rsync leak memory? When I started this transfer Jan 31 (yes, a
week ago) the three rsync processes that was spawned took a tens or
maybe hundreds of kB of memory.

Currently, according to Activity Monitor they use

Memory (both "regular" and compressed)

2.69 GB
2.47 GB
2.41 GB (2.40 GB compressed)

They have also used a lot of CPU time:

% ps -auxww | grep rsync
rsync 19545 11 20 SN+ 723:20.57 0.0
rsync 19544 46 20 UN+ 263:32.55 0.0
rsync 19543 31 20 SN+ 936:51.55 0.0

Mostly I still let this computer idle but when I use it it sometimes
freezes in a way that reminds of how it feels when you use a computer
that hasn't enough memory (it has 8 GB). Although, according to Activity
Monitor, memory pressure is still green (used 6.5 GB, Cache 1.4 GB, Swap
10 GB).


2. Something is broken with the transfer and rsync's handling of hard links.

On the source:
% du -sh
1.0T .

on the destination

% du -sh
3.9T .

Yes, the destination is 4 times bigger and I have only transferred about
40% (some 80 directories out of 198) (I am currently at 63 millions of
files) of the contents of the source.

I think I read somewhere that Time Machine use a special filesystem
feature, hard links to directories. Does rsync handle this?


I have also gathered some rudimentary statistics, the size of the
logfile of rsync as well as the number of lines (two lines per
transferred file) in the log file. Start time was Jan 31, 13:11


zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 1.6G 1 Feb 01:06 /tmp/rsync.log
zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 2.9G 1 Feb 09:11 /tmp/rsync.log
zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 8.4G 2 Feb 18:26 /tmp/rsync.log

zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 9.7G 3 Feb 01:37 /tmp/rsync.log
zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 12G 3 Feb 22:21 /tmp/rsync.log

zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 15G 4 Feb 21:20 /tmp/rsync.log
zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 15G 4 Feb 23:49 /tmp/rsync.log
zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 17G 5 Feb 07:22 /tmp/rsync.log

zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 18G 5 Feb 18:24 /tmp/rsync.log

zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 19G 6 Feb 10:18 /tmp/rsync.log

zsh-% ls -lh /tmp/rsync.log
-rw-r--r-- 1 db wheel 20G 6 Feb 19:04 /tmp/rsync.log

As you can see in the first 12 h the log grew to 1.6 GB. After 53 h it
was 8.4 GB but then the growth slowed down and in the last 24 h it just
grew 2 GB (18 to 20) (the last and the third last points). This is not
super exact but I think it gives a rough indication of the performance.


A similar metric but this time counting the lines in the log:

zsh-% date; wc -l /tmp/rsync.log
Mon 1 Feb 2016 09:23:41 CET
20045861 /tmp/rsync.log

zsh-% date; wc -l /tmp/rsync.log
Mon 1 Feb 2016 18:20:21 CET
29886346 /tmp/rsync.log

zsh-% date; time wc -l /tmp/rsync.log
Mon 1 Feb 2016 18:21:01 CET
29906243 /tmp/rsync.log
wc -l /tmp/rsync.log 6.53s user 4.51s system 46% cpu 23.860 total
zsh-%

zsh-% date; time wc -l /tmp/rsync.log
Wed 3 Feb 2016 01:37:28 CET
66717053 /tmp/rsync.log
wc -l /tmp/rsync.log 14.57s user 9.09s system 55% cpu 42.701 total

zsh-% date; time wc -l /tmp/rsync.log
Wed 3 Feb 2016 22:21:29 CET
83738765 /tmp/rsync.log
wc -l /tmp/rsync.log 18.44s user 12.98s system 39% cpu 1:19.75 total

zsh-% date; time wc -l /tmp/rsync.log
Thu 4 Feb 2016 21:20:57 CET
103578124 /tmp/rsync.log
wc -l /tmp/rsync.log 22.70s user 16.81s system 30% cpu 2:07.54 total

zsh-% date; time wc -l /tmp/rsync.log
Thu 4 Feb 2016 23:46:19 CET
106439787 /tmp/rsync.log
wc -l /tmp/rsync.log 23.41s user 18.00s system 26% cpu 2:34.63 total

zsh-% date; time wc -l /tmp/rsync.log
Fri 5 Feb 2016 07:22:06 CET
113853825 /tmp/rsync.log
wc -l /tmp/rsync.log 24.99s user 18.90s system 31% cpu 2:20.75 total

zsh-% date; time wc -l /tmp/rsync.log
Fri 5 Feb 2016 18:24:44 CET
122665638 /tmp/rsync.log
wc -l /tmp/rsync.log 26.91s user 19.15s system 42% cpu 1:48.37 total

zsh-% date; time wc -l /tmp/rsync.log
Sat 6 Feb 2016 10:18:36 CET
134194513 /tmp/rsync.log
wc -l /tmp/rsync.log 29.38s user 20.62s system 41% cpu 1:59.18 total

zsh-% date; time wc -l /tmp/rsync.log
Sat 6 Feb 2016 19:05:15 CET
136240005 /tmp/rsync.log
wc -l /tmp/rsync.log 29.85s user 21.57s system 33% cpu 2:32.81 total
zsh-%




What gives?

dbonde+forum+rsyn...@gmail.com

unread,
Feb 6, 2016, 1:49:28 PM2/6/16
to
Den 2016-01-26 kl. 09:06, skrev Simon Hobson:
>>> >>The other option is
>>> >>
>>> >>HD <--FW800--> Computer <--USB2 or Ethernet 1000Mbit --> NAS
>> >
>> >If you use a network connection then you've still got that network
layer.
> Just thinking a bit more about that ...
>
> Is your normal setup :
> NAS --ethernet--> Computer hosting A
> or
> NAS --USB--> Computer hosting A
>
> Or is "Computer hosting A" actually the NAS, not another MAC ?

NAS <--ethernet--> computer <--FW800--> external disk

The NAS can handle external USB drives and the external disk has USB2 so
I could set it up like this

external disk <--USB2--> NAS <--ethernet--> computer

but I assumed it meant that I had to transfer the data from the external
disk to the computer via both USB and ethernet and then transfer it back
to the NAS over ethernet which, intutively, sounded slow.

Simon Hobson

unread,
Feb 6, 2016, 4:43:27 PM2/6/16
to
dbonde+forum+rsyn...@gmail.com wrote:

> NAS <--ethernet--> computer <--FW800--> external disk
>
> The NAS can handle external USB drives and the external disk has USB2 so I could set it up like this
>
> external disk <--USB2--> NAS <--ethernet--> computer
>
> but I assumed it meant that I had to transfer the data from the external disk to the computer via both USB and ethernet and then transfer it back to the NAS over ethernet which, intutively, sounded slow.


Your intuition is correct - on two counts.
First off, USB2 is a lot slower than FW800 - it's often slower than FW400 even though on paper it's faster at 480Mbps. That's all down to the underlying architecture.
The extra trip over gigabit ethernet probably won't slow things down, however, the hosting of the drive via the NAS may well cause another bottleneck. For two reasons - firstly there's the hosting itself, and of course the load of hosting it adds to the load of hosting the destination drive.


dbonde+forum+rsyn...@gmail.com wrote:

> I scrapped all my previous progress and started over with a different "connection setup", now the NAS is connected to the computer using 1 Gbit wired ethernet, while the source disk is still using FW800.
>
> On bigger files I now typically get 20-30 MB/s so that was a substantial improvement.

That's still poor. That's only 160 to 240 Mbps - while you should easily get closer to 800 on large files.

TBH I'd be pointing my finger at the NAS and wondering about it's performance. Some makes/models have a reputation for rather crap performance, while others are much better.

I would suggest trying a couple of tests with a large file (or a small number of largish files) - at least several GB. Compare copying it with the finder, using cp in a terminal window, and using rsync. Repeat with a similar amount of data in small files. That should give an idea of raw throughput capacity, and how much overhead in file management slows things down.

> Mostly I still let this computer idle but when I use it it sometimes freezes in a way that reminds of how it feels when you use a computer that hasn't enough memory (it has 8 GB). Although, according to Activity Monitor, memory pressure is still green (used 6.5 GB, Cache 1.4 GB, Swap 10 GB).

Given those memory numbers, I'm not surprised you get pauses. I'm guessing your have 8G of real RAM, and it's up to 10G of swap. I have MenuMeters installed with a disk indicator - it's noticeable that under certain conditions the system gets "slow to respond" while the disk activity light is on solid (or nearly so). Swapping to/from disk is always very slow - several orders of magnitude slower than normal memory access.
When transferring large quantities of data, that tends to flush everything else from the cache - so as soon as you do anything needing disk access, your request has to go to disk (as it's not in cache) and it also goes onto a queue. That makes interactive work feel very slow - how it feels is out of proportion to the actual slowdown. I notice it when I'm doing backups - I use Retrospect and on large files it can reach something in the order of 500Mbps throughput overall, much much less with lots of smaller files when other overheads take over (not least, lots of disk head movement seeking each small file).
0 new messages