rsync & dolphin

bad sector

unread,

Jun 24, 2022, 6:29:56 AM6/24/22

to

I copied a folder to another as a backup experiment using

rsync -avh /0/1/this /0/

'this' being an at he time inactive home folder. It went
smoothly and ended up with identical number of files BUT
the total size AS SHOWN IN DOLPHIN is diferent. How can
this be?

source:
5.9 GiB (6,283,742,065) 32,715 files, 2,476 sub-folders

target:
5.9 GiB (6,282,464,113) 32,715 files, 2,476 sub-folders

but if instead I command

rsync -ach /0/1/this /0/

then the target folder size becomes

5.9 GiB (6,282,496,881)

What rsync command do I need to end up with EXACTLY
the same number of files and total size, the same as
if I had used dd?

The reason I want exactness is because after the copy
I want to add 1 file to the source, then do some sort
diff test and see as a result ONLY that added file and
its size as the difference.

marrgol

unread,

Jun 24, 2022, 7:28:38 AM6/24/22

to

On 24/06/2022 at 12.29, bad sector wrote:
>
> I copied a folder to another as a backup experiment using
>
> rsync -avh /0/1/this /0/
>
> 'this' being an at he time inactive home folder. It went
> smoothly and ended up with identical number of files BUT
> the total size AS SHOWN IN DOLPHIN is diferent. How can
> this be?
>
> source:
> 5.9 GiB (6,283,742,065) 32,715 files, 2,476 sub-folders
>
> target:
> 5.9 GiB (6,282,464,113) 32,715 files, 2,476 sub-folders

The directories get “defragmented” and reduced in size.
And you are not copying hardlinks (if you have them in the source).

--
mrg

Dan Purgert

unread,

Jun 24, 2022, 2:12:57 PM6/24/22

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

bad sector wrote:
>
> I copied a folder to another as a backup experiment using
>
> rsync -avh /0/1/this /0/
>

> target:
> 5.9 GiB (6,282,464,113) 32,715 files, 2,476 sub-folders
>
> but if instead I command
>
> rsync -ach /0/1/this /0/
>
> then the target folder size becomes
>
> 5.9 GiB (6,282,496,881)
>
> What rsync command do I need to end up with EXACTLY
> the same number of files and total size, the same as
> if I had used dd?

None will -- there's a distinct difference between a block-level copy
(via dd) and a file-level copy (cp / scp / rsync ) that will affect
"size on disk".

- Block level copies will include any extra fragmentation that the
source has (and therefore overhead), whereas new copies of files
will (on an EXT4 filesystem, anyway) be allocated enough extents
(i.e. contiguous blocks) to complete the write in one go.

- Both of your sets of option switches will not copy any extended ACL
information that may have been in SOURCE ('-A' for that)

- Different filesystems can have different blocksizes, which would
somewhat reduce "used space on disk" as well.

With the assumption that both '-ah' and '-ach' ('-v' just ups the
verbosity to stdout, so ignoring that switch) were run sequentially with
the same SOURCE and DEST; the increase of ~30 KiB is most likely due to
file changes prompted by simply using the system -- for example, if
$HOME was your SOURCE, some graphical application(s) writing to
.xsession-errors , or new data in your .mozills caches, etc.

>
> The reason I want exactness is because after the copy
> I want to add 1 file to the source, then do some sort
> diff test and see as a result ONLY that added file and
> its size as the difference.

Easiest way to do this is add the '-P' switch.

It changes the output from something like this (-avch):

sending incremental file list
./
testfile

sent 386 bytes received 38 bytes 848.00 bytes/sec
total size is 192.53K speedup is 454.08

To this (-avchP):
sending incremental file list
./
testfile2
21 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=1/8)

sent 413 bytes received 38 bytes 902.00 bytes/sec
total size is 192.55K speedup is 426.94

Note that in both cases, having nothing to update results in this:

sending incremental file list

sent 342 bytes received 12 bytes 708.00 bytes/sec
total size is 192.55K speedup is 543.93

Note that rsync counts its overhead (i.e. file metadata that is
generated on the fly to be shared from SOURCE to DEST, and DEST response
to SOURCE) in the bytes sent/received. The "Total Size" at the bottom
is the actual size of SOURCE (less overhead).

And if one were to remove the 'verbose' switch, all you get for
"unchanged" output is the single line "sending incremental file list";
which is kind of completely useless :).

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE3asj+xn6fYUcweBnbWVw5UznKGAFAmK1/roACgkQbWVw5Uzn
KGDS5RAAgmHsl+XyTOBOtIbH6jlRvq5k7cglGSeq0Q0/DKBFkNEPRneLwZv0u9gX
EOkhJXE1dYnGJRUykL8T8NpS/mWg9Fjn/M5eMxiZFgpZbFUThH28c1oI/7XUl5u/
lGXa3tOPFleVOfd9okUpQN6JWVOlur3x/kcK7u6Jffef+jHf2Waw+bAKGmUwJQCO
IBjgQqGcda+WLeQL4/IbSeF9XFFoXoiD9YWKUmeJynoISZaxRuk9EKFTvLmlYC/c
GNzITw00Keqp11f3DzDSbtLkPBtYLJWDs/UFMzMHfNwRKedLxh5RzWQHZPp1E+Dz
YwvgbtpdSgMVhnK2Nqf2p5SWL9EZcrCEQSZ9vQRYJz+wQiUjnXLzVcZ2HndoAv6l
dJ9zG8bS0ekhHu5mKZC9fo0DbyHnUAigLyk5rEjYHvvMh6mtVb6vfyii3WRWjPKy
Tyc2imwol1JTfPflvHtREJXIciqPBb/rBvY9+JXAdYx1m+vdeSMM0jyr5eWnJJVb
0ENq8cP0NIFFexIO4Y60r6D4YxBR0znAJNiVIYYlGGV/3pZI1VcsaN0jLGzHarre
RDp2Lg5AaQlRkXcgBkXJ6NFFgpV9NUiq3lhCYz0jiF7w8iv3wt613rmhsCitomE6
bn3Oa0yblhV6s43p29XME7EYY1/zB3egztvHePzS+3+OWNWnMyg=
=p+g3
-----END PGP SIGNATURE-----

--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

bad sector

unread,

Jun 24, 2022, 7:55:46 PM6/24/22

to

I don't think I have any hard links in /home/userX, nor do I think I could recognise one if I saw one :-)

bad sector

unread,

Jun 24, 2022, 8:24:35 PM6/24/22

to

Thanks!

I guess I'll just have to get used to the overhead difference.

To get away from dynamic changes I tried doing it from the home of userY as root, this should leave the userX home completely inactive (user homes link to folders on a data disk). I'll tinker around some more, gotta get used to the rsync command and feedback syntax etc. I love dd but if I can use rsync changing only what's new in target backups that should be much faster and therefore more frequently executable. Once I figure out all the other details I'll proppbably try to get down to run-level 3, command my cron script to execute before shutdown and have it write any changes into 3 of 4 backup drives, rotating them so that one is not plugged-in at all on any given session ..just in case.

up to now then, I would use

rsync -aAvchP /0/1/this /0/bu1/;rsync -aAvchP /0/1/this /0/bu2/;rsync -aAvchP /0/1/this /0/bu3/;

It would be a little more involved with mounts/umounts in between so that no two of them would be mounted at any one time.

followed by some diff script to confirm what was done and a prompt to shut-down or not.

Dan Purgert

unread,

Jun 24, 2022, 9:42:27 PM6/24/22

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

bad sector wrote:

> On 6/24/22 14:12, Dan Purgert wrote:
>> bad sector wrote:

>>> [...]

>>> The reason I want exactness is because after the copy
>>> I want to add 1 file to the source, then do some sort
>>> diff test and see as a result ONLY that added file and
>>> its size as the difference.
>>
>> Easiest way to do this is add the '-P' switch.
>>
>> It changes the output from something like this (-avch):

>> [...]

>> To this (-avchP):
>> sending incremental file list
>> ./
>> testfile2
>> 21 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=1/8)
>>
>> sent 413 bytes received 38 bytes 902.00 bytes/sec
>> total size is 192.55K speedup is 426.94
>

> Thanks!
>
> I guess I'll just have to get used to the overhead difference.

What "overhead" difference? You mean the "semi-accidental
defragmentation" thing where space-on-disk is less?

>
> To get away from dynamic changes I tried doing it from the home of
> userY as root, this should leave the userX home completely inactive
> (user homes link to folders on a data disk). I'll tinker around some

In theory, anyway :)

> up to now then, I would use
>
> rsync -aAvchP /0/1/this /0/bu1/;rsync -aAvchP /0/1/this /0/bu2/;rsync
> -aAvchP /0/1/this /0/bu3/;

Assuming the intention is that "this" gets backed up in triplicate;
something like this:

if [[ bu1 is mounted ]] then
rsync original bu1
else
mount
rsync
fi

if [[ bu2 is mounted ]] then
rsync orig bu2
else
mount
rsync
fi

if [[ bu3 is mounted ]] then
rsync orig bu3
else
mount
rsync
fi

Personally, all (linux) machines I own run a script along these lines:

#get user $HOME from /etc/passwd
dir=$(grep $1 /etc/passwd | cut -d\: -f6)
#add trailing slash to stop extra directory creation on DEST
dir=${dir}/
#rsync to remote host
rsync [OPTS] $dir backups@backup-server:remote/${HOSTNAME}/$1

This script is run daily on the laptop/desktop/VM(s).

In turn, I have an external SSD that I plug into "backup-server" weekly
and rsync the "backups" user $HOME/remote directory to. Finally I have
a bimonthly(ish) copy that goes to the inlaws (well, the SSD anyway).

> It would be a little more involved with mounts/umounts in between so
> that no two of them would be mounted at any one time.

What benefit would there be for only having one backup target active at
a time?

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE3asj+xn6fYUcweBnbWVw5UznKGAFAmK2aBIACgkQbWVw5Uzn
KGAqiw//dBDUKkySJ/LN+e++5BfU95kbv5Oy3dAq22Bj64p8xT4bs480FN4awa6c
4e8R6GOJWvAzb/4WCPa3cZe/k8ksD7Yh6FOxMWnRfzoV05vBqRqer0FDruiCCanm
TFpecqaTniBLG6ettmhPZW1gXEgygSP6L9kzjhLFFCMZyqRo/EPyhBgJWaz0e9mP
c6xbftpHU/Ok+Z/Kk4S7MdBDy9nQOGZjVXb77I9Lt416IZ38L/NDL6KvM61Rj/Pr
R1CtzxXdytRxn1aA1/1mVm/yW55hSCD806jjcBCiwcOSjDKdq97Te75V2FRFuGiy
A6pwLGmQQznt3rtenj3zZ35ccDlh3WnPyT1yNRiQTVlSnCFBykpzcdTT7Z/LEIqC
uhJrFq1w6T02NcUYakANv/E+N9ODNZOzSCLTMW0+EF/STQ9nEgrdxuzsiL2WOjzh
ey69Twc5ec1PPRg0muIE9MXdSSmfH8+E7BeRfH+Q9qqaN7IhjhwoV05/plgBnE4/
0wO9CL7JFt0OtLVtXpcMlPSdiKR8v8fd5oVM9VDIv2tjk0Qhlwe1YxPcr0MwCSlV
jfTZlxTCI+Cd1/LKlqeCv8F0fFPFwCihy8APWkZ+l8zsqhvlefG2m3TG/LlguBp0
OFKeHzQheUg6YS8J4ahAvqXH3wT1AErkm7rVw8pnlrXCXUstnc4=
=VMmo

bad sector

unread,

Jun 24, 2022, 11:02:28 PM6/24/22

to

On 6/24/22 21:42, Dan Purgert wrote:
> bad sector wrote:
>> On 6/24/22 14:12, Dan Purgert wrote:
>>> bad sector wrote:
>>>> [...]
>>>> The reason I want exactness is because after the copy
>>>> I want to add 1 file to the source, then do some sort
>>>> diff test and see as a result ONLY that added file and
>>>> its size as the difference.
>>>
>>> Easiest way to do this is add the '-P' switch.
>>>
>>> It changes the output from something like this (-avch):
>>> [...]
>>> To this (-avchP):
>>> sending incremental file list
>>> ./
>>> testfile2
>>> 21 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=1/8)
>>>
>>> sent 413 bytes received 38 bytes 902.00 bytes/sec
>>> total size is 192.55K speedup is 426.94
>
>> Thanks!
>
>> I guess I'll just have to get used to the overhead difference.
>
> What "overhead" difference? You mean the "semi-accidental
> defragmentation" thing where space-on-disk is less?

yes

> Assuming the intention is that "this" gets backed up in triplicate;
> something like this:
>
> if [[ bu1 is mounted ]] then
> rsync original bu1
> else
> mount
> rsync
> fi
>
> if [[ bu2 is mounted ]] then
> rsync orig bu2
> else
> mount
> rsync
> fi
>
> if [[ bu3 is mounted ]] then
> rsync orig bu3
> else
> mount
> rsync
> fi

I haven't decided how I'm going to do it, conceptually

> Personally, all (linux) machines I own run a script along these lines:
>
> #get user $HOME from /etc/passwd
> dir=$(grep $1 /etc/passwd | cut -d\: -f6)
> #add trailing slash to stop extra directory creation on DEST
> dir=${dir}/
> #rsync to remote host
> rsync [OPTS] $dir backups@backup-server:remote/${HOSTNAME}/$1
>
> This script is run daily on the laptop/desktop/VM(s).
>
> In turn, I have an external SSD that I plug into "backup-server" weekly
> and rsync the "backups" user $HOME/remote directory to. Finally I have
> a bimonthly(ish) copy that goes to the inlaws (well, the SSD anyway).

I had been doing bu's about once every 2-4
weeks but I want to step that up to every
week initially; maybe week-1 to bu-disk-1,
week-2 to bu-disk-2, week 3 to bu-disk-3.
Then on week-4 I'd bu to another disk that
is only then within walking distance of
the box... something along these lines.
Files I work on in any active frenzy I usually
just copy manually into the first bu-disk
each day.

>> It would be a little more involved with mounts/umounts in between so
>> that no two of them would be mounted at any one time.
>
> What benefit would there be for only having one backup target active at
> a time?

Old habbit mount/umount everything as needed,
never leave too much mounted.

This was easy at one point when I was using
a cascading KDE widget that let me mount/umount
anything with two clicks to existing dedicated
mountpoints (long story).

https://i.imgur.com/kUg7w8Z.png

Aragorn

unread,

Jun 25, 2022, 6:56:00 AM6/25/22

to

On 24.06.2022 at 19:55, bad sector scribbled:

> I don't think I have any hard links in /home/userX, nor do I think I
> could recognise one if I saw one :-)

A hard-link is, for all intents and purposes, an additional file name,
which may or may not be stored in the same directory as the original
file. You can recognize a hard-link by using "ls -l" and looking at
the link counter, e.g. ...

[nx-74205:/dev/pts/2][/home/aragorn]
[aragorn] > ls -l .bashrc
-rw------- 1 aragorn aragorn 8299 Jan 29 07:58 .bashrc
↑
↑
link counter

If the link counter reads "2" or higher, then the file has multiple
hard-links.

You can also check using the "stat" command, e.g. ...

[nx-74205:/dev/pts/2][/home/aragorn]
[aragorn] > stat .bashrc
File: .bashrc
Size: 8299 Blocks: 24 IO Block: 4096 regular file
Device: 0,51 Inode: 260 Links: 1
Access: (0600/-rw-------) Uid: ( 1000/ aragorn) Gid: ( 1000/ aragorn)
Access: 2019-04-27 19:10:18.766646000 +0200
Modify: 2022-01-29 07:58:09.037628385 +0100
Change: 2022-01-29 07:58:09.037628385 +0100
Birth: 2019-04-27 19:17:24.826641978 +0200

--
With respect,
= Aragorn =

bad sector

unread,

Jun 25, 2022, 7:38:15 PM6/25/22

to

On 6/25/22 06:55, Aragorn wrote:
> On 24.06.2022 at 19:55, bad sector scribbled:
>
>> I don't think I have any hard links in /home/userX, nor do I think I
>> could recognise one if I saw one :-)
>
> A hard-link is, for all intents and purposes, an additional file name,
> which may or may not be stored in the same directory as the original
> file. You can recognize a hard-link by using "ls -l" and looking at
> the link counter, e.g. ...
>
>
> [nx-74205:/dev/pts/2][/home/aragorn]
> [aragorn] > ls -l .bashrc
> -rw------- 1 aragorn aragorn 8299 Jan 29 07:58 .bashrc
> ↑
> ↑
> link counter
>
>
> If the link counter reads "2" or higher, then the file has multiple
> hard-links.
>
> You can also check using the "stat" command, e.g. ...

Jesus, Aragorn, I didn't need this this evening after a hard day of heavy-equipment maintenance, I don't even know what planet I'm on right now :-)

I think I'm onto it, hard links are like work stations with cpu's all working the same data albeit possibly differently, soft-links are like extra keyboards and monitors connected to ONE of the work stations. But I still can't think of any use for the former.

David W. Hodgins

unread,

Jun 25, 2022, 7:51:49 PM6/25/22

to

On Sat, 25 Jun 2022 19:38:07 -0400, bad sector <forgetski@postit_invalid_.gov> wrote:
> I think I'm onto it, hard links are like work stations with cpu's all working the same data albeit possibly differently, soft-links are like extra keyboards and monitors connected to ONE of the work stations. But I still can't think of any use for the former.

A softlink is one directory entry pointing to the real directory entry for the file,
which points to the inode within a file system. Delete the file, and the softlinks
are broken, pointing to nothing.

A hard link is multiple directory entries all pointing to the same inode. Delete
one of them, the others are unaffected other then the link count in the directory
entries.

Regards, Dave Hodgins