Rsync Google Drive

0 views

Skip to first unread message

Demetrius Dade

unread,

Aug 5, 2024, 12:58:07 AM8/5/24

to omjochamhy

Iwould like to copy the entire file system hierarchy from one drive to another..i.e contents of each directory as well as regular files in Linux platform. Would be gratefull to know the best way to do that with possibly Linuxes in-built functions. The file system is a ext family.

To improve the copy speed, add -W (--whole-file), to avoid calculating deltas/diffs of the files. This is the default when both the source and destination are specified as local paths, since the real benefit of rsync's delta-transfer algorithm is reducing network usage.

Note. Doing this may have some side-effects on running applications. E.g. if your system have applications that requires writing to this particular partition exactly when you need to clone it, these applications would need to be stopped while partition is cloned. Or if its your own application re-write it to handle this scenario.

The preferred and recommended way of doing disk/partition cloning is to do so on a non-mounted system since that will not have any non-deterministic side effects. The same goes for systems built on the concept of read-only mounts.

"This approach is considered to be better than disk cloning with dd since it allows for a different size, partition table and filesystem to be used, and better than copying with cp -a as well, because it allows greater control over file permissions, attributes, Access Control Lists (ACLs) and extended attributes."

Note that partclone is not installed by default on most systems. Use a live distro like clonezilla or install partclone from your distros packet manager (apt-get install partclone on debian-based sytsems).

One of the other big reasons for using rsync in a large, recursive copy like this is because of the -u switch (or --update). If there is a problem during the copy, you can fix it up, and rsync will pick up where it left off (I don't think scp has this). Doing it locally, cp also has a -u switch.

Incidentally, if I ever have to use windows, I use rsync from cygwin to do large recursive copies, because of explorer's slightly brain-dead wanting to start from the beginning (although I find Finder is OS X even worse)

'dd' is awesome, but ddrescue (apt install gddrescue) is even better. If dd gets interrupted, there is no way to restart (another good reason to use rsync). When you use ddrescue with a logfile, it keeps track of which blocks have been copied.

When backing up a dual boot Windows/Linux system, I use ntfsclone for the Windows partitions and ddrescue for the Linux partition and dd for the MBR. (I haven't tried to back up a dual boot system using GPT/UEFI.)

What I'd love to see is a ddrescue tool that can create files like ntfsclone where unallocated space is marked with control characters. This makes the image not directly mountable, but allows it to be only as big as the contained data.

Note that this prevents the update of access times, so if you need to re-run the rsync, it might attempt to copy ALL files again. To prevent that, add --ignore-times and --checksum options to your rsync line, so that it only copies the files that have been changed since your first attempt.

I'm assuming that none of that stuff is necessary as long as I'm just rsyncing from a computer to a firewire-connected external drive. I'm I wrong in assuming that? Are things really going to be more complicated than that innocuous command?

Rsync works fine across local drives. However, if it detects local paths it automatically goes into --whole-file mode which does not copy the diffs, but just copies the source file over the destination file. Rsync will still ignore files that haven't changed at all though. When bandwidth between the source and destination is high (like two local disks) this is much faster than reading both files, then copying just the changed bits.

In some rare instances, I've had to add additional parameters to account for changes in login accounts across remote machines, changing ports, and even specifying where 'rsync' lives on the remote host... but those are not directly applicable to your question.

However, if one or both drives happen to be NTFS formatted, being accessed from *nix or even from within Windows using Mobaxterm/cygwin, then rsync incremental functionality wouldn't work well with rsync -a (archive flag)

One thing you might want to consider when using rsync however is to make use of the --link-dest option. This lets you keep multiple backups, but use hard links for any unchanged files, effectively making all backups take the space of an incremental. An example use would be:

It really depends on whether you're running databases or not. Rsync will grab a snapshot of every file, and ignore any intervening writes. If you want to back up a database, you should look at setting up an ignore filter and running DB dump tools before the rsync.

Your command as written should work, however you might want to look at a program called rsnapshot which is built on top of rsync and keeps multiple versions of files so you can go back and look at things as they were last week or last month. The configuration is pretty easy and it is really good at space optimization so unless you have a lot of churn it doesn't take up much more space then a single backup.

Finally I have ended up with 'backup2l - low-maintenance backup/restore tool', it's easy. I like the way it manages planning and rotation (in levels). I run it whenever I have my USB external drive attached from the command line but you can also automate it.

Try dirvish to do the backup.

It uses the hardlinks from rsync in the so-called vaults. You can keep as much of your older dumps as the USB disk can take. Or set it up in a an automated way.

Once you understand the idea of dirvish it more convient to use than rsync with all his options it self.

I do not use rsync with local drives but Rsync is wonderful for sync, cloning, backup and restore of data between networked linux systems. A fantastic Linux network enabled tool worth spending time learning. Learn how to use rsync with hard links (--link-dest=) and life will be good.

In the name of speed, Rsync, in my experience, changes automatically many operational parameters when rsync believes it has detected two local drives. Making matters worse. What is considered "local" from rsync's perspective can at times not really be that local. One example rsync sees a mounted SMB network share as a local drive. One can argue and be correct in explaining that in this case for rsync as a program instance the drives are all "local" but this misses the point.

The point is that scripts that operate as expected when used between a local and a remote drives do not work as expected when the same script is used where rsnyc sees the data paths as two local drives. Many rsync options seem changed or do not seem to work as expected when working with all local drives. File updates can slow to a crawl when one of the "local" drives is a networked SMB share or a large slower USB drive.

For example with "cwrsync -av /local/files/ /mountedSMBshare/files" and no -c (checksum) option where need for transfer should to be determined by file size and date with all local drives I see whole files copied between source and destination when the files have not even changed. This is not helpful behavior when one "drive" is a slower SMB networked share and the other a slow NTFS USB drive. A ssh into the SMB share server would be much better but this is not always possible and windows stuff, much hated, is part of everyday commercial life.

I would have preferred that rsync's operation was consistent regardless of the drives "location" and simply provide a --option for the user to invoke "local" operation when a speed advantage was seen as available and helpful. In my humble opinion this would be more consistent operation making rsync easier to use and more functional.

I use rsync a lot, and I leave the delete trigger off. If I start to notice a large disparity between the two filesystems, I just log in, enable the delete trigger, and manually run it... when it's done, I disable it again.

This provides an added benefit if I accidentally delete something.. even if the rsync job has already ran... I can just SSH in, copy the file from my backup, to my source, and problem solved (I do not expose my backup drive to anything but rsync.. no SMB, no NFS, etc.)

Thanks for all of the replies. It's helpful to think through this. This wasn't what I had in mind when I created the NAS, but I'm new to this so still learning. I'll let it run as-is for a couple of weeks to assess how quickly the slave drive is filling up.

Rclone is a command-line program to manage files on cloud storage. Itis a feature-rich alternative to cloud vendors' web storageinterfaces. Over 70 cloud storage products supportrclone including S3 object stores, business & consumer file storageservices, as well as standard transfer protocols.

Rclone has powerful cloud equivalents to the unix commands rsync, cp,mv, mount, ls, ncdu, tree, rm, and cat. Rclone's familiar syntaxincludes shell pipeline support, and --dry-run protection. It isused at the command line, in scripts or via its API.

Rclone really looks after your data. It preserves timestamps andverifies checksums at all times. Transfers over limited bandwidth;intermittent connections, or subject to quota can be restarted, fromthe last good file transferred. You cancheck the integrity of your files. Wherepossible, rclone employs server-side transfers to minimise localbandwidth use and transfers from one provider to another withoutusing local disk.

Rclone is mature, open-source software originally inspired by rsyncand written in Go. The friendly supportcommunity is familiar with varied use cases. Official Ubuntu, Debian,Fedora, Brew and Chocolatey repos. include rclone. For the latestversion downloading from rclone.org is recommended.