Rsync Gdrive

0 views

Skip to first unread message

Adimar Poynter

unread,

Aug 4, 2024, 6:06:30 PM8/4/24

to congplebimoh

DISCLAIMERUploading to a cloud storage system implies that you trust the maintainers of that system and everyone in-between to not mess with/read your data. If this is a concern for you, but you want to upload to Google Drive, anyway, please consider using some of the encryption methods mentioned here (or another encryption method of your choosing): -to.linuxcareer.com/using-openssl-to-encrypt-messages-and-files(Thanks to mostlyharmless for this info.)

UPDATE Feb. 12 2016: The gsync project has had some problems with show-stopping bugs, so you might want to try using rclone, instead. This Linux user has had good experiences with rclone. It's worth noting that, although this HowTo is Google Drive-specific, rclone works with about a dozen cloud storage services, including Amazon S3 and Dropbox. --DaneM

First, get rclone from Download the appropriate file for your operating system and architecture. Decompress the file and either run it from the directory it creates or copy it to somewhere in your PATH, like "/usr/local/bin/" and run it from any location on the command-line. Like gsync, this is a command-line program, so you will need to either run it from the terminal or write a shell script (BASH/SH recommended) to make it do what you want via a double-click. If you choose to do the latter, remember to make your shell script executable with "chmod +x ", or by using your preferred desktop environment's Properties dialogue.

When you run "rclone config", you will get to pick a name and type for your remote storage medium (referred to as "drive" in this HowTo), and it will ask you to do a few steps to authenticate with your Google account. Follow the instructions! If you run into trouble, go here:

This will upload everything once, then only upload changes (like rsync does), later on. However, you probably don't want to execute the command in exactly this way, unless you want all your data to dump into the root folder of the remote location (like a tar bomb). So, instead, format the command like this:

This will put all the stuff from "/" into the Google Drive folder called, "MyBackupFolder/", creating subdirectories as needed. "-v" provides you with information about what the program is doing, while it executes, and is optional.

If you want to put everything from the remote drive onto your computer, erasing old files on your local machine, and replacing them with newer ones from the remote source, just put the source--the remote storage location--first in the command; and put the destination--your local machine--second. (THIS IS DANGEROUS, AND WILL ERASE YOUR LOCAL DATA!) Also, you can transfer data from one place on remote storage to another place on remote storage using the same method: "rclone [options]". Refer to the documentation for instructions specific to each cloud.

It took me a while to find a good, simple, reliable way to backup my stuff to Google drive, using rsync; so I've decided to share my method with the good people at LQ. In short, the answer is to use "gsync" (NOT "grsync", which is different and broken/incomplete). It supports (so far as I can tell) ALL the same options as rsync (glee!), and lets you do it with Google Drive! You can upload to, and download from GD in this way, by picking which to use as SOURCE/DESTINATION folders.

You can check what the options do using "man rsync." I don't always use the "-c" option, since it's slow (but more thorough for checking the data). This command will delete files that are missing from the destination, and overwrite duplicates. Use with care! Note the trailing slash on the source folder; this is important! A trailing slash on the source folder means to copy the stuff IN the folder into the destination (resulting in "/mnt/DATA2/filesandstuff"). No slash means to copy the folder, itself, into the destination (which would result in "/mnt/DATA2/DATA/filesandstuff", which is probably not what you want). The destination folder ignores trailing slashes. (Thanks to suicidaleggroll for this clarification.)

Please note that you should probably not upload an entire 1TB+ drive to GD unless you have and want to use up all that storage space on the cloud. Therefore, I've specified the subdirectory of "/mnt/DATA/IMPORTANTSTUFF" to represent the important files/folders that I absolutely have to have backed-up remotely. You'll need to run a separate command for each folder (including subdirectories) that you want to upload in this fashion; make sure to change both the source and destination in the command when you do. (I haven't yet figured out how to do them all as a batch job, short of writing a script for it.) Also, I use root (sudo) for this and the rsync command because it helps manage permissions properly--but if you're certain that the current user/login owns all the files involved, you don't need it (and probably shoudn't use it, as a general security/safety precaution).

Finally, if you want to be able to walk away from it and know how long it actually took when you come back, you can prepend the "time" command to the beginning of the gsync or rsync command, like so:

If you would like to automate this process using a desktop-clickable script, and are having trouble getting it to work with sudo, check out Bash Tips: Acquire root access from within a BASH script using sudo.

Update 1-20-15: The following gsync bug is causing problems for uploading large amounts of data: short, all gsync users are currently using the same Google API key, which has a limit as to how much data can be uploaded by gsync in a single day. Patches are in the works to resolve this problem. This bug may also result in only empty directories being created, but the comments have a workaround for that. Please consult the github bug report for further updates.

The gsutil rsync command makes the contents under dst_url the same as thecontents under src_url, by copying any missing files/objects (or those whosedata has changed), and (if the -d option is specified) deleting any extrafiles/objects. src_url must specify a directory, bucket, or bucketsubdirectory. For example, to sync the contents of the local directory "data"to the bucket gs://mybucket/data, you could do:

The -m option typically will provide a large performance boost if either thesource or destination (or both) is a cloud URL. If both source anddestination are file URLs the -m option will typically thrash the disk andslow synchronization down.

Note 1: Shells (like bash, zsh) sometimes attempt to expand wildcards in waysthat can be surprising. Also, attempting to copy files whose names containwildcard characters can result in problems. For more details about theseissues see Wildcard behavior considerations.

Note 2: If you are synchronizing a large amount of data between clouds youmight consider setting up aGoogle Compute Engineaccount and running gsutil there. Since cross-provider gsutil data transfersflow through the machine where gsutil is running, doing this can make yourtransfer run significantly faster than running gsutil on your localworkstation.

The rsync -d option is very useful and commonly used, because it provides ameans of making the contents of a destination bucket or directory match thoseof a source bucket or directory. This is done by copying all data from thesource to the destination and deleting all other data in the destination thatis not in the source. Please exercise caution when youuse this option: It's possible to delete large amounts of data accidentallyif, for example, you erroneously reverse source and destination.

As mentioned above, using -d can be dangerous because of how quickly data canbe deleted. For example, if you meant to synchronize a local directory froma bucket in the cloud but instead run the command:

Running gsutil rsync over a directory containing operating system-specificfile types (symbolic links, device files, sockets, named pipes, etc.) cancause various problems. For example, running a command like:

If you use gsutil rsync as a simple way to backup a directory to a bucket,restoring from that bucket will result in files where the symlinks usedto be. At best this is wasteful of space, and at worst it can result inoutdated data or broken applications -- depending on what is consumingthe symlinks.

Since gsutil rsync is intended to support data operations (like moving a dataset to the cloud for computational processing) and it needs to be compatibleboth in the cloud and across common operating systems, there are no plans forgsutil rsync to support operating system-specific file types like symlinks.

While Cloud Storage is strongly consistent, some cloud providersonly support eventual consistency. You may encounter scenarios where rsyncsynchronizes using stale listing data when working with these other cloudproviders. For example, if you run rsync immediately after uploading anobject to an eventually consistent cloud provider, the added object may notyet appear in the provider's listing. Consequently, rsync will miss addingthe object to the destination. If this happens you can rerun the rsyncoperation again later (after the object listing has "caught up").

If the -C option is provided, the command instead skips failing objects andmoves on. At the end of the synchronization run, if any failures were notsuccessfully retried, the rsync command reports the count of failures andexits with non-zero status. At this point you can run the rsync commandagain, and gsutil attempts any remaining needed copy and/or deleteoperations.

If both the source and destination URL are cloud URLs from the same provider,gsutil copies data "in the cloud" (i.e., without downloading to and uploadingfrom the machine where you run gsutil). In addition to the performance andcost advantages of doing this, copying in the cloud preserves metadata (likeContent-Type and Cache-Control). In contrast, when you download data from thecloud it ends up in a file, which has no associated metadata, other than filemodification time (mtime). Thus, unless you have some way to hold on to orre-create that metadata, synchronizing a bucket to a directory in the localfile system will not retain the metadata other than mtime.