Globus google drive connector transfer slow

8 views
Skip to first unread message

Anthony Weaver

unread,
Jul 3, 2024, 3:32:35 PM (2 days ago) Jul 3
to Discuss
We've setup Globus Connect Personal on a Windows machine connected to an NMR instrument.  We then setup a recurring transfer for every ten minutes to sync (not transfer) for only new directories/files in a data folder from the computer to a Google drive Shared drive using our Globus Google connector.  Basically our researchers and their students want to share the data from the instrument through Google drive but don't want everyone logging in and out of Goolge to upload data.  

This is all working fine with the exception that the transfer times are very slow.  It takes approximately 17 minutes each time to run the sync even though each sync may only have 100-200 new files.  File sizes are tiny.  There was even one of the syncs that had no files to sync and it still took 17 minutes.  Also, the syncing is happening to a Globus guest collection we created and not directly to a mapped collection.

Thank you in advance for any replies

Tony Weaver

Is there some kind of setting we can adjust to make this faster?  When we setup the Googl Gateway we did not set the --google-drive-user-api-rate-quota option because I do not believe have a rate better than the default

Lev Gorenstein

unread,
Jul 3, 2024, 4:55:50 PM (2 days ago) Jul 3
to Anthony Weaver, Discuss
Tony,

How large is the data folder (total), and what's your sync criteria?

If you are using checksum, then it has to be computed for every present file on the source, then cross-checked with the cloud version - and that's an expensive lot of API calls for cloud storage.  And the cost has a component that's proportional to the sheer volume of the source (even when there are no changes).

At $JOB-1 I saw a similar effect when a user launched a multi-TB checksum-based sync from disk to an HPSS tape archive, causing a lot of re-staging :)


Lev

--
Lev Gorenstein
Solutions Architect
Globus // University of Chicago
e: l...@globus.org

Anthony Weaver

unread,
Jul 3, 2024, 7:55:04 PM (2 days ago) Jul 3
to Discuss, l...@globus.org, Discuss, Anthony Weaver
After doing some testing based on feedback from Lev we found that we can go from the NMR computer running PersonalConnect to a POSIX collection fairly quickly and we can go from a POSIX collection to the Google drive collection fairly quickly so we will have to continue troubleshooting why we can't go from the NMR computer directly to Google quickly.

Mark Yashar

unread,
Jul 3, 2024, 8:12:15 PM (2 days ago) Jul 3
to Anthony Weaver, Discuss, l...@globus.org
Hi Anthony,

Just in case this is useful to you, you may also find this public documentation from Stanford University regarding some challenges/difficulties that come up for users when transferring files to and from Google Drive using Globus to be helpful as well. I've copied and pasted some of the relevant documentation below:

Service Limitations

Globus has a number of limitations when working with Google Drive. These limits do not affect most common use cases, but they might affect you, so you should review them before starting to use Globus with Drive.

Drive Limitations

First, per the changes announced on January 23, there is now a 50 GB storage limit on your Stanford Google Drive space (the “My Drive” space), as well as a limit of 50 GB for each Shared Drive. If the transfer causes you to hit quota, it will eventually fail.

Next, there are some limits imposed by Google on all users of Drive:

The maximum file size on Google Drive is 5 TB. However, since the quota for all Drive spaces is 50 GB, the maximum file size on Google Drive is effectively 50 GB.

Users are limited to uploading 750 GB of data to Google Drive per calendar day. That includes both your own personal Drive, and all Shared Drive spaces. Although individual Drive spaces (your own Google Drive, and Shared drives) have a 50 GB quota, you can still reach the upload quota if you upload to many different spaces per day. Transfers that involve moving more than 750 GB of data into Drive will be automatically paused when this limit is reached, and resumed when your limit is reset. Google does not explicitly define when the calendar day resets.

If you hit the daily upload limit, and there is a file transfer in progress, that transfer will be allowed to complete. This is how it is possible to upload a multi-TB file, even when the daily upload limit is lower.

There are also limits specific to Shared Drives:

Google places a limit on the maximum number of items in a Shared Drive. The limit is 400,000 items per Shared Drive. Each file, folder, and Shortcut counts as one item.

All of the Google Drive general limitations mentioned above also apply to Shared Drives.

Globus Limitations

There are a number of limitations specific to Globus:

Globus does not support setting or copying custom permissions. Uploaded items will inherit the permissions of the parent folder.

Globus is also not able to copy Drive Shortcuts. Drive Shortcuts are similar to shortcuts on Windows or aliases on macOS. At this time, Globus is not able to copy or follow Shortcuts; attempting to copy a Shortcut will create an empty file at the destination, and generate a checksum (integrity-checking) error. Support for Shortcuts is currently on Globus’ backlog for implementation.

Files created by Google Products

Special limits apply to ‘files’ created by Google products, such as Docs, Sheets, Slides, etc..

When you ‘save’ one of these ‘files’ in Drive, Google does not actually store the data in Drive. Instead, Google Drive holds a pointer to the data within the specific Google product.

If you copy one of these ‘files’ from Drive, a file will be written at the destination, but the file will only contain the pointer data. For example, copying a Google Sheet will create a file with a .gsheet extension. The file is typically in JSON format, and if it has a url key, that will contain the URL needed to access the content in the appropriate Google product.

Also, if the destination is an endpoint system (like a desktop or laptop computer), and the Google Drive software is running at the destination, double-clicking the file will launch your default web browser, opening the appropriate Google product. For example, double-clicking on a .gsheet file will cause your preferred web browser to open the spreadsheet in Google Sheets.

If you try to upload one of these ‘pointer files’ back to Google Drive, the behavior is undefined. As of April, 2022, the following behavior was observed when uploading a .gsheet file to Drive:

  • The Google Sheet appeared in the new Drive location, and was still present in the original Drive location.

  • Globus reported a checksum verification failure after the upload.

  • Attempting to remove the Sheet from the new Drive location caused the sheet to move to the Trash. Had it not been restored, the Sheet would have been deleted after 30 days.

Therefore, it is best to exercise caution when transferring ‘files’ created by Google products. As an alternative, the content can be exported into a different format (for example, .xlsx for Google Sheets), and that exported file may be transferred as normal.

If you are OK with the limitations above, you should move on to authenticating to Google Drive."

Hope this helps

Mark Yashar

Research IT Domain Consultant, UC Berkeley

mya...@berkeley.edu




To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@globus.org.

Steven Lee

unread,
Jul 3, 2024, 9:01:08 PM (2 days ago) Jul 3
to Anthony Weaver, Discuss, l...@globus.org, Mark Yashar
Hi Anthony,

In addition to the limitations listed by Mark, Google Drive API also has rate limits: https://support.google.com/a/answer/10445916?hl=en. I quote:

Upload limits, quotas, and exemptions

Other system limits protect your data and ensure performance. For details, see Storage and upload limits for Google Workspace and Drive API usage limits.

To avoid exceeding the limits and quotas, use the following best practices and Monitor API quotas. If you think you might exceed these quotas, request an increase.

  • Batch multiple calls into a single request using the Google Drive API, especially when changing file metadata for many items that belong to the same user. See Google Drive API.
  • Do not allow a service account to create more than 400,000 files on behalf of a given user account.
  • The default quota limits for Drive API are 20,000 calls every 100 seconds, both per user and per project. This limit applies to the sum of read and write calls.
  • The rate of Drive API write requests is limited—avoid exceeding 3 requests per second of sustained write or insert requests, per account. Note: This rate limit can’t be increased.
  • If you encounter errors during the migration, follow steps to resolve them. See Retry failed requests to resolve errors.

I tested the Google Drive connector when Globus first released it many years ago. I ran into the Google Drive API rate limit in my small file upload tests right away. I tried to get Google to raise the API rate limit without success and concluded Google Drive is not useful for research data storage unless one is willing to tar up the small files first. Cheap storage is slow….


Steven Lee
Cornell University Center for Advanced Computing
721 Rhodes Hall, Ithaca NY 14853
Office: 607-255-2843
Email: sh...@cornell.edu

Anthony Weaver

unread,
Jul 3, 2024, 9:16:05 PM (2 days ago) Jul 3
to Discuss, sh...@cornell.edu, l...@globus.org, mya...@berkeley.edu, Anthony Weaver
Thank you for all the replies.  Based on all the many rate limits Google places on things I'm guessing that's my bottle neck.  We are dumping to google drive because the Chemists and their students wanted to share the data that way and they want it available their soon after the equipment finishes so they can make adjustments before the next run  Because of that  we setup a recurring data transfer for every 5 minutes between the NMR computer and Google drive to try to automate things.  In that 5 minutes of time, only maybe 200 new files may be produced, but if the chemists are running a lot of experiments then we're probably hitting one or more of Google's limits in a very short period of time.  I think we'll just have to not use Google Drive in this scenario

Anthony Weaver

unread,
Jul 3, 2024, 9:17:30 PM (2 days ago) Jul 3
to Discuss, Anthony Weaver, sh...@cornell.edu, l...@globus.org, mya...@berkeley.edu
I mis-typed, the recurring transfers are every 10 minutes not 5 but still the same ideas apply
Reply all
Reply to author
Forward
0 new messages