Speeding up transfer of file ownership?

79 views
Skip to first unread message

Peter Smulders

unread,
Jan 10, 2018, 6:14:49 PM1/10/18
to GAM for G Suite
I am dealing with a set of files numbering in the tens of thousands, if not ten times as much, that need to be transferred from one user to another. This is not a situation like 'everything from UserA to UserB', but rather various folders with many, many subfolders and files. I have a working GAMADV-X command, so that is not the problem. Rather, it is the time it takes. The processing rate seems to fluctuate a bit but comes out at slightly below one per second. This works out to hours and hours of a running process.

Before I start duct taping a solution, I want to ask if there is a built-in way to have GAMADV-X batch some of the requests, or run parallel processes, etc.

This is fresh install of GAMADV-X; I have not touched gam.cfg other than the prescribed basic config details.

Thanks for any hints.

--peter

Ross Scroggs

unread,
Jan 11, 2018, 2:39:33 PM1/11/18
to google-ap...@googlegroups.com
Peter,

This isn't easy. What command did you issue?

Ross

--
You received this message because you are subscribed to the Google Groups "GAM for G Suite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-manager+unsub...@googlegroups.com.
To post to this group, send email to google-apps-manager@googlegroups.com.
Visit this group at https://groups.google.com/group/google-apps-manager.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-apps-manager/aaf68520-db30-4c89-aa42-5272d9cb185f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Peter Smulders

unread,
Jan 12, 2018, 4:13:51 AM1/12/18
to GAM for G Suite
I used the straightforward

$ gam UserA transfer ownership [FolderID] UserB

I suspect the inherent difficulty in parallelisation lies in having to make sure files and folders are shared before transferring, maintaining execution order down branches to set the correct parent folder and taking asynchronicity into account.

For me personally, I will experiment with more commands in different session to transfer branches rather than full trees.

As an aside for a possible implementation: under the assumption you can access a list of IDs that share a parent folder (i.e. contents of a folder) maybe you could use a recursive algorithm:
  • (using pool of threads of size MAX_THREADS or whatever -- the usual batching facility)
  • (File ID as sole input)
  • Wait for available thread from pool
  • If it is a File --> check if excluded by any of the options to the gam command --> process --> release threadlock.
  • If it is Folder --> unless excluded, transfer folder --> grab a list of folder contents --> release threadlock --> call self on each item in this folder.
I suppose you would preparse the file set for inclusion / exclusion and use a flag to indicate that preparsing has already been done. This way, the parallellisation scales up or down to the limit of THREAD_POOL_SIZE. It isn't 'pure' parallel batching because the division is across folder branches rather than fixed size batches (so a very large folder with many files will still take a long time) but given a typical dataset I expect significant speed gains.

In case of API failures, maybe mark the IDs with a failure code and set a flag. Check for the flag at the end of the whole loop and reprocess the failures that might be transient issues.

(just my $0.02 -- ignore at will)

+KimNilsson

unread,
Jan 15, 2018, 5:00:01 AM1/15/18
to GAM for G Suite
You can always prepend a transfer command with a share command, making sure UserB is shared with the content that is about to be transferred.

Peter Smulders

unread,
Jan 15, 2018, 5:34:22 AM1/15/18
to GAM for G Suite
... making sure UserB is shared with the content that is about to be transferred.

I don't know about the impact on efficiency, but GAMADV-X already takes care of this by default. I doubt that a separate command to ensure the share, when added to the time for transfer command will be faster at all, since there is always overhead of setting up the data structure, etc, which has to be done for two commands rather than just the once.

I will experiment (just not right now) with running multiple share commands in parallel, operating on different branches of the tree and report back here.

--peter

Kim Nilsson

unread,
Jan 15, 2018, 6:55:07 AM1/15/18
to Google Apps Manager
Yeah, separate threads for separate content is probably always a good idea. Maybe even using separate client accounts to spread amounts and speed of of API calls. 
Reply all
Reply to author
Forward
0 new messages