Migration flattened folders

530 views
Skip to first unread message

Duncan Isaksen-Loxton

unread,
Nov 24, 2021, 2:12:39 AM11/24/21
to GAM for Google Workspace
Hi Folks, 

This is a bit of an odd one, but I'm hoping someone may have an idea on a magical solution. 

We recently migrated a client from an old Google Workspace and migrated it into an existingone (background is merging of companies). 

The source Google Drive had 128Gb of data in it, in thousands of folders, and after the migration this has been flattened into the root My Drive. The folder hierarchy has been created but they are all empty. 

Why, we are not sure, the third party migration tool engineers cant give me an answer. 

The only idea I have now is to script out the tree hierarchy on the source, and try to move them all based on that data in the destination. I can see several issues here, the biggest being it will need to be based on name, as the file ID's won't match. I may be able to get the mapping from the tool however. 

We could delete and remigrate, but that will take about 7 days. We could also delete the source and migrate, but thats risky as the user has modified some files (there is a backup elsehwerre however).

Does anyone else have any ideas on how I might be able to work this out for the client? 

Guibson Prieto

unread,
Nov 24, 2021, 2:18:42 AM11/24/21
to google-ap...@googlegroups.com
May i ask you what type of migration tool are you using and how many users is this about? I'm guessing the issue probably it's mappings from source to destination, since they all need to exist in the destination, a more detailed breakdown would be helpful to determine any path

Guibson Prieto

--
You received this message because you are subscribed to the Google Groups "GAM for Google Workspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-man...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-apps-manager/c9017551-edd2-449a-bb8f-a3329871eb58n%40googlegroups.com.

Duncan Isaksen-Loxton

unread,
Nov 24, 2021, 5:13:08 AM11/24/21
to GAM for Google Workspace
It is the commercial CloudMigrator tool. 

Brian Kim

unread,
Nov 24, 2021, 5:48:00 AM11/24/21
to GAM for Google Workspace
Seems like some of those may have been orphaned folders? Or owned by the user but was in a folder owned by a non-migrating user? Partial user migration is tricky, and CloudM's documentation is extensive but can be at times ambiguous. Even if you generated document mappings, depending on how many passes you ran, you will need to merge the CSV into one file before being able to do anything useful with it.

I would start by doing something like below in the source domain. This would show all the files regardless of ownership including id, title, owner, and the full path to the files and folders.

Brian Kim

unread,
Nov 24, 2021, 5:49:55 AM11/24/21
to GAM for Google Workspace
Correction:

gam redirect csv ./files.csv user username print filelist fields id,title,owners.emailaddress fullpath showownedby any 

Chris River

unread,
Nov 29, 2021, 4:36:25 PM11/29/21
to GAM for Google Workspace
CloudMigrator stores the source file IDs in the properties of migrated files. So, you can use this data to match against the source data. The property key that you're looking for is named "CloudMigrator-OriginId"

If you run the command gam user <user> show fileinfo <migrated file ID>, you'll see output like this:
properties:
key: CloudMigrator-OriginHash
value: <A hash value>
visibility: PUBLIC
key: CloudMigrator-OriginId
value: <Original file ID>
visibility: PUBLIC
key: CloudMigrator-Version
value: 3.31.10.0
visibility: PUBLIC

You can get all of the source ID's in a csv alongside the new ID's with this command: gam user <user> print filelist fields id,properties. The Origin ID will almost certainly be in the column "properties.1.value", but you'll want to check for any that don't match (there shouldn't be any additional properties on the migrated files, but there could be). You would then just need to match these up with the filelist from the source user (via gam user <user> print filelist fields id,parents), match up the parent IDs with the new parent IDs (via the OriginId property on the migrated parent folders), and then add the files to the appropriate matching parent IDs.

In terms of having the CloudMigrator software fix the issue, you wouldn't need to delete the migrated data and re-migrate. I think you wouldn't even need to clear migration history for the user and re-migrate (this would force re-migration of all data, and won't result in duplicates). CloudMigrator cares a lot about folder hierarchy, so I think just running a delta migration pass (enable the option to skip overwriting updated files for safety) might be enough to fix the hierarchy. It would be the simpler option, but more risky since you would need to make sure the settings are configured correctly to prevent overwriting data; and if it didn't do what it was supposed to do the first time, then odds are that you're going to be wary of trusting the software to perform as expected with another pass.

Duncan Isaksen-Loxton

unread,
Feb 14, 2022, 4:14:58 AM2/14/22
to GAM for Google Workspace
It's taken me a bit of time to get back to this, and Chris your suggestion works out in the spot check I've done. 

On the source I can see the complete list of files and folders. 

On the destination I can see the folders where these should be, and files are not in those folders. 

I now have a filelist of the destination, including the CloudMigrator-OriginId, and the parent id. Secondly I have a filelist of the source including the file path of where it lives.

So my question now is what's the best way to fix this? 

I could combine the data in Excel and VLOOKUP on it to match the folder paths, and that would give me the folder id in the same row of the orphaned file. 

Then I've got to get a gam script / Appsscript to run over the csv filelist and if the original parent is not the same as the current parent, move the file to it's correct destination. 

If anyone has some ideas on the best way to write this I'd be all ears. 

Thanks! 

Chris River

unread,
Feb 14, 2022, 12:05:59 PM2/14/22
to GAM for Google Workspace
Fixing via another CloudM Migrate pass is likely still the best option. Test on a single account first (be mindful of Drive sharing, the test account should have minimal permissions to other folders to limit the impact of the test). You can fix via GAM, but it is going to be a lot of work and carries a greater risk of mistakes.

Excel is probably going to be the easiest, and it gives you a chance to spot check things to make sure everything will be updated as you expect it to. Since you have a filelist of the destination with originid's already, you can map the migrated folders to the source folders via these origin ID's. This will be more reliable than folder path.

I had to create shortcuts for a customer as we used Google Workspace Migrate, which (at least at the time) did not not support migrating shortcuts. The script I created won't apply directly in this case, but since I needed to create shortcuts in the correct destination folders, a subset of my code did roughly what you're trying to do. I've tried to adapt it here for you, but mainly "on paper" and untested, so this script likely won't work as-is. Hopefully it can help you get most of the way though.

$sourcedomain = "source.org"
$destdomain = "dest.org"
Import-Csv $usersfile | ForEach-Object -ThrottleLimit 2 -Parallel {
  Set-Alias "gam" ~/bin/gamadv-xtd3/gam
  $sourcedomain = $using:sourcedomain
  $destdomain = $using:destdomain
  $sourceuser = $_.source
  $destuser = $_.dest
  Write-Host "Starting to process $destuser..."
  $alldestfiles = "query","'me' in owners","not 'me' in owners" | # this worked around an issue with obtaining all items located within a users Drive
   gam select $destdomain config cache_discovery_only true show_gettings false csv_output_header_filter JSON redirect csv - multiprocess noheader csv - gam user $destuser print filelist fullquery ~query fields name,id,parents,owners,mimetype,shortcutdetails.targetid,properties formatjson quotechar "'" 2>$null | % {
    $_.TrimStart("'").TrimEnd("'") | ConvertFrom-Json | Group-Object -Property {
      if ($_.Properties.key -eq 'CloudMigrator-OriginId') {
        ($_.Properties).where({ $_.key -eq 'CloudMigrator-OriginId' }).value
      } elseif ($_.shortcutDetails.targetId) {
        $_.shortcutDetails.targetId
      } else {
        $_.id
      }
    } -AsHashTable
  }
  $allsourcefiles = "query","'me' in owners","not 'me' in owners" |
   gam select $sourcedomain config cache_discovery_only true show_gettings false csv_output_header_filter JSON redirect csv - multiprocess noheader csv - gam user $sourceuser print filelist fullquery ~query excludetrashed fields name,id,parents,owners,mimetype,shortcutdetails.targetid formatjson quotechar "'" 2>$null | % {
    $_.TrimStart("'").TrimEnd("'") | ConvertFrom-Json | Group-Object id -AsHashTable
  }

# Everything below here is more theory-crafted, and so is much less likely to work as-is than the lines above
$incorrectdestparents = $alldestfiles.where({ $_.parents -neq $allsourcefiles.$($_.id).parents })
if ($incorrectdestparents.count -gt 0) {
  [Array]$parentstoupdate = foreach ($item in $incorrectdestparents.where({ $_.Values.ownedByMe -eq $true -and ($_.Values.parents) }).values) {
      if ($alldestfiles.$($item.id).trashed -eq $true) {
        continue
      }
      $parentId = $null
      if ($item.parents.isRoot -eq $true) {
        $parentId = $null
      } else {
        if (-not $item.parents.id) {
          continue
        }
        $destparent = $alldestfiles.$($item.parents.id)
        if ($destparent) {
          $parentId = $destparent.id
        } else {
          $parentId = $item.parents.id
        }
      }
      [PSCustomObject]@{
        User = $destuser
        id = $id
        parentId = $parentId
        originFileId = $item.id
      }
    }
if ($parentstoupdate.Length -gt 0) {
  [Array]$parentsUpdated = $parentstoupdate.Where({$_.parentId.count -eq 1}) | ConvertTo-Csv | gam select $destdomain config cache_discovery_only true show_gettings false csv_input_row_filter "parentId:regex:^.+$" redirect csv - multiprocess csv - gam user ~User update drivefile ~id parentid ~parentId csv 2>$null | ConvertFrom-Csv
}

#Optionally, if you want to keep an indicator on the updated files that you have updated them, you can do the below. You can also modify this (or add new properties) that store the original parent id (from the source or the destination), or anything else that you may find helpful
$parentsUpdated | ConvertTo-Csv | gam select $destdomain config cache_discovery_only true show_gettings false csv - gam user ~User update drivefile ~id returnidonly publicproperty MovedToNewParent True | Out-Null

$parentsUpdated
} #End of the  "Import-Csv $usersfile | ForEach-Object -ThrottleLimit 2 -Parallel {" loop

The script outputs an object that you can watch directly on the screen, or more helpfully pipe out to a csv (via | Export-Csv) so you have a record of the actions taken.

You also may need to factor in that it is no longer possible for files to have multiple parents, while in your source domain you may have some that do have multiple parents. You'll have to decide which folder to locate the primary item in, and which folder (or if) to create a shortcut in; the above script does not handle creating shortcuts for multi-parented items. If you're going to take the route of fixing this via GAM, you'll need a more comprehensive test account to validate the scripting with than you will with taking the route of fixing via CloudM, as there are more variables that you'll need to account for.

Duncan Isaksen-Loxton

unread,
Feb 17, 2022, 4:06:47 AM2/17/22
to GAM for Google Workspace
Thanks Chris thats superb! 

Interestingly we did re-run the migration several times in the hope it would fix it, but even the CloudM support team couldn't work out why this particular set of files had been flattened. There were several other folders in the same migration, with fairly deep trees that had no issue. 

I double checked aliases and double folder locations, and thank fully there are none, so this is fairly straight forward. 

I've opted for the Excel route, I brought both destination and source file lists into Excel sheets, removed anything that did not have the CloudMigrator-OriginId parameter, ignored any folders and did some matching to get a final CSV containing the owner, file id, and new parent id (based on matching the CloudMigrator-OriginId to the origin sheet id, getting the string represenation of its parent folder and then matching that to the destination parent folder to get the new parentid)

We did a couple of test runs with 4, and then 10 files and had the user confirm the changes. I've now run this on about 10,000 files and they look fine in their rightful homes. 

We now have abotu 40,000 files that did not migrate (of the 500,000 or so) and I am running CloudM once more to bring those across with the 'Do not overwrite changed files' checked. 

My last step witll be to do this all again and re-home them if necessary. 

Brian Kim

unread,
Mar 25, 2022, 9:06:00 AM3/25/22
to GAM for Google Workspace
I have a problem with regards to a last migration that I ran involving a partial acquisition and subsequent migration and thought I'd ask collective wisdom of some migration experts in this thread. All I was given was a list of users to migrate and successfully completed or so I thought.

3 months later, a user reports missing files, and investigation showed that many items were owned by service accounts not in the scope for migration and the ownership is mixed (either due to AODocs being used in the past, or users leaving the organization and ownership transferred). This also caused duplicate folders to appear (probably an existing problem due to two folders sharing a parent but not visible to the end user)

In order to migrate the missing files, I asked the users to first identify the folder paths that should have been migrated (this user only owned little over 2,000 items but looks like he added 100,000+ items via "add to My Drive").

What are my options for fixing this issue?
  1. Selecting "Allow Alternate Ownership" in CloudM - this has caused some issues before and avoided since
  2. Changing ownership after user identifies the paths and re-migrating (I realized from the thread that it's possible to avoid updated files being overwritten)
  3. Moving to Shared Drives
I started with gam config auto_batch_min 1 redirect csv ./ownedbyother.csv multiprocess csvfile users.csv:email print filelist fullpath showownedby others fields id,title,owners.emailaddress

bq load --autodetect --source_format=CSV dataset.table ./ownedbyother.csv

This file is too large and trying to load into BigQuery keeps failing so I think I would need to do some clean up first (there are 29 columns, which means there is a file somewhere with 23 different paths!)

Chris River

unread,
Mar 25, 2022, 10:09:56 AM3/25/22
to GAM for Google Workspace
I recommend avoiding running another migration pass if at all possible, and instead just sharing the source data over to the destination domain and just leave it or manually copy from there. If you need to run another migration pass to the live destination accounts, then I think probably the best option is to map the service accounts to new service accounts in the destination. For example, in your user list, map aod...@source.com to arc...@destination.com. This is similar to the "Allow Alternate Ownership" option, but it gives you control; in particular, it ensures that these items do not become owned by the user in the destination, so file ownership will be a better match for the current state.

So, I would try the following setup (assuming the rest of the settings are the default or otherwise configured as you prefer):
1. Source Platform > Document:
  • Migrate Items Only From Listed Users: Enabled
2. Destination Platform > Document:
  • Allow Alternate Item Ownership: Disabled
  • Skip Post Processing of Existing Items: Enabled
3. Users:
  • Include the user that is reporting the missing files, and ensure the "Migrate" option is enabled for them
  • Include all other source domain users (such as the AODocs user) that owns files that this user has in their Drive that you want to process data for. Ensure the "Migrate" option is disabled for these other accounts.
    • If you don't want to mirror these accounts over to the destination domain, and would instead prefer to combine the separate source service accounts into a single account in the destination (such as arc...@destination.com), then set the destination username for these to this individual account. You could even set this to the migrating user, if you do want to change the ownership of these to the migrating user.
4. Config Settings > Document:
  • Overwrite Updated Documents: Disabled
5. Config Settings > Address Replacement:
  • I would create an Address Replacements csv file that maps all source and destination accounts if any of these aren't just a simple domain change. (e.g. aod...@source.com,arc...@destination.com). This should at a minimum contain just the accounts with username differences, but you can include mirrored accounts as well for completeness (I usually do).
6. Config Settings > System:
  • If you are going to be mapping multiple source accounts to a single destination account, you'll need to enable the "Allow Multiple Sources" option.
This should result in the following:
  • All data that this user sees in their My Drive in the source that is owned by themselves or any of the other accounts listed in the Users page will be processed and migrated over.
  • Any files that were already previously migrated will be skipped and will not be updated or overwritten
  • The My Drive content in the destination domain for the other accounts listed in the user list should (mostly! see caution below) remain untouched, as their My Drive content in the source domain will not be processed
  • Ownership of migrated data will change depending on how you've configured your user and address mappings (e.g. from aodocs to archive).
Caution: CloudM really, really, really cares about folder paths. So much so that it will always (in my experience, at least) update folder paths in the destination domain to match the source domain, even when you have disabled the options to update already-migrated content. So, if the user has folders in their My Drive that is shared with other users, and migrated files/folders have been moved in the destination over the last few months that users have switched over, then these will be moved back to match the current locations in the source domain. It's been awhile since I've investigated the behavior, but just to err on the side of caution, I think this can including deleting already-migrated folders and creating new folders; which would cause the loss of any sharing permissions that have been changed and orphan files that have been created or moved into these folders after the initial migration.

Running another migration after users have already switched over for any period of time is risky, so I recommend avoiding it if at all possible. It sounds like it might not be possible here; but if sharing the files from the source domain over to the destination domain would be sufficient (e.g. by sharing a folder or Shared Drive over, and placing these items in that folder), I would recommend that. Another option to consider might be to migrate to a set of temporary accounts in the destination, breaking all associations with real users, and then using the Drive ownership transfer tool (the one in the admin console, used to preserve data before deleting accounts) to transfer ownership of the new set of migrated files to the appropriate user(s). This would result in duplicates, but it would have less risk.

Brian Kim

unread,
Mar 25, 2022, 10:52:24 AM3/25/22
to GAM for Google Workspace
Wow, thank you for such a detailed response! I will definitely err on the side of caution on this one, as I have definitely seen CloudM deleting and creating files/folders for something as simple as permission change. 

Duncan Isaksen-Loxton

unread,
Mar 27, 2022, 8:18:26 PM3/27/22
to GAM for Google Workspace
Further to Chris' response which frankly I can't beat (as he helped me last time!) 

We had this issue a while back, and in another thread we discussed our solution. 

  1. We indexed all files in the user My Drive, owned by an external party. 
  2. Then all of these were copied with a known prefix so they are owned by our user. 
  3. We then searched for and index all files with the known prefix. 
  4. Then we migrated those into a Shared Drive, migrated the Shared Drive, and then replaced them all in the correct directory based on the source data (using an excel sheet to map the directory path by text to find the destination folder id). This allowed us to do a migration in isolation via CloudM only on that one Shared Drive. 
Good luck!
Reply all
Reply to author
Forward
0 new messages