Removing Duplicate Files at Scale

133 views
Skip to first unread message

Delvin Bonilla

unread,
Jan 22, 2026, 4:37:58 PMJan 22
to GAM for Google Workspace
Hi,

I'm working on a Drive storage reduction project in a large environment and have been analyzing duplicate files using admin tools, GAM, and GAT+. Reporting is clear, but I'm trying to understand what is actually doable.

Has anyone successfully deleted duplicate files at scale while preserving sharing, or is that not feasible due to ownership and permissions. Also, if duplicates are shared with different users, have you found any workable cleanup approaches?

I'm also curious if anyone has used commands or workflows to safely delete non shared duplicates only. I wanted to see if others have tackled this already or if there is a recommended way to test outside of live data, such as a Workspace test environment.

Thank you.

Ross Scroggs

unread,
Jan 22, 2026, 5:24:37 PMJan 22
to google-ap...@googlegroups.com
How do you determine if two files are duplicates?

Ross
----
Ross Scroggs



--
You received this message because you are subscribed to the Google Groups "GAM for Google Workspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-man...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/google-apps-manager/fabe8f79-f331-4837-b75d-cab484913e8en%40googlegroups.com.

Delvin Bonilla

unread,
Jan 23, 2026, 10:42:51 AMJan 23
to GAM for Google Workspace
Hi Ross,

We currently are using two methods. Initially via GAM we did a pretty basic, if name = identical and size = identical, then it's a duplicate. We didn't use any file checksums, however by that alone we were able to find 23 TB worth of duplicates, with their associated file IDs.

We also use Google Admin Tools, which does provide a more information like MD5, path, who it's shared to etc.

Ross Scroggs

unread,
Jan 23, 2026, 10:59:47 AMJan 23
to google-ap...@googlegroups.com
Delvin,

Send me a private Meet/Zoom invitation, I'm curious about 25TB of duplicates.

Ross
----
Ross Scroggs


Russ Thibeault

unread,
Jan 23, 2026, 1:58:15 PMJan 23
to GAM for Google Workspace
Please do follow up on this, whether or not you have success. This is a great endeavor that would benefit so many of us and I for one would appreciate any insights.

I'm also curious about 25TB of dups, if you don't mind me asking, what is the total size of your Workspace drive? 

Ross Scroggs

unread,
Jan 23, 2026, 2:22:37 PMJan 23
to google-ap...@googlegroups.com
My bet is that domain shared files are being counted as duplicates. I'm offline for 4 houes, will check in later.

Ross
----
Ross Scroggs


Delvin Bonilla

unread,
Jan 23, 2026, 2:33:36 PMJan 23
to GAM for Google Workspace
Hi Ross,

Thank you for the offer. I will send you one as soon as I can. It's a little busy here today, so most likely looking at Monday or Tuesday if that is ok with you.

For now the command we used was: 
gam config auto_batch_min 1 redirect csv ./all_files.csv multiprocess all users print filelist choose mydrive_any fields id,name,mimeType,owners,size,md5 fullpath showownedby any

Our total workspace storage at the moment is 55 TB being used.

Ross Scroggs

unread,
Jan 23, 2026, 7:09:13 PMJan 23
to google-ap...@googlegroups.com
Delvin,

I'm in California (GMT-8 PST) and am available starting at 7:30AM either Mondaty or Tuesday; send me a private Meet/Zoom invitation.

Ross
----
Ross Scroggs


Message has been deleted

Bonilla, Delvin

unread,
Jan 29, 2026, 10:26:09 AMJan 29
to google-ap...@googlegroups.com
Hi Ross,

Thank you again for your support with this. It truly makes a difference, and we really appreciate it. 
I was able to send you a meeting for tomorrow if you can make it; if not, that's ok; we can try Monday as you stated.

Thank you again!
Delvin Bonilla


Ross Scroggs

unread,
Jan 29, 2026, 10:33:18 AMJan 29
to google-ap...@googlegroups.com
I have a schedule conflict tomorrow morning, Monday is free atfter 7:30AM PST.

Ross
----
Ross Scroggs


Message has been deleted
Message has been deleted

Ross Scroggs

unread,
Feb 6, 2026, 10:22:34 AMFeb 6
to google-ap...@googlegroups.com
Delvin,

I'm still curious about the files with lots of duplicates; do the following  for any of the files with 58 dups.

gam config auto_batch_min 1 num_threads 10 redirect csv ./FileInfo.csv multiprocess redirect stderr null multiprocess all users print filelist select id <DriveFileID> fields id,name,mimetype,owners.emailaddress norecursion showownedby any 


What do you get in FileInfo.csv


Ross

----
Ross Scroggs


On Jan 29, 2026, at 10:47 AM, 'Bonilla, Delvin' via GAM for Google Workspace <google-ap...@googlegroups.com> wrote:

Hi Ross,

A few colleagues who’ve been working on this project with me are very interested in the discussion and would like to join the call, if that’s okay.
I’ve invited them to the meeting and wanted to give you a heads-up.

Thanks again,
Delvin Bonilla
Database Administrator | Lycée Français de New York



On Thu, Jan 29, 2026 at 10:42 AM Bonilla, Delvin <dbon...@lfny.org> wrote:
Hi Ross,

Thank you again. I updated the invitation.
See you then!

Have a great weekend!
Delvin Bonilla
Database Administrator | Lycée Français de New York




--
You received this message because you are subscribed to the Google Groups "GAM for Google Workspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-man...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages