Finding and removing duplicate files

1,531 views
Skip to first unread message

Rick Davis GLSD

unread,
Jun 5, 2023, 3:37:34 PM6/5/23
to GAM for Google Workspace
Please excuse me if this has been asked and answered, I feel like I have probably asked this question once or twice in the past and then got sidetracked before following through.

I am pretty sure that this is probably an impossible task, but worth asking for suggestions. Years ago we replaced our teachers and secretaries ancient iMacs. At the time the teachers/secretaries were in the habit of creating folders on their iMac then sharing them with other users. Some of these other users would then make a copy of the shared folder on their iMac. Chaos created.

Fast forward to the replacement of said computers. Not wanting to lose any documents, I copied all files up to a Google Shared Drive. Chaos x 2.

Now when one of the teachers/secretaries searches for an old file they find two or more of the same file. 99% of the files have not been modified in many years, so they have the same name, file size, modification date, etc.

Is there a command or series of commands that I can run that scan scrub through the entire organizations google drives find duplicate files and either move the duplicate files to a common location? Or even delete the duplicates?

Kim Nilsson

unread,
Jun 7, 2023, 5:18:48 AM6/7/23
to GAM for Google Workspace
Short answer: No.

Longer answer: You need to print all those files, and use whatever tool you have to deduplicate the list. Then gam can delete all the Duplicates.

Python is supposedly very good with managing large masses of text, but I don't think anyone has created such a deduplication script for this very purpose, yet. Maybe someone with master python skills will take pity on you and help you out.

You could also start searching for deduplication services/scripts that may be repurposed for your situation.

Malek Arslan

unread,
Jun 7, 2023, 6:06:23 AM6/7/23
to google-ap...@googlegroups.com
Hello Kim,
Can you please guide me with article to configure GAM tool on cloud shell.

Thank you 

--
You received this message because you are subscribed to the Google Groups "GAM for Google Workspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-man...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-apps-manager/57d80ef1-7804-46a5-bb62-19f69a0e1e4en%40googlegroups.com.

This e-mail and any attachments are confidential and intended solely for the use of the addressee only. If you have received this message in error, you must not copy, distribute or disclose the contents; please notify the sender immediately and delete the message.

This message is attributed to the sender and may not necessarily reflect the view of Wickes Building Supplies Limited, Registered in England and Wales 01840419, Vision House, 19 Colonial Way, Watford, United Kingdom, WD24 4JL (VAT number 336725881) or its parent company Wickes Group plc (Registered in England No. 12189061; Vision House, 19 Colonial Way, Watford, United Kingdom, WD24 4JL) and any of its subsidiaries.

Agreements binding Wickes Building Supplies Limited, Wickes Group plc or any of its subsidiaries may not be concluded by means of e-mail communication.

E-mail transmissions are not secure and Wickes accepts no responsibility for changes made to this message after it was sent. Whilst steps have been taken to ensure that this message is virus free, Wickes accepts no liability for infection and recommends that you scan this e-mail and any attachments.

Kim Nilsson

unread,
Jun 7, 2023, 10:42:38 AM6/7/23
to GAM for Google Workspace
I recommend watching this video.

Amy Bailey

unread,
Jun 14, 2023, 9:45:41 AM6/14/23
to GAM for Google Workspace
Kim mentioned finding deduplication services > here is one that gives has free trial https://app.kincaidit.com/free-trial

I know several districts that use the tool to clean up duplicate files to help with storage quotas.

Ted White

unread,
Oct 8, 2024, 11:45:40 AM10/8/24
to GAM for Google Workspace
@Amy & @Rick, have you tried Filerev? It seems to be specifically for finding and removing duplicates in Google Drive.

Rick Davis GLSD

unread,
Dec 11, 2024, 1:45:23 PM12/11/24
to GAM for Google Workspace
I did take a look at it. I was intrigued at first glance. Impressed at what it could potentially do. Then disappointed when I noticed that the only way to scan shared drives ... which is my main need ... was to sign up for the premium plan, I figured why not try some other suggestions first. Sure $150 for a year is not a lot. But when I don't know if it will fill my need...

Rick

Rick Davis

unread,
Dec 11, 2024, 2:05:47 PM12/11/24
to google-ap...@googlegroups.com
To be honest, I think my real issue here is user training for them to understand how to use sharing and collaboration. They all have a habit of making a copy of a file or document. I sometimes find duplicates in their own drive.

--
You received this message because you are subscribed to a topic in the Google Groups "GAM for Google Workspace" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-apps-manager/qYvz4l0ay8g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-apps-man...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/google-apps-manager/ec66ee0b-c2e7-4279-bfdf-b6a807d23bf5n%40googlegroups.com.


--

Rick Davis

unread,
Dec 11, 2024, 2:33:24 PM12/11/24
to google-ap...@googlegroups.com
This project has me feeling like Alice falling down the rabbit hole. I just used Ross's script to scan and mark files in 23 different shared drives.

Partway through I realized I was only scanning and marking PDF's. So I probably need to learn more about running a list of commands through a spreadsheet or csv file to look for multiple file types, like .docx, .png, .ppt, etc.

On top of that the command and script I am applying only looks at the files within the referenced shared drive. Which means if a user duplicated the file to their own drive or another shared drive. I'm not finding it with my current workflow. And if I am not correcting the user behavior or habits, this will be an endless mess or a dog that never catches its tail.

Might be worth the $150 subscription after all.

The good news is ... we're not anywhere near our storage space quota.
The main issue is when a user searches for a file using its filename, there is no telling which file they are going to find and if they don't modify it, they may find the same file in a different location the next time.
--

UUCP Tech

unread,
Mar 6, 2025, 3:29:03 PMMar 6
to GAM for Google Workspace
Rick - I'm trying not to fall too far down this same rabbit hole.

I'm curious (even though my name isn't Alice) as to what kind of progress you've made, if you're willing to share.

Thanks!

Rick Davis

unread,
Mar 6, 2025, 4:24:02 PMMar 6
to google-ap...@googlegroups.com
Sorry to say, not far at all. It's one of those projects that produces more questions than answers and gets "back-burnered" often. One large stumbling block for me is the ability or inability to scan across the entire organization of files. Meaning I can scan one shared drive and mark duplicates, but it does no real good if it is not scanning and comparing ALL shared drives. Or user drives, there could be a file in a users drive that is also duplicated in one or more other user drives as well as a shared drive. 😵‍💫

I looked at FileRev and it looks like it could do what I need to do, but I have not taken the time to assess the security risks ... given the permissions it will need ... enough to explain it to my IT Director and admins for approval.

Rick



--

UUCP Tech

unread,
Mar 8, 2025, 11:05:23 AMMar 8
to GAM for Google Workspace
We have that same problem of needing to scan across shared drives (although even being able to scan within a single shared drive would be helpful). I recently tried FileRev and it found a couple duplicates on My Drive, but I don't think we have the budget to justify paying for the full product, assuming it would even do what we need it to do. We do have some groups that insist on storing things in... let's just say "non-standard" ways. We're far better off than we were when the content was scattered across various members' personal computers, different Dropbox accounts, dusty cigar boxes under somebody's bed...

Your comment about needing to educate people is probably the most spot-on. Unfortunately, almost all of our users are volunteers, so they come and go.

You mentioned using Ross's script, but I don't see a link to that script in this thread (maybe I'm just missing it). Can you please point me to that script?

Thanks - and good luck!

Ross Scroggs

unread,
Mar 8, 2025, 11:07:55 AMMar 8
to google-ap...@googlegroups.com
Send me a Meet/Zoom invitation and I'll help.

Ross
----
Ross Scroggs



You received this message because you are subscribed to the Google Groups "GAM for Google Workspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-man...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/google-apps-manager/7108ad89-911f-4099-a53c-5ee440ddd1a7n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages