Duplicate File Finder Windows Server 2012 R2

0 views

Skip to first unread message

Annegret Haldiman

unread,

Aug 4, 2024, 10:08:49 PM8/4/24

to lepervere

Youmight want to consider implementing it on the new file server as well if it is Windows Server based, since it allows you to filter file types and assign quotas to folders, which will make the management of content in shares much easier and adds another layer of defense to your server if implemented properly. Adding a SMTP server to the configuration allows you to send explanatory mails when users try to copy unauthorized files to the protected share, near their quota limit and it can send scheduled reports to you regarding usage as well.

You can assign a file policy to the root folder where the share(s) reside on (monitoring only, no need to enforce it) and once it has had a chance to run you will be able to pull reports for large files, duplicate files, files with a specific extension, files by modify date (for unused files) and so forth.

FSRM seems like a great idea to help keep it from become what it is now again, so that will absolutely be setup on the new server. Part of the removing duplicate files is to trim down the extraneous data before I move it both for the time, space, and sanity savings. I do have some cooperation from other departments to slim down and organize a bit, but in the end it will probably be a lot of unilateral decisions on my part.

Hello, was wondering if anyone can recommend a program out there that will search a server drive/drives/network shares for duplicate files? I have several users who have a bad habit of duplicating files to work on a project, then just leave all the copies behind, filling up my drive shares.

I run this on 2008 R2. It is also good to see who is using up the most space on the volume and what files they own. Another one I have used in the past is Fast Duplicate File Finder Free Fast Duplicate File Finder - Remove Duplicates

Enterprise data storage rapidly grows in volume and density year after year. However, not all of this data is business critical. Studies show that more than 30 percent of the data stored by an organization is redundant, obsolete, or trivial (ROT).

It analyzes file metadata to find and report on duplicate copies of files in Windows file servers, and workgroup environments. Users can also delete these copies right from the dashboard for quick and simple storage optimization

DataSecurity Plus can now find and report on duplicate files in your domain. You can view useful breakdowns on the file type composition of detected duplicate files under the Storage Overview tab, along with details on how much space is taken up by them.

Our duplicate file finder technology does more than just free up disk space - it reduces the expense of time-consuming backup operations, brings clarity to document storage areas, speeds up file searches and indexing processes, reduces clutter, and improves accessibility for users.

Under active development since 2005, Duplicate File Detective is a business-class duplicate finder with deep scalability, robust duplicate file detection, and excellent performance.

Try it and see why so many users depend on Duplicate File Detective.

For one of the projects, I needed a PowerShell script to find duplicate files in the shared network folder of a file server. There are a number of third-party tools for finding and removing duplicate files in Windows, but most of them are commercial or are not suitable for automatic scenarios. (adsbygoogle = window.adsbygoogle []).push();

The following PowerShell one-liner command allows you to recursively scan a folder (including its subfolders) and find duplicate files. In this example, two identical files with the same hashes were found:

From Out-GridView docs: The PassThru parameter of Out-GridView lets you send multiple items down the pipeline. The PassThru parameter is equivalent to using the Multiple value of the OutputMode parameter.

I have a 15TB storage network, and I am down to about 2.5TB now (due to a large amount of duplicates). I have tried many scanners, but I have had little success, eventually they all crash due to the massive amounts of data. Is there any program that you know of that will be able to handle these large loads. I don't care about the platform that it runs on.

If you haven't done so already, you may be able to work around your problem by cramming more RAM into the machine that's running the duplicate detector (assuming it isn't already maxed out). You also can work around your problem by splitting the remaining files into subsets and scanning pairs of those subsets until you've tried every combination. However, in the long run, this may not be a problem best tackled with a duplicate detector program that you have to run periodically.

You should look into a file server with data deduplication. In a nutshell, this will automatically only store 1 physical copy of each file, with each "copy" hardlinked to the single physical file. (Some systems actually use block-level deduplication rather than file-level dedup, but the concept is the same.)

Newer advanced filesystems such as ZFS, BTRFS, and lessfs have dedup support, as does the OpenDedup fileserver appliance OS. One or more of those filesystems might already be available on your Linux servers. Windows Storage Server also has dedup. If you have some money to throw at the problem, some commercial SAN/NAS solutions have dedup capability.

Keep in mind, though, that dedup will not necessarily help with small, slightly modified versions of the same files. If people are littering your servers with multiple versions of their files all over the place, you should try to get them to organize their files better and use a version control system--which only saves the original file and the chain of incremental differences.

64 GB should be sufficient for caching at least 1 billion checksum-file path entries in physical memory, assuming 128-bit checksums and average metadata (filesystem path, file size, date, etc.) no longer than 52 bytes. Of course, the OS will start paging at some point, but the program shouldn't crash--that is, assuming the duplicate file finder itself is a 64-bit application.

If your duplicate file finder is only a 32-bit program (or if it's a script running on a 32-bit interpreter), the number of files you can process could be vastly less if PAE is not enabled: more on the order of 63 million (4 GB / (128 bits + 52 bytes)), under the same assumptions as before. If you have more than 63 million files, you use a larger checksum, or if the average metadata cached by the program is larger than 52 bytes, then you probably just need to find a 64-bit duplicate file finder. In addition to the programs mgorven suggested (which I assume are available in 64-bit, or at least can be easily recompiled), there is a 64-bit version of DupFiles available for Windows.

Using CC Adobe Bridge on a iMac connecting to a Windows server via smb:// where the RAW camera files are stored. He selects the folder and the RAW files display in the filmstrip on the bottom of the Bridge window where they are listed smaller number to larger from left to right. He will then select the image from there to Preview it. If it is one he wants to keep, ha makes no changes. If it is one he wants to trash, he marks it with a single star. Then when he has completed the evaluation of the photos in this folder he will select the single star photos so they all show in Preview and trash them from there.

The issue he experiencing is that while he is going through this evaluation process in Bridge, duplicate files will start showing up in the filmstrip. It appears to be completely random as to when this happens and to which RAW file. Sometimes it is not just a duplicate but triplicate files that start showing up. The dupe files are named exactly the same as the original and have the same metadata file info and size.

So when he is going through the files in the filmstrip, if a smaller numbered file name duplicates, it moves all the files in the filmstrip to the right, causing him to lose his place for what file he was working on. So this act, randomly happening to a folder with a few hundred RAW images gets a bit frustrating with all the jumping around.

If we see the dupes, we are able to go to another iMac, use Adobe Bridge to navigate to the same folder location and can see the duplicate files listed there as well. Yet, when we are on this 2nd iMac and navigate to that folder using Finder, the dupes are not seen, yet on the 2nd screen with Bridge open we still can see the dupes.

Same issue here, it just trippled the number of files and data on our NAS.

We wanted to start Bridge more but I guess it's time to look at Capture One or so as no one from Adobe seems to bother addressing the matter seriously....

Please do look at Capture 1, On1, and any other application that you think will help you, I do not care. I do not get paid by Adobe if you use Adobe products or any other product. If I thought that blue cheese would be just what you needed, I'd recommend that.

That is wrong from so many different angles as to be astonomical. Don't belive me? If you really really want a good reliable NAS-based application for your purposes, get ready to spend many hundreds if not thousands of dollars.

Wow, why so defensive...or aggressive? This is an Adobe Forum, so why assume anyone is having a go at you?!

But since you seem to have it all figured out where in the license agreement does it say what you state above:

"And Bridge, which was originally designed and has been maintained for single uses on a computer with (perhaps) external hard drives, is free."

Defensive or aggressive? I am sorry that I came off that way. But please, the way you came off was that Adobe doesn't give a "darn" about the fact that you are trying to use software in a way that it was not designed to do, and then you say that they do not take matters seriously.

Adobe will not support working with files from a server (NAS). They will support if the files are on the Mac (Apple) device. We have also attempted using a USB attached storage and still have the same issue. This is also impractical when several people are accessing many many Terabytes of images, that is why we use the Windows based server.