Find Duplicate Images

0 views
Skip to first unread message

Argelia Long

unread,
Aug 4, 2024, 7:28:55 PM8/4/24
to glycafkital
Ihave a few (38000) picture/video files in a folder. Approximately 40% of these are duplicates which I'm trying to get rid of. My question is, how can I tell if 2 files are identical? So far I tried to use a SHA1 of the files but it turns out that many duplicates files had different hashes. This is the code I was using:

As outlined above duplicate detection can be based on a hash. However, if you want to have near duplicate detection, which means that you are searching for images that basically show the same things, but have been scaled, rotated, etc. you might need a content based image retrieval approach. There's LIRE ( ), a Java library for that, and you'll find the "SimpleApplication" in the Download section. What you then can do is to


This may depend on the image format but you could compare by comparing the height and width and then go pixel by pixel using the RGB code. To make it more efficient you can decide a threshold of comparison. For example:


I wrote a pure java library just for this few days back. You can feed it with directory path(includes sub-directory), and it will list the duplicate images in list with absolute path which you want to delete. Alternatively, you can use it to find all unique images in a directory too.


If this does yield the same result for two files which compared different before, then metadata is in fact the source of your problem, and you can either use some command line approach like this, or update your code to read the image and compute the hash from the raw uncompressed data.


If, on the other hand, different files still compare different, then you have some changes to the actual image data. One possible cause might be the addition or removal of an alpha channel, particularly if you are dealing with PNG here. With JPEG, on the other hand, you'll likely have images uncompressed and then recompressed again, which will lead to slight modifications and data loss. JPEG is an inherently lossy codec, and any two images will likely differ unless they were created using the same application (or library), with the same settings and from the same input data. In that case you'll need to perform a fuzzy image matching. Tools like Geeqie can perform such things. If you want to do this yourself, you'll have a lot of work ahead of you, and should do some research up front.


It's been a long time so I should probably explain how I finally solved my problem. The real trick was to not use hashes to begin with and instead just compare the timestamps in the exif data. Given that these pictures were taken either by me of my wife it would have been quite unlikely for different files to have the same timestamp, hence this simpler solution was actually much more reliable.


I am wondering if there is a way to compare folders in Ubuntu? I have tried to organize my photo folders many times...and for this reason I have several folders that contain the same files (maybe a couple of extra ones) and it would be great to have a tool to figure out which files are extra and which files are identical.


diff will help you find duplicate files in two different directories, but if your mess is greater or if, for any other reason, you want to find duplicate (exact) image files in a whole directory, including subdirectories, you can use the gthumb image browser viewer, which is probably already installed in your system.


Gthumb provides a tool to search for duplicate media/audio/video/images/text/all files in a directory. To do this, just select your directory in the view mode that displays a left pane with your directory tree, and then, from the menu select Edit>Find duplicates... a dialogue window shows the duplicates and lets you choose which file(s) to delete. This procedure is visual and helpful in many cases; but it is slow, if you have too many duplicate files to delete.


But just before the image search engines return matching results, they'll quickly test the uploaded image with a number of other images in their databases to ensure the most accurate results are served. Typically, when available, the search engines may make use of metadata of the image such as the file name of the image, date, camera used, etc.


Despite all these processes, our tool delivers results pretty fast. If there is no precisely matched result for the specific query, then the tool will track similar images from the search engines for you. To utilize our tool, there is no need to log in or register. There is no restriction on using the tool as you can perform an unlimited number of searches.


You can do a reverse image search not just on your desktop computers but on your smartphone devices as well. Today the sites are becoming more and more mobile-friendly, which is why people can put these online tools to use anywhere, anytime.


Some photo search engines also allow users to paste the URL of an image to search for it. Once you've provided the photo or its URL, the photo match tool will scan the internet for results matching it. Hence, search using an image allows you to quickly access relative information about a given photo, including information about the objects and people in it along with their corresponding metadata.


Well, despite the technicalities involved, this concept is pretty easy to understand: whereas in the standard search you type in keywords to find text-based content, to search by image, you only have to upload the photo you want to search for. And that brings us to an important point:


This could be the people, places, animals, products, etc. in the photo. By uploading a search query to your reverse image search engine, you will be able to identify those objects as the engine will return information about them.


So if, for example, you want to see different styles or colors of the same exact object in a photo, you can simply reverse search the photo to see that. So no worries regarding how do you search images to see creative commons if it is anywhere else on the Internet?


If it turns out to be that you are the original owner of a photo, you can simply perform an image search on mobile into the search bar and find out who is using your artwork without returning the credit to you.


Just as finding plagiarized photos, you can search for an image for your personal photos to see if anyone is using them on a fake social media account use Facebook image search. This protects your reputation and personal identity.


This image finder tool is free to use and it is built to deliver the most up-to-date results, including images and their relevant information. The tool integrates with the top three and biggest search engines in the world, which are Google, Bing, and Yandex. When you search for images, this expertly designed tool pulls all the possible pictures related information from these three search engines to present to you, which makes this free search tool highly trustworthy. In fact, the tool is already being not only used by hundreds of thousands of people around the globe but also loved.


The DupliChecker photo Search app (tool) is built for everyone if you want to know, How do you do an image search on the iPhone? from all walks of life. Whether you want to use it for personal, professional, or commercial purposes, you are welcome to do so. We only ask that you use it for legitimate reasons only. Below is a list of our most popular user group.


Our platform requires that you upload an image or enter an image path (URL) of an image to be able to get the results you want. We will like to state here that after you have provided the image for search purposes, we do NOT store or share your photos. Which means that your images are completely secure? We respect your privacy and will never violate it.


I am looking for a program that I can run on windows that can check for duplicate (or larger/smaller resolution of same image) images (possibly different extensions), and If there are exact duplicates replace all with a hard-link. Otherwise if they are same image, but different resolution, I need to keep the largest one, and replace the rest with hard-links.


I know that there are lots of programs to determine duplicate files and create hard-links. However I am unaware of any that can find duplicate images with different resolution and create hard-links based on a given parameter.


I ended up googling a bit more (after failing with python for a while), and found a program called AllDup (freeware). It has successfully satisfied all of my needs, and more. There only seems to be two downsides with it. The first being a moderately complicated gui (Since its not just for images, there is a large number of unrelated options). The second is, AllDup is not fully automated. However its as close as I need.


Not C# but python + Pillow can quickly walk the directory tree, extracting the image information and some sort of finger print such as the MD5 of a fixed, reduced, resolution of each image to locate exact duplicates that differ only in scale. It can handle most image file types, other than raw, and you could also specify that it compare files that are in different formats - possibly with a preference of which to keep in the event of a duplicate of similar dimensions.


It can also handle file deletion and hard link creation for you but before deletion I would suggest a compare of the pairs of images, scaled to the lower of the two resolutions, to make sure that they are duplicates.


Duplicate Photo Cleaner is the only software for managing duplicate and similar photos on Windows, Mac, and mobile phones you'll ever need. It's different from other duplicate photo finders because it compares photos just like a human would and detects similarities the smart way. With Duplicate Photo Cleaner, it's easy to find photos of the same subject, resized pictures, edited images, and more. It's also great for removing duplicate photos taken using your phone's Burst mode!


Duplicate Photo Cleaner is the only image similarity finder that works equally well on Windows and Mac. No matter which operating system you use, you can be absolutely sure that no duplicate photos will go unnoticed even if they hide in Adobe Lightroom, Mac Photos, or on your other connected device with a drive letter.

3a8082e126
Reply all
Reply to author
Forward
0 new messages