Greetingsis there an app that search and (eventually) delete duplicate files?
Mainly in pictures (so with picture related file extension), however, an app that search all duplicate files in storage would be very handy.
thanks @MeiRos Meiros,
I know a couple of those.
I was hoping for some app more integrated to NC application.
Some that allows me to decide what sort of duplicate file to search for (pictures/photo type or other types).
Also that allows me (in an easy way) to choose the files to delete (choosing the correct folder where the files are).
Hi Meonkeys,
I gave Duplicatefinder a try but I dont succeed to getting it to work. It only turn the circle around for ever!
Do you have any tips?
I tried the occ command but I only get the > prompt after sending the command with options (user and path)
PS
Sorry for late response but I gave up for a while!!
File an issue or check their github repo. Using search or reading documentation or just browsing the app store could have saved you a ton of time on this! I see it received an update for nc22 as of yesterday.
There is also simple duplicate tagger, you can decide if you need to keep it or delete it. GitHub - GAS85/nextcloud_scripts: This a collection of scripts that makes my nextcloud administrator life easy by automation and better reporting. And I would like to share it with U.
This is not really a space issue, I have lots, but I have nearly 1000 ebooks that I sometimes file in multiple places. (No, I don't use the dreaded Kindle thanks.) So I may download it in a "holiday reading" folder, then I may also download in the "Publishers folder" and at some point I may have reviewed it and moved it to the "author" folder and then maybe I have forgotten that I downloaded it in the "lost interest in this" folder. So there could be anywhere from 1-4 version of a book floating around.
If I search specifically by name it shows me all of the versions and then I can delete the ones that are in the "wrong" place, but I don't have the time or inclination (despite my love of procrastination by rearranging files) to search 1000 titles. So is there a simple Dropbox search function for duplicates, or do I have to download some kind of app that will do the job. Can you download file titles and locations to a spreadsheet? At least that would let me filter/sort and then go back and find the ones I want to delete.
If that's required, I don't want one that automatically deletes duplicates, I want to pick which ones are required, but if you have any recommendations of on that is easy to use and pretty much straight forward (nothing fancy required, just find the duplicates (or those that may have a similar name since sometimes they get saved with slightly different titles), I'd appreciate the help.
Awesome Duplicate Photo Finder is a free powerful tool, that helps you to find and remove duplicate photos on your computer. With this app you can easily clean up your photo collection from duplicates or even similar images.
This program is very easy to use. You can do all you need with duplicate photos in just a couple of mouse clicks. Awesome Duplicate Photo Finder is able to compare resized pictures or even pictures with corrected colors (black and white photos etc.). It supports all major image types: JPG, BMP, GIF, PNG, TIFF, CR2 (Canon RAW).
To start playing just drag some folders from the Windows Explorer into the program's window and click "Start" button, and have fun!
There are already utilities which will compare files to see if they're identical, but they don't cope well with Lightroom images. BecauseLightroom can write metadata back into the files, identical photos can be different on disk because of something as simple as a 'last edit time'value. They also don't help where one image has been changed, for example scaled, but is still essentially the same image.
By using the EXIF data, this plugin finds all photos which came from the same source; it compares the camera model, serial number, lens, ISOspeed etc. It can also filter by keywords and filename to narrow results down further.
Once you've identified all the duplicates, using custom matching rules if need be, you can safely and easily choose which to delete. DON'T trust anything that offers to automatically delete all your duplicates, you WILL lose important photos! This plugin is safe. It groups all combinations of duplicates together so that you can choose individually, or do a mass-delete safely, guaranteeing to leave at least one copy of each duplicate set. See here for more information.
Over time your Lightroom catalog can accumulate damaged records. These usually cause no problems - unless you try to access that photo in Lightroom, whenstrange things can start to happen. If you're seeing this, you need to try the Duplicate Finder's "Repair Catalog Damage" function.
Since it accesses your entire catalog at once, the Duplicate Finder is - unfortunately - very good at identifying the faulty records in a damaged catalog.It will now quickly and simply identify exactly which records are faulty and isolate them for you, allowing you to easily repair many types of catalog damage by removing and re-importing the affected photos. See over at Learn to Lightroom for more details.
The duplicate finder is one of the most invaluable tools in Jabref. It would be great if the user has more control on the tool; for a through searching of duplicates, just like Bookends. Searching by just the title; author, year, and the combination of them would give a power for the user to clean up duplicates.
You can help us nonetheless by reporting false-positives (i.e. different entries which JabRef marks as duplicates) or duplicates which JabRef does not recoginze. Then we can use these cases to fine-tune the algorithm to find duplicates.
So far, there is no false positive. That is, there is minimal chance that Jabref detects non-duplicates as duplicate. Given the Merge option, the false positives are not really much a worry. And, the comparison feature at that point is amazing. I totally love that part.
This feature does not appear to have been addressed in 4 years and with the increase focus on systematic literature reviews I have seen over that period I think it is something that needs to be revived and worked on.
From my observation it appears the duplicate detection algorithm tries to be too clever for its own good, using some form of overall similarity measure with a few exceptions or weightings on specific data fields. While this minimizes false positives, it leaves many potential duplicates completely unidentified. During a systematic literature review, false positives are the least of my worries as they are minimal as compared to the number of actual duplicates that occur, so I think the feature would better serve a more complete identification of potential duplicates.
Some databases create BibTeX entries differently to others, in which case duplicates are completely missed because they are not of the same type, even when most other details match. Springer Link, for example, assigns conference papers as InCollection as opposed to InProceedings, so matching against other databases does not identify the duplicates correctly. About the only information that seems to override this is having an exactly matching DOI, leading to the next problem.
It would be nice if everyone output DOI identifiers in exactly the same way, unfortunately, this is not the case. Some databases output the DOI as the URL (sometimes in the url field and not the doi field even) rather than the plain identifier by itself. Since there is no quality tool at the moment to normalise the DOIs, this causes problems with matching them. Also, some databases will escape LaTeX characters in the DOIs, such as underscores (\_ instead of just _) which also prevents the DOIs from being matched. Ideally, the following should all be considered matching:
Some databases export a bunch of their own metadata along with the core BibTeX fields for each entry. This appears to interfere with the duplicate detection as it increases the degree of dissimilarity, especially if the entry it is compared to is very small. Web of Science, for example, is particularly bad for creating bloated entries with unnecessary details. It is even worse if two databases export info using the same field name but in different ways, so the content is completely different. For example, one using the notes field to include citation details and another using the notes field for a comment like online first.
At least in the context of a systematic literature review, it might be worth having the duplicate detection put its emphasis on a few key fields (such as author, title, date, doi, and url) irrespective of their entry type. I think this would maximize candidate detection, which is more important in this context than minimizing false positives. If using general similarity measures on those fields, the thresholds might need to be set appropriately high, but I would assume that, after appropriate normalisation, those fields would be almost exact matches in duplicate entries.
It would be very nice to have a comprehensive list of found duplicates, like a duplicate finder report. Then I could check or uncheck the duplicates I want to process at all and the results would be more transparent.
If I select a folder for the "Search directorys" that includes a hashtag at the front (first character) it selects the next directory above it that does not start with a hashtag and there seems to be no way around it... would be great if that could be fixed or I'll just have to change my directory structure. ^^
3a8082e126