Hello everyone,
I would like to clean up my corpus a bit by removing duplicate files. Could anyone recommend some software that could do the job?
txt only would be sufficient. PDF also would be great. I want to automatically find duplicate academic journal articles that I and others have downloaded.
I looked on the internet, but there are too many options, and I don't like downloading utility type software from untrusted sources. Most programs also seem to be designed to find duplicate lines of text in a program.
Adam