How to find duplicate PDF, txt. or doc files?

127 views
Skip to first unread message

adam turner

unread,
Jan 16, 2017, 2:02:52 AM1/16/17
to AntConc-Discussion

Hello everyone,

I would like to clean up my corpus a bit by removing duplicate files. Could anyone recommend some software that could do the job?
txt only would be sufficient. PDF also would be great. I want to automatically find duplicate academic journal articles that I and others have downloaded. 

I looked on the internet, but there are too many options, and I don't like downloading utility type software from untrusted sources. Most programs also seem to be designed to find duplicate lines of text in a program. 

Adam 

Laurence Anthony

unread,
Jan 16, 2017, 7:10:09 AM1/16/17
to ant...@googlegroups.com
Hi,

I've used the following tool in the past:


As you say though, I cannot say for certain if the software can be trusted or not. You would have to use it at your own risk.

Laurence.

--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+unsubscribe@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at https://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

JFlorian

unread,
Jan 16, 2017, 8:41:55 AM1/16/17
to ant...@googlegroups.com
Perhaps I'm not considering a bigger picture, but... why not just use Windows search?
Reply all
Reply to author
Forward
0 new messages