So, it seems apparent that duke has been basically abandoned, there are no updates after 2016 and info about it is pretty sparse. I needed this functionality but there are numerous bugs that made it difficult to achieve what I needed. In the 1.2 version I could build the index (it took about 12 hours) and would then run, but the --noreindex would produice an error. I could not reindex every time. I noticed that duke 1.3 was in the source repo and there were build instructions so I tried that but it failed to build with a bunch of missing libraries that I could not locate. I found a built version here:
https://oss.sonatype.org/content/repositories/snapshots/no/priv/garshol/duke/duke/1.3-SNAPSHOT/ this worked with the --noreindex, but would not actually build the index without running out of memory. So I was planning to build the index with 1.2 then match against it with 1.3.
If you still want to use Duke at this point, you can probably get it to work by going through all that.
I wondered why this was the only real open source duplicator I could find. Also I saw that some of the config files turned off fuzzy matching in lucene... so I went to look into lucene. Turns out that lucene and solr basically have all the functionality of duke and are well maintained and documented. Solr is a search engine based on lucene that does "fuzzy matching", but if you load your data into that and then search against it it will give you matches (duplicates). It has all the same matching engines as duke plus a whole lot more.
So I expect the reason that everyone lost interest in duke is because they figured this out and duke is now redundant. Unfortunately it took me a week to realize this, hopefully I can save anyone that reads this some time.