I don't mean to dread up ancient history; however this post addressed a question I had, and I wanted to add a few keywords to it:
I had a question if Unicode, UTF 8/16/etc, double byte characters, non-English characters were compatible with any/all algorithms in Dedupe. It sounds like the answer is YES! I would expect that things like the affine-gap might be applicable but some algorithms may fail miserably (n-char prefix maybe??).
I suppose there may be some special considerations around addresses and other fields that may not use US/English conventions.
Cheers!