Happy New Year, Chris (and all cascading folks)!
We are looking more into making binary comparators easier. The current API is pretty good: if we pass you a Comparator that implements StreamComparator, we are good to go.
But if we look at how serializers are set up, as far as I know, the only way to do that is via Hadoop's reflection based mechanism (which dispatches on .getClass of the data). Secondly, we have two parts of the code that need to know about how data is serialized located far from each other (one in the hadoop config, the other in the Fields Comparators).
It occurs to me it would be nice if I pass a Comparator that implements cascading.tuple.Serializer (for instance), that would be nice. In one place I have the Comparator, StreamComparator and Serializer for my data type.
Next, this gives you a nice side benefit in terms of performance: since I am statically telling you how to serialize a particular field, you don't need to write the type token into each record, which can give significant savings, especially for wide records of small data types.
There are significant issues to the .getClass based dispatch that I don't want to bother to go into in this email which the above solution also side steps.
The nice part is this proposal would be totally API compatible since users would opt into it by passing a certain type of Comparator.
We could sketch an implementation of this, but I know you prefer to write the code for cascading so I thought we should discuss before code is written.
Best,
--