Hi Guys,
New group member here, and I figured I would ask a question by means of introduction. :-)
How do folks tend to think about case sensitivity and Unicode in their API design?
vs
My leaning is towards something fairly basic, but I wanted to get some outside opinions. My thoughts at this point:
Customer data should be case sensitive, everything else should be case insensitive.
This means the value below would be case sensitive with "Chris" as the name:
(easy and obvious via query parameters)
(not as easy or obvious in the path)
Now, to complicate things, Unicode comes into play. My leaning is towards saying "All string comparisons MUST be done using only Unicode Normalization Form KC". I've not seen anyone else do this in their APIs, which has me questioning if it's the right direction. How (if?) are folks dealing with Unicode, and especially the normalization aspects of Unicode? The craziness of dealing with non-normalized Unicode strings seems a problem that must be avoided. Having two names that look the same, but compare differently, seems like it would be a recipe for disaster. For example the string "fi" can be represented either by the characters "f" and "i" (U+0066 U+0069) or by the ligature "fi" (U+FB01). Similar considerations apply for many other combinations of characters for which Unicode defines ligatures, and the mess that will make of my database, and the complications to callers doing searches, is just scary.
Cheers,
Chris