Case Sensitivity and Unicode Normalization in APIs?

40 views
Skip to first unread message

Chris Mullins

unread,
Jan 27, 2015, 2:07:09 PM1/27/15
to api-...@googlegroups.com
Hi Guys,

New group member here, and I figured I would ask a question by means of introduction. :-) 

How do folks tend to think about case sensitivity and Unicode in their API design? 


vs


My leaning is towards something fairly basic, but I wanted to get some outside opinions. My thoughts at this point:

Customer data should be case sensitive, everything else should be case insensitive.  

This means the value below would be case sensitive with "Chris" as the name:

(easy and obvious via query parameters)  

(not as easy or obvious in the path)

Now, to complicate things, Unicode comes into play. My leaning is towards saying "All string comparisons MUST be done using only Unicode Normalization Form KC". I've not seen anyone else do this in their APIs, which has me questioning if it's the right direction. How (if?) are folks dealing with Unicode, and especially the normalization aspects of Unicode? The craziness of dealing with non-normalized Unicode strings seems a problem that must be avoided. Having two names that look the same, but compare differently, seems like it would be a recipe for disaster. For example the string "fi" can be represented either by the characters "f" and "i" (U+0066 U+0069) or by the ligature "fi" (U+FB01). Similar considerations apply for many other combinations of characters for which Unicode defines ligatures, and the mess that will make of my database, and the complications to callers doing searches, is just scary. 

Cheers,
Chris

sune jakobsson

unread,
Jan 28, 2015, 2:08:52 AM1/28/15
to api-...@googlegroups.com
You need to distinguish between "match" and search, aka are you testing for equal, or are you looking for something your local language tolerates/uses different spellings.

BR Sune

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/d/optout.

Chris Mullins

unread,
Jan 28, 2015, 7:29:54 PM1/28/15
to api-...@googlegroups.com
Hi Sune,

API's, must deal with the many aspects of localization and the complexity that comes with. That's only possible though is proper platform features (such as Normalization) are enforced. I've not heard anyone argue against Normalization Form KC, but I'm never sure if that's simply because people don't know what it is and implies, or if it's because the answer is an unequivocal yes. 

The discussions I've had regards to "Should API's be case sensitive?" have been mixed, and I'm hoping to gain some additional insight into the problem from folks here. 

Cheers,
Chris
Reply all
Reply to author
Forward
0 new messages