proposal: case-insensitive enum

1,578 views
Skip to first unread message

Ryan Pedela

unread,
Sep 27, 2013, 2:23:06 PM9/27/13
to json-...@googlegroups.com
First of all, I want to say how impressed I am by the current standard. My particular validation use case is quite complicated, and so far I have only experienced one limitation: case-insensitive enum.

The background can be found in this issue for the Node.js tv4 module.
https://github.com/geraintluff/tv4/issues/70

As I mention in the issue, I am working on a REST API for PostgreSQL and I want to support using case-insensitive SQL keywords just like in raw SQL. Let me give you a more specific example. Let's say you have a simple SELECT statement with a WHERE clause.

SELECT * FROM my_table WHERE my_col IN (1, 2, 3);

The JSON for this SQL query is below. $ means this is something special in a SQL expression such as a SQL operator like "IN". In this case, I would like to support "$IN" and "$in" since it is convention when writing SQL to use upper-case for keywords, but it is not required.

{
  "select": "*",
  "from": "my_table",
  "where": {
    "$expr": [ { "$column": "my_col" }, "$in", [ 1, 2, 3 ] ]
  }
}

In my opinion, the simplest solution is just to add a "caseInsensitive" flag or something similar to the standard. There are other use cases for a case-insensitive enum too. As geraintluff mentions in the Github issue, this could also be applied to "pattern" for case-insensitive pattern matching.



Geraint

unread,
Sep 27, 2013, 4:09:27 PM9/27/13
to json-...@googlegroups.com
The biggest challenge I can see with something like this is locales.

The classic case is the "Turkish i" situation. In English, lower-case "i" (U+0069) is case-insensitively equivalent to "I" (U+0049). However, in Turkic languages, the lower-case dotted "i" (U+0069) is case-insensitively equivalent to a upper-case dotted "İ" (U+0130), and the upper-case undotted "I" (U+0049) is equivalent to the lower-case undotted "ı" (U+0131). Lithuanian also has some interesting behaviour with "i".

The upshot is that you can't definitely say whether "getir" is case-insensitively equivalent to "GETIR" unless you know what language you're working in.

There's also the German "ß", which when capitalised becomes "SS". This means that "MOSSE" is case-insensitively equivalent to both "Moße" and "Mosse" - but are those two case-insensitively equivalent to each other?

For your particular case, these aren't a problem - I think it's probably safe to say that the SQL keywords "INSERT" and "IN" are in English, and therefore unambiguously equivalent to "insert" and "in". However, it does present a possible challenge when defining a keyword like this in the standard.

Right now, my guess is you're using JavaScript's String.prototype.toLowerCase(), which is described as "locale-independent". What that actually means, though is that it's locale-insensitive. Basically, implementations pick a "generic" locale (I'll give you one guess what that is) which is perceived to work OK for most cases.

So I'm not opposed to such a keyword in principle, it's just... possibly quite complicated to specify in practice.

Geraint

da...@fogmine.com

unread,
Oct 21, 2013, 11:26:48 PM10/21/13
to json-...@googlegroups.com
+1 Would like case-insensitive option as well --for enums in my current case but in general for patterns would be awesome!

Andrew Todd

unread,
Oct 22, 2013, 11:45:18 AM10/22/13
to json-...@googlegroups.com
I agree with Geraint that the implementation would get very complicated, very quickly, and possibly difficult to implement cross-language in validators. More to the point, if you are defining a public interface, you should try to avoid confusion and unpredictability as much as possible: and case-insensitivity decreases predictability.

It's not entirely clear to me why you would want to try to shoehorn a relational language into a "REST API," but if you must, and you are convinced you should break out all keywords into property names, just tell people to use all-caps or all lower case. If they're incapable of understanding that, you should probably assume that they're incapable of using your API without messing things up.

Ryan Pedela

unread,
Oct 22, 2013, 12:07:46 PM10/22/13
to json-...@googlegroups.com
Seems like those rules could just be put into the spec as they seem to be IF statements. Granted I am sure there are many rules, but doesn't seem that crazy to me. Is it necessary to be 100% on day one? Why not progressively add rules as needed? Maybe this could be a shortcut: toLocaleLowerCase(). Unfortunately it is based on the OS settings, but maybe the source code would be helpful with the rules.

The REST API is just my most recent example use-case. There have been several times where I needed case-insensitive enum throughout my career.

The question of why we are buidling a REST API for PostgreSQL is not relevant to this discussion or JSON schema. You can contact me directly if anyone is interested.

Geraint

unread,
Oct 22, 2013, 12:20:00 PM10/22/13
to json-...@googlegroups.com
Interesting point, actually - you want "IN" to be case-insensitive because it's like that in SQL.

However, SELECT, WHERE and FROM are also case-insensitive.  Do you need your JSON representation to allow a "Select" property instead of "select" (but not both)?

Personally, I think the cleanest solution here is to pick a case for each keyword and stick with it.  (My feeling is that lower-case is more JSONic). Clients can use toLocaleLowerCase() before they send the data off.

Now, your actual data parser might be more easygoing than the spec, but I think that often having a strict theoretical standard actually makes things easier for people using your API.

Ryan Pedela

unread,
Oct 22, 2013, 12:33:14 PM10/22/13
to json-...@googlegroups.com
"Now, your actual data parser might be more easygoing than the spec, but I think that often having a strict theoretical standard actually makes things easier for people using your API."

Exactly! Why I want case-insensitive enum. :)



Thanks,

Ryan Pedela
Datalanche CEO, co-founder
www.datalanche.com
rpe...@datalanche.com
513-571-6837


--
You received this message because you are subscribed to a topic in the Google Groups "JSON Schema" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/json-schema/V-ZAJfV7mt8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to json-schema...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Geraint

unread,
Oct 22, 2013, 12:42:47 PM10/22/13
to json-...@googlegroups.com
On Tuesday, October 22, 2013 5:33:14 PM UTC+1, Ryan Pedela wrote:
"Now, your actual data parser might be more easygoing than the spec, but I think that often having a strict theoretical standard actually makes things easier for people using your API."

Exactly! Why I want case-insensitive enum. :)


Ah - by "strict", I meant "choose a case and stick with it".  So, it's stricter in the sense that it only allows one way to specify the same data - if a query is the same, then the JSON representation will also be the same, even if transcoded by different clients.

I realise this doesn't really solve your problem as such - but case-insensitivity is actually quite a burden to implement (and is a minefield of subtle bugs), and I'm not sure that it's actually required here at all.

After all, "select", "from" and "where" are fixed-case in your data - why should "in" be made case-sensitive?

Ryan Pedela

unread,
Oct 23, 2013, 10:56:12 AM10/23/13
to json-...@googlegroups.com
I think these are the rules (easily parsed): http://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
This talks about case mapping: http://www.unicode.org/reports/tr21/

Any other problems with implementation?

Thanks,

Ryan Pedela
Datalanche CEO, co-founder
www.datalanche.com
rpe...@datalanche.com
513-571-6837


--

Geraint

unread,
Oct 23, 2013, 11:33:08 AM10/23/13
to json-...@googlegroups.com
On Wednesday, October 23, 2013 3:56:12 PM UTC+1, Ryan Pedela wrote:
I think these are the rules (easily parsed): http://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
This talks about case mapping: http://www.unicode.org/reports/tr21/

Any other problems with implementation?


Yes, that is the appropriate document to be looking at to perform case-transformations.

The issue I was attempting to flag was not that a solution didn't exist - instead, my concerns are that in environments where specifying a locale for case-transformations is often unsupported (such as JavaScript), demanding Unicode-case-transformation support is not completely trivial.  (Additionally, we have to decide on behaviour for things like "Moße"/"Mosse" equivalence, but that's a lesser concern.)

So, like many keyword suggestions, it's about trying to cover bases while keeping the standard (and therefore implementations) as simple as possible.  Personally, I feel the balance lies with this being too complicated - and I'm honestly still confused why this case-sensitivity is needed, when other keywords in your format seem fine being case-sensitive.

You mentioned there were other situations - would you be able to share some of these as examples?  It might help me understand why this is needed.

Ryan Pedela

unread,
Oct 23, 2013, 6:55:36 PM10/23/13
to json-...@googlegroups.com
Obviously my example was a bad one since it incurred more discussion about the REST API design (which would literally take hours to fully explain) rather than if there should be case-insensitive enum support. Whether you agree that I need case-insensitivity or not for the REST API is besides the point. Although not every day, I have needed case-insensitivity in a variety of cases including enum values. Isn't one of the goals of JSON Schema to validate strings? Well sometimes you just don't care about the case when validating a string. Can I think of a specific example other than my current one? No, but the fact that every major language has toLower() and toUpper() shows case-insensitivity is needed.

Let me give you an analogy. I cannot think of any specific example where I need the function cosh(), but if I was building a standardized math library it would be in there. Why? Well because it is a math function that I have used before, will likely use again, and others will use it too. But the use is so infrequent (at least for me) that I cannot think of a specific example when I used it.

If I was proposing something really strange, I can understand wanting a specific example. But not in this case.

So far the main concern has been the implementation issues for locales. Although not obvious to me what is so hard, I have never done it before either so I will defer. I generally think one should optimize for the 80% use case so a compromise could be to make locale-dependent case-insensitivity an optional part of the standard and point people to resources like the document above to help them implement it.

Chris Miles

unread,
Oct 24, 2013, 5:49:58 AM10/24/13
to json-...@googlegroups.com
Leaving aside whether case insensitivity is 'a good thing'(tm) the thought occurs that we have Regular Expressions for pattern and patternProperties.

Regular expressions have several ways to express case insensitivity.

Could we leverage regular expressions in the definition of enums?
Would a pattern keyword do the required validation in this case?

Chris
--
You received this message because you are subscribed to the Google Groups "JSON Schema" group.
To unsubscribe from this group and stop receiving emails from it, send an email to json-schema...@googlegroups.com.

Geraint

unread,
Oct 24, 2013, 6:00:44 AM10/24/13
to json-...@googlegroups.com
On Thursday, October 24, 2013 10:49:58 AM UTC+1, Chris Miles wrote:
Regular expressions have several ways to express case insensitivity.


This is actually the perspective that interests me most - often, regular expressions support 'flags' including case-sensitivity.  JSON Schema currently doesn't support these, so case-sensitivity has to be part of the expression itself (e.g. [Hh][Ii], that kind of thing).

I can imagine something for "pattern" (like "patternFlags", or an object-value for "pattern"), but specifying flags for "patternProperties" is a harder problem.  Might be useful, though - I myself have wished for "patternFlags" occasionally.

Ryan Pedela

unread,
Oct 24, 2013, 6:30:35 AM10/24/13
to json-...@googlegroups.com
{
  type: "string",
  pattern: "*",
  patternFlags: {
    insensitive: true
  }
}

Or something similar would certainly work for case-insensitive string validation, but doesn't it still have the locale problem?

Geraint

unread,
Oct 24, 2013, 8:13:24 AM10/24/13
to json-...@googlegroups.com

It most definitely does.  In fact, it's probably even worse, because if implementing your own proper Unicode case-transformations is a slight difficulty, getting Unicode support for regular expressions sounds like a nightmare.

I would personally prefer to not have the standard just glosses over it and say "Beware of encoding issues!  Try not to use any characters that might cause trouble.".  The thing is, that looks like it's the only possible way to have any kind of case-insensitivity anywhere.
Reply all
Reply to author
Forward
0 new messages