Specifying word counts with json-schema?

55 views
Skip to first unread message

Rupert Smith

unread,
Aug 14, 2015, 5:47:38 AM8/14/15
to JSON Schema
Hi,

I have some validation rules that place a restriction on word counts. This can be achieved with a regular expression - one that matches words and does so up to a maximum count. For example, this regex "^(?:\\b\\w+\\b[\\s\\r\\n]*){1,50}$" matches up to 50 words (but needs some fine tuning, it does not allow much punctuation).

The trouble with the regex approach, is that UI code has no way of understanding what the intention of the regex is, so it is hard to generate a good error message for this. Ideally the error message would say "must enter 50 words or less".

Word count might be a tricky thing to standardize, what punctuation is allowed and what about different languages, and so on.

Would it make sense for json-schema to have a word count validation keyword?

Would it make sense for json-schema to explicitly define a user extensible mechanism for adding new validations that are not part of its base specification? This would mean for example, that if I add my own "word-count" validation keyword, I could reasonably expect tools built around json-schema to not fail when encountering a non-standard keyword, and to provide extension points to invoke a call-back in my code in order to apply the validation check for my definition of word-count.

Rupert

p.s. As a workaround we are hard-coding some of our error messages in the UI. For the word-count case, we just hard code an error message giving the correct validation for that field, for example "Must be 50 words or less and a maximum of 1000 characters".

Rupert Smith

unread,
Aug 14, 2015, 6:11:42 AM8/14/15
to JSON Schema
On Friday, August 14, 2015 at 10:47:38 AM UTC+1, Rupert Smith wrote:
Hi,

The trouble with the regex approach, is that UI code has no way of understanding what the intention of the regex is, so it is hard to generate a good error message for this. Ideally the error message would say "must enter 50 words or less".

Word count might be a tricky thing to standardize, what punctuation is allowed and what about different languages, and so on.

Would it make sense for json-schema to have a word count validation keyword?

Would it make sense for json-schema to explicitly define a user extensible mechanism for adding new validations that are not part of its base specification? This would mean for example, that if I add my own "word-count" validation keyword, I could reasonably expect tools built around json-schema to not fail when encountering a non-standard keyword, and to provide extension points to invoke a call-back in my code in order to apply the validation check for my definition of word-count.

Another possibility is to add error message definitions into the json-schema against a "message" property. 


shows how this is done in Java JSR303. The idea would be that I could define a regex for the word-count, but override the default message to one which is worded to describe the error as being word-count related rather than the generic "does not match pattern" message.

Rupert

 

Jason Desrosiers

unread,
Aug 17, 2015, 1:36:12 AM8/17/15
to JSON Schema
Yes, a `wordCount` keyword could be tricky to specify.  Maybe you would be interested in getting the ball rolling with a proposal.  https://github.com/json-schema/json-schema/wiki/v5-Proposals


You can count on validator implementations to ignore keywords that they don't understand, but I don't think many validator implementations provide hooks for custom keywords.  I would also like to see implementations provide this feature.

As for the error `message` keyword, I saw a proposal like this somewhere recently.  I tried looking for it again, but can't find it now.  My intuition is telling me this isn't the way to go, but I can't articulate why.  There might not be anything wrong with it, but I've come to trust my intuition in these matters.

Rupert Smith

unread,
Aug 20, 2015, 11:16:22 AM8/20/15
to JSON Schema
On Monday, August 17, 2015 at 6:36:12 AM UTC+1, Jason Desrosiers wrote:
As for the error `message` keyword, I saw a proposal like this somewhere recently.  I tried looking for it again, but can't find it now.  My intuition is telling me this isn't the way to go, but I can't articulate why.  There might not be anything wrong with it, but I've come to trust my intuition in these matters.

I think your intuition might be the same as mine - the purpose of the schema is to define the structure of valid data, and not to render error messages when comparing data against a schema. The reporting of error messages might be the responsibility of some validation tool. So I think I would agree with your intuition.
Reply all
Reply to author
Forward
0 new messages