Exception when serializing a dictionary with an empty-string key

263 views
Skip to first unread message

zerbedeu

unread,
Mar 2, 2012, 2:58:49 PM3/2/12
to rav...@googlegroups.com
Using RavenDB v701 (either embedded and client), I get an unexpected exception when storing a Dictionary<string, string> which contains an entry with the empty-string as the key.

The sample code below reproduces the issue, and also shows that Newtonsoft.Json serializes it correctly.

using System.Collections.Generic;
using System.Diagnostics;
using Newtonsoft.Json;
using Raven.Client.Embedded;

namespace RavenEmptyStringKey {
    class Sample { public Dictionary<string, string> Values { get; set; } }

    class Program {
        static void Main() {
            var sample = new Sample {Values = new Dictionary<string, string> {{"", "irrelevant"}}};
            var json = JsonConvert.SerializeObject(sample);
            var loaded = JsonConvert.DeserializeObject<Sample>(json);

            // Newtonsoft.Json has no issue serializing this object
            Debug.Assert(loaded.Values.Count == 1 && loaded.Values[""] == "irrelevant");

            var documentStore = new EmbeddableDocumentStore();
            documentStore.Initialize();
            using (var session = documentStore.OpenSession()) {
                session.Store(sample);
                // Unexpectedly throws; likely cause: the empty string key fails the test
                // https://github.com/ravendb/ravendb/blob/master/Raven.Abstractions/Json/Linq/RavenJTokenWriter.cs#L128
                session.SaveChanges();
            }
        }
    }
}

Oren Eini (Ayende Rahien)

unread,
Mar 2, 2012, 3:07:45 PM3/2/12
to rav...@googlegroups.com
RavenDB serializes to JSON, and json doesn't permit empty keys in a dictionary.

zerbedeu

unread,
Mar 3, 2012, 7:47:17 AM3/3/12
to rav...@googlegroups.com
Thank you for your quick reply! I was not aware of such a limitation, as you probably noticed from the first part of code, where I round-trip such a dictionary to JSON and back using Newtonsoft.Json; this works without issue.

Furthermore, looking at the specs from http://www.json.org/, it seems to me to be explicitly permitted by the spec:
  • An object is an unordered set of name/value pairs, each of the form: string : value
  • Later on: A string is a sequence of zero or more Unicode characters [...] 

Since Newtonsoft.JSON (I believe it is the underlying JSON serialization library in RavenDB) allows such cases without issues, I think this is an unnecessary limitation of RavenDB and I would like to understand the reason behind it.


On Friday, March 2, 2012 9:07:45 PM UTC+1, Oren Eini wrote:
RavenDB serializes to JSON, and json doesn't permit empty keys in a dictionary.

Kyle Hamilton

unread,
Mar 3, 2012, 1:00:37 AM3/3/12
to rav...@googlegroups.com
There is no reference to this limitation in RFC 4627, the definitive JSON spec.

-Kyle H

Quoted (from RFC4627):

2.2. Objects

An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.

object = begin-object [ member *( value-separator member ) ]
end-object

member = string name-separator value

[...]

2.5. Strings

The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".

Alternatively, there are two-character sequence escape
representations of some popular characters. So, for example, a
string containing only a single reverse solidus character may be
represented more compactly as "\\".

To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

string = quotation-mark *char quotation-mark

char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX

escape = %x5C ; \

quotation-mark = %x22 ; "

unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

On Fri, Mar 2, 2012 at 12:07 PM, Oren Eini (Ayende Rahien) <aye...@ayende.com> wrote:
> RavenDB serializes to JSON, and json doesn't permit empty keys in a
> dictionary.
>
>

zerbedeu

unread,
Mar 3, 2012, 9:45:35 AM3/3/12
to rav...@googlegroups.com
Thank you, Kyle, for finding a more authoritative source! Section 1 of http://www.ietf.org/rfc/rfc4627 seems to also indicate that there is no such limitation:

   A string is a sequence of zero or more Unicode characters [UNICODE].

   An object is an unordered collection of zero or more name/value
   pairs, where a name is a string and a value is a string, number,
   boolean, null, object, or array.

njy

unread,
Mar 3, 2012, 10:06:43 AM3/3/12
to rav...@googlegroups.com
There are some edge caseswhere a valid JSON string is not actually usable inside a JS engine or something like that, and this seems to be one of those cases.
You can express the thing in a valid JSON representation, but to then *manipulate* the damn thing, it is needed a JS engine, right? And that is where lies the problem: the engine parsing the JSON fragment and building the in-memory object (a JS VM inside a browser or Json.Net or whatever) cannot do it.
That's the problem: following the specs you can declare something that will never be usable by a JS engine.

That makes sense? Or am i missing something?


zerbedeu

unread,
Mar 3, 2012, 2:55:55 PM3/3/12
to rav...@googlegroups.com
A JS engine has no issue with such a dictionary either (see http://jsfiddle.net/eRDCt/); neither does the official spec, nor the undelying JSON serialization engine that RavenDB uses. Therefore, I believe this is simply a corner-case that is incorrectly rejected by the current RavenDB code.

Oren Eini (Ayende Rahien)

unread,
Mar 4, 2012, 8:55:13 AM3/4/12
to rav...@googlegroups.com
You are correct, will be fixed in the next build.

Oren Eini (Ayende Rahien)

unread,
Nov 15, 2012, 5:38:10 PM11/15/12
to rav...@googlegroups.com
That was fixed quite a while ago

On Thu, Nov 15, 2012 at 8:06 PM, Phil S <pjsa...@gmail.com> wrote:
Is their an ETA on the fix or an issue # tracking this?

Phil S

unread,
Jan 7, 2013, 5:07:55 PM1/7/13
to rav...@googlegroups.com
This bug still appears in tests with build 960 which was the active build at the time of your reply. Is there any documentation of a fix before or since that build?

Oren Eini (Ayende Rahien)

unread,
Jan 7, 2013, 5:08:41 PM1/7/13
to rav...@googlegroups.com
Try the 2.0 build
Reply all
Reply to author
Forward
0 new messages