64-bit integers in JSON

146 views
Skip to first unread message

Steven Jewel

unread,
Mar 28, 2014, 11:33:54 AM3/28/14
to clearsk...@googlegroups.com
Hello everyone,

There are three places where it'd be extremely handy to have 64-bit
integers in JSON:

1. File sizes. We are otherwise limited to 4GB files.

2. Local revision counters. I updated the spec last night that explains
how to work around this, but it'd be better to not have to worry about it.

3. Vector Clocks. We are otherwise limited to only having 4 billion
changes per file, which could hypothetically happen if someone was
trying to sync something that changes often (like a sqlite database
[0]). I can work around this by allowing multiple vector clocks per
record, but it's really messy.

My proposal is that we allow 64-bit integers in our JSON. From what I
can tell, Javascript is the place where 64-bit integers don't work [1],
but the JSON spec doesn't restrict the length of integers. Here are
three proposals:

1. Send 64-bit integers in the JSON stream. We patch jsoncons if
necessary to support this. I like this approach because we're unlikely
to ever have a browser-based javascript implementation [2] of
clearskies, and nodejs has Int64 support via libraries.

{
"size": 15298937712826087
}


2. Send 64-bit integers as strings in the JSON stream. This only uses
two more characters, but it means that JSON implementations don't need
to be patched.

{
"size": "15298937712826087"
}


3. Hex-encode 64-bit integers. This uses less characters (although if
we implement the gzip extension it won't matter anyway).

{
"size": "0x365a4d83d056e7"
}


4. Send 64-bit integers as an array of two 32-bit integers.

{
"size": [3562061, 2211469031]
}


My vote is for #1 since I believe that's the way the industry will go,
but I don't feel very strongly about it. Any of these will work for our
purposes.

Steven

[0]: I realize that it'd not a good idea to sync a sqlite database with
any sync tool that doesn't have OS-provided snapshots, for consistency
reasons, but I'm just trying to think of a case where a file could have
more than 2^32 writes during its lifetime.

[1]: Supposedly ints work up to 2^53 in javascript, which is enough for
9 PB files.

[2]: There are 64-bit int libraries for browsers:
http://docs.closure-library.googlecode.com/git/class_goog_math_Long.html

Pedro Larroy

unread,
Mar 28, 2014, 11:39:44 AM3/28/14
to clearsk...@googlegroups.com
Hi Jewel.

I checked jsoncons and it supports 64 bit integers fine. I also support #1 for file size.
For the vector clocks and the share revision number I'm using arbitrary precision integers (gmp) encoded as a string in base10. I think this way is 100% correct as there is no overflow.

And in dynamic languages arbitrary precision integers is built-in. So for the version fields I support #2

Pedro.




--
You received this message because you are subscribed to the Google Groups "ClearSkies Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clearskies-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pedro Larroy Tovar   |    http://pedro.larroy.com/

Steven Jewel

unread,
Mar 28, 2014, 11:55:43 AM3/28/14
to clearsk...@googlegroups.com
On 03/28/2014 09:39 AM, Pedro Larroy wrote:
> I checked jsoncons and it supports 64 bit integers fine. I also support
> #1 for file size.
> For the vector clocks and the share revision number I'm using arbitrary
> precision integers (gmp) encoded as a string in base10. I think this way
> is 100% correct as there is no overflow.
>
> And in dynamic languages arbitrary precision integers is built-in. So
> for the version fields I support #2

I feel like arbitrary precision is acceptable for our purposes because
we're not doing anything computationally intensive with the numbers, but
what about storage and comparison in sqlite? In particular, I'm worried
about the `last_updated_rev` field in the database.

A peer writing 100k records per second won't overflow its revision
number for half a million years, so I think 64-bit is an acceptable
limitation for that field.

For vector clocks I believe it's also going to be impossible to overflow
given our system constraints, and especially because there's a vector
clock per file.

In any case, this is an implementation decision more than a spec
decision, so I will leave it up to your judgment. I will add a
sub-section to the wire protocol once we've come to consensus.

Steven

Pedro Larroy

unread,
Mar 28, 2014, 1:48:08 PM3/28/14
to clearsk...@googlegroups.com
If you did the math and overflow is not an issue then we can switch it to 64bit integer. Anyway it was a good exercise to use gmp in C++, now I know how it works. I don't know if we win or lose anything in any of the two approaches, maybe minus one dependency.


It affects the spec because the numbers are strings for versions. Should we switch it back to 64 bit numbers or leave it as arbitrary precision? Maybe somebody else wants to comment on this also.

Pedro.


--
You received this message because you are subscribed to the Google Groups "ClearSkies Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clearskies-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages