Sorting dates in CouchDB

4 views
Skip to first unread message

Carl Bourne

unread,
Oct 8, 2012, 5:21:08 AM10/8/12
to us...@couchdb.apache.org
Hi I just wanted to sanity check what I'm doing here.

I have a bunch of documents stored in Couch with expiry dates like this: "expire": "2020-07-28T15:13:00+00:00"

I have a simple map function defined like this:

function(doc) {
emit(doc.expire);
}

I'm then querying the data ranges like this: ?startkey="2012-10-02"&endkey="2012-12-02"

Is this the correct way to do this type of thing?

Regards,

Carl

Aurélien Bénel

unread,
Oct 8, 2012, 5:35:21 AM10/8/12
to us...@couchdb.apache.org
> have a bunch of documents stored in Couch with expiry dates like this: "expire": "2020-07-28T15:13:00+00:00"
> (...)
> I'm then querying the data ranges like this: ?startkey="2012-10-02"&endkey="2012-12-02"

You can do that, but be aware that you're comparing strings of different lengths.
One consequence is that you won't get timestamps from 2012-12-02.

This is because of the following alphabetical order:
2012-10-01T8:00:00+00:00
2012-10-02
2012-10-02T9:00:00+00:00
2012-12-01T10:00:00+00:00
2012-12-02
2012-12-02T11:00:00+00:00


Regards,

Aurélien


Martin Hewitt

unread,
Oct 8, 2012, 5:40:03 AM10/8/12
to us...@couchdb.apache.org
We split our dates into an array of integers, so today's date would look like:

"date":[2012,10,8]

And we do the same for time, and then combine them in the view function and emit a compound array

emit([folder, date], doc._id);

We can then query by:

?startkey=["test", [2012,10]&endkey=["test", [2012,10,{}]

To get all of October's entries in the "test" folder.

If you choose to go down this route (i.e. parsing your date string into a JS Date object in the view function), just be aware that the Javascript date object uses 0-based months, (January as month 0 instead of 1).

Martin

Carl Bourne

unread,
Oct 8, 2012, 5:44:06 AM10/8/12
to us...@couchdb.apache.org
Thanks Aurélien,

> One consequence is that you won't get timestamps from 2012-12-02.

OK so if I understand you correctly if I'm only querying using the date part e.g. "2012-12-02" then it will work OK?

Also, is there a better way to do this that overcomes this issue?

Carl Bourne

unread,
Oct 8, 2012, 5:58:04 AM10/8/12
to us...@couchdb.apache.org
Aurélien,

> You can do that, but be aware that you're comparing strings of different lengths.
> One consequence is that you won't get timestamps from 2012-12-02.


I just checked some of my data and all of the stored strings are of the same length. Am I understanding you correctly here? e.g.

"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-04T22:19:00+00:00"
"expire":"2012-10-05T23:13:00+00:00"
"expire":"2012-10-05T23:13:00+00:00"
"expire":"2012-10-05T23:13:00+00:00"
"expire":"2012-10-05T23:13:00+00:00"
"expire":"2012-10-05T23:13:00+00:00"
"expire":"2012-10-05T23:13:00+00:00"
"expire":"2012-10-06T00:01:00+00:00"
"expire":"2012-10-06T00:01:00+00:00"
"expire":"2012-10-06T00:01:00+00:00"
"expire":"2012-10-06T00:01:00+00:00"
"expire":"2012-10-06T00:01:00+00:00"
"expire":"2012-10-06T00:01:00+00:00"
"expire":"2012-10-06T00:07:00+00:00"
"expire":"2012-10-06T00:07:00+00:00"
"expire":"2012-10-06T00:07:00+00:00"

Regards,

Carl

On 8 Oct 2012, at 10:35, Aurélien Bénel <aurelie...@utt.fr> wrote:

Aurélien Bénel

unread,
Oct 8, 2012, 7:53:02 AM10/8/12
to us...@couchdb.apache.org
Hi Carl,

> OK so if I understand you correctly if I'm only querying using the date part e.g. "2012-12-02" then it will work OK?

I'm not sure to understand, but if you use "key" instead of "startkey" and "endkey", you will need the exact value you emitted.

You have indeed several solutions:

1. emit the whole timestamp as a string, and query with startkey and endkey using an endkey that is alphabetically greater than the last key you want to grasp (for example : "2012-12-03" or even "2012-12-02Z" instead of "2012-12-02") :

?startkey="2012-10-02"&endkey="2012-12-03"
or
?startkey="2012-10-02"&endkey="2012-12-02Z"

{rows:[
{"key"="2012-10-02T9:00:00+00:00", ...},
{"key"="2012-12-01T10:00:00+00:00", ...},
{"key"="2012-12-02T11:00:00+00:00", ...}
]}

2. emit as a string the part of the timestamp you want to query on, and use startkey and endkey more naturally:

?startkey="2012-10-02"&endkey="2012-12-02"

{rows:[
{"key"="2012-10-02", ...},
{"key"="2012-12-01", ...},
{"key"="2012-12-02", ...}
]}

3. emit the whole timestamp as an array and query a part of it:

?startkey=[2012, 10, 2]&endkey=[2012, 12, 2, {}]

{rows:[
{"key"=[2012, 10, 2, 9, 0], ...},
{"key"=[2012, 12, 1, 10, 0], ...},
{"key"=[2012, 12, 2, 11, 0], ...}
]}

4. emit the whole timestamp as an array and query a (reduced) group of it:

?group_level=3&startkey=[2012, 10, 2]&endkey=[2012, 12, 2]

{rows:[
{"key"=[2012, 10, 2], ...},
{"key"=[2012, 12, 1], ...},
{"key"=[2012, 12, 2], ...}
]}


There's more than one way to do it. ;)


Regards,

Aurélien

Carl Bourne

unread,
Oct 8, 2012, 9:08:39 AM10/8/12
to us...@couchdb.apache.org
Thanks Aurélien,

I'll try these options to see which is going to work best for me!

Kai Griffin

unread,
Oct 8, 2012, 3:02:02 PM10/8/12
to us...@couchdb.apache.org
Hi Carl,
If I were storing dates, then I'd favour storing them as unix dates
(which are long integers), not strings; not least because it's more
efficient, but also because you leave the door open to doing something
more interesting with the dates down the track without having to parse
them to/from strings first. That said, I think your start key should
find the first 2012-10-02 successfully, but your end-key would only find
up to 2012-12-01T23:59:59 (note that's 1 Dec, not 2 Dec), because there
is no expiry date exactly matching just 2012-12-02. Since your query
returned no rows at all, maybe the endkey is failing entirely; not sure
why that would be.
Cheers,
Kai

Michael Zedeler.

unread,
Oct 8, 2012, 3:11:23 PM10/8/12
to us...@couchdb.apache.org
On 2012-10-08 11:58, Carl Bourne wrote:
> Aur�lien,
>
>> You can do that, but be aware that you're comparing strings of different lengths.
>> One consequence is that you won't get timestamps from 2012-12-02.
> I just checked some of my data and all of the stored strings are of the same length. Am I understanding you correctly here? e.g.
>
> "expire":"2012-10-04T22:19:00+00:00"
> "expire":"2012-10-04T22:19:00+00:00"
> [...]
> "expire":"2012-10-06T00:07:00+00:00"
Aur�liens point is that when you use strings, they are sorted
lexicographically, so

'abc' < 'bbc'

(Because 'a' < 'b'.)

And thus

'2012-10-06' < '2012-10-06T00:07:00+00:00'

(Because '' < 'T00:07:00+00:00' - the empty string always ranks lower
than any other string.)

So in order to get all data from 2012-10-06, you can either use the end
key '2012-10-07' or include some letter that you know will serve as a
safe delimiter, such as 'Z' (my suggestion) - the end key then becomes
'2012-10-06Z'.

Regards,

Michael.

Mark Hahn

unread,
Oct 8, 2012, 3:11:51 PM10/8/12
to us...@couchdb.apache.org
> I'd favour storing them as unix dates

I find it really useful to be able to read the dates in futon and debug
dumps. Are you sure it's a lot slower?
>> ?startkey="2012-10-02"&endkey=**"2012-12-02"

Jens Alfke

unread,
Oct 8, 2012, 3:16:56 PM10/8/12
to us...@couchdb.apache.org

On Oct 8, 2012, at 12:02 PM, Kai Griffin <ka...@resourceandrevenue.com> wrote:

> If I were storing dates, then I'd favour storing them as unix dates
> (which are long integers), not strings; not least because it's more
> efficient, but also because you leave the door open to doing something
> more interesting with the dates down the track without having to parse
> them to/from strings first.


—Jens

Kai Griffin

unread,
Oct 8, 2012, 3:26:32 PM10/8/12
to us...@couchdb.apache.org
No, I'm not sure. I can't help but think that at least in view indexing
it's less expensive to use numbers than strings... but that's purely
instinct and not based on knowledge of how erlang works - perhaps a
hold-over from obsolete thinking on my part. I'm often doing date calcs
and grouping by years, months, week numbers, or quarters, etc... so I've
always just felt more comfortable having them stored as dates to begin
with so there's no additional conversion process to extract those year,
month, etc values.

Tim Tisdall

unread,
Oct 8, 2012, 3:55:34 PM10/8/12
to us...@couchdb.apache.org
I think it'd only be a lot slower if you did parsing of the date. As you have it, it's just a simple string comparison. You're doing it the way I would. Unix timestamp is good too, but makes it hard to read the dates manually.

Mark Hahn

unread,
Oct 8, 2012, 4:14:23 PM10/8/12
to us...@couchdb.apache.org
> so I've always just felt more comfortable having them stored as dates to
begin with so there's no additional conversion process to extract those
year, month, etc values.

Whether you use numbers or strings it still needs to do a conversion from
json (a string) to a Date object. For that matter it might be slower to
convert from the JSON number in a string to a Date than from a string to a
Date. I have no idea of course.

I just realized I was thinking of code in the client. Using a number for a
key lookup is probably faster. I wonder if Couch stores the JSON as a
string in the DB or converted to binary.

On Mon, Oct 8, 2012 at 12:26 PM, Kai Griffin <ka...@resourceandrevenue.com>wrote:

> No, I'm not sure. I can't help but think that at least in view indexing
> it's less expensive to use numbers than strings... but that's purely
> instinct and not based on knowledge of how erlang works - perhaps a
> hold-over from obsolete thinking on my part. I'm often doing date calcs
> and grouping by years, months, week numbers, or quarters, etc... so I've
> always just felt more comfortable having them stored as dates to begin with
> so there's no additional conversion process to extract those year, month,
> etc values.
>
> On 08/10/2012 21:11, Mark Hahn wrote:
>
>> I'd favour storing them as unix dates
>>>
>> I find it really useful to be able to read the dates in futon and debug
>> dumps. Are you sure it's a lot slower?
>>
>> On Mon, Oct 8, 2012 at 12:02 PM, Kai Griffin <ka...@resourceandrevenue.com
>> >**wrote:
>>>> ?startkey="2012-10-02"&endkey=****"2012-12-02"

Kai Griffin

unread,
Oct 8, 2012, 7:44:33 PM10/8/12
to us...@couchdb.apache.org
On 08/10/2012 22:14, Mark Hahn wrote:
>> so I've always just felt more comfortable having them stored as dates to
> begin with so there's no additional conversion process to extract those
> year, month, etc values.
>
> Whether you use numbers or strings it still needs to do a conversion from
> json (a string) to a Date object. For that matter it might be slower to
> convert from the JSON number in a string to a Date than from a string to a
> Date. I have no idea of course.
>
> I just realized I was thinking of code in the client. Using a number for a
> key lookup is probably faster. I wonder if Couch stores the JSON as a
> string in the DB or converted to binary.
>

First of all, having now browsed this topic of dates-as-strings vs
dates-as-long-ints a little bit, I didn't realise until now that storing
full date strings in this context is fairly commonplace, even in text
books. It suggests that many of my assumptions about performance might
be ill-founded. I still struggle to see how "new Date(dateString)"
could be as speed-efficient as "new Date(milliseconds)" within a map
function, but this only impacts view creation (and only if you actually
have to instantiate a date object like that), so the penalty is paid
only once per document (but in my case this is tens of millions of
documents, so it still might be a big enough penalty to worry about).

I guess as a storage-space miser (and admittedly also on a philosophical
level), I still personally prefer that dates, representing as they do
milliseconds, remain numbers in storage, and come to life only
fleetingly as long, readable strings at run-time on the client-side.
Even if the numeric version is stored as a string internally within
JSON/couch, it's still less than half the size of the equivalent full
date string. This doesn't help with reading Futon or log files,
admittedly. In my own case, I have several different kinds of date-time
values in each document, and over 30 million documents, growing daily -
the storage space difference between numeric and full-string versions of
dates would definitely be significant.

Mark Hahn

unread,
Oct 8, 2012, 7:48:38 PM10/8/12
to us...@couchdb.apache.org
It's obviously a personal call. Personal to the developer and personal to
the specific app needs. It would be interesting to have a poll, just out
of curiosity.

Jens Alfke

unread,
Oct 8, 2012, 7:49:20 PM10/8/12
to us...@couchdb.apache.org

On Oct 8, 2012, at 1:14 PM, Mark Hahn <ma...@hahnca.com> wrote:

> I've always just felt more comfortable having them stored as dates to
> begin with so there's no additional conversion process to extract those
> year, month, etc values.
> For that matter it might be slower to
> convert from the JSON number in a string to a Date than from a string to a
> Date. I have no idea of course.

I don’t know how SpiderMonkey does it, but dates are very often represented internally as timestamps*, so the conversion from a number would be very cheap. Whereas accessing the month/date/etc. would involve first parsing the date string to a numeric timestamp, then processing that timestamp to re-derive those fields.

> I wonder if Couch stores the JSON as a
> string in the DB or converted to binary.

I believe CouchDB stores documents internally as serialized Erlang terms. I don’t know much about this format other than that it’s binary not ASCII. On the other hand, the docs have to be converted to JSON to be sent across the wire to the view server, and then parsed into JavaScript objects. There might be tin cans and string involved too. :p

—Jens

Mark Hahn

unread,
Oct 8, 2012, 7:54:50 PM10/8/12
to us...@couchdb.apache.org
> There might be tin cans and string involved too.

Immutable string?
Reply all
Reply to author
Forward
0 new messages