Hi Kyle,
Thanks for your support.
What I am trying to do is to find a way to capture arbitrary data,
coming from different sources under a key-value scheme, organized by
user sessions.
I have no way of knowing for sure what the key nor the values will be
since it's left at the appreciation of a third party client
application.
So, my color/shape sample is just as possible as anything else.
Also, since we will likely to be interested in saving extra data for
each key-value pair, I feel like the following representation would be
impractical
data_bag: {
color: ['red', 'green'],
shape: ['square']
}
If I want to have some contextual data, let's say a timestamp, in
addition for each key value pair, I would need to have something like
data_bag: {
color: [{value: 'red', timestamp:'2010-05-04 10:25:32+0200'}, {value:
'green', timestamp:'2010-05-04 10:27:58+0200'}],
shape: [{value: 'square', timestamp:'2010-05-04 10:25:35+0200'}]
}
Now, this modelization would most likely solve part of the problem,
since I figure I can use $elemMatch to ignore the timestamp part when
querying.
However, I can't think of a way to ask for all documents matching the
following condition: (color: 'red' AND shape: 'square') OR (color:
'blue' AND shape: 'triangle').
Additionally, the reason why I initially thought of a key: 'color',
value: 'red' design is that I would like to set an index to improve
querying speed.
And with
data_bag: {
color: [{value: 'red', timestamp:'2010-05-04 10:25:32+0200'}, {value:
'green', timestamp:'2010-05-04 10:27:58+0200'}],
shape: [{value: 'square', timestamp:'2010-05-04 10:25:35+0200'}]
}
I would need to index data_bag.color, data_bag.shape and so on. Since
I don't know what the keys will be, I cannot define proper indexing.
Unless setting an index on data_bag would allow me to speed up queries
on data_bag's properties (how would an index on an array of objects be
handled?).
I also considered a different modelization where a key-value pair
would be a single document
{
session_key: 1,
key: 'color',
value: 'red',
timestamp:'2010-05-04 10:25:32+0200'
},
{
session_key:1,
key: 'shape',
value: 'square,
timestamp:'2010-05-04 10:25:35+0200'
},
{
session_key:1,
key: 'color',
value: 'green,
timestamp:'2010-05-04 10:27:58+0200'
}
By indexing {key:1, value:1}, I can get high speed querying even on
very large collections.
Now, I needed to add this session_key in order to group all the same
session information together. But I need to have session specific meta
data, let's say geographical information.
If it was a RDBMS, I would just have a separate table and perform a
join.
In a document database, I think the correct way would be to have this
data in each document (denormalizing the info). Gobbling up the disk
space, but fair enough, it's an acceptable trade-off.
But here when I want to query for all sessions with color: 'red' AND
shape: 'square', I would need to run an aggregation (map/reduce I
gather), which will inevitably deeply impact speed.
Now, this is my perception of the problem, I would be glad to share
your thoughts on this.
Thanks,
Guillaume
> >
http://groups.google.com/group/mongodb-user/browse_thread/thread/7067...