Mongo DB Arrays changing into Objects

3,005 views
Skip to first unread message

Glenn

unread,
Feb 23, 2011, 8:22:10 PM2/23/11
to mongodb-user
I'm pretty sure I'm going to get a barrage of responses telling me
that I shouldn't be trying to use sub-documents in Mongo, but c'mon,
they give us the API to do so. There should be a way.

Anyways, my system takes advantage of the $push command to build up
arrays as sub-documents inside the document "rows". And I even use
arrays nested inside objects, etc (I have an ORM built on top of
it). For example, the document could look like:

"buckets" : { "bucket1" : { "apples" : [ { "color" : "green" },
{ "color" : "red" } ] } }

There's a "buckets" object containing a single bucket in this case.
It has an "apples" property which is an array of objects.
My ORM works perfectly most of the time (It is currently used in the
production Facebook game: CityZen), however, for some reason it will
occasionally corrupt the arrays and turn them into objects keyed by
index-strings, so the above would become:

"buckets" : { "bucket1" : { "apples" : { "0" : { "color" : "green" },
"1" : { "color" : "red" } } } }

This, naturally, causes any subsequent $push calls to fail, and thus
breaks the game state for that user.
The worst part is, that it doesn't happen all the time. However,
recently it's cropped up quite a bit.

I know that there are some issues with removing elements from arrays
(requiring you to call a $pull after the $unset), and I do perform
those steps when needed. However, this particular array never
actually gets any elements removed from it (in theory at least).

I'm wondering if anyone has any insights into why this type-change
might be occurring and ideas on how to avoid it, other than the
suggestion not to use sub-documents. (I simply can't rewrite the
whole system at this point).
Please let me know if you do. Thanks.

Gaetan Voyer-Perrault

unread,
Feb 23, 2011, 8:30:21 PM2/23/11
to mongod...@googlegroups.com
Are you using the PHP driver? If so what version?

How does your ORM work?


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Glenn

unread,
Feb 23, 2011, 8:52:33 PM2/23/11
to mongodb-user
I am using the PHP driver version 1.1.3.
The ORM is essentially a dynamic PHP class which you can use to build
the document selector. Then you can get the value at that selector
with a call to getValue(), or set it with setValue(). There are also
other helper methods like push(), remove(), etc. EX:

Metabase::get("user", "123123")->buckets["bucket1"]-
>apples[1].getValue();
(which returns: "red")

The "buckets" member can be indexed like a dictionary (rather than an
object) since it is defined as such in my ORM schema, but regardless
it transitions to using Mongo-style syntax when accessing the actual
DB. For example this line:

Metabase::push("user", "123123")->buckets["bucket1"]->apples-
>push(array("color" => "yellow"));

Would become this PHP Mongo query:

$mongoConnection->update(array("key" => "123123"), array('$push' =>
array("buckets.bucket1.apples" => array("color" => "yellow"))),
array("upsert" => true, "safe" => true));


On Feb 23, 5:30 pm, Gaetan Voyer-Perrault <ga...@10gen.com> wrote:
> Are you using the PHP driver? If so what version?
>
> How does your ORM work?
>

Gates

unread,
Feb 24, 2011, 3:36:23 PM2/24/11
to mongodb-user
Here's why I was asking this question.

Fundamentally, in PHP, the following JSON object basically *is* an
array:
{ "0" : { "color" : "green" }, "1" : { "color" : "red" } }

PHP isn't great about differentiating between "array" and "hash
table". So that thing may actually behave as an array in PHP even
though it's technically an "object" in JSON.

Step 1: solving the live problem (if it's causing you problems)
- I would add some error handling code here.
- If your write is failing due to this error, you want to do two
things
-a. Reload the source data
-b. Re-save that data "as a whole"

It's not as efficient, but it should restructure the data correctly.

Step 2: trying to repro the problem.

Grab a test user and try this:
Metabase::push("user", "123123")->
buckets["bucket1"]->
apples->
push(
array(
array("color" => "yellow"),
array("color" => "green")
)
);

Does the output of that work with Metabase?

I would try some variants of this query on Metabase and see what goes
in & comes out (is Metabase using an ArrayObject anywhere)?

I've personally seen this type of weird disconnect between PHP and
Mongo, but in all cases, it came back to me saving the wrong data in
some way. I don't want to pin this on Metabase specifically, but
there's likely something ArrayObject or hashed arrays unknowingly.

- Gates

Glenn

unread,
Feb 25, 2011, 1:00:52 PM2/25/11
to mongodb-user
Well, Metabase itself actually does descend from ArrayObject (in order
to get all the dynamic array class functionality), but I never save
the actual Metabase object into the DB. Furthermore, I'm always using
$push to modify arrays in the DB, never simply passing in an entire
array as the value (although, I would assume that should work too for
normal arrays).
Anyways, for now I have implemented a temp hack fix which simply fixes
any non-array values upon game load. It's not optimal, and I haven't
found the true source of the problem, but unfortunately, I don't have
enough time to dig deeper at the moment.

Thanks, Gates, for your input though. If I find the core issue, I'll
be sure to post my results here.

-Glenn

Charles Woodcock

unread,
Mar 24, 2011, 11:26:34 AM3/24/11
to mongodb-user
If you have a PHP array which has some indices missing, e.g.
array(0 => 'a', 2 => 'b')
then it will be saved as an object (and not an array) in Mongo.

This is the correct behaviour of the database apparently. I had the
same problem when using array_unique() (which preserves keys and thus
creates this sort of situation).

The solution is to use array_values(), which will reindex the PHP
array to:
array(0 => 'a', 1 => 'b')
which gets saved correctly.

jp

unread,
May 3, 2011, 6:02:00 PM5/3/11
to mongodb-user
I have seen the same thing and have a discussion that I believe is
caused by same behavior at:

http://groups.google.com/group/mongodb-user/browse_thread/thread/e8dbc9f139f6717e/79efc8b26819ef0d?lnk=gst&q=serenityexperience#79efc8b26819ef0d

The PHP documentation should explain clearly that lists of embedded
documents MUST be a zero based index, while a PHP associative array
will be treated like a single sub document. Thus, $push and $pushAll
cannot be used to append to such documents.

Cheers


On Mar 24, 8:26 am, Charles Woodcock

jp

unread,
May 3, 2011, 6:02:23 PM5/3/11
to mongodb-user
I have seen the same thing and have a discussion that I believe is
caused by same behavior at:

http://groups.google.com/group/mongodb-user/browse_thread/thread/e8dbc9f139f6717e/79efc8b26819ef0d?lnk=gst&q=serenityexperience#79efc8b26819ef0d

The PHP documentation should explain clearly that lists of embedded
documents MUST be a zero based index, while a PHP associative array
will be treated like a single sub document. Thus, $push and $pushAll
cannot be used to append to such documents.

Cheers


On Mar 24, 8:26 am, Charles Woodcock
<charles.woodc...@totalinternetgroup.nl> wrote:

Gates

unread,
May 4, 2011, 6:48:55 PM5/4/11
to mongodb-user
@JP, I agree that something should be clarified here. I personally had
this issue (which is how I knew the answer above).

I'll see what's involved in updating the php.net documentation.

- Gates

On May 3, 3:02 pm, jp <j...@serenityexperience.com> wrote:
> I have seen the same thing and have a discussion that I believe is
> caused by same behavior at:
>
> http://groups.google.com/group/mongodb-user/browse_thread/thread/e8db...

jp

unread,
May 4, 2011, 8:50:54 PM5/4/11
to mongodb-user
@Gates Cheers! It took me a day to learn Mongo, a day to design and 2
days to implement. I find it great for RAD, but there are these few
little things that cause great headaches. I had things working, or
what I thought and found this issue and spent a lot of time trying
this, trying that, just to figure out what was going wrong.

Along these lines, my design could benefit from my embedded documents
being a map rather than an array...I know understand I cannot use
$push or $pushAll, but I assume it would be fine to use things like
$set. I also assume indexing across these sub-objects should not be
an issue.

My question along these lines: Is there a benefit to using the array
construct vs an object (AKA map) construct? We currently cannot
select back which of the array elements we want from the embedded list
so there seems to be no features implemented around these embedded
arrays, but if I stored as map then my PHP code could more easily/
quickly locate specific elements within my set based on known keys.

Cheers

Gates

unread,
May 5, 2011, 1:26:13 PM5/5/11
to mongodb-user
> We currently cannot select back which of the array elements we want from the embedded list....
> ... there seems to be no features implemented around these embedded arrays

So there are some features around embedded arrays. There's the
$elemMatch operator and dot notation works with arrays.

But, it's true that there's limited support for treating sub-documents
as if they were documents themselves. You can query on the existence
of a sub-document, but you still get back the whole parent.

> Is there a benefit to using the array construct vs an object (AKA map) construct?

So we've jumped from a bug to a schema design problem, but I think
you've hit on an important consideration here.

Here are two versions of the same basic data:

{ _id: 1,
text: "I am parent",
children: [
{ id: 'a', name: "john"},
{ id: 'b', name: "nancy"}
]
}

Pros:
- nice if you want to $push children.

Cons:
- annoying to update "john", generally have to update the whole
document.

Another version of this same document.

{ _id: 1,
text: "I am parent",
children: {
'a': { name: "john"},
'b': { name: "nancy"}
}
}

Notice how "children" is now an object? I've used that unique child ID
as key and the child object as a value.

Pros:
- Good if you want to update, $set: {'children.a'...}.
- Good for also useful for adding, $set: {'children.c':...}.

Cons:
- Where do you get new IDs? (ObjectIds?)
- You lose array ops ($push, $pop, $size, ...)

So, as always there are trade-offs here.

In some scenarios, it's important to be able to drill in and modify
specific pieces of data.
In other cases you really just want to manipulate the array with pops
and pushes.

- Gates

jp

unread,
May 5, 2011, 11:44:45 PM5/5/11
to mongodb-user
This is excellent insight and so far one of my favorite discussions!
Thanks for the detailed explanation, with examples! Although, I am
not clear as to what you mean by the con "Where do you get new IDs?
(ObjectIds?)"

I believe much of the PHP users' confusion comes from PHP's use of a
map for all its array behavior (which I have come to really like).
What are you thoughts on gearing Mongo towards these similar
constructs?

I hope to see new features in the future around subdocuments, mainly
because they are the whole benefit of document stores in the first
place. The ability to get away from relational models and create far
more intuitive and optimal representations. I would say these models
mesh far better with an object-oriented design/implementation in code,
but also create a more optimal retrieval methodology. Before I came
to know Mongo I had my own in memory implementation that is very
similar (even has secondary indexing) but of course proprietary to my
code and I found huge performance gains over traditional storage
models.

Aside from the current known issue of not being able to select
specifically the sub/embedded documents upon return I would like to
see the ability to see and treat these more like recursive objects.
Would be ideal to be able to say, select back just the sub-object if
desired. Even have the cursor able to iterate over the sub
collection. I could see in place updates happen while iterating as
well which

In some cases it makes better sense to have these in their own
collection, while in others there are redundancies created, thus more
indexing, larger storage requirements and potentially slower models.

Thanks again for your excellent response, I hope this will provide
clarity for others and help them avoid the troubles we had to
experience.

Cheers

Gates

unread,
May 6, 2011, 3:39:08 PM5/6/11
to mongodb-user
> "Where do you get new IDs? (ObjectIds?)"

Assume you are creating a document like this:
{
children: {
'a': { ... },
'b': { ... }
}
}

The fields 'a' and 'b' are basically IDs.
So in some way you have to generate this ID or get it from somewhere.

> Aside from the current known issue of not being able to select specifically the sub/embedded documents upon return...

So to clarify this try the following commands:
db.test.insert( { a : 1, children : { b : { x:2}, c: {x:3} } } );
db.test.find( { }, { 'children.b' : 1 } )
=> { _id : ..., "children" : { "b" : { "x" : 2 } } }

This brings back the _id field and the only the b sub-document of
children.
It does not return a:1 or children.c.

If you structure your document as *objects of objects*, then it's easy
to pull out a specific embedded document.
If you structure your document as *arrays of objects*, this does not
work as well.

db.test.insert( { a : 2, children : [ {name:'b', x:2}, {name:'c', x:
3} ] } );
db.test.find( {a:2}, { children:1 } );

This will give you only the "children" field. But you cannot pull back
only the "children.b" sub-document.
You can $slice and pull back the first N children, but you can't pull
back the specific child.

> What are you thoughts on gearing Mongo towards these similar constructs?

There are several existing JIRA tickets requesting some more features
around *arrays of objects*.
Here's one for returning only matching objects from the array:
https://jira.mongodb.org/browse/SERVER-828

There are many more:
http://bit.ly/jc6HRk

The best spot to start work on such features is to create an JIRA
request or vote on an existing one.

- Gates

jp

unread,
May 6, 2011, 8:10:02 PM5/6/11
to mongodb-user
Thanks again! We appreciate how well you are able to clarify these
issues :)

Thankfully, in my case the ID's are a non-issue in many, if not all,
of my use cases. Other than the ID I personally see little advantage
to the array implementation...If $push and what not were implemented
against all subdocuments (assoc. arrays) then there is the option to
have control over your keys or not. I am lately biased because of the
ease PHP offers by implementing arrays as assoc. arrays ;) Even back
in my C++ days more often than not I needed assoc. rather than simple
lists.

Considering the differences in functionality there are significant
design decisions which need to be consider when deciding between these
two, as things exist today. As in PHP, it would be nice to see these
two options function nearly (if not completely) synonymously in future
versions. A good roadmap as to future plans is also beneficial (if
not critical) as we are making decisions now that could be affected
into the future, potentially breaking our applications :)

All this aside, are there any performance differences between these
two methods in regards to storage, insert/update and/or querying?
That alone could make a significant case to implement in one direction
over the other regardless of offered functionality/behavior.

I look forward to reviewing these JIRA issues, thanks for sharing!

Cheers

jp

unread,
May 6, 2011, 11:18:39 PM5/6/11
to mongodb-user
After thinking more about this I believe another major difference/
issue is if you choose the subdocument approach over the array I am
not sure I can see a way that you can properly index fields within the
collection of subdocuments.

For example, ff we have the array structure:

$doc =
array( "_id" => (int)1,
"SUBS" =>
array( array( "NAME"=>"SUBNAME1",
"STR"=>"Some Data" ),
array( "NAME"=>"SUBNAME2",
"STR"=>"Other Data" ) ) );

I can see how and am using indexes that can use the dot notation like
"SUBS.NAME," "SUBS.STR"

However, given a document structure like:

$doc =
array( "_id" => (int)1,
"SUBS" =>
array( "SUBNAME1" => array( "STR"=>"Some
Data" ),
"SUBNAME2" =>
array( "STR"=>"Other Data" ) ) );

I cannot think of a way that the subdocument can be indexed in this
way...any ideas? If that is true then this is the biggest CON against
associative arrays for subdocument collections/lists. That is, IF you
need to be able to search for a document who contains subdocuments of
specific criteria.

Cheers
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages