Conceptual error in implementing parseNsFullyQualified method

84 views
Skip to first unread message

lsaf...@uncc.edu

unread,
Dec 4, 2016, 10:49:32 PM12/4/16
to mongodb-dev
Hi all,

While working on my project (basically, it is a proxy for mongodb, written in Java), I observed a very strange behavior in the implementation of mongodb.

My observation: mongodb is sensitive to the order of fields in a bson document! This must not be the case. mongodb must use the element name to find an element and must not retrieve element based on its location.

Scenario:
Let suppose we want to run db.players.find() in football database

The wire protocol will generate

[71, 0, 0, 0, 38, 0, 0, 0, 0, 0, 0, 0, -38, 7, 0, 0, 102, 111, 111, 116, 98, 97, 108, 108, 0, 102, 105, 110, 100, 0, 36, 0, 0, 0, 2, 102, 105, 110, 100, 0, 8, 0, 0, 0, 112, 108, 97, 121, 101, 114, 115, 0, 3, 102, 105, 108, 116, 101, 114, 0, 5, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0]
Green color is a bson document  and its field name is metadata

The whole message is

Header
    Message length is 71
    Request ID is 15
    Response To is 0
    opCode is OP_COMMAND
Body
    database is football
    commandName is find
    metadata is {"find":"players","filter":{}}
    commandArgs is {}

my app generates (I commented the processing part)
[71, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, -38, 7, 0, 0, 102, 111, 111, 116, 98, 97, 108, 108, 0, 102, 105, 110, 100, 0, 36, 0, 0, 0, 3, 102, 105, 108, 116, 101, 114, 0, 5, 0, 0, 0, 0, 2, 102, 105, 110, 100, 0, 8, 0, 0, 0, 112, 108, 97, 121, 101, 114, 115, 0, 0, 5, 0, 0, 0, 0]

The whole message is
 
Header
    Message length is 71
    Request ID is 15
    Response To is 0
    opCode is OP_COMMAND
Body
    database is football
    commandName is find
    metadata is {"filter":{},"find":"players"}
    commandArgs is {}

Then mongodb fails with the following message

Header
    Message length is 112
    Request ID is 104
    Response To is 15
    opCode is OP_COMMANDREPLY
Body
    metadata is {"code":73,"errmsg":"Invalid collection name: football","ok":0.000000,"waitedMS":}
    commandArgs is {}

Instead of interpreting football.player, mongodb finds football. Why? Because mongodb append database name with the string value of the first element in mettadata field. In the above example, the first element type is BSONDocument not String, therefore the code returns error. Instead of retrieving the first element, the code must retrieve the find element.

Details

mongo/src/mongo/db/commands/find_cmd.cpp

bool run(OperationContext* txn, const std::string& dbname, BSONObj& cmdObj, int options, std::string& errmsg, BSONObjBuilder& result)
{
        const NamespaceString nss(parseNs(dbname, cmdObj));
        if (!nss.isValid() || nss.isCommand() || nss.isSpecialCommand()) {
            return appendCommandStatus(result,
                                       {ErrorCodes::InvalidNamespace,
                                        str::stream() << "Invalid collection name: " << nss.ns()});
		}
		...
}


mongo/src/mongo/db/commands.cpp

string Command::parseNs(const string& dbname, const BSONObj& cmdObj) const {
    BSONElement first = cmdObj.firstElement();
    if (first.type() != mongo::String)
        return dbname;

    return str::stream() << dbname << '.' << cmdObj.firstElement().valuestr();
}

string Command::parseNsFullyQualified(const string& dbname, const BSONObj& cmdObj) {
    BSONElement first = cmdObj.firstElement();   //this line is the problem
    uassert(ErrorCodes::BadValue,
            str::stream() << "collection name has invalid type " << typeName(first.type()),
            first.canonicalType() == canonicalizeBSONType(mongo::String));
    const NamespaceString nss(first.valueStringData());
    uassert(ErrorCodes::InvalidNamespace,
            str::stream() << "Invalid namespace specified '" << nss.ns() << "'",
            nss.isValid());
    return nss.ns();
}


I think one should fix this. What do you think?

Regards,
Lida

ra...@10gen.com

unread,
Dec 7, 2016, 12:58:40 PM12/7/16
to mongodb-dev
Indeed, MongoDB is sensitive to the order of fields in BSON documents.  This is by design, and the order of fields affects the semantics of a given request in many different situations.  To name just a few:

- Command request documents.  For commands sent using the OP_QUERY wire protocol message, the command is dispatched based on the first element of the command request document.  {"foo": "x", "bar": "y"} will run the "foo" command with the "bar" option, while {"bar": "y", "foo": "x"} will run the "bar" command with the "foo" option.
- BSON documents which specify a sort or ordering.  The query "find().sort({a: 1, b: 1})" will sort documents based on the value of "a", then the value of "b", while the query "find().sort({b: 1, a: 1})" will sort documents based on the value of "b", then the value of "a".  Similar rules apply to index specifications and shard keys.
- Insert requests to the database.  Documents stored on disk preserve the original order of fields received in the insert request (with the exception of the "_id" field, which may be re-ordered), and updates which change the value of an existing field are guaranteed to not change the order of fields in documents (server version 2.6 and later only).

That said, I agree that the error message that you received for your OP_COMMAND is quite confusing, since the command is indeed being dispatched based on the "command name" header.  The case where the first element of the command request doesn't match the "command name" should really be caught by the network layer.  I just filed https://jira.mongodb.org/browse/SERVER-27318 to improve this behavior (thanks for reporting this!).

In summary, the proxy you're writing needs to preserve the order of fields in all documents it receives, including command request documents.  Assuming you're writing the proxy at the network level (that is, your code serializes and deserializes MongoDB wire protocol messages), you may find importing org.bson from the MongoDB Java Driver (http://api.mongodb.com/java/current/) to be useful, in that the library provides abstractions for BSON objects which preserve the order of elements, and it provides a serialization/deserialization API for these objects as well.  Potentially, you might also be able to reuse wire protocol message parsing code from the driver itself.

~ rassi

lsaf...@uncc.edu

unread,
Dec 8, 2016, 1:58:03 AM12/8/16
to mongodb-dev
Dear Rassi,

It was strange behavior for me. Because according to JSON specification (http://www.json.org/):

An object is an unordered set of name/value pairs.

I have fixed my BSONDocument class and now it keeps the order.

Thank you for suggesting the libraries. However, for learning purpose, I prefer to build from scratch. I started developing for Mongo 3. However recently, I upgraded to Mongo 3.2 and I found that the wire protocol has changed a lot between these two versions. I am currently coping with the changes (I will soon change to Mongo 3.4). Instead of relying on specialized messages, Mongo now relies on a few command types mainly OP_COMMAND.

Regards,
Lida

ra...@10gen.com

unread,
Dec 9, 2016, 3:53:48 PM12/9/16
to mongodb-dev
> It was strange behavior for me. Because according to JSON specification (http://www.json.org/):  An object is an unordered set of name/value pairs. 

Ah.  Indeed, JSON is unordered, but BSON is not ("BSON documents (objects) consist of an ordered list of elements.", https://en.wikipedia.org/wiki/BSON).  I understand how that is confusing.  See also this old thread from the BSON mailing list for some related discussion: https://groups.google.com/d/topic/bson/lLJviAyN_po/discussion.

> Thank you for suggesting the libraries. However, for learning purpose, I prefer to build from scratch. I started developing for Mongo 3. However recently, I upgraded to Mongo 3.2 and I found that the wire protocol has changed a lot between these two versions. I am currently coping with the changes (I will soon change to Mongo 3.4). Instead of relying on specialized messages, Mongo now relies on a few command types mainly OP_COMMAND.

Yep, OP_COMMAND was introduced in MongoDB 3.2, and it is used in the MongoDB shell and for various intra-cluster server communication tasks.  Its use may grow in MongoDB 3.6.

> I have fixed my BSONDocument class and now it keeps the order.

Glad to hear your issue is fixed!

~ Rassi
Reply all
Reply to author
Forward
0 new messages