C# driver using aggregation pipeline automatically on simple regex search

912 views
Skip to first unread message

Marc-Olivier Labarre

unread,
Oct 5, 2019, 5:12:38 AM10/5/19
to mongodb-user
hi,

I've been debugging an issue on the C# driver, and I noticed it doesn't behave like I expect it to.

I'm using a regex filter that contains special characters and unicode, and through the C# driver, the regex doesn't work.
I've managed to make it work both directly through a database client (robo3t) and through the python driver.

I've enabled mongodb to profile/log all queries, and discovered that what I though was a single regex filter, through the driver, turned out to be using the aggregation pipeline, and that the regex used somehow lost the unicode characters.

The filter in C# is simply:

    var filters = Builders<BsonDocument>.Filter.Regex("Title", new BsonRegularExpression(reg, "i"));

And when I check on mongodb profile, I find my query as been transmitted so:

{
    "op" : "command",
    "ns" : "contents",
    "command" : {
        "aggregate" : "contents",
        "pipeline" : [ 
            {
                "$match" : {
                    "Title" : /HOW\\ TO\\ LEVEL\\ FAST\\ IN\\ GHOST\\ RECON\\ BREAKPOINT\\ \\|\\ HOW\\ TO\\ LEVEL\\ FAST\\ IN\\ GHOST\\ RECON\\ BREAKPOINT\\ \\|\\ ���/i
                }
            }, 
    ...
    "nreturned" : 0,
}

Similary, with Python, the following filter:

    c = collection.find({'Title': {'$regex': reg, '$options': 'i'}})

produces the expected query profile on mongo:

{
    "op" : "query",
    "ns" : "contents",
    "command" : {
        "find" : "contents",
        "filter" : {
            "Title" : {
                "$options" : "i",
                "$regex" : "HOW\\ TO\\ LEVEL\\ FAST\\ IN\\ GHOST\\ RECON\\ BREAKPOINT\\ \\|\\ 🔥DOUBLE\\ XP🔥\\ \\|\\ GHOST\\ RECON\\ BREAKPOINT\\ XP\\ GUIDE"
            }
        },
    ...
    "nreturned" : 1,
}

note: The regex string itself is the name of a video from youtube, with the special characters escaped.

This was done with the official driver version 2.9.2 and on a dockerized vanilla mongodb 3.6.

-----

So I have two questions:
1) Is it normal that a simple regex search is transformed to use the aggregation pipeline?
2) What happened to the unicode characters in the regex?


Thanks

Wan Bachtiar

unread,
Oct 15, 2019, 10:34:37 PM10/15/19
to mongodb-user

Is it normal that a simple regex search is transformed to use the aggregation pipeline?

Hi Marc,

I tried to reproduce this with MongoDB .NET/C# driver v2.9.2, and unable to. How do you use the filters variable ?
The following example code:

var filter = Builders<BsonDocument>.Filter.Regex("Title", new BsonRegularExpression("TEST", "i"));
var results = collection.Find(filter).ToList();

Showed up in profiler as:

"op": "query",
  "ns": "dbname.collname",
  "command": {
    "find": "collname",
    "filter": {
      "Title": /TEST/i
    },

Also, if you’re using regex just for the case insensitive search, you can try to use case-insensitive index instead. For example:

var results = collection.Find<SomeObject>(x=>x.Title == "TEST", new FindOptions() {Collation = new Collation("en", strength: CollationStrength.Secondary)}).ToList();

If you have follow up questions, could you please provide:

  • An example document that you’re trying to match
  • Code snippet for the query. i.e. how are you using the filters variable.

Regards,
Wan.

Marc-Olivier Labarre

unread,
Oct 21, 2019, 1:17:23 PM10/21/19
to mongodb-user
Hi Wan,

thanks for you reply.
The answers are actually in the post itself.
The document was a single document with only one field with this string as a value, and the code snippet is what I posted, with reg being the Regex-escaped string of the query: "HOW\\ TO\\ LEVEL\\ FAST\\ IN\\ GHOST\\ RECON\\ BREAKPOINT\\ \\|\\ 🔥DOUBLE\\ XP🔥\\ \\|\\ GHOST\\ RECON\\ BREAKPOINT\\ XP\\ GUIDE"

note: the real scenario was more complex, but I did devolve it into its simplest expression to isolate the problem, and kept the faulty behavior.

I think your example is too simple (using "TEST"), and will allow the driver to use shortcuts - you can see the actual regex is dropped/transformed and MongoDB will do a simple case-insensitive filter match for the exact string, instead of matching a more general pattern.

In a similar way, I suspect the driver is using a different "shortcut" to transform what should be a regex query into a (sadly-invalid) aggregate pipeline.

Wan Bachtiar

unread,
Oct 30, 2019, 3:13:34 AM10/30/19
to mongodb-user

Hi Marc,

It would be helpful if you can provide:

    • An example document that you’re trying to match
    • A minimal and reproducible code example

    Using the code example that I’ve posted above, I’m unable to reproduce the issue that you’re getting (The change of Find to aggregation match). If the code snippet is using the proper UTF-8 encoding (emoji), MongoCommandException will throw Regular expression is invalid: invalid UTF-8 string although you must have escaped this as based on your post you don’t seem to have this issue.

    Regards,
    Wan.

    Reply all
    Reply to author
    Forward
    0 new messages