Retrieving escaped characters using pymongo

339 views
Skip to first unread message

Dídac Busquets

unread,
Feb 28, 2020, 12:44:08 AM2/28/20
to mongodb-user
Hello,

I have a mongo collection with a document that includes escaped characters (using \") like this:

{
    "_id" : ObjectId("5e57d89d5a7f537828c12a53"),
    "field" : "my \"string\" with escaped chars"
}

If I retrieve all the documents from the database, and print them, using

for doc in collection.find({}):
   print(doc)

the output is

{'_id': ObjectId('5e57d89d5a7f537828c12a53'), 'field': 'my "string" with escaped chars'}



As you can see, the escaped characters have been transformed to non-escaped ones.

How can I maintain the escaped characters so I get the original  "my \"string\" with escaped chars" string?

Thanks

Iľja Pelech

unread,
Feb 28, 2020, 2:57:54 AM2/28/20
to mongodb-user
You can use some python builtin functionality. e.g.:

>>> a = 'abc\t\n\'def'
>>> repr(a)
'"abc\\t\\n\'def"'
>>> a.encode("utf-8", "backslashreplace")
"abc\t\n'def"
>>>

As you can see - apostrophe and quotes are "not so" special :-)

Shane Harvey

unread,
Feb 28, 2020, 12:15:16 PM2/28/20
to mongodb-user
The two strings in your example are the same so no transformation is needed: 

>>> original = "my \"string\" with \'escaped\' chars \t\n "
>>> original
'my "string" with \'escaped\' chars \t\n '
>>>
>>> client.test.t.insert_one({'field': original})
<pymongo.results.InsertOneResult object at 0x10b705230>
>>> doc = client.test.t.find_one()
>>> doc
{'_id': ObjectId('5e5948f07b934e872e3feaa8'), 'field': 'my "string" with \'escaped\' chars \t\n '}
>>> original == doc['field']
True

The only difference is the quote character (single quote ' or double quote: ") Python uses when displaying the string.

Dídac Busquets

unread,
Feb 28, 2020, 12:29:09 PM2/28/20
to mongodb-user
It's not yet exactly what I need.

The document with the escaped double quotes is not created from python, I inserted it using Robo3T. And it really needs to have double quotes as it will be later processed by a Java app that doesn't like single quotes.

Are you able to retrieve the escaped double quotes when you read a document that shows them when inspected with Robo3T?

Shane Harvey

unread,
Feb 28, 2020, 1:26:01 PM2/28/20
to mongodb-user
Are you able to retrieve the escaped double quotes when you read a document that shows them when inspected with Robo3T?

PyMongo reads the exact value of the field without modifying it; so there are no "escaped double quotes" in the example string because it does not contain any backslashes at all. A string with backslashes would look like this:

>>> client.test.t.insert_one({'field': r'my \"string\" with escaped chars'})
<pymongo.results.InsertOneResult object at 0x10545f1c0>
>>> client.test.t.find_one()
{'_id': ObjectId('5e595a6e7735850d1b1fc08c'), 'field': 'my \\"string\\" with escaped chars'}

So it may be a problem that the inserted data does not contain the backslashes you expect.

Alternatively, the app can encode the original string in your example using Python's json module, like this:

>>> import json
>>> s = json.dumps('my "string" with escaped chars')
>>> s
'"my \\"string\\" with escaped chars"'
>>> print(s)

"my \"string\" with escaped chars"

The Java app can then parse the JSON encoded string.

If the app needs to send the whole BSON document, it can use MongoDB Extended JSON like this:

>>> from bson.json_util import dumps, CANONICAL_JSON_OPTIONS
>>> s = dumps({'_id': ObjectId('5e5948f07b934e872e3feaa8'), 'field': 'my "string" with \'escaped\' chars \t\n '}, json_options=CANONICAL_JSON_OPTIONS)
>>> s
'{"_id": {"$oid": "5e5948f07b934e872e3feaa8"}, "field": "my \\"string\\" with \'escaped\' chars \\t\\n "}'
>>> print(s)
{"_id": {"$oid": "5e5948f07b934e872e3feaa8"}, "field": "my \"string\" with 'escaped' chars \t\n "}

Dídac Busquets

unread,
Mar 2, 2020, 5:07:17 AM3/2/20
to mongod...@googlegroups.com

Thanks Shane,

Your comment helped finding what the problem was. It’s the “print” bit, that is not displaying the backslashes. My code was actually doing a return str(doc) as it was a REST API server. The str was then removing the backslashes.

I’ve changed that for return json.dumps(doc) and it keeps them.

So problem solved!



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/a3a4bcd0-c9f5-4624-86a3-bb84f750c9d4%40googlegroups.com.


--
Dídac Busquets

Shane Harvey

unread,
Mar 2, 2020, 3:01:39 PM3/2/20
to mongodb-user
Great! Happy to help!
Reply all
Reply to author
Forward
0 new messages