Hi Georg -
Ok, I finally got to the bottom of this.
Since TinkerGraph uses a HashMap for its index, you can see what's being stored in the index by using Gremlin to return the contents of the map.
Here's what's being stored in the TinkerGraph index using your Bulbs `g.university.create(name=name)` method above...
{"results":[{"name":{"Université de Montréal":[{"name":"Université de Montréal","element_type":"university","_id":"0","_type":"vertex"}]},"element_type":{"university":[{"name":"Université de Montréal","element_type":"university","_id":"0","_type":"vertex"}]}}],"success":true,"version":"2.5.0-SNAPSHOT","queryTime":3.732632}
All that looks good -- the encodings look right.
To create and index a vertex like the one above, Bulbs uses a custom Gremlin script via an HTTP POST request with a JSON content type.
Here's the problem...
Rexster's index lookup REST endpoint uses URL query params, and Bulbs encodes URL params as UTF-8 byte strings.
To see how Rexster handles URL query params encoded as UTF-8 byte strings, I executed a Gremlin script via a URL query param that simply returns the encoded string...
{"results":["Université de Montréal"],"success":true,"version":"2.5.0-SNAPSHOT","queryTime":16.59432}
Egad! That's not right. As you can see, that text is mangled.
In a twist of irony, we have Gremlin returning gremlins, and that's what Rexster is using for the key's value in the index lookup, which as we can see is not what's stored in TinkerGraph's HashMap index.
Here's what's going on...
This is what the unquoted byte string looks like in Bulbs:
>>> name
u'Universit\xe9 de Montr\xe9al'
>>> bulbs.utils.to_bytes(name)
'Universit\xc3\xa9 de Montr\xc3\xa9al'
`'\xc3\xa9'` is the UTF-8 encoding of the unicode character `u'\xe9'` (which can also be specified as `u'\u00e9'`).
UTF-8 uses 2 bytes to encode a character, and Jersey/Grizzly 1.x (Rexster's app server) has a bug where it doesn't properly handle 2-byte character encodings like UTF-8.
It looks like this is fixed in Jersey/Grizzly 2.0, but switching Rexster from Jersey/Grizzly 1.x to Jersey/Grizzly 2.x is a big ordeal.
Last year TinkerPop decided to switch to Netty instead, and so for the TinkerPop 3 release this summer, Rexster is in the process of morphing into Gremlin Server, which is based on Netty rather than Grizzly.
Until then, here are few workarounds...
Since Grizzly can't handle 2-byte encodings like UTF-8, client libraries need to encode URL params as 1-byte latin1 encodings (AKA ISO-8859-1), which is Grizzly's default encoding.
Here's the same value encoded as a latin1 byte string...
{"results":["Université de Montréal"],"success":true,"version":"2.5.0-SNAPSHOT","queryTime":17.765313}
As you can see, using a latin1 encoding works in this case.
However, for general purposes, it's probably best for client libraries to use a custom Gremlin script via an HTTP POST request with a JSON content type and thus avoid the URL param encoding issue all together -- this is what Bulbs is going to do, and I'll push the Bulbs update to GitHub later today.
- James