I've recently been working on adding support for UTF-8 (or in fact, any encoding) to net-ldap. When it comes to sending data, it turns out we can just treat it as raw byte strings internally, as long as the client and server agree on the encoding. This seems reasonable to me. One has to do one fix to
String#to_ber, though, and make it use "ASCII-8BIT" encoded strings. You can see the change here: https://github.com/danabr/ruby-net-ldap/commit/2bc334d2904c68ad2ca0cb8e95661a87cbae2632
This has worked great for me in the project I am working on, which uses UTF-8 and an OpenLDAP server. The code must be tested on more setups, though.
Now, so far I have only been dealing with the easy part, namely sending data to the server. It becomes a little bit more complicated when receiving data. Ideally, I would like it to work something like this:
Net::LDAP.open(..., :encoding => 'utf-8') do |ldap|
ldap.search(...) do |entry|
entry[:dn] # utf-8
entry[:cn][0] # utf-8
end
end
The question is where the proper place to place the decoding is. I've been thinking about storing the encoding as a property of the connection, and then pass on the encoding to Net::LDAP::PDU to do the work. But I don't really see a "clean" way of doing it.
Do you have any suggestions?