I didn't see any resolution from these discussions:
BERT 2.0 Proposal: New complex type - {bert, string, Encoding,
Binary}:
http://groups.google.com/group/bert-rpc/browse_thread/thread/b3ccda7b76a3a631
String encoding semantics:
http://groups.google.com/group/bert-rpc/browse_thread/thread/b9816d8d70593714
...and I needed a solution ASAP, so I hacked something together:
Encoding & sending UTF-8 strings from Ruby was easy to fix: by using
String#bytesize instead of String#length, the byte size is reported
when encoding Strings and when calculating the BERT packet length.
Once the sizes are based on bytes and not characters, packets
containing multibyte Strings go out over the wire just fine, instead
of getting truncated or being invalid BERT terms as they previously
were. No extra encoding information is added -- the receiver is
expected to have knowledge of the encoding type.
On the decoding side, I added an optional parameter to the decode
method, "force_to_encoding", that allows you to specify which encoding
you would like Strings and Binaries to be decoded with (e.g.
BERT.decode(@bert, 'utf-8')). If you leave it out, everything works as
before. The current implementation uses String#force_encoding, which
AFAIK is Ruby 1.9-only, but it should be possible to add Ruby 1.8
support as well by fixing up BERT.force_encoding to support it (both
the Ruby decoder and the C decoder have been modified to call
BERT.force_encoding when necessary).
I also added a few other things to support Ruby 1.9 (like shielding
Fixnum/Bignum type coercion and doing byte-level comparision in the
unit tests).
See my BERT and BERT-RPC forks for the changes:
http://github.com/kenpratt/bert
http://github.com/kenpratt/bertrpc
Thanks!
-Ken