String encoding/decoding for non-ASCII encodings

40 views
Skip to first unread message

Ken Pratt

unread,
Mar 18, 2010, 7:17:47 PM3/18/10
to BERT-RPC
I'm using the Ruby BERT and BERT-RPC libraries, as well as the Erlang
Bert.erl library, and needed the ability to easily send UTF-8 strings
back and forth between them.

I didn't see any resolution from these discussions:

BERT 2.0 Proposal: New complex type - {bert, string, Encoding,
Binary}:
http://groups.google.com/group/bert-rpc/browse_thread/thread/b3ccda7b76a3a631

String encoding semantics:
http://groups.google.com/group/bert-rpc/browse_thread/thread/b9816d8d70593714

...and I needed a solution ASAP, so I hacked something together:

Encoding & sending UTF-8 strings from Ruby was easy to fix: by using
String#bytesize instead of String#length, the byte size is reported
when encoding Strings and when calculating the BERT packet length.
Once the sizes are based on bytes and not characters, packets
containing multibyte Strings go out over the wire just fine, instead
of getting truncated or being invalid BERT terms as they previously
were. No extra encoding information is added -- the receiver is
expected to have knowledge of the encoding type.

On the decoding side, I added an optional parameter to the decode
method, "force_to_encoding", that allows you to specify which encoding
you would like Strings and Binaries to be decoded with (e.g.
BERT.decode(@bert, 'utf-8')). If you leave it out, everything works as
before. The current implementation uses String#force_encoding, which
AFAIK is Ruby 1.9-only, but it should be possible to add Ruby 1.8
support as well by fixing up BERT.force_encoding to support it (both
the Ruby decoder and the C decoder have been modified to call
BERT.force_encoding when necessary).

I also added a few other things to support Ruby 1.9 (like shielding
Fixnum/Bignum type coercion and doing byte-level comparision in the
unit tests).

See my BERT and BERT-RPC forks for the changes:
http://github.com/kenpratt/bert
http://github.com/kenpratt/bertrpc

Thanks!

-Ken

Reply all
Reply to author
Forward
0 new messages