UTF-8 test and patch

49 views
Skip to first unread message

Jeremy Leader

unread,
Sep 7, 2011, 2:00:21 PM9/7/11
to protobu...@googlegroups.com, Theron Stanford
Hi, the list has been quiet for a while!

My co-worker, Theron Stanford, discovered that protobuf-perlxs doesn't correctly
preserve the UTF-8 flag on deserialized string fields. The protobuf spec
(http://code.google.com/apis/protocolbuffers/docs/proto.html#scalar) says that
"A string must always contain UTF-8 encoded or 7-bit ASCII text"; so we believe
that it's appropriate to upgrade non-UTF-8 Perl strings to UTF-8 before
serializing, and to set the UTF-8 flag after deserializing.

We wrote a test case (utf8test.tgz) that tests all the types of string fields
(required, optional, repeating) and all the different accessors (set_*, add_*,
copy_from, to_hashref) on UTF-8, Latin-1, and plain ASCII strings.

Currently, 8 of the tests (all the ones with UTF-8 input) fail. We've created a
patch (protobuf-perlxs-1.1-utf8_fix.patch) that does the following:

- for repeated and non-repeated string fields, and for copying to a string
field from a hashref, if the input is a non-UTF-8 string, copy it, call
sv_utf8_upgrade on the copy, and serialize from the upgraded string

- for all string fields, when deserializing, call SvUTF8_on to set the UTF-8 flag.

Any comments, suggestions, complaints? Anyone else using protobuf-perlxs?

--
Jeremy Leader
jle...@oversee.net

protobuf-perlxs-1.1-utf8_fix.patch
utf8test.tgz

Dave Bailey

unread,
Sep 9, 2011, 5:28:39 PM9/9/11
to protobu...@googlegroups.com
sorry, meant to send this to the list as well:

hi jeremy and theron,

thanks for the patch, i will get it into the next release.  i have a few other issues to fix, and have been meaning to add support for extensions.  i hope to get this out this month.

-dave



--
You received this message because you are subscribed to the Google Groups "Protocol Buffers for Perl/XS" group.
To post to this group, send email to protobuf-perlxs@googlegroups.com.
To unsubscribe from this group, send email to protobuf-perlxs+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf-perlxs?hl=en.



Reply all
Reply to author
Forward
0 new messages