Unicode surrogate code points as values

3 views
Skip to first unread message

Jesse Millikan

unread,
Apr 10, 2013, 2:59:28 PM4/10/13
to lamb...@googlegroups.com
The 1 code-point string "\ud8f0" (or any single surrogate code point) is not "valid" unicode, as that code point does not represent a character alone. A few tests are based on this. (e.g. one in modules/unittest/test_float.py) This string is an allowed value in Python, where a string is a list of unicode code points, but not in Racket, where a string is a list of unicode characters. Thus, an error occurs both places where the respective parsers try to make them racket strings.

Sorry if you've discussed this already; I didn't find it in a quick search of the group. Anyhow, if the native parser is to be merged anytime soon, it will probably have to be allowed to error on those test files the same as the Python parser strategy does now.

Joe Gibbs Politz

unread,
Apr 10, 2013, 3:24:33 PM4/10/13
to lamb...@googlegroups.com
I think that error is fine for now. Eventually the right solution is
to pick a representation of Python strings that isn't Racket strings,
but that doesn't have to happen right away.

We haven't discussed this before, so thanks for bringing it up!
> --
> You received this message because you are subscribed to the Google Groups
> "lambda-py" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to lambda-py+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
Reply all
Reply to author
Forward
0 new messages