Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DBCS versus skip_backward

6 views
Skip to first unread message

Peter Gibbs

unread,
Nov 3, 2003, 5:17:52 AM11/3/03
to perl6-internals
Whilst attempting to implement DBCS encoding, I have discovered that
skip_backward cannot be implemented for this encoding style, due to the
mixture of 1-byte and 2-byte characters.

Some of the available options:
1) Throw an exception if somebody tries to skip_backward in a DBCS
string
2) Standardise on a single Unicode format for all internal string
processing
3) Convert all strings in DBCS encoding to another format, either always
or only when skip_backward is invoked
4) Pass additional context information to skip_backward, so it can fall
back to counting forward when required
5) Remove skip_backward completely
6) Do not support DBCS encoding
7) Create an index for DBCS strings (i.e. a map of character offset
versus byte offset) - this would also require that skip_backward
receive additional data

More options, preferences, comments, etc all welcome.

Regards
Peter Gibbs
EmKel Systems

Michael Scott

unread,
Nov 3, 2003, 7:26:24 AM11/3/03
to perl6-i...@perl.org
In an attempt to understand what the plan is with regard to ICU and
Parrot strings in general, I've been gathering together links to
previous bits of discussion on:


http://www.vendian.org/parrot/wiki/bin/view.cgi/Main/
ParrotDistributionUnicodeSupport

Obviously what is still needed is a Strings PDD. I wonder could we
write it interactively on the wiki?

Mike

0 new messages