Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Enumerating all canonically equivalent strings

2 views

Skip to first unread message

BobH

unread,

Jun 20, 2011, 5:51:12 PM6/20/11

to perl-u...@perl.org

Does there exist a standard module or function that, given a Combining
Character Sequence (or, more generally, an arbitrary Unicode text
string), will generate a list of all canonically equivalent strings?

For example, if given the character U+1EAD, I'd like to get back a list
of all these canonically equivalent sequences:

0061 0302 0323
0061 0323 0302
00E2 0323
1EA1 0302
1EAD

(I don't particularly care whether the interface is in terms of arrays
of USVs or utf strings.)

Some years ago I created such a module for my own use (I called it
Unicode::MakeEquivalents), and am now wondering whether there exists a
standard solution to this problem (so I can abandon my own stuff), or
whether I should pursue adding this functionality to CPAN somewhere.

Suggestions?

Bob

0 new messages