Unicode supplemental planes in Harbour

195 views
Skip to first unread message

KevinC

unread,
Oct 6, 2020, 7:42:40 AM10/6/20
to Harbour Developers
Surely Przemek and Viktor and everyone helping them know that
  • Unicode is more than a 16 bit encoding
  • its upper limit is 0x10FFFF, not 0xFFFF
  • it consists of 17 "planes", each of which contains 2^16 characters
  • the first plane, the Basic Multilingual Plane (BMP), has code points 0 to 0xFFFF
  • the other 16 planes, the supplemental planes, have code points 0x10000 to 0x10FFFF
  • the first supplemental plane, the Supplemental Multilingual Plane (SMP), has code points 0x10000 to 0x1FFFF
  • until recently only the BMP was normally used
  • the recent addition of emoticons like 😊 😲😔 to the SMP has made it much more popular
Why then do we see the following?
  • UTF8 for U+F60A <private-use-F60A> = e"\xEF\x98\x8A"
  • UTF8 for U+1F60A SMILING FACE WITH SMILING EYES = e"\xF0\x9F\x98\x8A"
  • HB_UTF8CHR(0x1F60A) --> e"\xEF\x98\x8A"
  • HB_NUMTOHEX(HB_UTF8ASC(HB_UTF8CHR(0x1F60A))) --> "F60A"
  • HB_NUMTOHEX(HB_UTF8ASC(e"\xF0\x9F\x98\x8A")) --> "F60A"
Is it a bug or a feature?

KevinC

unread,
Oct 6, 2020, 9:03:37 AM10/6/20
to Harbour Developers
A few years ago, I wrote a small library called UnicodeLib, written in pure Harbour, that has conversion functions written that handle all Unicode planes - http://kevincarmody.com/software/unicodelib.zip
Reply all
Reply to author
Forward
0 new messages