Duncan Roe
unread,Jul 29, 2025, 7:04:49 AMJul 29Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mlug
Hi Everyone,
Following on from last night's hazy binary arithmetic:
To recap: UTF-16 characters stand for themslves as Unicode code points up to
64K, except for the surrogate range, which can encode up to 1M (the top 32K of
these encodings is reserved for private use).
Well-formed UTF-16 always contains surrogates in pairs, high followed by low.
| first High Surrogate U+D800 1101 1000 0000 0000
| last High Surrogate U+DBFF 1101 1011 1111 1111
| first Low Surrogate U+DC00 1101 1100 0000 0000
| last Low Surrogate U+DFFF 1101 1111 1111 1111
All High Surrogates start 110110 and all Low Surrogates start 110111, i.e 6
bits, leaving 10 bits for data giving 20 bits of data in a pair.
Cheers ... Duncan.