Saving (and reusing) Unicode strings

Frederick H. Bartlett

unread,

Jan 17, 2002, 6:47:55 PM1/17/02

to

For example, given a document beginning with this paragraph:

This is Greek: biblios

where "biblio" are ordinary Greek characters and "s" is the variant
sigma, I can access ActiveDocument.Paragraphs(1).Range.Text and save
it to, say, myString. However, if I add a new paragraph and set its
Text to myString, "biblios" will show only as boxes -- that is, as
unknown Unicode characters. If I then select the boxes and change the
font to "Symbol", I get "biblio" back, but the sigma ends up as a ")".

Now, there's got to be a way to save a Unicode string from Word in
such a way that one can figure out what it's made of and then reuse
it. After all, if I select the text, I can copy it accurately.

What does vb do to Unicode strings that fouls them up?

Thanks!
Fred

Klaus Linke

unread,

Jan 18, 2002, 1:33:13 AM1/18/02

to

Hi Fred,

> What does vb do to Unicode strings that fouls them up?

Nothing; the problem is you don't have a Unicode string to start with,
because "Symbol" isn't a Unicode font.
Old fonts like that (which are around a lot longer than the definition
of Unicode) are called "decorative" fonts.

What's more, if you insert a character from a decorative font with
"Insert > Symbol", Word will really insert some kind of pointer to the
definition of that character (with the real font and code
information); this pointer shows up as ")" in an Unicode string. This
"protection" prevents that the font is changed for those protected
symbols.

Probably you inserted most greek characters from the keyboard, but the
variant sigma from the dialog.

I would recommend to use greek characters from the greek subset
("real" greek Unicode) instead of the "Symbol" font. Install a greek
keyboard to type the characters, and use the Microsoft Visual Keyboard
to show which characters are on which key.

To help you change existing text to Unicode I append two macros.

SymbolsUnprotect replaces the "protected" symbols from decorative
fonts -- those that show up as ")" -- with regular characters.

SymbolToUnicode replaces any character from the "Symbol" font with the
corresponding Unicode character.
The list I used is from www.unicode.org:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt

Some fonts do not contain all those Unicode characters, so you may see
some square boxes after you run the conversion macro. In that case,
you can change the font to a bigger font that does contain them
(Lucida Sans Unicode, Verdana Ref, or Arial Unicode MS, for example).

Greetings, Klaus

Sub SymbolsUnprotect()
'
Dim SelFont, SelCharNum

Selection.Collapse (wdCollapseStart)
Selection.Find.ClearFormatting
With Selection.Find
.text = "[" & ChrW(61472) & "-" & ChrW(61695) & "]"
.Replacement.text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
With Dialogs(wdDialogInsertSymbol)
SelFont = .Font
SelCharNum = .CharNum
End With

Selection.Font.Name = SelFont
Selection.TypeText text:=ChrW(SelCharNum)

' replace the last 2 lines with the following to
' ptotect symbols from decorative fonts:
' Selection.InsertSymbol _
' Font:=SelFont, _
' CharacterNumber:=SelCharNum, _
' Unicode:=True

Wend
End Sub

Sub SymbolToUnicode()
Dim myFont As String
Dim myCharNum As Long
Dim myRange As Range
Dim myChar As Range
Dim i As Long, CharCount As Long
Set myRange = Selection.Range.Duplicate
CharCount = myRange.ComputeStatistics(wdStatisticCharacters)
For Each myChar In myRange.Characters
i = i + 1
StatusBar = Format(100 * i / CharCount, "###") & "%"
If myChar.Font.Name = "Symbol" Then
myCharNum = AscW(myChar.text) And &HFFFF&
' Decorative Fonts are mapped to a
' "private use" code page starting at &HF000
myCharNum = myCharNum - &HF000&
myChar.Font.Name = myChar.Style.Font.Name
Select Case myCharNum
Case &H22 ' # FOR ALL
myChar.text = ChrW(&H2200)
Case &H24 ' # THERE EXISTS
myChar.text = ChrW(&H2203)
Case &H27 ' # CONTAINS AS MEMBER
myChar.text = ChrW(&H220B)
Case &H2A ' # ASTERISK OPERATOR
myChar.text = ChrW(&H2217)
Case &H2D ' # MINUS SIGN
myChar.text = ChrW(&H2212)
Case &H40 ' # APPROXIMATELY EQUAL TO
myChar.text = ChrW(&H2245)
Case &H41 ' # GREEK CAPITAL LETTER ALPHA
myChar.text = ChrW(&H391)
Case &H42 ' # GREEK CAPITAL LETTER BETA
myChar.text = ChrW(&H392)
Case &H43 ' # GREEK CAPITAL LETTER CHI
myChar.text = ChrW(&H3A7)
Case &H44 ' # GREEK CAPITAL LETTER DELTA
myChar.text = ChrW(&H394)
Case &H44 ' # INCREMENT
myChar.text = ChrW(&H2206)
Case &H45 ' # GREEK CAPITAL LETTER EPSILON
myChar.text = ChrW(&H395)
Case &H46 ' # GREEK CAPITAL LETTER PHI
myChar.text = ChrW(&H3A6)
Case &H47 ' # GREEK CAPITAL LETTER GAMMA
myChar.text = ChrW(&H393)
Case &H48 ' # GREEK CAPITAL LETTER ETA
myChar.text = ChrW(&H397)
Case &H49 ' # GREEK CAPITAL LETTER IOTA
myChar.text = ChrW(&H399)
Case &H4A ' # GREEK THETA SYMBOL
myChar.text = ChrW(&H3D1)
Case &H4B ' # GREEK CAPITAL LETTER KAPPA
myChar.text = ChrW(&H39A)
Case &H4C ' # GREEK CAPITAL LETTER LAMDA
myChar.text = ChrW(&H39B)
Case &H4D ' # GREEK CAPITAL LETTER MU
myChar.text = ChrW(&H39C)
Case &H4E ' # GREEK CAPITAL LETTER NU
myChar.text = ChrW(&H39D)
Case &H4F ' # GREEK CAPITAL LETTER OMICRON
myChar.text = ChrW(&H39F)
Case &H50 ' # GREEK CAPITAL LETTER PI
myChar.text = ChrW(&H3A0)
Case &H51 ' # GREEK CAPITAL LETTER THETA
myChar.text = ChrW(&H398)
Case &H52 ' # GREEK CAPITAL LETTER RHO
myChar.text = ChrW(&H3A1)
Case &H53 ' # GREEK CAPITAL LETTER SIGMA
myChar.text = ChrW(&H3A3)
Case &H54 ' # GREEK CAPITAL LETTER TAU
myChar.text = ChrW(&H3A4)
Case &H55 ' # GREEK CAPITAL LETTER UPSILON
myChar.text = ChrW(&H3A5)
Case &H56 ' # GREEK SMALL LETTER FINAL SIGMA
myChar.text = ChrW(&H3C2)
Case &H57 ' # GREEK CAPITAL LETTER OMEGA
myChar.text = ChrW(&H3A9)
Case &H57 ' # OHM SIGN
myChar.text = ChrW(&H2126)
Case &H58 ' # GREEK CAPITAL LETTER XI
myChar.text = ChrW(&H39E)
Case &H59 ' # GREEK CAPITAL LETTER PSI
myChar.text = ChrW(&H3A8)
Case &H5A ' # GREEK CAPITAL LETTER ZETA
myChar.text = ChrW(&H396)
Case &H5C ' # THEREFORE
myChar.text = ChrW(&H2234)
Case &H5E ' # UP TACK
myChar.text = ChrW(&H22A5)
Case &H60 ' # RADICAL EXTENDER
myChar.text = ChrW(&HF8E5)
Case &H61 ' # GREEK SMALL LETTER ALPHA
myChar.text = ChrW(&H3B1)
Case &H62 ' # GREEK SMALL LETTER BETA
myChar.text = ChrW(&H3B2)
Case &H63 ' # GREEK SMALL LETTER CHI
myChar.text = ChrW(&H3C7)
Case &H64 ' # GREEK SMALL LETTER DELTA
myChar.text = ChrW(&H3B4)
Case &H65 ' # GREEK SMALL LETTER EPSILON
myChar.text = ChrW(&H3B5)
Case &H66 ' # GREEK SMALL LETTER PHI
myChar.text = ChrW(&H3C6)
Case &H67 ' # GREEK SMALL LETTER GAMMA
myChar.text = ChrW(&H3B3)
Case &H68 ' # GREEK SMALL LETTER ETA
myChar.text = ChrW(&H3B7)
Case &H69 ' # GREEK SMALL LETTER IOTA
myChar.text = ChrW(&H3B9)
Case &H6A ' # GREEK PHI SYMBOL
myChar.text = ChrW(&H3D5)
Case &H6B ' # GREEK SMALL LETTER KAPPA
myChar.text = ChrW(&H3BA)
Case &H6C ' # GREEK SMALL LETTER LAMDA
myChar.text = ChrW(&H3BB)
Case &H6D ' # MICRO SIGN
myChar.text = ChrW(&HB5)
Case &H6D ' # GREEK SMALL LETTER MU
myChar.text = ChrW(&H3BC)
Case &H6E ' # GREEK SMALL LETTER NU
myChar.text = ChrW(&H3BD)
Case &H6F ' # GREEK SMALL LETTER OMICRON
myChar.text = ChrW(&H3BF)
Case &H70 ' # GREEK SMALL LETTER PI
myChar.text = ChrW(&H3C0)
Case &H71 ' # GREEK SMALL LETTER THETA
myChar.text = ChrW(&H3B8)
Case &H72 ' # GREEK SMALL LETTER RHO
myChar.text = ChrW(&H3C1)
Case &H73 ' # GREEK SMALL LETTER SIGMA
myChar.text = ChrW(&H3C3)
Case &H74 ' # GREEK SMALL LETTER TAU
myChar.text = ChrW(&H3C4)
Case &H75 ' # GREEK SMALL LETTER UPSILON
myChar.text = ChrW(&H3C5)
Case &H76 ' # GREEK PI SYMBOL
myChar.text = ChrW(&H3D6)
Case &H77 ' # GREEK SMALL LETTER OMEGA
myChar.text = ChrW(&H3C9)
Case &H78 ' # GREEK SMALL LETTER XI
myChar.text = ChrW(&H3BE)
Case &H79 ' # GREEK SMALL LETTER PSI
myChar.text = ChrW(&H3C8)
Case &H7A ' # GREEK SMALL LETTER ZETA
myChar.text = ChrW(&H3B6)
Case &H7E ' # TILDE OPERATOR
myChar.text = ChrW(&H223C)
Case &HA0 ' # EURO SIGN
myChar.text = ChrW(&H20AC)
Case &HA1 ' # GREEK UPSILON WITH HOOK SYMBOL
myChar.text = ChrW(&H3D2)
Case &HA2 ' # PRIME
myChar.text = ChrW(&H2032)
Case &HA3 ' # LESS-THAN OR EQUAL TO
myChar.text = ChrW(&H2264)
Case &HA4 ' # FRACTION SLASH
myChar.text = ChrW(&H2044)
Case &HA4 ' # DIVISION SLASH
myChar.text = ChrW(&H2215)
Case &HA5 ' # INFINITY
myChar.text = ChrW(&H221E)
Case &HA6 ' # LATIN SMALL LETTER F WITH HOOK
myChar.text = ChrW(&H192)
Case &HA7 ' # BLACK CLUB SUIT
myChar.text = ChrW(&H2663)
Case &HA8 ' # BLACK DIAMOND SUIT
myChar.text = ChrW(&H2666)
Case &HA9 ' # BLACK HEART SUIT
myChar.text = ChrW(&H2665)
Case &HAA ' # BLACK SPADE SUIT
myChar.text = ChrW(&H2660)
Case &HAB ' # LEFT RIGHT ARROW
myChar.text = ChrW(&H2194)
Case &HAC ' # LEFTWARDS ARROW
myChar.text = ChrW(&H2190)
Case &HAD ' # UPWARDS ARROW
myChar.text = ChrW(&H2191)
Case &HAE ' # RIGHTWARDS ARROW
myChar.text = ChrW(&H2192)
Case &HAF ' # DOWNWARDS ARROW
myChar.text = ChrW(&H2193)
Case &HB2 ' # DOUBLE PRIME
myChar.text = ChrW(&H2033)
Case &HB3 ' # GREATER-THAN OR EQUAL TO
myChar.text = ChrW(&H2265)
Case &HB4 ' # MULTIPLICATION SIGN
myChar.text = ChrW(&HD7)
Case &HB5 ' # PROPORTIONAL TO
myChar.text = ChrW(&H221D)
Case &HB6 ' # PARTIAL DIFFERENTIAL
myChar.text = ChrW(&H2202)
Case &HB7 ' # BULLET
myChar.text = ChrW(&H2022)
Case &HB8 ' # DIVISION SIGN
myChar.text = ChrW(&HF7)
Case &HB9 ' # NOT EQUAL TO
myChar.text = ChrW(&H2260)
Case &HBA ' # IDENTICAL TO
myChar.text = ChrW(&H2261)
Case &HBB ' # ALMOST EQUAL TO
myChar.text = ChrW(&H2248)
Case &HBC ' # HORIZONTAL ELLIPSIS
myChar.text = ChrW(&H2026)
Case &HBD ' # VERTICAL ARROW EXTENDER
myChar.text = ChrW(&HF8E6)
Case &HBE ' # HORIZONTAL ARROW EXTENDER
myChar.text = ChrW(&HF8E7)
Case &HBF ' # DOWNWARDS ARROW WITH CORNER LEFTWARDS
myChar.text = ChrW(&H21B5)
Case &HC0 ' # ALEF SYMBOL
myChar.text = ChrW(&H2135)
Case &HC1 ' # BLACK-LETTER CAPITAL I
myChar.text = ChrW(&H2111)
Case &HC2 ' # BLACK-LETTER CAPITAL R
myChar.text = ChrW(&H211C)
Case &HC3 ' # SCRIPT CAPITAL P
myChar.text = ChrW(&H2118)
Case &HC4 ' # CIRCLED TIMES
myChar.text = ChrW(&H2297)
Case &HC5 ' # CIRCLED PLUS
myChar.text = ChrW(&H2295)
Case &HC6 ' # EMPTY SET
myChar.text = ChrW(&H2205)
Case &HC7 ' # INTERSECTION
myChar.text = ChrW(&H2229)
Case &HC8 ' # UNION
myChar.text = ChrW(&H222A)
Case &HC9 ' # SUPERSET OF
myChar.text = ChrW(&H2283)
Case &HCA ' # SUPERSET OF OR EQUAL TO
myChar.text = ChrW(&H2287)
Case &HCB ' # NOT A SUBSET OF
myChar.text = ChrW(&H2284)
Case &HCC ' # SUBSET OF
myChar.text = ChrW(&H2282)
Case &HCD ' # SUBSET OF OR EQUAL TO
myChar.text = ChrW(&H2286)
Case &HCE ' # ELEMENT OF
myChar.text = ChrW(&H2208)
Case &HCF ' # NOT AN ELEMENT OF
myChar.text = ChrW(&H2209)
Case &HD0 ' # ANGLE
myChar.text = ChrW(&H2220)
Case &HD1 ' # NABLA
myChar.text = ChrW(&H2207)
Case &HD2 ' # REGISTERED SIGN SERIF
myChar.text = ChrW(&HF6DA)
Case &HD3 ' # COPYRIGHT SIGN SERIF
myChar.text = ChrW(&HF6D9)
Case &HD4 ' # TRADE MARK SIGN SERIF
myChar.text = ChrW(&HF6DB)
Case &HD5 ' # N-ARY PRODUCT
myChar.text = ChrW(&H220F)
Case &HD6 ' # SQUARE ROOT
myChar.text = ChrW(&H221A)
Case &HD7 ' # DOT OPERATOR
myChar.text = ChrW(&H22C5)
Case &HD8 ' # NOT SIGN
myChar.text = ChrW(&HAC)
Case &HD9 ' # LOGICAL AND
myChar.text = ChrW(&H2227)
Case &HDA ' # LOGICAL OR
myChar.text = ChrW(&H2228)
Case &HDB ' # LEFT RIGHT DOUBLE ARROW
myChar.text = ChrW(&H21D4)
Case &HDC ' # LEFTWARDS DOUBLE ARROW
myChar.text = ChrW(&H21D0)
Case &HDD ' # UPWARDS DOUBLE ARROW
myChar.text = ChrW(&H21D1)
Case &HDE ' # RIGHTWARDS DOUBLE ARROW
myChar.text = ChrW(&H21D2)
Case &HDF ' # DOWNWARDS DOUBLE ARROW
myChar.text = ChrW(&H21D3)
Case &HE0 ' # LOZENGE
myChar.text = ChrW(&H25CA)
Case &HE1 ' # LEFT-POINTING ANGLE BRACKET
myChar.text = ChrW(&H2329)
Case &HE2 ' # REGISTERED SIGN SANS SERIF
myChar.text = ChrW(&HF8E8)
Case &HE3 ' # COPYRIGHT SIGN SANS SERIF
myChar.text = ChrW(&HF8E9)
Case &HE4 ' # TRADE MARK SIGN SANS SERIF
myChar.text = ChrW(&HF8EA)
Case &HE5 ' # N-ARY SUMMATION
myChar.text = ChrW(&H2211)
Case &HE6 ' # LEFT PAREN TOP
myChar.text = ChrW(&HF8EB)
Case &HE7 ' # LEFT PAREN EXTENDER
myChar.text = ChrW(&HF8EC)
Case &HE8 ' # LEFT PAREN BOTTOM
myChar.text = ChrW(&HF8ED)
Case &HE9 ' # LEFT SQUARE BRACKET TOP
myChar.text = ChrW(&HF8EE)
Case &HEA ' # LEFT SQUARE BRACKET EXTENDER
myChar.text = ChrW(&HF8EF)
Case &HEB ' # LEFT SQUARE BRACKET BOTTOM
myChar.text = ChrW(&HF8F0)
Case &HEC ' # LEFT CURLY BRACKET TOP
myChar.text = ChrW(&HF8F1)
Case &HED ' # LEFT CURLY BRACKET MID
myChar.text = ChrW(&HF8F2)
Case &HEE ' # LEFT CURLY BRACKET BOTTOM
myChar.text = ChrW(&HF8F3)
Case &HEF ' # CURLY BRACKET EXTENDER
myChar.text = ChrW(&HF8F4)
Case &HF1 ' # RIGHT-POINTING ANGLE BRACKET
myChar.text = ChrW(&H232A)
Case &HF2 ' # INTEGRAL
myChar.text = ChrW(&H222B)
Case &HF3 ' # TOP HALF INTEGRAL
myChar.text = ChrW(&H2320)
Case &HF4 ' # INTEGRAL EXTENDER
myChar.text = ChrW(&HF8F5)
Case &HF5 ' # BOTTOM HALF INTEGRAL
myChar.text = ChrW(&H2321)
Case &HF6 ' # RIGHT PAREN TOP
myChar.text = ChrW(&HF8F6)
Case &HF7 ' # RIGHT PAREN EXTENDER
myChar.text = ChrW(&HF8F7)
Case &HF8 ' # RIGHT PAREN BOTTOM
myChar.text = ChrW(&HF8F8)
Case &HF9 ' # RIGHT SQUARE BRACKET TOP
myChar.text = ChrW(&HF8F9)
Case &HFA ' # RIGHT SQUARE BRACKET EXTENDER
myChar.text = ChrW(&HF8FA)
Case &HFB ' # RIGHT SQUARE BRACKET BOTTOM
myChar.text = ChrW(&HF8FB)
Case &HFC ' # RIGHT CURLY BRACKET TOP
myChar.text = ChrW(&HF8FC)
Case &HFD ' # RIGHT CURLY BRACKET MID
myChar.text = ChrW(&HF8FD)
Case &HFE ' # RIGHT CURLY BRACKET BOTTOM
myChar.text = ChrW(&HF8FE)
End Select
i = i - 1
End If
Next myChar
myRange.Select
StatusBar = "Finished changing ""Symbol"" font to Unicode"
End Sub

Frederick H. Bartlett

unread,

Jan 18, 2002, 9:42:13 AM1/18/02

to

Klaus,

Thanks for the macros; SymbolToUnicode looks especially useful.

I hope you don't mind a few more questions. I come to Word and VB from
Perl, Python, TeX, and PostScript, so I'm a little bit at sea. Until a
month or two ago, I had never used a wysiwyg word processor (and I'm
really missing Emacs and TeX right about now!).

My understanding is that Word (and VB) use Unicode internally. If that
be true, how could it be possible to create non-Unicode strings? More to
the point, why would the software let me? (OK, so strings get converted
to ANSI when they're passed to API functions ... but that's an even
crazier design decision.)

When I convert a string containing ansi text and greek (like the one
under discussion) to a byte array, I discover that the byte values are
ascii + 00 for the ansi stuff (where "a" is "61 00") and ascii + F0 for
the greek (so alpha is "61 F0"). And end-sigma is "28 00", or ")". (You
were entirely correct about the method I used to enter this stuff.) But
I can't even assign the byte array to a string variable and get back the
original information. Why are these operations not commutative? And if
&H2800& is a marker for a pointer, why can't I just access the pointer
and get what it points to? (Instead of, as in SymbolToUnicode, writing
an explicit translation table.) And how does it make sense to use a
single marker for an entire class of pointers rather than the pointers
themselves?

Is this stuff documented anywhere? I've been through several references
(Appleman, Cornell, Roman, Lomax) and Microsoft's MSDN site, and I
haven't found an explanation. Nor do I see -- at all -- how this system
makes sense. I did just find an article by Dave Rado which begins with
the incontrovertible assertion that "symbols ... are a _nightmare_ to
work with in Word 97 and above!"; this doesn't bode well.

I would like to be able to, say, parse a string and replace non-ascii
characters with entity references (α, etc.) -- and be able to take
entity references and turn them into characters Word understands. The
fact that I couldn't even copy a string using the Text property was
rather disconcerting! (And, no, I don't want to use Word's "save as
html" function: my goal is to translate Word docs to valid XML.)

Thanks for your help,
Fred

Klaus Linke

unread,

Jan 19, 2002, 2:04:22 AM1/19/02

to

Hi Fred,

> My understanding is that Word (and VB) use Unicode internally.
> If that be true, how could it be possible to create non-Unicode
> strings? More to the point, why would the software let me?

My previous explanation was so terse that it maybe became wrong.

Long before Unicode was invented, people used decorative fonts like
Zapf Dingbats, Wingdings or Symbol to put special characters into the
text. The fonts themselves don't contain any information about the
Unicode value of any given character (and with many fonts that contain
pictures of flowers or animals ... there isn't any Unicode glyph that
looks like the character).

So Word has no chance to convert text using those fonts to Unicode.
The decision to just prevent the use of those fonts in Word would have
been too drastic, because many people are using them, and expect to
continue exchanging texts containing those fonts with other programs
(DTP...).

The Unicode consortium offers a way out for such occasions. There are
"private use" areas within the 65536 codes, which can be used for
characters that don't conform to the standard.

Microsoft puts characters from decorative fonts into a "private use"
code page starting at &HF000.

Since there may be thousands of decorative fonts about, not every one
of them *can* get it's own code page; using the same code page for all
of them seems the only sensible design decision.

> (OK, so strings get converted to ANSI when they're passed to
> API functions ... but that's an even crazier design decision.)

I don't see it like that. When VBA was designed, just about all APIs
expected ANSI strings, so I think this, too, was a sensible and
practical design decision. I guess in the future (VB.Net), the default
will be Unicode.

> When I convert a string containing ansi text and greek (like
> the one under discussion) to a byte array, I discover that the
> byte values are ascii + 00 for the ansi stuff (where "a" is "61 00")
> and ascii + F0 for the greek (so alpha is "61 F0"). And end-sigma
> is "28 00", or ")". (You were entirely correct about the method
> I used to enter this stuff.)
> But I can't even assign the byte array to a string variable and
> get back the original information. Why are these operations not
> commutative?

Because all decorative fonts use the same ("private use") code page.
If you only use *one* decorative font, you can re-assign this font to
the characters from the code page &HF000-&HF0FF, and so make the
operation commutative.

> And if &H2800& is a marker for a pointer, why can't I
> just access the pointer and get what it points to?
> (Instead of, as in SymbolToUnicode, writing an explicit
> translation table.)

I think you are mixing two different problems:

-- The translation table is needed because the font itself (which is
older than the Unicode standard) doesn't contain the translation
table.

-- You can get the character and font for the "protected" character as
described in Dave's article -- I used the same method in my macro:

With Dialogs(wdDialogInsertSymbol)
SelFont = .Font

' ... "And &HFFFF&" converts the signed short
' integer to a positive long integer
SelCharNum = .CharNum And &HFFFF&
End With

This is rather slow and cumbersome ... but after you run the macro
"SymbolsUnprotect()" from my last post, you can access the font --
Character.Font.Name -- and code -- AscW(Character.Text) -- without
problems.

As every decorative font contains a different glyph at a given
position, you need the code *and* the font.

> Is this stuff documented anywhere? I've been through several
references
> (Appleman, Cornell, Roman, Lomax) and Microsoft's MSDN site, and I
> haven't found an explanation. Nor do I see -- at all -- how this
system

> I did just find an article by Dave Rado which begins with the
> incontrovertible assertion that "symbols ... are a _nightmare_ to
> work with in Word 97 and above!"; this doesn't bode well.

Dave Rado's article is the best article I know on this subject, but I
find this sentence very controvertible. As long as you don't use
decorative fonts, you will run into few problems.

There still are a few problems in Word2000 that you may encounter.

Examples:
- You have to look out for the control codes below 32 which Word uses
for special characters: ChrW(30) is used instead of ChrW(&H2011) for
non-breaking hyphens...
- "Paste Special" as unformatted Unicode text will change or remove
some characters like Chr(172), Chr(182) ... again probably because of
compatibility issues.
- Not all fonts contain all characters, so you may sometimes see boxes
for characters not contained in the font. And sometimes Word won't let
you change the font, because it falsely thinks that doing so would
produce boxes.

> I would like to be able to, say, parse a string and replace
non-ascii
> characters with entity references (α, etc.) -- and be able to
take
> entity references and turn them into characters Word understands.
> The fact that I couldn't even copy a string using the Text property
> was rather disconcerting! (And, no, I don't want to use Word's
> "save as html" function: my goal is to translate Word docs to valid
XML.)

We are doing the same in our shop, and use numeric entity references
(&#03B1; for the greek alpha...).

I use 4 macros, two for strings (fast), and two for documents (a lot
slower) to switch between entity references and Unicode characters;
the one replacing Unicode with entity references in documents gives a
warning if it encounters decorative fonts.

Hope that clears the alphabet soup a bit :-)

Klaus