Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Symbol Characters

342 views
Skip to first unread message

Eric

unread,
May 3, 2004, 2:00:46 PM5/3/04
to
I'm trying to extract all special characters from a
document and transform them to unicode. This seems fairly
easy to do with upper-ascii characters in Times, Times New
Roman, Arial or other "standard" fonts. Using AscW
*should* return an accurate unicode value.

However, how do I search and find symbol font characters
and convert them to unicode? I have found no reference to
doing this, and the AscW function doesn't seem to
recognize the difference that a character is in symbol
font.

Thanks!
Eric

Klaus Linke

unread,
May 3, 2004, 2:37:53 PM5/3/04
to
Hi Eric,

> I'm trying to extract all special characters from a
> document and transform them to unicode. This seems fairly
> easy to do with upper-ascii characters in Times, Times New
> Roman, Arial or other "standard" fonts. Using AscW
> *should* return an accurate unicode value.

The characters are Unicode already; I guess your macro replaces them with
some ASCII tags/entities "…" or such?
AscW *does* return the code.

> However, how do I search and find symbol font characters
> and convert them to unicode? I have found no reference to
> doing this, and the AscW function doesn't seem to
> recognize the difference that a character is in symbol
> font.

Word can't tell which characters are in some symbol font. Some symbols may
not even have a representation in Unicode.
So Word uses codes in the code page above &HF000 for them (which is in a
"private use" range that programs/users can use for their own purposes).

Another problem is that Word protects symbols from symbol fonts that users
have inserted from "Insert > Symbol" against changes (so they don't get
messed up if you change the font or style).
On the downside, that means that Word won't tell you the font that was
used, and AscW will report a code of 40 = "(".

The latter problem can be cured with the macro below.

After you have run the macro, you'll get the code for symbols from symbol
fonts. But this isn't enough information to identify the symbol, since all
symbol fonts use the same code block -- you'll need the font name too.

For some symbol fonts such as "Symbol" or "Zapf Dingbats", you'll find the
mappings to Unicode on the unicode.org ftp server, or on the CD in the
Unicode Standard book in the "Mappings" folder. For "Symbol", you'll find a
macro here:
http://www.google.com/groups?selm=a7bvbk%24dm%2402%241%40news.t-online.com&rnum=4

Regards,
Klaus

Sub SymbolsUnprotect()
'
Dim SelFont, SelCharNum

Selection.Collapse (wdCollapseStart)
Selection.Find.ClearFormatting
With Selection.Find
.Text = "[" & ChrW(61472) & "-" & ChrW(61695) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
With Dialogs(wdDialogInsertSymbol)
SelFont = .Font
SelCharNum = .CharNum
End With

Selection.Font.Name = SelFont
Selection.TypeText Text:=ChrW(SelCharNum)

' replace the last 2 lines with the following to
' protect symbols from decorative fonts:
' Selection.InsertSymbol _
' Font:=SelFont, _
' CharacterNumber:=SelCharNum, _
' Unicode:=True

Wend
End Sub

"Eric" <anon...@discussions.microsoft.com> wrote:
> I'm trying to extract all special characters from a
> document and transform them to unicode. This seems fairly
> easy to do with upper-ascii characters in Times, Times New
> Roman, Arial or other "standard" fonts. Using AscW
> *should* return an accurate unicode value.
>
>

> Thanks!
> Eric


Eric

unread,
May 4, 2004, 3:54:42 PM5/4/04
to
Klaus,

The macro you included here and the one to which you
referred me at Google worked wonderfully. Danke!

One more question - is the macro to unprotect symbols
necessary with Word 2003? I don't see the shaded field
designator around inserted symbols in Word 2003, and the
SymbolToUnicode macro worked fine in Word 2003 without
having to unprotect the inserted symbols. I just want to
make sure I'm not missing anything.

Thanks,
Eric

>-----Original Message-----
>Hi Eric,
>
>> I'm trying to extract all special characters from a
>> document and transform them to unicode. This seems
fairly
>> easy to do with upper-ascii characters in Times, Times
New
>> Roman, Arial or other "standard" fonts. Using AscW
>> *should* return an accurate unicode value.
>
>The characters are Unicode already; I guess your macro
replaces them with

>some ASCII tags/entities "." or such?

>.
>

Klaus Linke

unread,
May 6, 2004, 9:06:26 AM5/6/04
to
Hi Eric,

The macro to "unprotect" the symbols is only necessary if symbols from
"decorative" fonts (Symbol, Wingdings, ...) have been inserted from the
"Insert > Symbol" dialog.

That goes for all versions, AFAIK.

One thing you might still miss are symbol *fields*. Those can be inserted
by a user or by some (WP) conversion programs; you'd need to unlink them (=
turn them into regular characters), or write a special macro to deal with
them.

Greetings,
Klaus

0 new messages