Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to convert typing information?

0 views
Skip to first unread message

Andrew

unread,
Oct 28, 2003, 11:16:18 AM10/28/03
to
Hi, friends,

I am trying to understand the mechanism on how to
determine the candidate Chinese characters and display
them by using typing information.

For example, if I typed ba, how can find all Chinese
charaters pronounced as ba from code page 936? Do I need
to prepare a separate table to associate each
pronounciation with all its candidate charaters in
advance? If yes, how to represent these candidate
characters in the table, using its address in the code
page? If yes, how to convert this address into a visual
character picture (bitmap?) and display it?

I know functions of MS IME can do this conversion. But,
what is the logic/algorithm behind it?

Thanks a lot.

Michael (michka) Kaplan [MS]

unread,
Oct 29, 2003, 1:05:33 PM10/29/03
to
Are you looking for some brainstorming on a good algorithm to use to do such
a lookup? The table format that pronunciation-based IMEs use is not really
documented AFAIK. Also, there is no Win32 API that provides this
information.

With that said, if there were a native speaker who did not mind looking at
20,000+ characters and figuring out where the different pronunciation
boundaries were, one could get the sort key values for each boundary case
and then be able to determine pronunciation by looking at the sort key (a
non-trivial but hardly impossible effort, one that I have done for customers
in the past with Korean).

Note that this would answer a different question ("How do you prononuce * ?"
rather than "How can I get all of the characters that have pronunciation
ba?").

Though to be honest the entire effort to do the above is of limited use
since there are so many characters with multiple pronunciations (and from
looking at other sources it is clear that the most common pronunciation is
not always what is present)....

--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with
no warranties, and confers no rights.


"Andrew" <anon...@discussions.microsoft.com> wrote in message
news:019f01c39d6e$d1a523d0$a101...@phx.gbl...

Andrew

unread,
Oct 29, 2003, 3:56:37 PM10/29/03
to
>Are you looking for some brainstorming on a good lgorithm
to use to do such a lookup?

Yes. But I also try to understand how the typing
information is converted into meaning character(s) and how
the character(s) is finally displayed on the screen with
the selected font type.

(I guess:
(1) First, one needs to build a table to associate typing
info with certain character(s), and this character(s) is
represented by its code and code page number in the table.
(2) By looking up this table using typing info, system
determines the character location(s) in the code page.
(3) Based on above location info, system open the selected
font .ttf file, moves pointer to the location(s), and
retrieves drawing/painting info.
(4) System then finds the correct window and position,
displays the character(s) according those drawing/painting
info.
But I am not sure.)

Michael (michka) Kaplan [MS]

unread,
Oct 31, 2003, 1:14:06 AM10/31/03
to
"Andrew" <anon...@discussions.microsoft.com> wrote...

> >Are you looking for some brainstorming on a good lgorithm
> to use to do such a lookup?
>
> Yes. But I also try to understand how the typing
> information is converted into meaning character(s) and how
> the character(s) is finally displayed on the screen with
> the selected font type.
>
> (I guess:
> (1) First, one needs to build a table to associate typing
> info with certain character(s), and this character(s) is
> represented by its code and code page number in the table.

I assume you mean pronunciation info for each character. Code pages do not
directly enter it, and all CJK languages contain more characters than their
respective "code pages".

> (2) By looking up this table using typing info, system
> determines the character location(s) in the code page.

Not sure what you mean here, but I think this is incorrect. It only ever
deals with choosing characters from its list of characters it believes to
valid candidates.

> (3) Based on above location info, system open the selected
> font .ttf file, moves pointer to the location(s), and
> retrieves drawing/painting info.

I think this is confusing an IME with what the rendering engine does -- a
very different topic, with a very different component piece in the system
managing it.

> (4) System then finds the correct window and position,
> displays the character(s) according those drawing/painting
> info.

Same as my comment to #3.

Note that the issues in #2 do not exist and the issues in #3 and #4 are not
related to the algorithm you would be considering for what an IME-like
component would do.


--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies

This posting is provided "AS IS" with

0 new messages