Wow. Thanks for that Lukhnos!
> On Nov 9, 12:25 am, Eric Rasmussen <keras
...@mac.com> wrote:
> > This has been changed somewhat in Leopard -- we've already seen what
> > may be a fairly serious bug possibly related to wrongly assigning the
> > ".cin" extension to a file that should have the ".inputplugin"
> > extension. My current thinking is that ".inputplugin" is for the Apple
> > format that used to be converted into a ".dat" file by the Input
> > Method Plug-in Converter. But I don't know anything about what
> > constitutes a valid ".cin" format file -- how are they different from
> > the Apple format? Do they really work in Leopard? Or are there
> > limitations?
> I'll begin with some history that I know.
> .cin was first introduced by Xcin, an input method framework for X11
> developed in the mid 1990s, as a data format for table-based input
> methods. By table-based I mean input methods that can be implemented,
> or seen, as a table look-up mechanism. Around 90% of input methods
> (Chinese and beyond) can be implemented that way. Apple's .inputplugin
> also belongs to that category. Almost every mainstream input method
> framework supports at least one form of user-customizable IME
> creation. .cin seems to have become one of the standard data formats
> because it's simple and many user-generated tables are already in wide
> circulation.
> I have very limited knowledge of Xcin and other frameworks, but in the
> early days, .cin was intended as a source format, not to be consumed
> directly by input method framework (or more precisely, the table-based
> input method "generator"). Also back then a .cin could use any
> encoding recognized by the framework. So phone.cin (renamed to
> bpmf.cin in OV) was encoded in Big5, pinyin.cin in GB, and so on.
> When we were developing the "generic" module (first named OVIMXcin,
> later renamed to OVIMGeneric) to support .cin in OpenVanilla, we made
> two decisions: first, we no longer require user to run a compiler/
> converter to make .cin into a binary format, as it was so, which means
> the .cin is consumed by the input method module directly. Second,
> all .cin files must use UTF-8 encoding. This opened the door to bigger
> character set and the famous "♨" input method.
> So what constitutes a valid .cin file? For OpenVanilla, a .cin file
> consists of three sections:
> 1. a header consisting of directives beginning with "%", like %ename,
> %selkey, %endkey. Some of them are like meta-data, some of them are
> controlling directives.
> 2. a keyname block between the directives "%keyname begin" and
> "%keyname end". This tells the generic input method to map the key
> typed to a character displayed in the composing stage (mostly to
> represent radicals in radcial-based input methods).
> 3. a chardef block between the directives "%chardef begin" and
> "%chardef end". This is the body of the data table. "chardef" is
> somewhat an anachronistic misnomer. It used to define the relationship
> between key sequences to characters (hence the name), but modern
> implementations like OV and gcin allow phrases in this block.
> Different frameworks have implemented the details somewhat
> differently. OV's implementation disallows the use of Windows-style CR
> LF (so only the UNIX-style \n is used, and that's also what OS X
> uses), and comment lines (beginning with #) is not allowed in the
> chardef block.
> Although .cin contains enough information for key-character/phrase
> mapping, but many input methods (like Cangjei/"Changjei" or Simplex/
> Jianyi) require finer control. For OpenVanilla, the control is
> provided in the form of input method preferences (with some mind-
> bogging names like "force composition when reaching maximum length of
> radical" or "use space to select the 1st candidate). Different input
> methods require different controls (and those are a must -- failure to
> provide those controls yields barely usable input methods). gcin
> differs from OV's implementation in that it allows those control
> directives to be expressed as a .cin header, with its own directive
> extensions.
> OpenVanilla's repository of .cin is available at:
> http://openvanilla.googlecode.com/svn/trunk/Modules/SharedData/
> Zonble has written an excellent tutorial (in Chinese) on how to create
> your own input method by writing up a .cin, which is kind of standard
> text now:
> http://docs.google.com/View?docid=ah6d8th954vw_201fd5dkx
> Technically .cin is really just a set of key-value pairs with its own
> convention. OV makes heavy use of .cin as a format. Things like
> reverse radical/pinyin lookup or associated phrases are also done
> with .cin-based data tables. I see it a good sign that Apple adopts a
> popular (and mostly consistent and cross-framework compatible) data
> format for Leopard.
> So what about Leopard? As far as I know, dropping in a UTF-8-
> encoded .cin into ~/Library/Input Methods or /Library/Input Methods
> then re-login just works. A new input method, using the name defined
> in the .cin, shows up in the Input Menu tab of the International
> preferences panel. I'm not aware of any per-method level control so
> far (I might be very ignorant on this).
> In terms of limitation, I'm not aware of that either. OV's own
> implementation (and many others) is only limited by memory and your
> patience (loading a .cin with 200,000 entries on a G3 is no small
> thing; a database-backed design will solve the problem). Leopard's own
> take should not differ much. So it should be very flexible and easily
> customizable.
> d.