Currently, capitalization handling is hard-coded in C, and is English-centric.
In addition, there are two regexes that match capital words (CAPITALIZED-WORDS and PL-CAPITALIZED-WORDS). Because (at least currently) only one regex can match (the first one that matches), the regex suffix guessing doesn't work for capital words.
My idea is to shift the handling of capitalized words from the domain of hard-coded rules in C, to the domain of the LG rules.
Requirements:
- Minimal handling by program code - the rest will be done by the LG rules.
- Flexibility - as less as possible language dependency.
To that end I propose to consider a capitalized word as composed of a non-capitalized one that has an initial "virtual" null morpheme that signify its capitalization.
In order that the LG rules will be able to select the proper word meaning, two alternatives are to be generated:
Input word: Qwerty
alt1: nonCAP.ZZZ qwerty
alt2: 1stCAP.ZZZ qwerty
If the word is all capitals (to be used by languages that use such words), maybe:
alt2: allCAP.ZZZ qwerty
In the English dictionary, the LEFT-WALL, colon and ballets will have the proper connectors for selecting the appropriate form of the word. The ZZZ null morphemes will get discarded after the linkage step. However, the program will re-capitalize the words appropriately for display if needed.
I'm weak in LG rules so this proposal doesn't include a suggestion in that regard, but hopefully it can be done.
Amir