Dear Laurence,
Thank you very much for your report and all these details!
I sum up your majors points below (hoping I didn't forget one):
- show stress on syllable rather than the only vocalic nucleus;
- convert the output in lower case;
- add a 4th degree of stress: the open transition
- the more words, the more schwa/i/u and open transition there should be.
1. show stress on syllable rather than the only vocalic nucleus
It actually was our first option. To make this possible, I need to know where begins and where ends each syllable, like this : /æk.ˈsɛp.tə.bəl/.
The thing is, english Wiktionary gives syllabation info for only 1/5 of its transcriptions (20.499 out of 100.524 transcriptions of 58.038 words of the dictionary), and only 1/6 of the words (only 11.599 words has at least one syllabized transcription, out of 58.038).
If we had syllabation for all words (like French Wiktionnaire), we could do it, be in the present situation I don't know how to deal with the 5/6 of the words that aren't syllabized.
2. convert the output to lower case
No problem for that. We wanted to keep the case information to keep the output the closer we can to the real-life text form. It's good for French, but as we deal with font size in English, I probably should desactivate it and convert everything to lower by default.
3. add a 4th degree of stress: the open transition
I may be able to do that with the rule you gave to me (if there is one or more consonant before or after a schwa/i/u in one syllable).
But again, problem comes from syllabation information that I have for only a little part of the words.
A solution could be passing all schwa/i/u in this situation to open transition, and let it as it is for the rest till the user indicates syllabation in the collaborative dictionary (if he wants to).
Question: should we represent open transition vowel with a dot? We'd have no more information about how to write the word. But maybe knowing how to READ is the only objective of WikiColor??
4. the more words, the more schwa/i/u and open transition there should be
This is something I should be able to do. Giving priority to shorten or schwaed versions. For now, WikiColor suggests transcriptions in the order of the dictionary (putting non-alignable transcription at the end of the list).
If someone knows about a free resource that gives syllabation of words, or if someone knows some rules of syllabation we could implement, let us know!
sylvain