Note: I'm using the eltk list now:
So, take a look at the table in utils.CharConverter. Here's an entry:
'ɪ':['lax close front unrounded','I','\ic','I'],
Maybe you could leverage the 'name' entry in position 0 of the list.
Otherwise, you may want to read in the OATS/PHOIBLE onto and get some work out of it, and/or a reasoner.
--Scott
On Wed, Jul 29, 2009 at 2:35 PM, S. Moran
<st...@u.washington.edu> wrote:
i suppose that's one way. check the encoding of the data (but that could be various flavors of unicode, so maybe just narrow it down to utf-8 at this point). if bad encoding, throw warning. the problem about comparing each character to the IPA set, you have that it seems (but with out knowledge about what is a consonant, what is a vowel, what is a diacritic, is there other shit). And then since most segments seem to be multi-character, i think we'll need some algorithm to compare:
C + [diacrtics]* # a consonant and any # of diacritics (but how do we handle consonant clusters if they're phonemic? V + [diacritics]*
and possible others.
On Wed, 29 Jul 2009, scott farrar wrote:
Nope, not there yet.
Have at it! How are you doing it:
-detect encoding first, then compare each char. to the IPA set?
On Wed, Jul 29, 2009 at 1:48 PM, S. Moran <st...@u.washington.edu> wrote:
i need a method to validate whether or not input is in correct unicode ipa. there's not such a thing in char converter,
correct?