unicode ipa validator

4 views

Skip to first unread message

scott farrar

unread,

Jul 29, 2009, 5:49:36 PM7/29/09

to el...@googlegroups.com, S. Moran

Note: I'm using the eltk list now:

So, take a look at the table in utils.CharConverter. Here's an entry:

'ɪ':['lax close front unrounded','I','\ic','I'],

Maybe you could leverage the 'name' entry in position 0 of the list.

Otherwise, you may want to read in the OATS/PHOIBLE onto and get some work out of it, and/or a reasoner.

--Scott

On Wed, Jul 29, 2009 at 2:35 PM, S. Moran <st...@u.washington.edu> wrote:

i suppose that's one way. check the encoding of the data (but that could be various flavors of unicode, so maybe just narrow it down to utf-8 at this point). if bad encoding, throw warning. the problem about comparing each character to the IPA set, you have that it seems (but with out knowledge about what is a consonant, what is a vowel, what is a diacritic, is there other shit). And then since most segments seem to be multi-character, i think we'll need some algorithm to compare:

C + [diacrtics]* # a consonant and any # of diacritics (but how do we handle consonant clusters if they're phonemic? V + [diacritics]*

and possible others.

On Wed, 29 Jul 2009, scott farrar wrote:

Nope, not there yet.

Have at it! How are you doing it:

-detect encoding first, then compare each char. to the IPA set?

On Wed, Jul 29, 2009 at 1:48 PM, S. Moran <st...@u.washington.edu> wrote:

i need a method to validate whether or not input is in correct unicode ipa. there's not such a thing in char converter,
correct?

Reply all

Reply to author

Forward

0 new messages