Hyphenation errors with inflected word forms

15 views
Skip to first unread message

to...@bluecanoelearning.com

unread,
Jan 4, 2018, 2:57:13 PM1/4/18
to Wordnik API
I'm seeing many cases where the hyphenation for a root word is correct, but an inflected form (e.g. -ed) returns the entire word with no hyphenation. Setting "useCanonical" to "true" doesn't seem to help.

Some examples:

confuse & confused
overwhelm & overwhelmed
annoy & annoyed

This seems to be incredibly common and turns out to be a huge problem for my usage scenario.

I can make a workaround that will solve the problem for some percentage of cases, but if the entire word is being returned without hyphenation because of a lookup failure, it would really be better to return *no* result than an incorrect one.

Thanks,
Tony

Erin McKean

unread,
Jan 5, 2018, 12:45:50 PM1/5/18
to Wordnik API
Thanks Tony!

Unfortunately we aren't doing algorithmic hyphenation yet, only lookups from a dataset. So inflected forms that aren't included have a null result (which as you see returns the unhyphenated form). 

We're looking into expanding this dataset for the next version of the API and adding some algorithmic entries (possibly with confidence measurements) but I don't have a date for that yet.

If you can share some more information about your use case (either here, or directly to api...@wordnik.com) I may be able to suggest some alternatives!

Thanks again!

Erin
---------------------
Erin McKean
Wordnik
@emckean/@wordnik/@wordnikapi
the Wordnik mission: every English word, available to everyone, everywhere

PS Help support Wordnik by adopting your favorite word today! https://www.wordnik.com/adoptaword

to...@bluecanoelearning.com

unread,
Jan 5, 2018, 1:43:53 PM1/5/18
to Wordnik API
Thanks for the reply, Erin. The problem with the hyphenation API is that for the inflected forms, it returns not an empty result, but a *wrong* one. Here's the response for the word "confused":

[ { "text": "confused", "seq": 0 } ]

If the response were an empty array, then it would be clear that I need to take other steps, like using the "related words" API to find the root form, etc. But since it returns an incorrect result, I now have to be suspicious of any response that returns just a single syllable.

My scenario is a custom dictionary UX based on the "color vowel" teaching methodology. Each word in the dictionary is color-coded based on the vowel sound of the stressed syllable in the word. Also, I want to underline the stressed vowel in the word. The pronunciation API gives me the vowel sound of the stressed syllable, but the hyphenation API is critical for being able to underline the correct letters in the stressed syllable.

In cases where I can't get correct hyphenation for an inflected form, I'll have to instead show users the root form, which is okay (but not great). But the current behavior of the hyphenation API makes it hard, since any single-syllable response is possibly wrong.

It sounds like the fix might be a one-line change somewhere in the API - to just return an empty array rather than the entire word as a single syllable.

Thanks,
Tony

Erin McKean

unread,
Jan 7, 2018, 12:01:17 PM1/7/18
to Wordnik API
Ah, thanks Tony -- I see the problem now.

Unfortunately, we're not able to update the API at this point, but I hope we'll have a beta version of the new API up for testing soon. :(

This sounds like a great use case, though!

On Thursday, January 4, 2018 at 11:57:13 AM UTC-8, to...@bluecanoelearning.com wrote:
Reply all
Reply to author
Forward
0 new messages