Seeking a tool to assign semantic domains to our FLEx entries

31 views
Skip to first unread message

Dale Hoskins

unread,
Mar 18, 2026, 10:17:34 PM (6 days ago) Mar 18
to FLEx list
I have nearly 5000 well-developed entries, but less than half of them have been assigned a semantic domain. Has anyone developed an AI tool or a procedure that would help us quickly assign reasonable semantic domain guesses? We would confirm them later.

chris...@sil.org

unread,
Mar 19, 2026, 9:30:51 AM (6 days ago) Mar 19
to FLEx list
Let me know if you hear of one!  I think you could probably achieve this currently with an LLM hooked up to FlexTools.

Chris

Jonathan Dailey

unread,
Mar 19, 2026, 9:38:50 AM (6 days ago) Mar 19
to flex...@googlegroups.com
You could use the suggest button on the bulk edit section but that would be very tentative. 

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/f401cc2c-51c6-46ec-a5a2-7aa9f20491e1n%40googlegroups.com.

Mike Maxwell

unread,
Mar 20, 2026, 10:53:07 PM (4 days ago) Mar 20
to flex...@googlegroups.com
I'm assuming you want to assign the entries for words in some language
to semantic domains based on their glosses, not on the words themselves.

As you say, this probably cannot be done completely automatically. But
you might be able to get part way towards your goal using a thesaurus to
map the English word glosses (possibly ambiguously) to more abstract
concepts. An example of a public domain thesauri is WordNet
(https://en.wikipedia.org/wiki/WordNet, see also
https://wordnet.princeton.edu/related-projects). This would get you to
hypernyms, which you might more easily map to semantic domains.

There also exist natural language programs like word2vec, which create a
vector-based semantic-like representation of words (in theoretically any
language) based on co-occurrences in text. This requires large amounts
of text, and it does not directly create a representation grounded in
the real world. But if nearly half of your words already have semantic
domains, it *might* be possible to map those words to existing
English-based vector semantic domains and make inferences about the
semantic domains of the other words in your dictionary. But that's more
a research project than a tried-and-true method, and assume you have a
large corpus.

A bilingual corpus aligned at the word level to words in English or some
other glossing language could also be used. Again, I think this is more
of a research project than a way of speeding up your work. (It also
assumes that your corpus is parsed, i.e. that each inflected wordform
has a pointer to its lexeme.)

Mike Maxwell
> --
> "FLEx list" messages are public. Only members can post.
> flex_d...@sil.org
> http://groups.google.com/group/flex-list <http://groups.google.com/
> group/flex-list>.
> ---
> You received this message because you are subscribed to the Google
> Groups "FLEx list" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to flex-list+...@googlegroups.com <mailto:flex-
> list+uns...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/flex-
> list/f401cc2c-51c6-46ec-a5a2-7aa9f20491e1n%40googlegroups.com <https://
> groups.google.com/d/msgid/flex-list/f401cc2c-51c6-46ec-
> a5a2-7aa9f20491e1n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Mike Maxwell

Reply all
Reply to author
Forward
0 new messages