I'm assuming you want to assign the entries for words in some language
to semantic domains based on their glosses, not on the words themselves.
As you say, this probably cannot be done completely automatically. But
you might be able to get part way towards your goal using a thesaurus to
map the English word glosses (possibly ambiguously) to more abstract
concepts. An example of a public domain thesauri is WordNet
(
https://en.wikipedia.org/wiki/WordNet, see also
https://wordnet.princeton.edu/related-projects). This would get you to
hypernyms, which you might more easily map to semantic domains.
There also exist natural language programs like word2vec, which create a
vector-based semantic-like representation of words (in theoretically any
language) based on co-occurrences in text. This requires large amounts
of text, and it does not directly create a representation grounded in
the real world. But if nearly half of your words already have semantic
domains, it *might* be possible to map those words to existing
English-based vector semantic domains and make inferences about the
semantic domains of the other words in your dictionary. But that's more
a research project than a tried-and-true method, and assume you have a
large corpus.
A bilingual corpus aligned at the word level to words in English or some
other glossing language could also be used. Again, I think this is more
of a research project than a way of speeding up your work. (It also
assumes that your corpus is parsed, i.e. that each inflected wordform
has a pointer to its lexeme.)
Mike Maxwell
>
http://groups.google.com/group/flex-list <
http://groups.google.com/
> group/flex-list>.
> ---
> You received this message because you are subscribed to the Google
> Groups "FLEx list" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
flex-list+...@googlegroups.com <mailto:
flex-
>
list+uns...@googlegroups.com>.
> list/f401cc2c-51c6-46ec-a5a2-7aa9f20491e1n%
40googlegroups.com <https://
>
groups.google.com/d/msgid/flex-list/f401cc2c-51c6-46ec-
> a5a2-7aa9f20491e1n%
40googlegroups.com?utm_medium=email&utm_source=footer>.
--
Mike Maxwell