Link Grammar for Russian

17 views
Skip to first unread message

Anton Kolonin @ Gmail

unread,
Jul 8, 2020, 12:46:07 AM7/8/20
to link-grammar
Hi, if there is a Russain Link Grammar guru here at the moment?

I am trying the sentence "папа сидел на диване" at
http://sz.ru/parser/parse.pl
and it is parsed just fine.

However, when I check the grammar files in

https://github.com/opencog/link-grammar/tree/master/data/ru

I am not finding infected from of "диване" while I am able to find the
the non-infected form "диван":

Antons-MacBook-Pro:ru akolonin$ grep -r диване *
Antons-MacBook-Pro:ru akolonin$ grep -r диван *
stem.dict:диван.ndmsi диез.ndmsi диплом.ndmsi доступ.ndmsi дуэт.ndmsi
жетон.ndmsi
stem.dict:диван.ndmsv диез.ndmsv диплом.ndmsv доступ.ndmsv дуэт.ndmsv
жетон.ndmsv
stem.dict:диамагнетик.ndmsv диванчик.ndmsv диптих.ndmsv дистих.ndmsv
диуретик.ndmsv дифтонг.ndmsv
stem.dict:диамагнетик.ndmsi диванчик.ndmsi диптих.ndmsi дистих.ndmsi
диуретик.ndmsi дифтонг.ndmsi
words/words.36:дивани.=
words/words.6:диван.=
words/words.235:диванчик.=


Does it mean
A) the http://sz.ru/parser/parse.pl is running more advanced version of
the Russian grammar
OR
B) there is some LG magic about handling inflections?

Can one point to the current maintainer of the Russian dictionary and/or
corresponding reference?

Cheers,
-Anton

--
-Anton Kolonin
skype: akolonin
cell: +79139250058
akol...@aigents.com
https://aigents.com
https://www.youtube.com/aigents
https://www.facebook.com/aigents
https://wt.social/wt/aigents
https://medium.com/@aigents
https://steemit.com/@aigents
https://reddit.com/r/aigents
https://twitter.com/aigents
https://golos.in/@aigents
https://vk.com/aigents
https://aigents.com/en/slack.html
https://www.messenger.com/t/aigents
https://web.telegram.org/#/im?p=@AigentsBot

Amir Plivatsky

unread,
Jul 9, 2020, 4:56:30 PM7/9/20
to link-grammar
Hi Anton,
I don't know, but in any case this is not related to your observation here.
 
OR
B) there is some LG magic about handling inflections?

Yes.  It tries to split words to a stem & suffix.
Stems are words ended with ".=", while suffixes start with "=".
For диване it finds these split possibilities:
   див.= =е
   диван.= =ане

(див.=, , диван.=, and =ане are all found in the dictionary).

You can see these splits using the following debug arguments:

$ link-parser ru -v=6 -debug=flatten_wordgraph,print_sentence_word_alternatives

(See alt0 and alt1 of word7.)

You can also see the splits using the "word-graph" display:
link-parser ru -wordgraph=1

(You can enlarge the window like a browser window, by stretching it and using ctrl-roller or ctrl+).

You can also see which splits got actually used using the -morphology flag.

Can one point to the current maintainer of the Russian dictionary and/or
corresponding reference?

See the morph.dict file.


Cheers,
-Anton

Amir

Linas Vepstas

unread,
Jul 9, 2020, 8:22:01 PM7/9/20
to link-grammar
What Amir said. You should explore the various flags and settings. You should see the following output:
$ link-parser ru
link-grammar: Info: Dictionary found at /usr/local/share/link-grammar/ru/4.0.dict
link-grammar: Error: Aspell: No word lists can be found for the language "ru".
link-grammar: Info: ru: Spell checker disabled.
link-grammar: Info: Dictionary version 5.3.15, locale ru_RU.UTF-8
link-grammar: Info: Library version link-grammar-5.8.0. Enter "!help" for help.
linkparser> папа сидел на диване
Found 1 linkage (1 had no P.P. violations)
    Unique linkage, cost vector = (UNUSED=0 DIS= 0.00 LEN=12)

     +----Sm3----+----Ew---+----Jp---+
     |           |         |         |
папа.nlmsi сидел.vnndpms на.jp диване.ndmsp

linkparser> !morph
Display word morphology turned on.
linkparser> папа сидел на диване
Found 1 linkage (1 had no P.P. violations)
    Unique linkage, cost vector = (UNUSED=0 DIS= 0.00 LEN=12)

          +------Sm3------+        +-------Jp------+
  +-LLAEY-+      +--LLAOC-+---Ew---+      +--LLAAQ-+
  |       |      |        |        |      |        |
пап.= =а.nlmsi си.= =дел.vnndpms на.jp диван.= =е.ndmsp

linkparser>

The second form showing the stem-suffix splitting that was chosen. The stem-suffix bonds always start with the two letters LL and there are tens of thousands or hundreds of thousands of these; the link names are auto-generated from the input sources.

The parser at  http://sz.ru/parser/parse.pl was an earlier version of the same system, and the current link-parser should be completely compatible with it, as both are constructed using the same scripts from the same source dictionaries. ... or at least, they were. The current dictionaries in the current link-parser might be newer; I don't know if the scripts powering the web-site at sz.ru were kept up-to-date.

The documentation for link types is here: http://sz.ru/parser/doc/ and the suffix documentation is here: http://sz.ru/parser/doc/morph/   Some of the link types for Russian and English are quite similar: for example, the meaning of W, M, J, S, SI, A, E, EA, MV are more-or-less the same in English and Russian.

There is also more info here: http://slashzone.ru/parser/

-- Linas


--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/20ecafaf-ed28-4c33-ae80-e29eca7198aeo%40googlegroups.com.


--
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

Reply all
Reply to author
Forward
0 new messages