NELL for Russian Language

90 views
Skip to first unread message

Kseniya Buraya

unread,
Aug 22, 2017, 2:37:49 PM8/22/17
to NELL: Never-Ending Language Learner
Hi, all!

I'm a second-year master student and a researcher at Machine Learning Research Group (ITMO University, Russia).
This year we made an attempt to adopt the CPL component of the NELL system for morphologically rich languages, namely, Russian. We published a paper about our first steps of doing this work - https://goo.gl/q1xkaF.

Now our CPL component works with a very small ontology (over 20 categories) and we estimate its quality on the Russian Wikipedia articles. As our next step, we want to learn our CPL component the initial English ontology translated to Russian. But, unfortunately, our CPL is very slow. Its realization is in Python and we have problems with running it on big text corpuses and big ontologies.

We are very interested in the continuation of this work and want to ask for a help and collaboration. Is there any description of the architecture of the CPL component (its source code and the overall implementation)? Is there any chance for us to collaborate with initial NELL creators and embed the Russian language there?

Thanks!
Best wishes,
Kseniya.

Bryan Kisiel

unread,
Sep 27, 2017, 1:25:43 PM9/27/17
to NELL: Never-Ending Language Learner
Hi Kseniya,

Sorry to take so long to get back to you -- people travelling during the
Summer and then starting a new semester lead to some discoordination.
This is an interesting paper. Indeed, applying NELL to additional
languages is a key area of interest for us, and the team here at CMU was
unaware that you'd already been in contact with some of our Brazilian
collaboorators who have been spearheading most of the work on multilingual
NELL. As you might already know, for the languages we've used so far,
what we usually do is have a number of filters for possible patterns and
their arguments based on POS tagging, although the POS requirements are
not specific to each pattern as in your approach. I suspect your work
could improve precision in other languages as well, since the existing CPL
does commonly make similar kinds of errors in other languages we've tried.
The best defense NELL has against that for now is relying on lack of
corroboration from other learning algorithms helping to filter out such
errors.

The only more detailed description of CPL that has been published than in
our AAAI10 paper, which it seems you've already found, would be in Andrew
Carlson's thesis work linked from http://www.cs.cmu.edu/~acarlson/. But
we've made a number of small tweaks beyond that over the years. We've
also tried alternate CPL implementations since then, and in fact we're
using a variant we internally call "CPL2" for category learning that uses
some fundamentally different formulas for scoring patterns and instances.
The main intentions for CPL2 are to improve the recall/precision tradeoff,
and to be more responsive to human feedback. This is ongoing work, so we
haven't published anything about it yet.

If you want to email me directly, I can share some additional details with
you, and and we can explore the possibility of further collaboration.

bki...@cs.cmu.edu
> --
> You received this message because you are subscribed to the Google Groups "NELL: Never-Ending Language Learner" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cmunell+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
Reply all
Reply to author
Forward
0 new messages