Using FreeLing with R to process a corpus

963 views
Skip to first unread message

Matías Guzmán

unread,
Nov 17, 2012, 4:45:49 AM11/17/12
to corplin...@googlegroups.com
Hello all,

I need to lemmatize a corpus (in Spanish and English), find a series of sentences based on the lemmas, and then parse these sentences into trees. FreeLing can lemmatize and parse Spanish and English quite well, but there seems to be no implementation for R. I could use freeling calling it with system("analyze [...]"), but this doesn't seem to be a great solution. Maybe anyone know any better workaround?

Thanks,

Matías Guzmán

Kevin Parent

unread,
Nov 17, 2012, 5:27:28 AM11/17/12
to corplin...@googlegroups.com
Unless some lemmatizing package has come up since the last time I had to do this and couldn't (a few years ago), R doesn't really have the ability. So I believe it's either do it yourself or use Freeling. I asked a question about lemmatizing a while back. It takes a little effort as you basically have to make the lemma tables yourself, but all in all that's not so hard and not as long a process as you might first assume, and of course you can reuse it for any other project. Searching the archives will bring that conversation up.




--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To post to this group, send email to corplin...@googlegroups.com.
To unsubscribe from this group, send email to corpling-with...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/corpling-with-r?hl=en.



--
Kevin Parent, Ph.D, ACS, ALB
Korea Maritime University
Chair, Korea Toastmasters Territorial Council

Schoolmasters: https://sites.google.com/site/schoolmasterstemp/
National Korea Toastmasters: www.koreatm.org



Pep Vallbé

unread,
Nov 17, 2012, 12:54:22 PM11/17/12
to corplin...@googlegroups.com
Hi, as far as I know there's no implementation in R for that and you might have to use them separately. I'm not sure whether Python would do for you, but in that language there are good applications for NLP (e.g., http://nltk.org/) and statistical analysis (e.g., http://pandas.pydata.org/). 

cheers!

pep
Pep Vallbé


Earl Brown

unread,
Dec 29, 2012, 5:47:08 PM12/29/12
to corplin...@googlegroups.com
I don't know if TreeTagger:

http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

can do all you need, but there is an R package called "koRpus" that can interface with it:

http://cran.r-project.org/web/packages/koRpus/index.html

That's what I use to do my POS tagging and lemmatization in Spanish. It also has support for English and other languages.

Best, Earl Brown

Matías Guzmán

unread,
Dec 29, 2012, 6:14:09 PM12/29/12
to corplin...@googlegroups.com
Thanks Earl, I already did the work in java, R is too slow for what I wanted anyhow ;)


2012/12/29 Earl Brown <ekbr...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To view this discussion on the web visit https://groups.google.com/d/msg/corpling-with-r/-/jdsrPou4-UkJ.

Earl Brown

unread,
Oct 17, 2013, 5:34:14 PM10/17/13
to corplin...@googlegroups.com
Matías and/or others, I'd love to use FreeLing more for my lemmatization and tagging in Spanish, as it is way more robust than TreeTagger, but I'm having the trouble Matías seemed to refer to when using system("analyze ...") in R, including error messages like this:

/usr/local/bin/analyze: line 39:   363 Segmentation fault: 11  $FREELING/bin/analyzer $param

My question is more of a point-me-in-the-right-direction type of question than a question about code:
What resources would help me gain (enough) proficiency in Java to be able to use FreeLing?

I bought:

Hammond, Michael. 2002. Programming for Linguists: Java Technology for Language Research. Blackwell.

and it helped me learn about classes and objects and methods, but I feel like I need more about how to call FreeLing, written in C++, from Java, or even from within R with the rJava package or even better yet, from the Rcpp package.

Thanks for your suggestions. Earl Brown

Matías Guzmán Naranjo

unread,
Oct 17, 2013, 5:42:14 PM10/17/13
to corplin...@googlegroups.com
Hey Earl,

I actually know enough java for freeling, and did used it so. But if you don't, the python api is great and it's easier to learn. I just liked java's speed, python can be a bit slow in large corpora. I would say that some 3-6 months in either languages should be enough.

Best, Matías


2013/10/17 Earl Brown <ekbr...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with...@googlegroups.com.

To post to this group, send email to corplin...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages