I'm basically attempting to reproduce that work. Now to the real
problem: I am in need of a parser to extract verb-subject/object pairs
from texts. One of the professors here directed me to Link Grammar (
http://www.link.cs.cmu.edu/link/ ), which does what I would need...
except that it is written in C. I've meddled with the API a bit, but
to my dismay I discovered that (mostly) all my knowledge of C
programming has waned, to the point of not even being able to get
their examples to work (in my defense, I'm _really_ good with Ruby and
Smalltalk now :-D ). Still, I am in need of a parser. Now, the forum
for Link Grammar indicates (
http://hartford.lti.cs.cmu.edu/linkparser/phorum/read.php?1,7 ) that
there is a Perl interface available for it so I said to myself "Hey,
why not see if there is something similar available in Ruby"? I looked
at RAA, but could not find anything related (perhaps I overlooked some
category?). So the question at hand is: Is there any English text
parser available in Ruby that would (somewhat) match my needs? I
_could_ write my programs in Perl, but I'd much rather use the one
language I've come to love: Ruby.
Cheers...
C.W.S.
> Greetings!
Moin.
> I am in need of a parser to extract verb-subject/object pairs
> from texts. One of the professors here directed me to Link Grammar (
> http://www.link.cs.cmu.edu/link/ ), which does what I would need...
> except that it is written in C. I've meddled with the API a bit, but
> to my dismay I discovered that (mostly) all my knowledge of C
> programming has waned, to the point of not even being able to get
> their examples to work [...] Is there any English text
> parser available in Ruby that would (somewhat) match my needs?
While I can give you no definite answer whether something like that is
already available Ruby/DL might help a lot at writing an interface to
the C library in pure Ruby. It's part of standard Ruby and there's quite
a lot of documentation available on the web.
> Is there any English text
> parser available in Ruby that would (somewhat) match my needs? I
> _could_ write my programs in Perl, but I'd much rather use the one
> language I've come to love: Ruby.
Hello Claus,
I have hacked together a small proof of concept extension that binds to
LinkParser. It is by far not complete, but you can download it from here:
www.tua.ch/ruby/link/050121-link-4.1b.tar.gz
I would like to maintain this library, although a first version will not
be out before next month; Please send whatever changes you make per
'darcs' or unified diff to me.
To get started, do a make of the link package itself. Then go to the obj
directory and do a 'ar r liblink.a *.o'; copy this library to your
library paths.
Then change to the /ext directory, do an 'extconf.rb' and then 'make',
'make install'.
You should now be able to run tc_linkparser.rb in /tests.
Don't expect too much, its only a base for what is to be. But look at it
from the bright side, you get to decide on the API ;). Plus it gets you
started with c again.
Hope this helps,
kaspar
hand manufactured code - www.tua.ch/ruby
> Again, many thanks for the quick assistance.
Tell me if you need it, the API took quick shape after another few hours of
'hack mode' just before the we..
Note that whatever I've written is under Ruby license, but the other stuff
is under a GPL license.
www.tua.ch/ruby/link/050128-link-4.1b-for-ruby.tgz
This is probably going to be a real release sometime soon.
best regards,
> I'll give it
> another try once I get access to a clean machine.
Hey, just tell me if you need help on this. You are the only user as far as
I can tell (that is: until I figure out something usefule for this tool).
Performance issues ? Isn't the thing rather well optimized ? I wonder what
you are running trough Lingua::LinkParser to get these issues...
----8<----
baron@daedalus:~/tmp/link-4.1b-ruby/tests$ ruby tc_linkparser.rb
Loaded suite tc_linkparser
Started
Opening ./4.0.dict
F Opening ./4.0.dict
E
Finished in 0.010642 seconds.
1) Failure:
test_basic(TestLinkParser) [tc_linkparser.rb:30]:
Exception raised:
Class: <StandardError>
Message: <"Could not find dictionary.">
---Backtrace---
tc_linkparser.rb:16:in `initialize'
tc_linkparser.rb:16:in `new'
tc_linkparser.rb:16:in `initialize'
tc_linkparser.rb:10:in `new'
tc_linkparser.rb:10:in `new'
/usr/lib/ruby/1.8/singleton.rb:95:in `instance'
/usr/lib/ruby/1.8/singleton.rb:84:in `instance'
tc_linkparser.rb:20:in `dict'
tc_linkparser.rb:31:in `test_basic'
tc_linkparser.rb:30:in `assert_nothing_raised'
tc_linkparser.rb:30:in `test_basic'
---------------
2) Error:
test_link_each(TestLinkParser):
StandardError: Could not find dictionary.
tc_linkparser.rb:16:in `initialize'
tc_linkparser.rb:16:in `new'
tc_linkparser.rb:16:in `initialize'
tc_linkparser.rb:10:in `new'
tc_linkparser.rb:10:in `new'
/usr/lib/ruby/1.8/singleton.rb:95:in `instance'
/usr/lib/ruby/1.8/singleton.rb:84:in `instance'
tc_linkparser.rb:20:in `dict'
tc_linkparser.rb:65:in `test_link_each'
2 tests, 1 assertions, 1 failures, 1 errors
baron@daedalus:~/tmp/link-4.1b-ruby/tests$ ls
4.0.affix 4.0.constituent-knowledge 4.0.knowledge tiny.dict
. 4.0.batch 4.0.dict tc_linkparser.rb words
baron@daedalus:~/tmp/link-4.1b-ruby/tests$
----8<----
I have two guesses as to what the problem is:
- I copied liblink.a to the wrong directory(ies), or
- 4.0.dict needs to be in the same directory as the library.
I haven't investigated further on it due to unrelated personal reasons, though.
The performance issues themselves are not with Lingua::LinkParser, but
with Link Grammar itself. When running it on the sentences in
http://notpublic.wrong.button.com/sent.txt . Those have been extracted
by a friend from a NYT article. Some time out, others just fail, and
on one Link Grammar just refuses to work because the sentence is
longer than 70 words! Fortunately for me, the first phase of my work
does not require _all_ sentences to be parsed - just enough to extract
a sufficient amount of VS/VO pairs (~40000).
Lastly: I've been in contact with Martin Chase, who worked on the
Linguistics and LinkParser ports
(http://raa.ruby-lang.org/list.rhtml?name=linkparser and
http://www.deveiate.org/code/linguistics-overview.html - a Ruby gem is
also available for the latter). I plan to use Linguistics because I
also need to make use of WordNet. So far LinkParser is unoptimized (to
the point that pruning (and probably memoization too) are a
necessity), and needs the words in the sentence to be tagged if used
with 4.0.dict . The plan I have so far is to use your Link Grammar
wrapper instead of LinkParser for the Linguistics module, since it is
considerably faster.
Cheers!
-CWS
> 1) Failure:
> test_basic(TestLinkParser) [tc_linkparser.rb:30]:
> Exception raised:
> Class: <StandardError>
> Message: <"Could not find dictionary.">
Yeah, this looks as though the dictionary could not be found. I usually run
the tests directly from the tests directory, where I also have all those
dict files. If that does not work (and it certainly seems not to), there is
an option using DICTPATH environment variable.
But I was looking at how to get the Dictionary.new an optional argument
which would be the path of the dictionaries.
Sure hope we can sort this out. And yeah, the Ruby version seems to be very
slow - I don't really understand why this should be rewritten.