Re: [opencog-dev] hello

Linas Vepstas

unread,

Aug 16, 2011, 2:12:52 PM8/16/11

to Ben Goertzel, ope...@googlegroups.com, joel...@gmail.com, 练睿婷, link-grammar

On 16 August 2011 12:32, Ben Goertzel <b...@goertzel.org> wrote:
>
> One project that has never been done, and is fairly good for a
> beginner I think, is to try to test how well RelEx works relative to
> other parsers...
[...]
> The dependency relations used in that tool's output are probably not
> identical to RelEx's, but I suspect a mapping is possible, especially
> since RelEx now has a "compatibility mode" that gives Stanford Parser
> style output...

FWIW, relex now has a unit test for the "stanford compatibility mode"
of about 50 sentences (and they all pass). (run this by saying "ant test"
or "ant check")

While I put this together, I discovered that the stanford parser was far
from perfect; when the two parsers disagreed, it was often the case that
stanford just did something crazy. My gut impression was that it made
errors about twice as often as relex, but I did not try to measure an
error rate.

One big problem is that different parsers seem to generate different
kinds of output, making direct comparison difficult or impossible :-(
This is why I created the "compatibility mode".

--linas

Ben Goertzel

unread,

Aug 16, 2011, 2:18:58 PM8/16/11

to linasv...@gmail.com, ope...@googlegroups.com, joel...@gmail.com, 练睿婷, link-grammar

Interesting, Linas...

So, my suggested project for the student interested in an NLP project,
was to use some scripts to transform a large corpus into dependency
form (tweaked to agree with Stanford output format, say), and then
test RelEx vs. Stanford on this large dependency corpus...

If RelEx really does work better, that's a nice result...

If not, it tells us some places we need to improve RelEx...

- Ben

--
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
CTO, Genescient Corp
Vice Chairman, Humanity+
Adjunct Professor of Cognitive Science, Xiamen University, China
Advisor, Singularity University and Singularity Institute
b...@goertzel.org

"My humanity is a constant self-overcoming" -- Friedrich Nietzsche

Linas Vepstas

unread,

Aug 16, 2011, 2:42:45 PM8/16/11

to Ben Goertzel, ope...@googlegroups.com, joel...@gmail.com, 练睿婷, link-grammar

On 16 August 2011 13:18, Ben Goertzel <b...@goertzel.org> wrote:
> Interesting, Linas...
>
> So, my suggested project for the student interested in an NLP project,
> was to use some scripts to transform a large corpus into dependency
> form (tweaked to agree with Stanford output format, say), and then
> test RelEx vs. Stanford on this large dependency corpus...

I'm not sure, but this "tweaking" may be non-trivial. For example,
I found that sometimes, stanford made distinctions where relex didn't,
and v.v. and so much of my work was reproducing these kinds of quirks.

I assume that other corpuses will have similar characteristics.

> If RelEx really does work better, that's a nice result...
>
> If not, it tells us some places we need to improve RelEx...

Well, I have a huge number of test sentences that are parsed
incorrectly; there's no shortage of things to fix :-)

The most urgent work that remains un-done is to do the relex
side of the new/improved conjunction handling. When adding
conjunction support, I added some 6 or so new link types to link-grammar.
Relex doesn't know about any of them, and so now it will mangle
parses with conjunctions.

One of the places where relex and stanford were "fundamentally
incompatible" was exactly in the handling of conjunctions. The
new link-grammar conjunction links fixes this, but the rules needed
to generate the correct output are still missing.

I think that fixing/finishing this might be the most important thing to
hack on in relex these days.

--linas

Ben Goertzel

unread,

Aug 16, 2011, 2:47:50 PM8/16/11

to linasv...@gmail.com, ope...@googlegroups.com, joel...@gmail.com, 练睿婷, link-grammar

Thanks Linas, that is a good suggestion

So for G Meera: perhaps Linas's suggestion would be a good place to
start. The task would be to modify RelEx to make use of the link
parser's new conjunction handling.... He could provide specific
guidance on this as needed...

Following this, testing against a larger corpus as I suggested would
be valuable..

And all this will help NLGen as well as RelEx...

-- Ben G

Linas Vepstas

unread,

Aug 16, 2011, 3:13:46 PM8/16/11

to Ben Goertzel, ope...@googlegroups.com, joel...@gmail.com, 练睿婷, link-grammar

On 16 August 2011 13:47, Ben Goertzel <b...@goertzel.org> wrote:
>
> Following this, testing against a larger corpus as I suggested would
> be valuable..

Simply finding one that is compatbile or almost-compatible, and determining
just how different it is, would be useful!

But yes, I think that fixing/finishing conjunction handling would be best.

--linas

Ben Goertzel

unread,

Aug 16, 2011, 3:24:54 PM8/16/11

to linasv...@gmail.com, ope...@googlegroups.com, joel...@gmail.com, 练睿婷, link-grammar

Well what I found was a script for converting the Penn Treebank and
Brown Corpus to dependency-grammar format...

The task then becomes to try to tweak this script to give
Stanford-parser-style output, I guess...