initial code based on PROIEL dependency analysis

41 views
Skip to first unread message

James Tauber

unread,
Apr 12, 2010, 6:26:32 AM4/12/10
to graded...@googlegroups.com
Until this weekend, all the GNT graded reader work I'd done has used clause boundaries from OpenText.org

With the availability of the PROIEL dependency tree analysis, I thought I'd give that a go.

I've uploaded to github code for extracting the clauses in John's Gospel and generating a very basic reading programme from that.

Clauses were extracted by looking at any 'pred' arc and linearizing all nodes from that point down. If there were embedded preds then clauses corresponding
to both inner and outer preds were generated.

Note that the current code is just based on forms with use made of syntactic or morphological information. I also can't do inline replacement into an English context because I don't have an English text mapped to the PROIEL analysis.

However, my initial impression is that the PROIEL analysis will be preferable to work with moving forward.

James

Patrick Narkinsky

unread,
Apr 12, 2010, 7:41:30 AM4/12/10
to graded...@googlegroups.com
James,

Could you clarify in what ways you see the PROIEL data being superior
to the opentext data? One obvious one that leaps to mind is that
OpenText seems to be a dead project...

Thanks,

Patrick

--
Patrick Narkinsky
pat...@narkinsky.com

"Let things true be preferred to things false, things eternal to
things momentary, things useful to things agreeable."

Lucius Caelius Lactantius

> --
> You received this message because you are subscribed to the Google Groups "graded-reader" group.
> To post to this group, send email to graded...@googlegroups.com.
> To unsubscribe from this group, send email to graded-reade...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/graded-reader?hl=en.
>
>

jtauber

unread,
Apr 12, 2010, 8:15:01 AM4/12/10
to graded-reader
It's actively maintained, is redistributable under a CC license, is
based on a freely redistributable text and is a less idiosyncratic
analysis.

Admittedly, I haven't spent THAT much time with it but it seems that
it will be easier to extract the kind of syntactic information I'm
interested in from it.

James


On Apr 12, 7:41 am, Patrick Narkinsky <patr...@narkinsky.com> wrote:
> James,
>
> Could you clarify in what ways you see the PROIEL data being superior
> to the opentext data?  One obvious one that leaps to mind is that
> OpenText seems to be a dead project...
>
> Thanks,
>
> Patrick
>
> --
> Patrick Narkinsky

> patr...@narkinsky.com

Reply all
Reply to author
Forward
0 new messages