lxxmorph with better lemmas

120 views
Skip to first unread message

James Cuénod

unread,
May 21, 2016, 5:51:37 PM5/21/16
to Open Scriptures
The ccat files are a great source of morphological data for the LXX but "verbal prefixes are separated from the root," so their lemmas are not as helpful (see http://ccat.sas.upenn.edu/gopher/text/religion/biblical/lxxmorph/*ReadMe.Analysis). Is there a good lxx source with better lemmatization?

David Troidl

unread,
May 22, 2016, 11:44:16 AM5/22/16
to openscr...@googlegroups.com

I have done considerable work on the lemmas, not only the split apart ones, but there were inconsistencies.  The same word was given different lemmas in different places, some obvious misspellings, others just different.  I have all the data recorded in a set of OSIS files, that I would be happy to share.  The only problem is, the CCAT license is rather restrictive, so I'm not sure how I could share them properly.

David

On 5/21/2016 5:51 PM, James Cuénod wrote:
The ccat files are a great source of morphological data for the LXX but "verbal prefixes are separated from the root," so their lemmas are not as helpful (see http://ccat.sas.upenn.edu/gopher/text/religion/biblical/lxxmorph/*ReadMe.Analysis). Is there a good lxx source with better lemmatization?
--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscripture...@googlegroups.com.
To post to this group, send email to openscr...@googlegroups.com.
Visit this group at https://groups.google.com/group/openscriptures.
For more options, visit https://groups.google.com/d/optout.


Virus-free. www.avast.com

James Cuénod

unread,
May 22, 2016, 4:11:36 PM5/22/16
to Open Scriptures
Hi David, thanks. Yes, I didn't realise how restrictive that license is. Is there a better source to use?

David Troidl

unread,
May 23, 2016, 9:26:13 AM5/23/16
to openscr...@googlegroups.com

I don't really know of a better source.  That's why I continue to work with the CCAT data.

Teus Benschop

unread,
May 24, 2016, 2:00:32 AM5/24/16
to openscr...@googlegroups.com
Would it be a right thing to do to share the modifications on the CCAT data as a set of patch files? Then one could use the original CCAT files, apply the shared patch files, and have the improved data as a result.

David Troidl

unread,
May 24, 2016, 9:51:39 AM5/24/16
to openscr...@googlegroups.com

That sounds like an excellent idea.  I am not an expert programmer.  My first thought would be to make a complete replacement list, giving the original lemma and its replacement, whether there are changes or not.  If there is a better way, I am open to suggestions.

The other question is, my lemmas are in Unicode polytonic Greek.  Would the patch file be better taking the original betacode CCAT lemmas, or their Unicode forms?  I also use real polytonic Greek, not the NFC mutilation.  Would that be a problem?

David

On 5/24/2016 2:00 AM, Teus Benschop wrote:
Would it be a right thing to do to share the modifications on the CCAT data as a set of patch files? Then one could use the original CCAT files, apply the shared patch files, and have the improved data as a result.
--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscripture...@googlegroups.com.
To post to this group, send email to openscr...@googlegroups.com.
Visit this group at https://groups.google.com/group/openscriptures.
For more options, visit https://groups.google.com/d/optout.


Virus-free. www.avast.com

Teus Benschop

unread,
May 25, 2016, 12:41:46 PM5/25/16
to openscr...@googlegroups.com
I didnt' realize that the two formats are so different. Looking at the CCAT files, and thinking of how different an XML files looks, it makes no sense to use patch files to transform the one to the other.

It wouldmake more sense to write a program that takes the CCAT data, creates an XML file out of it, and then apply your patches on top of that XML (OSIS) file.

In any case, the patch files can deal with the original beta codes, as well with the polytonic Greek. The patch system is unaware of whatever script is used.

Teus.

p.juanmig...@gmail.com

unread,
Aug 15, 2016, 2:00:46 AM8/15/16
to Open Scriptures
Hello. I saw this post by chance, and I got interested because I am working on an analytic Greek OT including Apocrypha/Deuterocanonicals, which is not available anywhere. David, I was wondering if you could share your OSIS files with me, they would be a great help.

Thank you very much. Blessings,
Juan.


El domingo, 22 de mayo de 2016, 17:44:16 (UTC+2), DavidTroidl escribió:

I have done considerable work on the lemmas, not only the split apart ones, but there were inconsistencies.  The same word was given different lemmas in different places, some obvious misspellings, others just different.  I have all the data recorded in a set of OSIS files, that I would be happy to share.  The only problem is, the CCAT license is rather restrictive, so I'm not sure how I could share them properly.

David

On 5/21/2016 5:51 PM, James Cuénod wrote:
The ccat files are a great source of morphological data for the LXX but "verbal prefixes are separated from the root," so their lemmas are not as helpful (see http://ccat.sas.upenn.edu/gopher/text/religion/biblical/lxxmorph/*ReadMe.Analysis). Is there a good lxx source with better lemmatization?
--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures+unsubscribe@googlegroups.com.
To post to this group, send email to openscriptures@googlegroups.com.


Virus-free. www.avast.com

David Troidl

unread,
Aug 15, 2016, 8:20:19 AM8/15/16
to openscr...@googlegroups.com

Hello Everyone,

My apologies to those who corresponded with me earlier on this thread.  I have been hesitating to release, because the resources are in a state of flux.  The more I work on revising my Septuagint translation of Genesis, the more I see a need for revising the Greek Word List accordingly.  Some of the definitions are relics from very old work, others are up to the minute.  Treat them accordingly.

I have finally decided to commit to a release.  Please excuse any imperfections.  There is a newly created repo for GreekResources at https://github.com/openscriptures/GreekResources.  The available readme files should explain the contents.  Please refer any questions or difficulties back to this list.

Due to the restrictive nature of the CCAT license, I am not able to release my own local files of the Septuagint.  My work on the lemmas is available in the LxxLemmas directory, with its own readme.  Merging them into the CCAT files, or any files derived from them, should be straightforward.  Only realize that the books with source files split into two separate files, like Genesis for example, will have to merge to match the lemma files.

I hope these resources will be helpful.

Peace,

David

To unsubscribe from this group and stop receiving emails from it, send an email to openscripture...@googlegroups.com.
To post to this group, send email to openscr...@googlegroups.com.




Avast logo

This email has been checked for viruses by Avast antivirus software.
www.avast.com


jta...@jtauber.com

unread,
Jan 3, 2017, 11:51:27 AM1/3/17
to Open Scriptures
I just wanted to update people on this thread that I've started applying my morphology software to the CCAT data and, in the process, have been producing patch files with corrections I've needed to make so far.


Note that they haven't been independently verified, they are just the changes necessary to get my morphological analyzer to work.

As part of other work I'm doing, I should solve the "split preverb to normal lemmas" mapping problem soon.

Aaron Laws

unread,
Jan 5, 2017, 1:25:02 PM1/5/17
to openscr...@googlegroups.com
Excellent, glad to hear of your progress! 

Reply all
Reply to author
Forward
0 new messages