ETCBC parsings and WLC text

105 views
Skip to first unread message

Nathan Bierma

unread,
Jan 27, 2015, 9:49:53 AM1/27/15
to openscr...@googlegroups.com
Some of you already saw this on openscriptures-hb, but I wanted to see if anyone on the larger OS list had any ideas. Thanks!  Nathan 

-----
Dear all, our developer has hit some snags while working on integrating the ETCBC parsings into OS' WLC text. He spells them out below. See his python scripts athttps://github.com/ctslearning/etcbc2wlc
 and report at http://goo.gl/Tt7G6p . 

Does anyone have any advice or guidance? Is the attempt to automate this integration viable or should we abandon it? 

>>>
Following Open Scripture's morphology (http://openscriptures.github.io/morphhb/parsing/HebrewMorphologyCodes.html), I've written a conversion from ETCBC (http://shebanq-doc.readthedocs.org/en/latest/features/comments/0_overview.html). (See https://github.com/ctslearning/etcbc2wlc). This somewhat works. There are quite a few things that WLC has that ETCBC doesn't, or at least I haven't been able to figure out. Here's a list of them:
  • Suffixes. While the suffix is part of the word, there is no parsing information on it whatsoever.
  • Adjective types: ordinal number.
  • Pronoun types: indefinite, relative.
  • Participle types: affirmation, exhortation, demonstrative, direct object, relative.
  • Hebrew verb stems: polel, polal, hithpolel, poel, poal, palel, pulal, qal passive, pilpel, polpal, hithpalpel, nithpael, pealal, pilel, hothpaal, tiphil, hishtaphel, nithpalel, nithpoel, hithpoel.
  • Aramaic verb stems: hithpeel, ithpaal, hithpaal, saphel, hophal, ithpeel, hishtaphel, ishtaphel, hithaphel, polel, ithpoel, hithpolel, hithpalpel, hephal, poel, palpel, ithpalpel, ithpolel, ittaphal.
  • Verb tenses: sequential perfect (weqatal), cohortative, jussive.
  • States: emphatic.
After building that conversion, I wanted to test it out, so since Open Scriptures has Ruth's morphology, I ran tests against it. Attached is a Apple Numbers file with the results. (See http://goo.gl/Tt7G6p ). This brought up more specifications that Open Scriptures didn't include, such as lemmas having a plus sign to designate two words that are artificially separated for lexical purposes. There was one of these that wasn't even marked. There was also a variant that the BHS/ETCBC used but WLC didn't, and this wasn't documented in the Appendix you provided (אַל in Ruth 3:17). Needless to say, I was able to get nearly all the words to line up (column D), but these inconsistencies made that pretty hard. And this was just with four chapters. I can't image how this might go down for the 150 chapters in Psalms, for example. 

But even after getting most of the words to line up, there are still quite a few variances between Open Scripture's morph and what my script determines. I'm not saying it's perfect, but even the places where it's clearly working, the ETCBC data itself differs. Sometimes OS has info that the ETCBC doesn't (Row 5). Other times the part of speech is different (Row 19), other times the gender differs (Row 61). And I'm not fresh enough on my Hebrew to look at the actual word and know which is right.

All this to say, I'm beginning to wonder if we'll be able to completely automate this. The ETCBC is complete, and we even have a working morphology converter (however imperfect), but if I can't get the words (Column D) or the morphology (Column G) to match up 100% for Ruth's for chapters, converting the entire Old Testament isn't in our grasp yet. 
-----

Reply all
Reply to author
Forward
0 new messages