Re: ETCBC4 Hebrew morphology & syntax is now free for non-commercial use

Daniel Owens

unread,

Nov 28, 2014, 3:15:47 AM11/28/14

to David Instone-Brewer, David Troidl, Pete Myers, openscri...@googlegroups.com, Jesse Griffin, Darrell Smith, Butrus Damaskus

This is very good news, thank you for sharing it with us. I hope they are able to resolve all questions about legal issues that may result from their association with the German Bible Society. But in any case this is good news for STEP and anyone else wanting to make use of the data. Are you preparing a module for STEP?

Daniel

On 11/27/14 8:15 PM, David Instone-Brewer wrote:

Dear David & Daniel (and other OpenScripture friends)

I'm forwarding an email (below) about a Hebrew OT morphology which has just become public.
Please note that the email was a response to a personal enquiry and is not an official public statement.

Here's a quick overview of the situation
* ETCBC4 is a database of morphology & syntax for the whole OT developed over the last 4 decades by researchers at Amsterdam's Free University.
* As I understand it, this is an extremely accurate and academically respectable dataset that is arguably better than the Westminster one (though I have not investigated it properly myself)
* This is licenced for use in Logos, Accordance etc, like the rival Westminster Hebrew morphology
* The owners now want to make it freely available. The German Bible Society (who put some seed money into it) were not happy with this
* The Eep Talstra Centre (who have developed ETCBC4) are nevertheless asserting their right to allow free distribution of this data (see the email below).

David IB

///   Dr David Instone-Brewer

dib   Senior Research Fellow in Rabbinics and the New Testament

^    Tyndale House, 36 Selwyn Gardens, Cambridge, CB3 9BA, UK

\=/   Rabb...@Tyndale.cam.ac.uk      www.TyndaleHouse.com

From: Dirk Roorda [ mailto:dirk....@dans.knaw.nl]
Sent: 20th November 2014 11:48
Re: The ETCBC4 database is now in the public domain

After realizing that, we no more hesitated, and put the complete ETCBC4 database in the public domain.
So, if you download it from DANS, you can do with it what you like, except using it for commercial applications, and we would appreciate proper attribution (cite the persistent identifier).

The DANS source is the EASY archive. urn:nbn:nl:ui:13-048i-71
Here you find the complete datasources, in several formats. Have a look at the readme file there.

The old WIVU site is being replaced by shebanq: http://shebanq.ancient-data.org

By the way, the ETCBC4 database from DANS is a snapshot.
Constantijn Sikkel and others are still adding to the database.
When they are "ready and finished", we will archive the result as ETCBC4s at DANS, in more or less the same format.

Dirk Roorda
researcher
+31 (0)6 13 66 50 23
dirk....@dans.knaw.nl
Data Archiving and Networked Services (DANS)
DANS promotes sustained access to digital research data. DANS is an institute of KNAW and NWO.
www.dans.knaw.nl
),

Daniel Owens

unread,

Nov 28, 2014, 7:19:17 AM11/28/14

to David Instone-Brewer, David Troidl, Pete Myers, openscri...@googlegroups.com, Jesse Griffin, Darrell Smith, Butrus Damaskus

That is good news. Thanks for clarifying.

Daniel

On 11/28/14 6:08 PM, David Instone-Brewer wrote:

Yes, I will probably use this, but I'm busy with other things at present.
The dispute with the German Bible Society is (acc to Dirk at ETC) about their use of the BHS text.
However we have no need to use the BHS, seeing that the OpenScripture text is arguably better, so there isn't any legal issue as far as I know.

David IB

At 08:15 28/11/2014, Daniel Owens wrote:

This is very good news, thank you for sharing it with us. I hope they are able to resolve all questions about legal issues that may result from their association with the German Bible Society. But in any case this is good news for STEP and anyone else wanting to make use of the data. Are you preparing a module for STEP?

Daniel

On 11/27/14 8:15 PM, David Instone-Brewer wrote:

Dear David & Daniel (and other OpenScripture friends)

I'm forwarding an email (below) about a Hebrew OT morphology which has just become public.
Please note that the email was a response to a personal enquiry and is not an official public statement.

Here's a quick overview of the situation
* ETCBC4 is a database of morphology & syntax for the whole OT developed over the last 4 decades by researchers at Amsterdam's Free University.
* As I understand it, this is an extremely accurate and academically respectable dataset that is arguably better than the Westminster one (though I have not investigated it properly myself)
* This is licenced for use in Logos, Accordance etc, like the rival Westminster Hebrew morphology
* The owners now want to make it freely available. The German Bible Society (who put some seed money into it) were not happy with this

* The Eep Talstra Centre (who have developed ETCBC4) are nevertheless asserting their right to allow free distribution of this dataÂ (see the email below).

David IB

Â

///Â Â Dr David Instone-Brewer

dibÂ Â Senior Research Fellow in Rabbinics and the New Testament

Â ^Â Â Â Tyndale House, 36 Selwyn Gardens, Cambridge, CB3 9BA, UK

\=/Â Â Rabb...@Tyndale.cam.ac.ukÂ Â Â Â Â www.TyndaleHouse.com

From: Dirk Roorda [ mailto:dirk....@dans.knaw.nl]
Sent: 20th November 2014 11:48
Re: The ETCBC4 database is now in the public domain
Â
Â
After realizing that, we no more hesitated, and put the complete ETCBC4 database in the public domain.
So, if you download it from DANS, you can do with it what you like, except using it for commercial applications, and we would appreciate proper attribution (cite the persistent identifier).
Â
The DANS source is the EASY archive. urn:nbn:nl:ui:13-048i-71
Here you find the complete datasources, in several formats. Have a look at the readme file there.

Â

The old WIVU site is being replaced by shebanq: http://shebanq.ancient-data.org

Â

By the way, the ETCBC4 database from DANS is a snapshot.
Constantijn Sikkel and others are still adding to the database.
When they are "ready and finished", we will archive the result as ETCBC4s at DANS, in more or less the same format.

Â
Â

Dirk Roorda
researcher
+31 (0)6 13 66 50 23
dirk....@dans.knaw.nl
Data Archiving and Networked Services (DANS)
DANS promotes sustained access to digital research data. DANS is an institute of KNAW and NWO.
www.dans.knaw.nl
),

Â

Nathan Bierma

unread,

Dec 8, 2014, 11:31:06 AM12/8/14

to Daniel Owens, David Instone-Brewer, David Troidl, Pete Myers, openscri...@googlegroups.com, Jesse Griffin, Darrell Smith, Butrus Damaskus, John Dyer

This is awesome news! Is anyone planning on working on an integrator to get the parsings as data attributes in xml or html, as with the files released by OpenScriptures HB for Jonah and Ruth? Is this possible without Strong's numbers in the ETCBC4? (or does ETCBC4 have them; I haven't been able to open their files). I could ask our developer to try it but I don't want to duplicate efforts.

Thanks,

Nathan

--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures...@googlegroups.com.
Visit this group at http://groups.google.com/group/openscriptures-hb.

--

Nathan Bierma

Educational Technologist

Interim Associate Director, Distance Education

Calvin Theological Seminary

Nathan Bierma

unread,

Jan 12, 2015, 3:18:51 PM1/12/15

to Daniel Owens, David Instone-Brewer, David Troidl, Pete Myers, openscri...@googlegroups.com, Jesse Griffin, Darrell Smith, Butrus Damaskus

A follow-up about the BHS question. David I-B says there's no need to use BHS since the Open Scriptures text (which is WLC-based, correct?) is superior. I agree, but I and our contract developer want to integrate the DANS/ETCBC parsings with our WLC/OS html files, and the developer is asking about discrepancies between the source and target versions and whether this would work. So two questions.

First, can we generate new html files from the ETCBC data, basically using this script:

http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/text/plain.ipynb#Verse-list

Or would those essentially be BHS files and run afoul of permissions?

Second, if we attempt to integrate ETCBC parsings with our existing WLC/OS files instead, will we encounter significant discrepancies? I'd always heard that discrepancies between BHS and WLC were negligible but I don't know for sure.

Basically we're still trying to figure out how exactly to take this gold mine and harvest it in our app. Thanks for any guidance you can give..

Nathan

On Fri, Nov 28, 2014 at 7:19 AM, Daniel Owens <dcow...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures...@googlegroups.com.
Visit this group at http://groups.google.com/group/openscriptures-hb.

Jesse Griffin

unread,

Jan 12, 2015, 3:30:51 PM1/12/15

to Nathan Bierma, Daniel Owens, David Instone-Brewer, David Troidl, Pete Myers, openscri...@googlegroups.com, Darrell Smith, Butrus Damaskus

Hello,

Appendix B of "A Readers Hebrew Bible" lists 27 differences between WLC and BHS. Only 11 of those are consonant differences. I'm not sure if that helps, but it is a short list of possible discrepancies.

Thank you,

Jesse Griffin

David Troidl

unread,

Jan 13, 2015, 11:49:03 AM1/13/15

to Jesse Griffin, Nathan Bierma, Daniel Owens, David Instone-Brewer, Pete Myers, openscri...@googlegroups.com, Darrell Smith, Butrus Damaskus

Hi,

The WLC, at least, is still an ongoing work. More recent updates catalog changes in agreement with the BHS, especially now that they have been comparing with the BHQ. There are cases where they decide to follow BHQ over BHS.

David

This email has been checked for viruses by Avast antivirus software.
www.avast.com

Nathan Bierma

unread,

Jan 26, 2015, 11:17:36 AM1/26/15

to openscri...@googlegroups.com, jag...@gmail.com, nbie...@calvinseminary.edu, dcow...@gmail.com, te...@tyndale.cam.ac.uk, peterdan...@gmail.com, sceptre...@yahoo.com, butrus...@gmail.com

Dear all, our developer has hit some snags while working on integrating the ETCBC parsings into OS' WLC text. He spells them out below. See his python scripts at https://github.com/ctslearning/etcbc2wlc

and report at http://goo.gl/Tt7G6p .

Does anyone have any advice or guidance? Is the attempt to automate this integration viable or should we abandon it?

>>>

Following Open Scripture's morphology (http://openscriptures.github.io/morphhb/parsing/HebrewMorphologyCodes.html), I've written a conversion from ETCBC (http://shebanq-doc.readthedocs.org/en/latest/features/comments/0_overview.html). (See https://github.com/ctslearning/etcbc2wlc). This somewhat works. There are quite a few things that WLC has that ETCBC doesn't, or at least I haven't been able to figure out. Here's a list of them:

Suffixes. While the suffix is part of the word, there is no parsing information on it whatsoever.
Adjective types: ordinal number.
Pronoun types: indefinite, relative.
Participle types: affirmation, exhortation, demonstrative, direct object, relative.
Hebrew verb stems: polel, polal, hithpolel, poel, poal, palel, pulal, qal passive, pilpel, polpal, hithpalpel, nithpael, pealal, pilel, hothpaal, tiphil, hishtaphel, nithpalel, nithpoel, hithpoel.
Aramaic verb stems: hithpeel, ithpaal, hithpaal, saphel, hophal, ithpeel, hishtaphel, ishtaphel, hithaphel, polel, ithpoel, hithpolel, hithpalpel, hephal, poel, palpel, ithpalpel, ithpolel, ittaphal.
Verb tenses: sequential perfect (weqatal), cohortative, jussive.
States: emphatic.

After building that conversion, I wanted to test it out, so since Open Scriptures has Ruth's morphology, I ran tests against it. Attached is a Apple Numbers file with the results. (See http://goo.gl/Tt7G6p ). This brought up more specifications that Open Scriptures didn't include, such as lemmas having a plus sign to designate two words that are artificially separated for lexical purposes. There was one of these that wasn't even marked. There was also a variant that the BHS/ETCBC used but WLC didn't, and this wasn't documented in the Appendix you provided (אַל in Ruth 3:17). Needless to say, I was able to get nearly all the words to line up (column D), but these inconsistencies made that pretty hard. And this was just with four chapters. I can't image how this might go down for the 150 chapters in Psalms, for example.

But even after getting most of the words to line up, there are still quite a few variances between Open Scripture's morph and what my script determines. I'm not saying it's perfect, but even the places where it's clearly working, the ETCBC data itself differs. Sometimes OS has info that the ETCBC doesn't (Row 5). Other times the part of speech is different (Row 19), other times the gender differs (Row 61). And I'm not fresh enough on my Hebrew to look at the actual word and know which is right.

All this to say, I'm beginning to wonder if we'll be able to completely automate this. The ETCBC is complete, and we even have a working morphology converter (however imperfect), but if I can't get the words (Column D) or the morphology (Column G) to match up 100% for Ruth's for chapters, converting the entire Old Testament isn't in our grasp yet.

---

Thank you,
Jesse Griffin

\=/Â Â ...@Tyndale.cam.ac.ukÂ Â Â Â Â www.TyndaleHouse.com

Nathan Bierma

unread,

Jan 27, 2015, 4:44:10 PM1/27/15

to David Instone-Brewer, openscri...@googlegroups.com

Thanks, David, for the reply. Here's an answer from the developer on the Ruth data; he's more familiar with the example (and the data) than I am:

-----

Here's the raw data from the two examples:

Ruth 2:11-7 [{'language': 'Hebrew', 'part_of_speech': 'verb', 'tense': 'infa', 'person': 'unknown', 'word': 'הגד', 'number': 'unknown', 'lexical_set': 'none', 'stem': 'hof', 'gender': 'unknown', 'state': 'a'}]

Ruth 4:15-7 [{'language': 'Hebrew', 'part_of_speech': 'conj', 'tense': 'NA', 'person': 'NA', 'word': 'ו', 'number': 'NA', 'lexical_set': 'none', 'stem': 'NA', 'gender': 'NA', 'state': 'NA'}, {'language': 'Hebrew', 'part_of_speech': 'prep', 'tense': 'NA', 'person': 'NA', 'word': 'ל', 'number': 'NA', 'lexical_set': 'none', 'stem': 'NA', 'gender': 'NA', 'state': 'NA'}, {'language': 'Hebrew', 'part_of_speech': 'verb', 'tense': 'infc', 'person': 'unknown', 'word': 'כלכל', 'number': 'unknown', 'lexical_set': 'none', 'stem': 'piel', 'gender': 'unknown', 'state': 'c'}]

The question marks mean there is no mapping. This happens a lot for verbs, and the OS morph simply removes empty values. Eventually the morphology converter will get rid of these, but there are legitimate times when there should be a value but the ETCBC doesn't have it (1:1-11, 1:6-24, 1:7-15, etc.) so that's why I've left in the question marks for now.

-----

On Tue, Jan 27, 2015 at 11:59 AM, David Instone-Brewer <te...@tyndale.cam.ac.uk> wrote:

Dear Nathan

Thanks for making these forays into the ETCBC data. I'm busy with LXX lexicography at present, and I haven't looked into it.
I suspect these differences are due to different philosophies of grammar.

For example, from a quick look at http://shebanq-doc.readthedocs.org/en/latest/features/comments/0_overview.html ,
it seems that the ETCBC is using a 4-stem model for verbs
Qal - the standard or simple
Piel - doubled - ie intensive
Nifal - with a nun prefix - ie passive or reflexive
Hiphil - ie causative.

This kind of model is preferred by modern comparative studies because it works with all semitic languages - Hebrew, Aramaic etc and even Punic and Phoenecian. Other stems are regarded as rare or derivative.

However, the page at https://shebanq-doc.readthedocs.org/en/latest/features/comments/vs.html
appears to say that ETCBC recognises many other stems.
So I don't really know.

I can't see how the ETCBC data shows these rarer forms, because the information appears to be missing from the Ruth.csv
eg at Ruth.2.11-7 the verb hugged (Strong 5046 8715 - a Hophal Infinitive) is indicated as "HVHa???a" in the ECTBC morphology column
eg at Ruth 4.15-7 the verb u-le-kalkkel (Strong 3557 8771 - a Pilpel Imperfect) is indicated as HC/R/Vpc???c in the ECTBC morphology.

What was the original of these two entries? Or are the question marks in the ECTBC data?

David IB

At 16:23 26/01/2015, Nathan Bierma wrote:

Hi David, I think you're on the OpenScriptures-HB list, but just in case, I wanted to send you this query and see if you had any ideas. Getting these parsings integrated would be of great benefit for both our apps.. Thanks! Â

Nathan

---------- Forwarded message ----------
From: Nathan Bierma < nbie...@calvinseminary.edu>
Date: Mon, Jan 26, 2015 at 11:17 AM
Subject: Re: [openscriptures-hb] Re: ETCBC4 Hebrew morphology & syntax is now free for non-commercial use
To: openscri...@googlegroups.com

Cc: jag...@gmail.com, nbie...@calvinseminary.edu, dcow...@gmail.com, te...@tyndale.cam.ac.uk, peterdan...@gmail.com , sceptre...@yahoo.com, butrus...@gmail.com

Dear all, our developer has hit some snags while working on integrating the ETCBC parsings into OS' WLC text. He spells them out below. See his python scripts at https://github.com/ctslearning/etcbc2wlc

Â and report at http://goo.gl/Tt7G6p .Â

Does anyone have any advice or guidance? Is the attempt to automate this integration viable or should we abandon it?Â

>>>
Following Open Scripture's morphology ( http://openscriptures.github.io/morphhb/parsing/HebrewMorphologyCodes.html ), I've written a conversion from ETCBC ( http://shebanq-doc.readthedocs.org/en/latest/features/comments/0_overview.html ). (SeeÂ https://github.com/ctslearning/etcbc2wlc).Â This somewhat works. There are quite a few things that WLC has that ETCBC doesn't, or at least I haven't been able to figure out. Here's a list of them:

Suffixes. While the suffix is part of the word, there is no parsing information on it whatsoever.

Adjective types:Â ordinal number.
Pronoun types:Â indefinite,Â relative.

Participle types: affirmation, exhortation, demonstrative, direct object, relative.

Hebrew verb stems:Â polel, polal, hithpolel, poel, poal, palel, pulal, qal passive, pilpel, polpal, hithpalpel, nithpael, pealal, pilel, hothpaal, tiphil, hishtaphel, nithpalel, nithpoel, hithpoel.
Aramaic verb stems:Â hithpeel,Â ithpaal,Â hithpaal,Â saphel,Â hophal, ithpeel, hishtaphel, ishtaphel, hithaphel, polel, ithpoel, hithpolel, hithpalpel, hephal, poel, palpel, ithpalpel, ithpolel, ittaphal.
Verb tenses:Â sequential perfect (weqatal),Â cohortative,Â jussive.
States: emphatic.
After building that conversion, I wanted to test it out, so since Open Scriptures has Ruth's morphology, I ran tests against it. Attached is a Apple Numbers file with the results. (See http://goo.gl/Tt7G6p ).Â This brought up more specifications that Open Scriptures didn't include, such as lemmas having a plus sign to designate two words that are artificially separated for lexical purposes. There was one of these that wasn't even marked. There was also a variant that the BHS/ETCBC used but WLC didn't, and this wasn't documented in the Appendix you provided (× Ö·×œ in RuthÂ 3:17). Needless to say, I was able to get nearly all the words to line up (column D), but these inconsistencies made that pretty hard. And this was just with four chapters. I can't image how this might go down for the 150 chapters in Psalms, for example.Â

But even after getting most of the words to line up, there are still quite a few variances between Open Scripture's morph and what my script determines. I'm not saying it's perfect, but even the places where it's clearly working, the ETCBC data itself differs. Sometimes OS has info that the ETCBC doesn't (Row 5). Other times the part of speech is different (Row 19), other times the gender differs (Row 61). And I'm not fresh enough on my Hebrew to look at the actual word and know which is right.

All this to say, I'm beginning to wonder if we'll be able to completely automate this. The ETCBC is complete, and we even have a working morphology converter (however imperfect), but if I can't get the words (Column D) or the morphology (Column G) to match up 100% for Ruth's for chapters, converting the entire Old Testament isn't in our grasp yet.Â

---

On Tuesday, January 13, 2015 at 11:49:03 AM UTC-5, David Troidl wrote:

Hi,

The WLC, at least, is still an ongoing work.Â More recent updates catalog changes in agreement with the BHS, especially now that they have been comparing with the BHQ.Â There are cases where they decide to follow BHQ over BHS.

David

On 1/12/2015 3:30 PM, Jesse Griffin wrote:

Hello,

Appendix B of "A Readers Hebrew Bible" lists 27 differences between WLC and BHS.Â Only 11 of those are consonant differences.Â I'm not sure if that helps, but it is a short list of possible discrepancies.

Thank you,

Jesse Griffin

On Mon, Jan 12, 2015 at 1:18 PM, Nathan Bierma <nbie...@calvinseminary.edu> wrote:

A follow-up about the BHS question. David I-B says there's no need to use BHS since the Open Scriptures text (which is WLC-based, correct?) is superior. I agree, but I and our contract developer want to integrate the DANS/ETCBC parsings with our WLC/OS html files, and the developer is asking about discrepancies between the source and target versions and whether this would work. So two questions.Â

First, can we generate new html files from the ETCBC data, basically using this script:

http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/text/plain.ipynb#Verse-list

Or would those essentially be BHS files and run afoul of permissions?Â

Second, if we attempt to integrate ETCBC parsings with our existing WLC/OS files instead, will we encounter significant discrepancies? I'd always heard that discrepancies between BHS and WLC were negligible but I don't know for sure.Â

Basically we're still trying to figure out how exactly to take this gold mine and harvest it in our app. Thanks for any guidance you can give..

Nathan

On Fri, Nov 28, 2014 at 7:19 AM, Daniel Owens <dcow...@gmail.com> wrote:

That is good news. Thanks for clarifying.

Daniel

On 11/28/14 6:08 PM, David Instone-Brewer wrote:

Yes, I will probably use this, but I'm busy with other things at present.

The dispute with the German Bible Society is (acc to Dirk at ETC) about their use of the BHS text.

However we have no need to use the BHS, seeing that the OpenScripture text is arguably better, so there isn't any legal issue as far as I know.

David IB

At 08:15 28/11/2014, Daniel Owens wrote:

This is very good news, thank you for sharing it with us. I hope they are able to resolve all questions about legal issues that may result from their association with the German Bible Society. But in any case this is good news for STEP and anyone else wanting to make use of the data. Are you preparing a module for STEP?

Daniel

On 11/27/14 8:15 PM, David Instone-Brewer wrote:

Dear David & Daniel (and other OpenScripture friends)

I'm forwarding an email (below) about a Hebrew OT morphology which has just become public.

Please note that the email was a response to a personal enquiry and is not an official public statement.

Here's a quick overview of the situation

* ETCBC4 is a database of morphology & syntax for the whole OT developed over the last 4 decades by researchers at Amsterdam's Free University.

* As I understand it, this is an extremely accurate and academically respectable dataset that is arguably better than the Westminster one (though I have not investigated it properly myself)

* This is licenced for use in Logos, Accordance etc, like the rival Westminster Hebrew morphology

* The owners now want to make it freely available. The German Bible Society (who put some seed money into it) were not happy with this

* The Eep Talstra Centre (who have developed ETCBC4) are nevertheless asserting their right to allow free distribution of this dataÃ‚Â (see the email below).

David IB
Ã‚Â
///Ã‚ Ã‚Â Dr David Instone-Brewer
dibÃ‚ Ã‚Â Senior Research Fellow in Rabbinics and the New Testament
Ã‚ ^Ã‚ Ã‚ Ã‚Â Tyndale House, 36 Selwyn Gardens, Cambridge, CB3 9BA, UK
\=/Ã‚ Ã‚Â ...@Tyndale.cam.ac.ukÃ‚ Ã‚ Ã‚ Ã‚ Ã‚Â www.TyndaleHouse.com

From: Dirk Roorda [ ...@dans.knaw.nl]

Sent: 20th November 2014 11:48

Re: The ETCBC4 database is now in the public domain

Ã‚

Ã‚

After realizing that, we no more hesitated, and put the complete ETCBC4 database in the public domain.

So, if you download it from DANS, you can do with it what you like, except using it for commercial applications, and we would appreciate proper attribution (cite the persistent identifier).

Ã‚

The DANS source is the EASY archive. urn:nbn:nl:ui:13-048i-71

Here you find the complete datasources, in several formats. Have a look at the readme file there.

Ã‚

The old WIVU site is being replaced by shebanq: http://shebanq.ancient-data.org

Ã‚

By the way, the ETCBC4 database from DANS is a snapshot.

Constantijn Sikkel and others are still adding to the database.

When they are "ready and finished", we will archive the result as ETCBC4s at DANS, in more or less the same format.

Ã‚

Ã‚

Dirk Roorda

researcher

+31 (0)6 13 66 50 23

dirk....@dans.knaw.nl

Data Archiving and Networked Services (DANS)

DANS promotes sustained access to digital research data. DANS is an institute of KNAW and NWO.

www.dans.knaw.nl

),

Ã‚Â

--

You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.

To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures...@googlegroups.com.

Visit this group at http://groups.google.com/group/openscriptures-hb.

--

Nathan Bierma

Educational Technologist

Interim Associate Director, Distance Education

Calvin Theological Seminary

This email has been checked for viruses by Avast antivirus software.

www.avast.com

--
Nathan Bierma
Educational Technologist
Interim Associate Director, Distance Education
Calvin Theological Seminary

Jesse Griffin

unread,

Jan 28, 2015, 5:32:33 PM1/28/15

to Nathan Bierma, David Instone-Brewer, openscri...@googlegroups.com

Hello,

I likewise wish I had more time to look at this.

If I understand the issue correctly, perhaps a more fool-proof method would be to iterate through the words in a given version (whichever one is simpler), strip the vowels and then attempt to pull that same word out of the other version. If you are moving verse by verse then this method should get you a correct match of the words every time. Then you could use the original words (with vowels and cantillation) in each of the sources to lookup the parsing for that term and add them to your unified output. You will still have the mismatches in parsing data, but I think this method would be more effectively at lining up the words.

I also think that using http://www.nltk.org/ might be helpful, but again, I haven't had time to dive into NLTK yet.

Thank you,

Jesse Griffin

Nathan Bierma

unread,

Jan 29, 2015, 10:59:48 AM1/29/15

to Jesse Griffin, David Instone-Brewer, openscri...@googlegroups.com

Thanks Jesse. A reply from our developer:

-----

This is actually what the Ruth export does already. But even using that approach, the four chapters of Ruth turned up several mismatches. So yes, it's pretty effective, but not 100% which is what we'd need for a machine import.

David Troidl

unread,

Jan 29, 2015, 5:04:45 PM1/29/15

to Nathan Bierma, Jesse Griffin, David Instone-Brewer, openscri...@googlegroups.com

A few comments from my experience matching texts in the past, especially the MapM with the OSHB. I start by matching the versification, to make sure the verses are matching properly. Then I compare word counts for each verse. In cases of a mismatch, the words can be mapped to the proper ones. All these hand editing steps prove to be reasonably moderate to handle. Then I compare the individual words in three forms. If the words match, everything is fine. If I remove the cantillation and they match, I record the difference. This is more common than actual word discrepancies. Then I compare the consonantal forms, if the vowel forms don't match, and record differences here.

Once you have the words corresponding in the two texts, the morphology can be transfered over. In writing the morphology in OSHB form, use "x" for missing or unknown values, rather than "?". The results may not be a perfect parsing of the text, but should be usable.

Peace,

David

Teus Benschop

unread,

Aug 17, 2015, 11:12:56 AM8/17/15

to OpenScriptures Hebrew Bible, te...@tyndale.cam.ac.uk, David...@aol.com, peterdan...@gmail.com, jag...@gmail.com, sceptre...@yahoo.com, butrus...@gmail.com

At https://shebanq.ancient-data.org/, there are useful tools to query the database.

This one (https://shebanq.ancient-data.org/hebrew/verse?version=4b&book=Genesis&chapter=1&verse=1) seems to query any information from the database with regard to Genesis 1:1.

Thank you for this announcement.

I am building a Lexicon for http://bibledit.org and the announcement is going to help a lot.

Reply all

Reply to author

Forward