Parsing procedure for words with two lemmas?

48 views
Skip to first unread message

Joel D. Ruark

unread,
Jun 22, 2018, 9:43:00 AM6/22/18
to OpenScriptures Hebrew Bible
Hello everyone, we have come across a few compound words in the OSHB which contain two lemmas for two different Strong's numbers.  These words are: אביעד (Isa 9:6), אחריכן (Ezra 3:5 and a few other verses in 2 Chr), and חרייונים (II Ki 6:25).  Andy Hubert is communicating that it presents some technical problems to mark two lemmas in the same word.  Here is Andy's message:

"It would be an overstatement to say we CANNOT add two lemmas to a single word. We can. However, it sure makes things significantly more complicated for programs using our data. And since it is so few words (2-3?) with few total occurrences (a dozen?), I think we would need a very compelling reason to go this way.  For example, I am working on a BibleTags project that displays the parsing info and a gloss under a word that is clicked upon. (See https://biblearc.com/path/hebrew/en/1/7/ for how this looks at present.) Eventually, this will include a host of info about the lemma. If we allow for a word to have multiple lemmas, I would need to come up with some sort of tabbed system to view such info for each lemma in turn. For the search feature I envision, my data structure is going to need to be quite a bit more complicated and I would need to decide when such combo lemmas should make the results and how. Even just importing the data is made more complicated, and db structures with a simple lemma column are no longer sufficient, etc.  Can it be done? Of course. It is worth the significant extra work and complication to do so for a small handful of words? I don’t think so. I think it could even have a deterring effect for others who otherwise utilize our data. Or they might use it, but end up doing less powerful things with it given the burden of the extra complication."

Jesse Griffin asked me to get on here to talk about how other data sets have approached this problem.  The solution that we've been kicking around is to make new Strong's numbers for these words and add them to the end of the Strong's lexicon (i.e. 8675, 8676, 8677) in order to avoid the technical problems of having to assign two lemmas to a word.  Jesse isn't wild about generating new Strong's numbers, so we're trying to brainstorm possible solutions.  Any thoughts here?

Thanks!  Joel

sceptreofjudah

unread,
Jun 22, 2018, 10:01:49 AM6/22/18
to Joel D. Ruark, OpenScriptures Hebrew Bible
I was wanting to do this when developing the software for HB but David was of the opinion not to do so. It is doable but would take quite a bit of effort at this late date.




Sent from my Samsung Galaxy smartphone.
--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures...@googlegroups.com.
Visit this group at https://groups.google.com/group/openscriptures-hb.

Joel D. Ruark

unread,
Jun 22, 2018, 12:03:21 PM6/22/18
to OpenScriptures Hebrew Bible
Hmmm...so in the end, what did you decide to do?  Did you stick with the two lemmas in one word and just deal with the technical glitches?

David Troidl

unread,
Jun 22, 2018, 5:33:25 PM6/22/18
to openscri...@googlegroups.com

Let me start with the first word.  We can get to the others later.  In the MapM, which is based mainly on the Aleppo Codex, it is divided by a maqqef: https://he.wikisource.org/wiki/%D7%99%D7%A9%D7%A2%D7%99%D7%94%D7%95_%D7%98/%D7%98%D7%A2%D7%9E%D7%99%D7%9D

Here is a screenshot of the verse from the facsimile of the LC: http://www.tanach.us/LCFolios/LC_Folio_223v.pdf

On the left half of the top line, the first two pairs of words of the verse are each separated by a maqqef.  The word in question, near the left side of the fourth line, seems to me to indicate a maqqef, at least as clearly as the second pair above.  It also bears the same relationship to the yod as in the first pair.

I would like to get opinions from the group on this.  Is there agreement?  Would it be worth submitting to the WLC?


On 6/22/2018 9:43 AM, Joel D. Ruark wrote:
אביעד (Isa 9:6)
Peace,

David



Avast logo

This email has been checked for viruses by Avast antivirus software.
www.avast.com


David Troidl

unread,
Jun 22, 2018, 7:14:05 PM6/22/18
to openscri...@googlegroups.com

Now for the other words.  None of these words actually has two lemmas in the OSHB in the repo.  Though that represents a loss of information.  See notes below.


On 6/22/2018 9:43 AM, Joel D. Ruark wrote:
Hello everyone, we have come across a few compound words in the OSHB which contain two lemmas for two different Strong's numbers.  These words are: אביעד (Isa 9:6),
Still open for discussion.  See previous email.

 אחריכן (Ezra 3:5 and a few other verses in 2 Chr),
Ezra 3:5, 2Chr 20:1,35, 24:4 are all listed with a maqqef in the MapM, 20:1 and 24:4 are listed in BDB with a maqqef.  My thought is to insert the maqqef and add a note that we are following the MapM for clarity.  We could also check against the Aleppo Codex, if someone is willing to look them up.  Does this seem reasonable?

and חרייונים (II Ki 6:25).
This ketiv actually has its own Strong number (2755).  This is what is used in the OSHB, and that would resolve the issue.  It also appears to be one word in the LC facsimile.  I would think it could be treated as one single noun, for parsing purposes.

Let me know what you think.

Peace,

David
  Andy Hubert is communicating that it presents some technical problems to mark two lemmas in the same word.  Here is Andy's message:

"It would be an overstatement to say we CANNOT add two lemmas to a single word. We can. However, it sure makes things significantly more complicated for programs using our data. And since it is so few words (2-3?) with few total occurrences (a dozen?), I think we would need a very compelling reason to go this way.  For example, I am working on a BibleTags project that displays the parsing info and a gloss under a word that is clicked upon. (See https://biblearc.com/path/hebrew/en/1/7/ for how this looks at present.) Eventually, this will include a host of info about the lemma. If we allow for a word to have multiple lemmas, I would need to come up with some sort of tabbed system to view such info for each lemma in turn. For the search feature I envision, my data structure is going to need to be quite a bit more complicated and I would need to decide when such combo lemmas should make the results and how. Even just importing the data is made more complicated, and db structures with a simple lemma column are no longer sufficient, etc.  Can it be done? Of course. It is worth the significant extra work and complication to do so for a small handful of words? I don’t think so. I think it could even have a deterring effect for others who otherwise utilize our data. Or they might use it, but end up doing less powerful things with it given the burden of the extra complication."

Jesse Griffin asked me to get on here to talk about how other data sets have approached this problem.  The solution that we've been kicking around is to make new Strong's numbers for these words and add them to the end of the Strong's lexicon (i.e. 8675, 8676, 8677) in order to avoid the technical problems of having to assign two lemmas to a word.  Jesse isn't wild about generating new Strong's numbers, so we're trying to brainstorm possible solutions.  Any thoughts here?

Thanks!  Joel

--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures...@googlegroups.com.
Visit this group at https://groups.google.com/group/openscriptures-hb.

David Troidl

unread,
Jun 24, 2018, 7:23:24 AM6/24/18
to openscri...@googlegroups.com

Here is another possible solution.  Since the LC is presenting us with two words juxtaposed, we could do the same.  For example, in Ezra 3:5, replace
          <w lemma="c/310 a" n="1.1.0.0">וְ/אַחֲרֵיכֵ֞ן</w>
with
          <w lemma="c/310 a">וְ/אַחֲרֵי</w><w lemma="3651" n="1.1.0.0">כֵ֞ן</w>
with no space in between.  The cantillation is marked to treat the first word as connected to the second, as if there were a maqqef in between.

This would seem to resolve the issue satisfactorily.  Are there any objections to this approach?

Peace,

David


On 6/22/2018 9:43 AM, Joel D. Ruark wrote:
--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures...@googlegroups.com.
Visit this group at https://groups.google.com/group/openscriptures-hb.


Virus-free. www.avast.com

Joel D. Ruark

unread,
Jun 25, 2018, 4:23:22 AM6/25/18
to OpenScriptures Hebrew Bible
Thanks for weighing in, David.  I must admit that I'm a semanticist and not a manuscript expert, but I see your argument about the maqqef here.  After looking carefully at the online LC folio (not just your screenshot), I agree with you that it *could* be a maqqef, but I don't think it's as clear as the example you mention three lines earlier.  But overall I agree with you this issue is worth submitting to the WLC on the basis of other manuscript evidence.  There could very well be a maqqef there, but the stroke simply didn't make it all the way to the 'ayin letter.

Joel D. Ruark

unread,
Jun 25, 2018, 4:27:08 AM6/25/18
to OpenScriptures Hebrew Bible
This sounds like a good idea.  Let me float it by Andy and make sure this will work for him, then I'll get back to you.

David Troidl

unread,
Jul 16, 2018, 7:00:46 PM7/16/18
to openscri...@googlegroups.com

I haven't heard anything in a while.  But how about שַׁלְהֶ֥בֶתְיָֽה the last word in Song 8:6?

David IB

unread,
Jul 20, 2018, 6:25:24 AM7/20/18
to OpenScriptures Hebrew Bible
Joel, thanks for pointing these out. 
I don't think there is any problem with Sng.008.006-18 שַׁלְהֶ֥בֶתְיָֽה (similarly at Job.015.030-07; Ezk.021.003-26)

However, we may have to consider a few others: 
Neh.002.013-17 המ//פרוצים (in Leningrad a partial pasuq appears to separate the words) H1992=הֵ֫מָּה=they(masc.)/H9015=¦/H6555=פָּרַץ=to break through
Psa.106.001-01 הַֽלְלוּ//יָ֨הּ/׀ H1984b=הָלַל=to boast//H3050=יָהּ=YH--/H9015=¦
Jer.029.023-18 הו//ידע H1931=הוּא=he/she/it//H3045=יָדַע=to know
Mal.001.013-03 מַ/תְּלָאָ֜ה H4100=מָה=what?/H8513=תְּלָאָה=hardship (OS=H4972=מַתְּלָאָה=weariness)
1Ch.015.013-02 לְ/מַ/בָּ/רִ֥אשׁוֹנָ֖ה H9005=l=to/H4100=מָה=what?/H9003=b=in/H7223=רִאשׁוֹן=first

David IB

David Troidl

unread,
Jul 24, 2018, 8:15:46 PM7/24/18
to openscri...@googlegroups.com

The issue with Sng.008.006-18 is the 'yah' ending, which the other two instances don't have.  Would this be H3050, as in Ps.106.1?



Neh.002.013-17    המ//פרוצים    (in Leningrad a partial pasuq appears to separate the words) H1992=הֵ֫מָּה=they(masc.)/H9015=¦/H6555=פָּרַץ=to break through

Since this is a ketiv, and the qere makes exactly that correction, and the fact that the WLC divides the word after the he, seems to weigh against this interpretation.



Psa.106.001-01    הַֽלְלוּ//יָ֨הּ/׀    H1984b=הָלַל=to boast//H3050=יָהּ=YH--/H9015=¦

Added to the list.



Jer.029.023-18    הו//ידע    H1931=הוּא=he/she/it//H3045=יָדַע=to know

Seems right, but I'll have to remember how to mark up a double ketiv, with a single qere.



Mal.001.013-03    מַ/תְּלָאָ֜ה    H4100=מָה=what?/H8513=תְּלָאָה=hardship    (OS=H4972=מַתְּלָאָה=weariness)

Strong H4972 is specifically listed as composed of H4100 and H8513, and the KJV has 'what a weariness'.



1Ch.015.013-02    לְ/מַ/בָּ/רִ֥אשׁוֹנָ֖ה    H9005=l=to/H4100=מָה=what?/H9003=b=in/H7223=רִאשׁוֹן=first

Added to the list.

David

David Troidl

unread,
Jul 24, 2018, 8:40:53 PM7/24/18
to openscri...@googlegroups.com

Here is what I have so far.  I would like to make the corrections soon.  Any comments would be appreciated.

Ezra.3.5
-          <w lemma="c/310 a" n="1.1.0.0">וְ/אַחֲרֵיכֵ֞ן</w>
+          <w lemma="c/310 a">וְ/אַחֲרֵי</w><w lemma="3651" n="1.1.0.0">כֵ֞ן</w>
2Chr.20.1
-          <w lemma="310 a" n="0.0.0.1">אַֽחֲרֵיכֵ֡ן</w>
+          <w lemma="310 a">אַֽחֲרֵי</w><w lemma="3651" n="0.0.0.1">כֵ֡ן</w>
2Chr.20.35
-          <w lemma="c/310 a" n="1.1.1">וְ/אַחֲרֵיכֵ֗ן</w>
+          <w lemma="c/310 a">וְ/אַחֲרֵי</w><w lemma="3651" n="1.1.1">כֵ֗ן</w>
2Chr.24.4
-          <w lemma="310 a" n="1">אַחֲרֵיכֵ֑ן</w>
+          <w lemma="310 a">אַחֲרֵי</w><w lemma="3651" n="1">כֵ֑ן</w>
Isa.9.5
-          <w lemma="1" n="0.0" morph="HNcmsc/Ncmsa">אֲבִי/עַ֖ד</w>
+          <w lemma="1" morph="HNcmsc">אֲבִי</w><w lemma="5703" n="0.0" morph="HNcmsa">עַ֖ד</w>
Ps.106.1
-          <w lemma="1984 b" n="2" morph="HVpv2mp/Np">הַֽלְלוּ/יָ֨הּ</w>
+          <w lemma="1984 b" morph="HVpv2mp">הַֽלְלוּ</w><w lemma="3050" n="2" morph="HNp">יָ֨הּ</w>
Jer.29.23
-          <w lemma="d/3045">הו/ידע</w><note type="variant"><catchWord>הו/ידע</catchWord>
+          <w lemma="1931">הו</w><w lemma="3045">ידע</w><note type="variant"><catchWord>הוידע</catchWord>
1Chr.15.13
-          <w lemma="l/m/b/7223" n="1.0" morph="HR/R/Rd/Aafsa">לְ/מַ/בָּ/רִ֥אשׁוֹנָ֖ה</w>
+          <w lemma="l/4100" morph="HR/Ti">לְ/מַ</w><w lemma="b/7223" n="1.0" morph="HRd/Aafsa">בָּ/רִ֥אשׁוֹנָ֖ה</w>

David

David Troidl

unread,
Jul 25, 2018, 5:13:50 PM7/25/18
to openscri...@googlegroups.com

Okay.  Thanks for the input on this thread.  I have added two items to the list:

Song.8.6
-          <w lemma="7957" n="0" morph="HNcfsa">שַׁלְהֶ֥בֶתְיָֽה</w><seg type="x-sof-pasuq">׃</seg>
+          <w lemma="7957" morph="HNcfsa">שַׁלְהֶ֥בֶתְ</w><w lemma="3050" n="0" morph="HNp">יָֽה</w><seg type="x-sof-pasuq">׃</seg>
2Chr.30.3
-          <w lemma="l/m/4078" n="0.1">לְ/מַ/דַּ֔י</w>
+          <w lemma="l/4100">לְ/מַ</w><w lemma="1767" n="0.1">דַּ֔י</w>

This last one is specifically listed under 'mah' in BDB.

David

Joel D. Ruark

unread,
Jul 26, 2018, 6:48:32 AM7/26/18
to openscri...@googlegroups.com
I have no comments.  Everything looks right to me.


To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures-hb+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures-hb+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures-hb+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OpenScriptures Hebrew Bible" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscriptures-hb+unsubscribe@googlegroups.com.



--
_______________________________
Joel D. Ruark, Th.M.
Ph.D. Candidate, Old Testament Studies
Stellenbosch University
Skype: joelruark

David Troidl

unread,
Jul 26, 2018, 9:45:46 AM7/26/18
to openscri...@googlegroups.com

I have committed the changes, and included some Strong number corrections I had accumulated.

David Troidl

unread,
Jul 27, 2018, 9:03:34 PM7/27/18
to openscri...@googlegroups.com

I juts updated the gh-pages branch.  You can see the results of this work at the cantillation demo:
http://openscriptures.github.io/morphhb/structure/OshbVerse/.  Go to one of the affected verses.  The word appears normal, but when you hover over the word, you see each part has its own popup.  Examples are Ezra 3:5 and Psalm 106:1, the first word in either verse.

David

Reply all
Reply to author
Forward
0 new messages