OSHB Morphology

271 views
Skip to first unread message

David Troidl

unread,
Dec 11, 2013, 9:04:46 PM12/11/13
to OSHB, openscr...@googlegroups.com, opensid...@googlegroups.com
We are announcing our first release of the morphology for portions of the Open Scriptures Hebrew Bible.

This first release contains Ruth and Jonah complete, as well as small portions of Genesis, Isaiah and Psalms.  Files can be downloaded here: https://github.com/openscriptures/morphhb

We are using the Hebrew Morphology Codes:
http://openscriptures.github.io/morphhb/parsing/HebrewMorphologyCodes.html

Anyone interested in learning more, or even contributing to our efforts, can join the mailing list
openscri...@googlegroups.com
or register at the parsing site:
http://hb.openscriptures.org/OshbParse/index.php

Thanks to all who have contributed to the construction of the website and the parsing so far.

Peace,

David



This email is free from viruses and malware because avast! Antivirus protection is active.


John Dyer

unread,
Dec 13, 2013, 12:09:43 PM12/13/13
to openscr...@googlegroups.com
Congrats on the release. Can't wait to use it!


--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscripture...@googlegroups.com.
To post to this group, send email to openscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/openscriptures.
For more options, visit https://groups.google.com/groups/opt_out.



--
John Dyer - http://j.hn/

scruf...@gmail.com

unread,
Dec 25, 2013, 7:24:35 PM12/25/13
to openscr...@googlegroups.com, OSHB, opensid...@googlegroups.com
This is such great progress. I'd love to help...

scruf...@gmail.com

unread,
Dec 25, 2013, 7:31:53 PM12/25/13
to openscr...@googlegroups.com, OSHB, opensid...@googlegroups.com
How are the XML files generated? Would it be conceivable that JSON files could also be generated?

On Thursday, 12 December 2013 02:04:46 UTC, DavidTroidl wrote:

Daniel Owens

unread,
Dec 25, 2013, 9:27:28 PM12/25/13
to openscr...@googlegroups.com
The XML files are generated based on a MySQL database dump, if I remember correctly. I am not familiar with JSON, but I would think a transformation from XML would not be very difficult.

Daniel

David Troidl

unread,
Dec 26, 2013, 1:50:49 PM12/26/13
to openscr...@googlegroups.com
Here's a quick first approximation of the book of Jonah.  I still have to figure out how to get rid of the extra commas.  Is this the kind of thing you're looking for?

David
Jonah.js

Ben Dwyer

unread,
Dec 26, 2013, 4:12:13 PM12/26/13
to openscr...@googlegroups.com
This looks great. It's not in the format that I will need it in, but it's a good step in that direction. How did you create it? I am thinking about writing an XML -> JSON converter for the ASV...


--
You received this message because you are subscribed to a topic in the Google Groups "Open Scriptures" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openscriptures/iF6ek80sKb8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openscripture...@googlegroups.com.

David Troidl

unread,
Dec 26, 2013, 5:09:59 PM12/26/13
to openscr...@googlegroups.com
I wrote an XSLT style sheet and transformed the XML.  If you're going from OSIS to JSON, the principle should be pretty much the same.

David

Pete Norcross

unread,
Dec 27, 2013, 10:55:44 AM12/27/13
to openscr...@googlegroups.com
It'd be easier to go from MySQL to JSON. I've fallen out of development of my JSON-based software since I got my new job at Comcast, but I'd be glad to write a SQL to JSON PHP script, if anybody has a JSON schema they'd like me to convert to. 

Nathan Bierma

unread,
Dec 28, 2013, 10:33:18 AM12/28/13
to openscr...@googlegroups.com
I realize this isn't what was asked for, but if it's helpful to anyone our developer made a utility for converting this XML to HTML using Node.js. Here it is:


Nathan 

Nathan Bierma
Educational Technologist
Calvin Theological Seminary

scruf...@gmail.com

unread,
Jan 1, 2014, 5:43:43 PM1/1/14
to openscr...@googlegroups.com
What is the best way to describe a JSON schema? I have played around with a few different ones, but the one I use at moment is this:

bookName: [
[ chapter 1 ],
[ chapter 2 ],
[ chapter 3 ],
[ ... ],
[ [verse 1],
  [verse 2],
  [verse 3],
  [ ... ]
],
[
[ [ 'word', 'lemma', 'morph' ], [ 'word', 'lemma', 'morph' ], [ 'word', 'lemma', 'morph' ], [ 'word', 'lemma', 'morph' ], [ 'word', 'lemma', 'morph' ], [ ..., ..., ... ] ]
],
]

That is probably more confusing than helpful. You can see an example here: http://javascripture.org/data/kjvdwyer7.js

I have found this simple array to be faster than a keyed object.

David Troidl

unread,
Jan 2, 2014, 9:10:54 AM1/2/14
to openscr...@googlegroups.com
Here's a second rough draft, in this format.  I worked out the comma problem.  There are a few questions left, though.
1)    We have the punctuation separated from the word.  In this file, I put the punctuation in its own element.  Can you handle it that way?
2)    The same goes for samekh and pe markers.
3)    I completely left out the qere alternatives, and only included the ketiv.  I can probably isolate the qere, but how do you want to deal with them in the markup?
4)    I also left out the notes.  Do you want anything done with them?  They include the original WLC notes, the KJV versification, and a few other things.
5)    The OSHB uses the Masoretic text versification.  Are you set up to handle that, without any chapter and verse indicators?

Let me know what you think.

David
Jonah.js

scruf...@gmail.com

unread,
Jan 7, 2014, 3:52:42 PM1/7/14
to openscr...@googlegroups.com
Thanks David, these are interesting questions.

1. I think it's good to keep the punctuation separate, but we'll need to escape it so it doesn't result in invalid JSON - where did you put it in the array? Maybe the first element of the array should itself be an array when there is punctuation - like:
[ 'string before word', 'word', 'string after word' ]?

2. What are samekh and pe markers?

3. What are qere and ketiv?

4. Maybe I should revise the format to allow for the notes, lets resolve the other questions first :)

5. The way I handled this before was to adjust the hebrew text to match with the KJV. Is there a standard versification which matches the two together?

As I think about this more, and also consider things posted in the other discussions, I am becoming more of the opinion that I should try to use an open standard for this data so I can plug more translations in without doing too much work, but I am not sure there is a standard yet....

David Troidl

unread,
Jan 7, 2014, 4:41:16 PM1/7/14
to openscr...@googlegroups.com
1.    I put the punctuation in its own array, inline with the word arrays.  The Hebrew punctuation has maqqef, that works like a hyphen, paseq, a vertical bar that separates words, and sof passuq, that signals the end of a verse (it looks like a colon).  In the Jonah file, the maqqef and sof passuq are both in the first line.  The paseq is more rare.

2.    Samekh and pe are Hebrew letters.  They are used here to represent places in the original manuscript where there is, basically, a line break or a paragraph break.

3.    Ketiv means "what is written", qere means "what is read".  The qere are basically marginal notes in the original manuscript, that indicate how the corresponding word in the text is supposed to be read.

4.    The notes can be handled later.

5.    I have notes in the OSHB that indicate 1973 places where the Masoretic versification differs from the KJV.  It often affects chapter breaks, and then carries on through the rest of the chapter.  So it would require a considerable amount of work to get it lined up with the KJV.  The order of the books is also significantly different, but that should be easier to rearrange.

David

scruf...@gmail.com

unread,
Mar 12, 2014, 7:45:35 PM3/12/14
to openscr...@googlegroups.com
Hi David,

This is great. I have a few issues with trying to interpret the strongs numbers. When the strong's numbers are in a numerical format they are the second element of the array, but sometimes they begin with l/ or c/ (not sure why) and then the morphology comes second and the strong's number moves to third place. Is this an error in your script?

Ben

scruf...@gmail.com

unread,
Mar 12, 2014, 7:58:45 PM3/12/14
to openscr...@googlegroups.com
Actually ignore what I said. The positioning seems to be fine, it just looks strange in my text editor because its in a RTL format.

I am still unsure what the difference between l/1234 and c/1234 is in front of the strongs numbers....

David Troidl

unread,
Mar 12, 2014, 8:01:49 PM3/12/14
to openscr...@googlegroups.com
The letters are for the prefixes on the words.  The "l" is for the Hebrew lamed, that mean "to", and "c" is for the conjunction, "and".

David
For more options, visit https://groups.google.com/d/optout.

scruf...@gmail.com

unread,
Sep 15, 2014, 7:18:31 PM9/15/14
to openscr...@googlegroups.com
I also see an "a", "b", "d", "m". Are these documented anywhere?

Thanks again. I'll have something to show for this soon :)
...

David Troidl

unread,
Sep 15, 2014, 8:27:53 PM9/15/14
to openscr...@googlegroups.com
It doesn't appear that they are documented anywhere.  The "b" before a slash is for Hebrew bet, meaning "in", and "d" is for the definite article, Hebrew he.  After the numbers, separated by a space, the a, b, etc. are the augments that distinguish when BDB breaks down a Strong entry into several parts.

David
For more options, visit https://groups.google.com/d/optout.

scruf...@gmail.com

unread,
Sep 19, 2014, 10:36:06 AM9/19/14
to openscr...@googlegroups.com
Hi David,

I have added the Hebrew morphological data to javascripture.org if you want to take a look. If you have loaded the site before you will need to wait for the new version to be downloaded, and then you'll be prompted to update.

Let me know how you get on. Thanks for all your help with this.

Ben
<a moz-do-not-send="true" href="http://www.avast.com/" target="_blank"
...

David Troidl

unread,
Sep 19, 2014, 7:49:56 PM9/19/14
to openscr...@googlegroups.com
Hi Ben,

It has some interesting functionality.  One little glitch I noticed: when I click on a word, the definition shows on the left, with the morphology at the bottom.  If I double-click on the same word, the references show on the right, and everything is still fine on the left.  But then if I click on the same word again, the morphology disappears.  In the New Testament, it says Morphology: undefined.  In the Old, it has Morphology::

David
For more options, visit https://groups.google.com/d/optout.

scruf...@gmail.com

unread,
Sep 19, 2014, 8:43:39 PM9/19/14
to openscr...@googlegroups.com
Thanks for the bug report. This is now fixed in the latest version. :)
Visit this group at <a moz-do-not-send="true" href="http://groups.google.com/group/openscriptures" target="_blank" onmousedown="this.href='http://groups.google.com/group/openscriptures';return true;"
...

scruf...@gmail.com

unread,
Sep 23, 2014, 5:23:07 PM9/23/14
to openscr...@googlegroups.com
David,

Looking on the parsing site it seems like the data in the git repo only contains verified morphological data. Would it be possible to do an export which contains everything that has been done, even if it's not been verified?

Thanks
Ben
Visit this group at <a moz-do-not-send="true" href="http://groups.google.com/group/openscriptures" target="_blank" onmousedown="this.href='http://groups.google.com/group/openscriptures';return true;"
...

David Troidl

unread,
Sep 23, 2014, 6:17:46 PM9/23/14
to openscr...@googlegroups.com
Ben,

The XML and SQL dumps of the database are at:


Those are automatically regenerated every 10 minutes.

David

johnmarsing

unread,
Sep 29, 2014, 2:43:52 PM9/29/14
to openscr...@googlegroups.com
David,

I got excited when I say that there was a script to create the word table as opposed to working with XML (which I’m not good at).  I was hoping that I could use this script in some way and make it work with the database that I’m using which is in Sql Server.  The problem is that it looks like it would be very hard to convert this script to T-SQL (which is what Sql Server uses).


Therefore I was wondering how hard would it be for you to make an actual MySql database consisting of the word table?  My thinking is that I can then download this database and use ODBC to get at the content from within Sql Server.

What do you think?  It would be far easier for me to go about this then trying to setup an environment that’s required to run a MySql server which I’ve never used before.


Thanks,

John Marsing

www.MyHebrewBible.com

...

Jesse Griffin

unread,
Sep 29, 2014, 3:02:31 PM9/29/14
to openscr...@googlegroups.com
I've setup a read only user account.  You can connect with this information:

port 3306
user hbread
pass OgepbamDoor5



Thank you,
Jesse Griffin

--

John Marsing

unread,
Sep 29, 2014, 4:38:43 PM9/29/14
to openscr...@googlegroups.com
Hello Jesse,
the user and password didn't work, at least not through the browser,  do i need to FTP into it?

Jesse Griffin

unread,
Sep 29, 2014, 4:40:36 PM9/29/14
to openscr...@googlegroups.com
John,

Those credentials provide access to the MySQL database.  You should be able to use those to create the ODBC connection that you want.

Thank you,
Jesse Griffin

scruf...@gmail.com

unread,
Oct 25, 2014, 6:25:41 PM10/25/14
to openscr...@googlegroups.com, openscri...@googlegroups.com, opensid...@googlegroups.com
How often is the repo itself updated? Does it depend how much parsing work has been done?


On Thursday, 12 December 2013 02:04:46 UTC, DavidTroidl wrote:
We are announcing our first release of the morphology for portions of the Open Scriptures Hebrew Bible.

This first release contains Ruth and Jonah complete, as well as small portions of Genesis, Isaiah and Psalms.  Files can be downloaded here: https://github.com/openscriptures/morphhb

We are using the Hebrew Morphology Codes:
http://openscriptures.github.io/morphhb/parsing/HebrewMorphologyCodes.html

Anyone interested in learning more, or even contributing to our efforts, can join the mailing list
openscri...@googlegroups.com
or register at the parsing site:

David Troidl

unread,
Oct 26, 2014, 7:56:29 AM10/26/14
to openscr...@googlegroups.com
Once the parsing is done, it has to be verified by an editor.  Then when we have a significant amount done, we will update the repo.
--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscripture...@googlegroups.com.
To post to this group, send email to openscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/openscriptures.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages