Morphology Codes

162 views
Skip to first unread message

David Troidl

unread,
Jul 8, 2012, 8:39:22 AM7/8/12
to openscr...@googlegroups.com, opensid...@googlegroups.com
Hi All,

We are in the process of finalizing the Hebrew morphology codes. The
latest version is available at
https://github.com/openscriptures/morphhb/tree/master/parsing

We have added a language indicator, H for Hebrew, A for Aramaic, and
refined the verb aspect codes, depending on language. There is a new
Adjective type for gentilic adjectives. The page has been restructured
for printing. Unfortunately, the only browser that seems to respect the
CSS page-break directive properly is Firefox.

Comments would be appreciated. Any little issues can be fixed now,
before we start applying the codes to the Hebrew text.

Peace,

David

Butrus Damaskus

unread,
Jul 8, 2012, 9:20:17 AM7/8/12
to openscr...@googlegroups.com
Is there a direct link to the HTML file that could be viewed in a browser?
Thanks!
> --
> You received this message because you are subscribed to the Google Groups
> "Open Scriptures" group.
> To post to this group, send email to openscr...@googlegroups.com.
> To unsubscribe from this group, send email to
> openscripture...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/openscriptures?hl=en.
>

David Troidl

unread,
Jul 8, 2012, 9:26:52 AM7/8/12
to openscr...@googlegroups.com
Sorry, I'm not an expert in git, but all I was able to find was a
listing of the HTML source. Maybe someone else would know better that I.

David

Weston Ruter

unread,
Jul 8, 2012, 12:05:59 PM7/8/12
to openscr...@googlegroups.com
We could create a gh-pages branch which then would make all of the files available under openscriptures.github.com/morphhb
--

Butrus Damaskus

unread,
Jul 8, 2012, 1:20:57 PM7/8/12
to openscr...@googlegroups.com
Thanks! I already cloned the git repository locally and viewed it by
the browser as a local file.

(It would however still not be bad to have the current version of this
HTML page accessible online and linkable from other documents...)

John Dyer

unread,
Jul 8, 2012, 2:34:53 PM7/8/12
to openscr...@googlegroups.com, opensid...@googlegroups.com
David,
I know you want substantive feedback, but can I take a minute just to say, "Great work!"?

Can't wait to use this!

JD

--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To post to this group, send email to openscriptures@googlegroups.com.
To unsubscribe from this group, send email to openscriptures+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/openscriptures?hl=en.




--
John Dyer - http://j.hn/

David Troidl

unread,
Jul 8, 2012, 4:54:52 PM7/8/12
to openscr...@googlegroups.com
That sounds fine for the static HTML page.  But what would happen to XML, XSLT, XSD, etc.?

David Troidl

unread,
Jul 8, 2012, 4:56:03 PM7/8/12
to openscr...@googlegroups.com
Thank you.


On 7/8/2012 2:34 PM, John Dyer wrote:
David,
I know you want substantive feedback, but can I take a minute just to say, "Great work!"?

Can't wait to use this!

JD

On Sun, Jul 8, 2012 at 7:39 AM, David Troidl <David...@aol.com> wrote:
Hi All,

We are in the process of finalizing the Hebrew morphology codes. The latest version is available at
https://github.com/openscriptures/morphhb/tree/master/parsing

We have added a language indicator, H for Hebrew, A for Aramaic, and refined the verb aspect codes, depending on language.  There is a new Adjective type for gentilic adjectives.  The page has been restructured for printing.  Unfortunately, the only browser that seems to respect the CSS page-break directive properly is Firefox.

Comments would be appreciated.  Any little issues can be fixed now, before we start applying the codes to the Hebrew text.

Peace,

David


--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To post to this group, send email to openscr...@googlegroups.com.
To unsubscribe from this group, send email to openscripture...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/openscriptures?hl=en.




--
John Dyer - http://j.hn/
--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To post to this group, send email to openscr...@googlegroups.com.
To unsubscribe from this group, send email to openscripture...@googlegroups.com.

Weston Ruter

unread,
Jul 8, 2012, 5:33:29 PM7/8/12
to openscr...@googlegroups.com, David Troidl
David, other formats like XML, XSLT, and XSD should all work as expected. Any client-side technology should work (JavaScript, SVG, etc), including images.

I created a gh-pages branch for morphhb and pushed it out to GitHub. You can now see the parsings page there without having to download the repo: http://openscriptures.github.com/morphhb/parsing/HebrewMorphologyCodes.html

I'd suggest that you modify the README to point to any such resources that should be browsed, and even to add an index.html to the root of the gh-pages branch with perhaps an extended README which then could serve as the homepage of the project, accessible at http://openscriptures.github.com/morphhb/

You can pull down the gh-pages branch like this:

git checkout --track -b gh-pages origin/gh-pages

And then I'd recommend you treat the gh-pages as a downstream branch, in other words that you only merge into gh-pages and not from gh-pages into master. That way you can add HTML pages to the gh-pages that are just for the site without then having them appear in the master branch, for example:

git checkout gh-pages
git add index.html
git commit -m "Adding homepage"
git push

You can then just periodically update the gh-pages with the latest form master and push it out:

git checkout gh-pages
git merge master
git push

Hope this helps!
Weston

Chris Little

unread,
Jul 9, 2012, 3:38:30 AM7/9/12
to openscr...@googlegroups.com
There's a maxim that one should avoid making any semantic distinction
that is signaled by a difference in capitalization. The chief problem
with such distinctions is that they are hard to read aloud to others and
so you introduce the possibility of transmission errors whenever someone
fails to realize that q does not equal Q, etc.

It also makes loss of information possible whenever someone naively
case-folds your data. (Sword will, for example, manifest this error, if
the morphology codes are encoded as proposed and then used for a
morphology lexicon, as we've done for other systems. But we should be
able to fix this problem, in theory.) It's also common, not just in
Sword, to case-fold prior to performing a search. So the distinction
between q & Q, h & H, etc. will be lost by most applications that
perform searches over tagged data, unless case-sensitive searches are
possible and specifically designated.

I don't necessarily have a recommendation for what to use instead, since
you have > 26 stem codes for Hebrew. Some options would be to use
2-character codes for each stem type (but I do recommend using a
constant width for all stem codes). You could also use all 26 letters
and then add another non-letter like @ or & to represent the 27th stem-type.

The coding looks good otherwise (to this Semitic languages neophyte).

--Chris

Butrus Damaskus

unread,
Jul 9, 2012, 3:46:49 AM7/9/12
to openscr...@googlegroups.com
On Mon, Jul 9, 2012 at 9:38 AM, Chris Little <chri...@crosswire.org> wrote:
> On 7/8/2012 5:39 AM, David Troidl wrote:
>>
>> Hi All,
>>
>> We are in the process of finalizing the Hebrew morphology codes. The
>> latest version is available at
>> https://github.com/openscriptures/morphhb/tree/master/parsing
>>
>> We have added a language indicator, H for Hebrew, A for Aramaic, and
>> refined the verb aspect codes, depending on language. There is a new
>> Adjective type for gentilic adjectives. The page has been restructured
>> for printing. Unfortunately, the only browser that seems to respect the
>> CSS page-break directive properly is Firefox.
>>
>> Comments would be appreciated. Any little issues can be fixed now,
>> before we start applying the codes to the Hebrew text.
>
>
> There's a maxim that one should avoid making any semantic distinction that
> is signaled by a difference in capitalization.

Who's maxim? Since this is "machine-readable" code I don't see any problems
with capitals.

> The chief problem with such
> distinctions is that they are hard to read aloud to others and so you
> introduce the possibility of transmission errors whenever someone fails to
> realize that q does not equal Q, etc.
>
> It also makes loss of information possible whenever someone naively
> case-folds your data. (Sword will, for example, manifest this error, if the
> morphology codes are encoded as proposed and then used for a morphology
> lexicon, as we've done for other systems. But we should be able to fix this
> problem, in theory.) It's also common, not just in Sword, to case-fold prior
> to performing a search. So the distinction between q & Q, h & H, etc. will
> be lost by most applications that perform searches over tagged data, unless
> case-sensitive searches are possible and specifically designated.
>
> I don't necessarily have a recommendation for what to use instead, since you
> have > 26 stem codes for Hebrew. Some options would be to use 2-character
> codes for each stem type (but I do recommend using a constant width for all
> stem codes). You could also use all 26 letters and then add another
> non-letter like @ or & to represent the 27th stem-type.
>
> The coding looks good otherwise (to this Semitic languages neophyte).
>
> --Chris
>
>

Chris Little

unread,
Jul 9, 2012, 5:32:38 AM7/9/12
to openscr...@googlegroups.com
On 7/9/2012 12:46 AM, Butrus Damaskus wrote:
> Who's maxim? Since this is "machine-readable" code I don't see any problems
> with capitals.

I suspect that the purpose of the coding is to create something that is
simultaneously machine-readable and human-readable. More specifically, I
think the objective is to come up with a coding that is easy for humans
and machines to transmit with a high degree of accuracy and reasonably
easy for humans to parse, at least in the more common cases. Machine
parsing of the proposed coding would be possible only with specialized
parsers--parsers that know the coding scheme and can read the code string.

Developing a machine-readable coding, without concern for
human-readability is trivial: Just enumerate the values for each
category. E.g. the 27 Hebrew stem types are can simply be enumerated
1-27. Then string the different categories together. Or... enumerate
every possible combination of values: "Hebrew verb qal perfect
3rd-person masculine singular" == 1, and so forth. They're perfectly
machine-readable, but not good codings if a human ever has to look at them.

If parsing with general-purpose XML parsers is desired, then TEI feature
structures would be worth looking into.

--Chris

David Troidl

unread,
Jul 9, 2012, 8:24:08 PM7/9/12
to openscr...@googlegroups.com
Chris,

I can understand this maxim in defining variable and function names in a
case-sensitive language like JavaScript, or in choosing XML ID values.
But this certainly can't be talking about the content of the document.

We have spent considerable time, behind the scenes, hashing out this
aspect of the morphology, and finally arrived at a workable system.
Using case to associate related values is common in Hebrew morphology
anyway. If you plan to support the Westminster Hebrew Morphology, the
issue will have to be dealt with. The capitals generally represent the
passive form of the lower case values.

Peace,

David

Daniel Owens

unread,
Jul 10, 2012, 8:54:39 AM7/10/12
to openscr...@googlegroups.com
Chris,

A few thoughts from someone involved in the behind-the-scenes hashing out... Thank you for your feedback. We were hoping for this kind of feedback to help David and I catch our blind spots.

I believe that the case issue is dealt with in SWORD SVN. If the module maker wants to avoid any potential issues, pre-processing could be done. I am likely to be the module maker, so I will be sensitive to the issue (already encountered it with WHM). Hopefully there will be a new stable release of SWORD before this text is fully analyzed. :)

The other concern that you raised was the potential for confusion when codes are spoken. Perhaps you have a use case in mind that I am not anticipating. But in my experience people parse orally by saying the name of the stem and aspect and then abbreviating person, gender, and number, such as "Qal, Perfect, 3ms," not "qp3ms." Therefore oral/aural confusion is unlikely.

Feel free to give a rejoinder.

Daniel

Nathan Bierma

unread,
Feb 8, 2013, 9:47:48 AM2/8/13
to openscr...@googlegroups.com
It's exciting to see what has taken shape in this important project. What is the current status and roadmap? This is something we'd like to use in our Hebrew courses. 

Many thanks for your work, 

Nathan 

Nathan Bierma
Educational Technologist
Calvin Theological Seminary

David Troidl

unread,
Feb 9, 2013, 9:18:04 AM2/9/13
to openscr...@googlegroups.com
Hi Nathan,

I had made some progress in generating morphology values up to the end of August.  Then a difficult semester hit, and I confined my efforts to finishing the update of the lexicon.  The lexicon has been released in the new form, but there is still some work to be done.  Right now I am rewriting my software to work with the new lexicon.  I'm not sure when I'll be able to get back to the morphology project.

Peace,

David
--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscripture...@googlegroups.com.

To post to this group, send email to openscr...@googlegroups.com.

Darrell Smith

unread,
Feb 9, 2013, 3:03:56 PM2/9/13
to openscr...@googlegroups.com
This is definitely a project that needs to be re-activated. What I see as being the main obstacle is not the morphology and other language internal issues but the design of the web framework that will allow volunteers to scan a text and assign morph codes to the words in the text. I'm very sure there will be no shortage of volunteers to do this grunt work. In the past, the discussion about the web framework digressed to python and those with that expertise -- they dropped the ball on the entire thing while David and others continued working on the Morphology. The key to getting this project going is getting a web framework team active again on it. I took no part in it after it digressed to python/django because I don't have that skill set and am too old with health issues to WANT to become proficient in it. My forte was PHP but it has been awhile since I was actively coding. This is just my 2 cents. David thanks for doing what you have done.

Darrell Smith
 
!



From: David Troidl <David...@aol.com>
To: openscr...@googlegroups.com
Sent: Saturday, February 9, 2013 6:18 AM
Subject: Re: Morphology Codes

David Troidl

unread,
Feb 9, 2013, 9:06:53 PM2/9/13
to openscr...@googlegroups.com
Thanks Darrell,

Yes, there have been problems getting the web side up and running.  That was the reason Daniel and I decided to try the automated route first.  I use PHP to write local applications, but I have never made a website, except the single page for the OSHB homepage.  So until we can get some programmers on board, it looks like the automated approach will have to carry the ball, as soon as I have time and inspiration to get back to it.

The main reason I got so involved with the Hebrew side of this, is that so little seemed to be available elsewhere.  Through OpenScriptures, I've heard from a number of people, with various skill sets, who would have liked to get involved.  I think the lack of coordination on the programming side has held the project back for some time.  I guess all we can do is pray for a solution.

Peace,

David

Daniel Owens

unread,
Feb 10, 2013, 7:37:21 PM2/10/13
to openscr...@googlegroups.com
For what it's worth, I think PHP may be the way to go. Perhaps in time I can try my hand at learning PHP, but if both of you, Darrell and David, can code in PHP, that is a great start. And I know of another PHP programmer who knows Hebrew. Maybe I can rope him into helping us.

Then again David's automated approach seemed to have promise of saving us lots of time. The only drawback was that it created a bottleneck and David was the only one work on it.

The way I see it we have multiple options (and there could be more):

1. Revive David's automated approach and take it as far as we can (i.e., delay a crowd-source approach for awhile).

2. Revive David's automated approach just long enough to get some data out there quickly (e.g., strong verbs, the weak verbs David has already worked on, nouns, and other easier parts of speech) and then put up a web framework that can harness crowd-sourcing to go the rest of the way.

3. Put up a web framework for crowd-sourcing and just start from scratch.

I would love to see a framework out there that could harness the efforts of all potential contributors. I am working through Hebrew texts these days and would love to include the morphology analysis I have been doing, but I do not think we should be hasty to seek a tedious crowd-source approach. Option #3 seems the least ideal.

Daniel
Reply all
Reply to author
Forward
0 new messages