WLC Word List

David Troidl

unread,

Jun 22, 2012, 9:13:59 PM6/22/12

to openscr...@googlegroups.com, opensid...@googlegroups.com

Hi All,

I just uploaded the first release of the WLC Word List,
https://github.com/openscriptures/morphhb. This is a compilation of the
word forms in the WLC, according to vowel form and augmented Strong
number. The consonantal form is included for searching and sorting
purposes. The separate references for each form contain the prefix and
suffix data for that instance, as IDs referring to the prefix and suffix
section at the beginning of the document. Full documentation is
included in WlcWordList.html. The full package is available in the
downloads.

My goal is to have a database-friendly catalog of words available for
morphological parsing. This will give the flexibility for recording
parsings, as well as using existing parsings of the same form to aid in
new assignments. The format contains all the WLC references for each
form. This will facilitate focusing on the ROI, parsing the forms that
occur most often first, to gain the greatest benefit. Some of the
parsings, like the prefixes and possibly the suffixes should be
straightforward, and apply to a great many instances. See the examples
in the documentation.

The format also provides for easily identifying discrepancies. I
already have experience with this. As I was building the list, I came
across a Strong number error in Numbers. When I corrected this, it took
a singleton form and merged it into a form with more entries. I also
found a very unusual prefix in Eccl.4.10. BDB identifies the word as
two separate words, but written as one in the MT. So in this case we
find a 'prefix' between the body of the word and the suffix. Anyone who
notices other such discrepancies, please post to the list, so
corrections can be made.

I just discovered that the WLC has been updated from 4.14 to 4.16. So
some work lies ahead. The application I used to make the list will work
again, but then any changes will be lost. I may have to do a separate
update for the WLC and the Word List.

Peace,

David

Darrell Smith

unread,

Jun 23, 2012, 2:02:40 AM6/23/12

to openscr...@googlegroups.com

Thanks David!

!

From: David Troidl <David...@aol.com>
To: openscr...@googlegroups.com; opensid...@googlegroups.com
Sent: Friday, June 22, 2012 6:13 PM
Subject: WLC Word List

-- You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To post to this group, send email to openscr...@googlegroups.com.
To unsubscribe from this group, send email to openscriptures+unsub...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/openscriptures?hl=en.

Daniel Owens

unread,

Jun 23, 2012, 9:10:33 AM6/23/12

to openscr...@googlegroups.com

This is a very helpful step. I counted over 46,000 different <w> elements, and then there are <p> and <s>. I like the idea of working on high-frequency forms first. I wonder, do you think we could start parsing these using the XML files while we wait for James to get us a more robust interface?

Daniel

David Troidl

unread,

Jun 23, 2012, 9:35:14 AM6/23/12

to openscr...@googlegroups.com

I was inclined to add the parsings for the prefixes, especially, as well as the examples I included in the documentation, but decided to leave it plain, to allow for different implementations. By all means, feel free to start in. One thing I left out of the example: I included an <m> element, under the <w>, specifically to record morph codes that had been applied already. A series of <m>'s will record various parsing choices to be used, and can be extended when new ones are found. Eventually these could be shown in a popup, with links to click, to add the parsing to the word. This may be unnecessary in a database, but helpful working in the XML.

Somewhere I had the numbers. There are about 18,000 consonantal forms, 44,000 vowel forms, and distinguishing by augment brings the total up over 46,000. Because the increase was so small, between the vowels and the vowel/augment, it led me to choose the form I did. This accounts for over 300,000 words total.

David

--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To post to this group, send email to openscr...@googlegroups.com.

To unsubscribe from this group, send email to openscripture...@googlegroups.com.

Daniel Owens

unread,

Jun 23, 2012, 10:04:39 AM6/23/12

to openscr...@googlegroups.com

Okay, so if we add a morph value to the WLC Word List, will it be easy enough to bring that back into the individual WLC files, especially before we have James import them into a database?

Daniel

David Troidl

unread,

Jun 23, 2012, 10:12:26 AM6/23/12

to openscr...@googlegroups.com

With the word form, the Strong number and the scripture reference, we should be able to pinpoint the word in at least 99.9% of cases. Unless there just happens to be two words with exactly the same vowel form, prefixes and suffix, and Strong number in the same verse, with different parsings, that would be 100%.

David

jonathon

unread,

Jun 24, 2012, 11:08:22 AM6/24/12

to openscr...@googlegroups.com

On 06/23/2012 01:35 PM, David Troidl wrote:
> I included an <m> element, under the <w>, specifically to record morph codes that had been applied already.

That implies that anybody using the xml will be able to easily do the
fifth item in the SBL Bible Software Shootout. Is that implication correct?

"I want to study the inflections of the Hebrew middle weak verb, and I
want to see what the range of possible variations is for each of the
conjugations (perfect, imperative, etc.) person, number, gender, and
stem. This means I need to find all the middle weak verbs, find all
their occurrences, and organize them in such a way that the variation of
their inflections are immediately apparent. The goal of the data
organization would be to allow me to write an article about the
variations of the Hebrew middle weak verb."

jonathon

evstevemd

unread,

Jun 27, 2012, 12:42:35 PM6/27/12

to Open Scriptures

Good work, keep it!

Reply all

Reply to author

Forward