JLPTEI questions

10 views
Skip to first unread message

Ze'ev Clementson

unread,
Feb 2, 2010, 11:31:41 AM2/2/10
to jewishlitu...@googlegroups.com
Hi Efraim,

I thought it would be useful to have a "JLPTEI questions" as a
"catch-all" for JLPTEI questions. Any "good" ones could perhaps be
added to the JLPTEI page on the wiki.

Here's the first one:

For the Tanach, there are 2 sets of files that are generated from the
version of the WLC contained in the ISTA file:

1. Intermediate xml files that have a TEI namespace and a
"urn:jewish-liturgy-extension" namespace associated with them.
2. Final xml files that have a TEI namespace and a
"http://jewishliturgy.org/ns/jlptei/1.0" namespace associated with
them.

The files that I looked at in #2 validate against the jpltei.rnc
schema; however, the files in #1 do not validate against that schema.
What schema is the "urn:jewish-liturgy-extension" namespace associated
with and when is that schema supposed to be used? Or, since these are
just "throw-away" intermediate xml files, is it sufficient that the
files are just well-formed xml and there is not really a schema for
validation purposes?

Thanks,
Ze'ev

Efraim Feinstein

unread,
Feb 2, 2010, 12:00:59 PM2/2/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> I thought it would be useful to have a "JLPTEI questions" as a
> "catch-all" for JLPTEI questions. Any "good" ones could perhaps be
> added to the JLPTEI page on the wiki.
>

A lot of them will probably result in documentation edits, I suspect.

> For the Tanach, there are 2 sets of files that are generated from the
> version of the WLC contained in the ISTA file:
>

Note that these are being redone to come directly from the WLC, but the
XML won't change much (it might gain some new features).

There are still a few critical TODO items to finish before I put that
conversion into the database.

> 1. Intermediate xml files that have a TEI namespace and a
> "urn:jewish-liturgy-extension" namespace associated with them.
>

That's now considered an error; the urn:jewish-liturgy-extension
namespace shouldn't exist anymore and is being phased out as the files
that had it are updated. As far as I know, there are no files that are
used by anything with that namespace in the trunk. There are some files
that haven't been converted yet (or that won't be) still lurking
around. I recently deleted a whole load of files from svn trunk because
they were still there from POC encoding. Can you check that you have an
updated svn?

The files I still have that reference that namespace are:
text/haggadah*.tei -- These are the POC haggadah sources. They won't
compile with any of the transforms in trunk, so, I guess they really
should be removed (I just need to make sure we have a copy in the
proof-of-concept tag).
text/bandits-siddur.tei -- another POC. Only exists in POC-encoding.
Should be moved to proof-of-concept tag.
text/jewishliturgyproject.tei -- The old project wide header. Needs to
be updated.
text/settings.tmpl.xml -- old settings file example. Should be removed.
code/set.xsl2, code/stage?.xsl2 -- Old POC staged-transform files.
Replaced by the code in transforms; what hasn't been replaced yet can be
replaced from the proof-of-concept tag. Should be removed.
code/tanach2jlptei.py -- unused code
code/depend.xsl2 -- dependency generator for makefiles; should be rewritten.


> 2. Final xml files that have a TEI namespace and a
> "http://jewishliturgy.org/ns/jlptei/1.0" namespace associated with
> them.
>

That's the correct namespace.

> Or, since these are
> just "throw-away" intermediate xml files, is it sufficient that the
> files are just well-formed xml and there is not really a schema for
> validation purposes?
>

In general, the intermediate files will be well-formed, but invalid
XML. They'll use an extra namespace, usually prefixed jx (
http://jewishliturgy.org/ns/jlp-processor ). In the case of those
intermediates, there's no useful schema that exists that can validate
them because they use fragmentation markup for combining multiple
hierarchies. That said, there shouldn't be any in svn, and if there
are, I need to remove them. There will eventually be some in a database
cache, which will speed up the client-side JLPTEI processing.

--
---
Efraim Feinstein
Lead Developer
Open Siddur Project
http://opensiddur.net
http://wiki.jewishliturgy.org

Ze'ev Clementson

unread,
Feb 2, 2010, 1:10:18 PM2/2/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Tue, Feb 2, 2010 at 9:00 AM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> I thought it would be useful to have a "JLPTEI questions" as a
>> "catch-all" for JLPTEI questions. Any "good" ones could perhaps be
>> added to the JLPTEI page on the wiki.
>>
>
> A lot of them will probably result in documentation edits, I suspect.
>
>> For the Tanach, there are 2 sets of files that are generated from the
>> version of the WLC contained in the ISTA file:
>>
>
> Note that these are being redone to come directly from the WLC, but the XML
> won't change much (it might gain some new features).

If you are redoing the Tanach generation code, you might want to
consider using David Troidl's WLC version from openscriptures. His is
also based on the WLC xml file but includes Strongs number references
for each word. This could be useful for text-writers who want to
display a Hebrew/English word translation. I've used David's version
in my Hebrew Bible iPhone app to display word definitions (see
attached screenshot).

> There are still a few critical TODO items to finish before I put that
> conversion into the database.
>
>> 1. Intermediate xml files that have a TEI namespace and a
>> "urn:jewish-liturgy-extension" namespace associated with them.
>>
>
> That's now considered an error; the urn:jewish-liturgy-extension namespace
> shouldn't exist anymore and is being phased out as the files that had it are
> updated.  As far as I know, there are no files that are used by anything
> with that namespace in the trunk.  There are some files that haven't been
> converted yet (or that won't be) still lurking around.  I recently deleted a
> whole load of files from svn trunk because they were still there from POC
> encoding.  Can you check that you have an updated svn?

I've just re-sync'ed and see that you've remove the intermediate
files. That's a lot tidier. One thing I did notice though is that the
code/input-conversion/Makefile needs to be fixed for the tanach-clean
target. At the moment, it's:

tanach-clean:
rm -f $(TEXTDIR)/text

This doesn't remove anything. Also, the WLC code outputs directly to
the text directory. Shouldn't it output to a separate wlc text
directory instead (perhaps by setting TEXTTAG)? Alternatively, if all
jlptei xml documents are to be output to the TOPDIR/text directory,
the "clean" targets in the makefiles should explicitly list the
members to delete.

> The files I still have that reference that namespace are:
> text/haggadah*.tei -- These are the POC haggadah sources.  They won't
> compile with any of the transforms in trunk, so, I guess they really should
> be removed (I just need to make sure we have a copy in the proof-of-concept
> tag).
> text/bandits-siddur.tei -- another POC.  Only exists in POC-encoding.
>  Should be moved to proof-of-concept tag.
> text/jewishliturgyproject.tei -- The old project wide header.  Needs to be
> updated.
> text/settings.tmpl.xml -- old settings file example.  Should be removed.
> code/set.xsl2, code/stage?.xsl2 -- Old POC staged-transform files.  Replaced
> by the code in transforms; what hasn't been replaced yet can be replaced
> from the proof-of-concept tag.  Should be removed.
> code/tanach2jlptei.py -- unused code
> code/depend.xsl2 -- dependency generator for makefiles; should be rewritten.

Ok, so once these files have all been dealt with, the extra namespace
won't be an issue.

>> 2. Final xml files that have a TEI namespace and a
>> "http://jewishliturgy.org/ns/jlptei/1.0" namespace associated with
>> them.
>>
>
> That's the correct namespace.
>
>>  Or, since these are
>> just "throw-away" intermediate xml files, is it sufficient that the
>> files are just well-formed xml and there is not really a schema for
>> validation purposes?
>>
>
> In general, the intermediate files will be well-formed, but invalid XML.
>  They'll use an extra namespace, usually prefixed jx (
> http://jewishliturgy.org/ns/jlp-processor ).  In the case of those
> intermediates, there's no useful schema that exists that can validate them
> because they use fragmentation markup for combining multiple hierarchies.
>  That said, there shouldn't be any in svn, and if there are, I need to
> remove them.  There will eventually be some in a database cache, which will
> speed up the client-side JLPTEI processing.

Ok, thanks for the information.

- Ze'ev

HebrewBibleDict.png

Efraim Feinstein

unread,
Feb 2, 2010, 5:10:40 PM2/2/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> I've just re-sync'ed and see that you've remove the intermediate
> files. That's a lot tidier. One thing I did notice though is that the
> code/input-conversion/Makefile needs to be fixed for the tanach-clean
> target. At the moment, it's:
>
> tanach-clean:
> rm -f $(TEXTDIR)/text
>
> This doesn't remove anything.

oops. It used to. An older version of the Tanach transformation code
output to text/text ... fixed in r436.

> Also, the WLC code outputs directly to
> the text directory. Shouldn't it output to a separate wlc text
> directory instead (perhaps by setting TEXTTAG)?

Probably, it should. The text/ directory was originally going to be the
place where texts were shared. Now, the eXist database is. So, the
text/ directory is more of a scratch area for transforms to dump their
final output than anything else. Fixed in r436.

> Alternatively, if all
> jlptei xml documents are to be output to the TOPDIR/text directory,
> the "clean" targets in the makefiles should explicitly list the
> members to delete.
>

I like your original idea better. It's easier to implement. :-)

> Ok, so once these files have all been dealt with, the extra namespace
> won't be an issue.
>

It shouldn't be an issue now. None of those files are referenced by
anything else.

Efraim Feinstein

unread,
Feb 2, 2010, 5:20:22 PM2/2/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> If you are redoing the Tanach generation code, you might want to
> consider using David Troidl's WLC version from openscriptures. His is
> also based on the WLC xml file but includes Strongs number references
> for each word.

I plan on linking in to the Open Scriptures version a bit later. For
now, I'm just trying to get the basic texts up. It's why I asked on the
Open Scriptures list about word counting -- for the conversion of out
@xml:id to their @osisID. I think I know how to do the conversion now,
but I haven't tested it yet.

It's also going to become important when we integrate Open Scriptures'
MorphHB into our copy of the WLC. On a technical level, our
implementation will be different from Open Scriptures', because ours is
based on a stand-off annotation model and theirs is an inline/attribute
annotation model. They should be bidirectionally compatible, though.

Unless we want to store and maintain a dictionary, we'll need to rely on
their REST API as the destination of the link.

Ze'ev Clementson

unread,
Feb 2, 2010, 5:25:01 PM2/2/10
to jewishlitu...@googlegroups.com
On Tue, Feb 2, 2010 at 2:10 PM, Efraim Feinstein

<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>> Ok, so once these files have all been dealt with, the extra namespace
>> won't be an issue.
>>
>
> It shouldn't be an issue now. None of those files are referenced by
> anything else.

No, it's not an issue, I should have said that once these files have
all been dealt with, all of the tidy-ups related to the old namespace
will have been completed.

- Ze'ev

Ze'ev Clementson

unread,
Feb 2, 2010, 6:07:03 PM2/2/10
to jewishlitu...@googlegroups.com

It's useful to have direct access to the Strongs number (as opposed to
only getting it via an API) even if we don't store the dictionary. For
example, for my iphone app, I created an SQL database from a
combination of David's dictionary data, WLC (with Strongs#) data,
placenames (with lattitude/longitude info from the OpenBible.info site
maintained separately for displaying the location on Google maps), and
Hebrew Bible verse linkages (see attached diagram). The more things
that we can inter-relate, the more types of output we can produce. If
you don't maintain the Strong's number in the opensiddur version of
the Tanach, then you will have to do a word-for-word link to the
openscriptures version in order to get the Strongs numbers (which lead
to the dictionary definitions and (eventually) the morphological
detail).

- Ze'ev

HebrewBible.png

Efraim Feinstein

unread,
Feb 2, 2010, 6:48:27 PM2/2/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> It's useful to have direct access to the Strongs number (as opposed to
> only getting it via an API) even if we don't store the dictionary. For
>

I actually don't see the point of a Strong's number in a hyperlinked
text. A Strong's number is just a database key for a database written on
dead trees.

Here's what I have in mind for the grand vision of the future:

Each word is a tei:w with an @xml:id, such as:
<tei:w xml:id="v1w1">בְּרֵאשִׁ֖ית</tei:w>

For the morphological data, we can represent it as (xml:id's are more or
less random, and the features are not yet defined either):
<tei:fs type="morph" xml:id="morph1">
<tei:f name="part-of-speech">...</tei:f>
<tei:f name="lemma">...</tei:f>
</tei:fs>

The morphological data is linked to the word through a tei:link:
<tei:link type="analysis" targets="#v1w1 #morph1"/>

For dictionary definitions, you can do something similar (the
morph.lemma may also be a way to do the link):
<tei:link type="definition" targets="#v1w1 #definition1"/>

Open Scriptures will have the data, and if we know the correspondence
between an Open Scriptures @osisId and an Open Siddur @xml:id, we can
convert the MorphHB/definition data as an aggregate to JLPTEI and store
it on our own *or* link through the Open Scriptures REST API if we don't
want to store the data. For example (I'm making up the URI here):
<tei:link type="definition" targets="#v1w1
http://openscriptures.org/definitions/word/00001"/>

Then, the conversion would occur on-access by API instead of in the
import to the database.

Does this make sense?

Ze'ev Clementson

unread,
Feb 2, 2010, 8:26:47 PM2/2/10
to jewishlitu...@googlegroups.com
Hi Efraim,

Yes, it makes sense to leverage work others are doing. And, you are
correct in that Strongs numbers were developed a long time ago for a
different medium. However:

1. It assumes that the Open Scriptures web service will be available
when Open Siddur users want to access the information (e.g. - the Open
Scriptures REST API will have been developed, the server will be setup
and operational, and there will be no access issues).
2. If the Open Scriptures REST API (which hasn't been defined or
developed yet) does not provide the level of granularity needed, some
parsing of the returned data will be required.
3. It is always easier to manage composite works if you have control
of specific versions of the source documents. For example, if the WLC
xml files change due to new research and Open Scriptures update their
version of the WLC and associated dictionary/morphology/etc documents
before Open Siddur updates it's tei:links to the underlying data, then
anyone who attempts to use Open Siddur code to create a new Siddur
that relies on information that is being retrieved via the Open
Scriptures REST API faces potential "breakage". It would probably be
better to maintain a "local" copy of the Open Scriptures documents
rather than rely on an as-yet-not-developed API. This would also make
it easier for people to download a copy of the Open Siddur project and
work off-line.
4. If Open Siddur maintains a "local" copy of the Open Scriptures
documents, why wouldn't we want them maintained in eXist where they
would be accessible for querying? Do we then write code to give users
the ability to both access the local copies AND use the Open
Scriptures REST API?
5. Even if Strongs numbers are old-fashioned and there may be other
ways to create word definitions in the future, the Strongs definitions
are available digitally (I haven't found open source digital
dictionaries of Biblical Hebrew anywhere else - have you?) and can be
cross-referenced to other dictionary sources if/when they become
available.

That's why I think that (although it does make sense to try to
leverage work others are doing) it still makes sense to incorporate
links to the Strongs number definitions in the Open Siddur version of
the Tanach and local copies of the Strongs definitions should be part
of the Open Siddur project.

- Ze'ev

Efraim Feinstein

unread,
Feb 2, 2010, 9:17:35 PM2/2/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> 1. It assumes that the Open Scriptures web service will be available
> when Open Siddur users want to access the information (e.g. - the Open
> Scriptures REST API will have been developed, the server will be setup
> and operational, and there will be no access issues).
>

This is an open and more general question -- how much do we want to
aggregate supplemental data that is only very indirectly
siddur-related? Particularly, when the other data is also open and we
need to somehow stay synchronized with their ongoing development.

We have the issue a bit with the WLC. One reason I want to derive our
WLC encoding directly from the WLC is to ensure that we preserve a
mapping between our data and the original, so we can get updates. The
process will become more complicated as our Tanach improves and starts
acquiring variant texts. It should be doable because the Tanach is a
stable text and (to the best of my knowledge) the WLC corrections at
this point don't involve adding or deleting whole words.

The Tanach is also a primary component of the siddur. So, it makes
sense to store it in our database according to our procedures.

Ordinarily, a raw-linked data strategy would say "don't aggregate data,
link data."

It's an interesting question.

> 2. If the Open Scriptures REST API (which hasn't been defined or
> developed yet) does not provide the level of granularity needed, some
> parsing of the returned data will be required.
>

I plan on keeping an eye on their API development and chiming in when I
think there's a feature we need.

> 3. It is always easier to manage composite works if you have control
> of specific versions of the source documents. For example, if the WLC
> xml files change due to new research and Open Scriptures update their
> version of the WLC and associated dictionary/morphology/etc documents
> before Open Siddur updates it's tei:links to the underlying data, then
> anyone who attempts to use Open Siddur code to create a new Siddur
> that relies on information that is being retrieved via the Open
> Scriptures REST API faces potential "breakage". It would probably be
> better to maintain a "local" copy of the Open Scriptures documents
> rather than rely on an as-yet-not-developed API. This would also make
> it easier for people to download a copy of the Open Siddur project and
> work off-line.
>

I wouldn't consider using this strategy for anything that's primary data
for us. The morphological data, for example, will be stored locally in
a JLPTEI feature-structure format, and could be a part of a more general
grammatical mapping project for the siddur.

Defining the persistence of URIs is going to be a problem both for us
and for Open Scriptures.

> 4. If Open Siddur maintains a "local" copy of the Open Scriptures
> documents, why wouldn't we want them maintained in eXist where they
> would be accessible for querying? Do we then write code to give users
> the ability to both access the local copies AND use the Open
> Scriptures REST API?
>

Good point re: querying. While I think the data would be query-able
either way, it could put a huge bandwidth/lag tax on both servers and
wouldn't be indexed if we rely entirely on the REST API.

If we use the Open Scriptures REST API at all, we will need to write our
own adaptor layer to bring the data into a format we can use.

> 5. Even if Strongs numbers are old-fashioned and there may be other
> ways to create word definitions in the future, the Strongs definitions
> are available digitally (I haven't found open source digital
> dictionaries of Biblical Hebrew anywhere else - have you?) and can be
> cross-referenced to other dictionary sources if/when they become
> available.
>

That's why I want to store the links as stand-off annotation instead of
directly in the document. It makes it easier to turn individual
features on or off or to switch dictionaries.

I don't have any issue with the dictionary. I'm questioning the
usefulness of the numbers as an indexing regime.


You've made a good case. I hope it's clear to anyone reading this why I
think it touches directly on a basic design issue.

Ze'ev Clementson

unread,
Feb 2, 2010, 10:37:43 PM2/2/10
to jewishlitu...@googlegroups.com
Hi Efraim,

It's not necessary to use the actual Strongs numbers for indexing.
Instead of embedding the Strongs number in the Tanach xml files
directly and using those numbers as an index, one could create a
unique id that is initially associated with the Strongs number
definition (for convenience, these initial id's could be the same as
the Strongs number; however, that wouldn't have to be the case). If
you look at the SQL schema that I sent in my previous email (the
HebrewBible.png file), you'll see that this is what I did for my
iphone app. The "Dictionary" has a Primary Key that is "id" and the
StrongsNumber is just a column in the table. Initially, there could be
a 1:1 relationship between id and StrongsNumber; however, eventually
(when definitions are perhaps broken up by binyan, p-o-s, etc) new
unique id's would be added and associated with words in the Tanach.
These new unique id's would not have a 1:1 correspondence to a
StrongsNumber, but that wouldn't be an issue because "id" is different
from StrongsNumber. However, the addition of new definitions and the
breakup of existing StrongsNumber definitions could be an incremental
process (if it is done at all) and doesn't preclude us from making
immediate use of the Strongs definitions.

- Ze'ev

Efraim Feinstein

unread,
Feb 2, 2010, 11:29:26 PM2/2/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> It's not necessary to use the actual Strongs numbers for indexing.
> Instead of embedding the Strongs number in the Tanach xml files
> directly and using those numbers as an index, one could create a
> unique id that is initially associated with the Strongs number

> [snip]

> (when definitions are perhaps broken up by binyan, p-o-s, etc) new
> unique id's would be added and associated with words in the Tanach.
> These new unique id's would not have a 1:1 correspondence to a
> StrongsNumber, but that wouldn't be an issue because "id" is different
> from StrongsNumber. However, the addition of new definitions and the
> breakup of existing StrongsNumber definitions could be an incremental
> process (if it is done at all) and doesn't preclude us from making
> immediate use of the Strongs definitions.
>

OK, I think I can buy that. Here's how I would propose implementing it:
(0) finish writing the code to import an updated WLC into the Open
Siddur (it's >90% done already)
(1) determine correspondence between Open Scriptures @osisId and Open
Siddur @xml:id in Tanach (we need to do this anyway for MorphHB!)
(2) Obtain Strong's Dictionary from Open Scriptures (the
Brown-Driver-Briggs dictionary is also public domain; I'm not sure if
there's a complete digitized version of it yet); convert the dictionary
to TEI (the TEI does have an already-defined dictionary module, which we
can import into the JLPTEI schema). Each entry has an @xml:id,
initially related to the Strong's number.
(3) The ids from #2 and #1 link each other in a linkage file.

Weston Ruter

unread,
Feb 2, 2010, 11:44:50 PM2/2/10
to Ze'ev Clementson, jewishlitu...@googlegroups.com, Efraim Feinstein, open-scriptures, Robert Hunt
(CC'ing Open Scriptures group: see entire thread at http://groups.google.com/group/jewishliturgy-discuss/browse_thread/thread/6c323b6a32196432 )

Hi Ze'ev, my responses appear inline:

On Tue, Feb 2, 2010 at 5:26 PM, Ze'ev Clementson wrote:
1. It assumes that the Open Scriptures web service will be available
when Open Siddur users want to access the information (e.g. - the Open
Scriptures REST API will have been developed, the server will be setup
and operational, and there will be no access issues).

A lot of us at Open Scriptures are concerned with the system's ability to work offline, and such an offline mode is just another way of looking at a cache. So we would be working toward allowing the entire database to be downloadable to a local cache for offline, or for use as a cache to ensure reliability and increase performance.

2. If the Open Scriptures REST API (which hasn't been defined or
developed yet) does not provide the level of granularity needed, some
parsing of the returned data will be required.

This is true, but granularity is also at the forefront of our concerns. Every word and even every punctuation mark should be individually addressable.
 
3. It is always easier to manage composite works if you have control
of specific versions of the source documents. For example, if the WLC
xml files change due to new research and Open Scriptures update their
version of the WLC and associated dictionary/morphology/etc documents
before Open Siddur updates it's tei:links to the underlying data, then
anyone who attempts to use Open Siddur code to create a new Siddur
that relies on information that is being retrieved via the Open
Scriptures REST API faces potential "breakage". It would probably be
better to maintain a "local" copy of the Open Scriptures documents
rather than rely on an as-yet-not-developed API. This would also make
it easier for people to download a copy of the Open Siddur project and
work off-line.
 
On Tue, Feb 2, 2010 at 6:17 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:
[…] Particularly, when the other data is also open and we need to somehow stay synchronized with their ongoing development. […] Defining the persistence of URIs is going to be a problem both for us and for Open Scriptures.

This is a very important, and it highlights the need for having stable identifiers for linking between documents. Our idea is to rely on unified texts as the stable anchor points that other datasets can be linked onto; after all, “Cool URIs don't change”! For example, the version of the WLC we have right now could constitute the initial contents of the unified text, and each data point within this text would receive a unique ID. If the WLC changes, we would not change the corresponding data points in the unified text, but rather we would merge in the changes and give those new data points new IDs, leaving the previous ones intact so that existing links are not broken.

On Tue, Feb 2, 2010 at 5:26 PM, Ze'ev Clementson wrote:
5. Even if Strongs numbers are old-fashioned and there may be other
ways to create word definitions in the future, the Strongs definitions
are available digitally (I haven't found open source digital
dictionaries of Biblical Hebrew anywhere else - have you?) and can be
cross-referenced to other dictionary sources if/when they become
available.

That's why I think that (although it does make sense to try to
leverage work others are doing) it still makes sense to incorporate
links to the Strongs number definitions in the Open Siddur version of
the Tanach and local copies of the Strongs definitions should be part
of the Open Siddur project.

Strong's numbers should indeed not be under-valued, even if they are obsolete, and their unique value as such makes their inclusion inline reasonable. Even so, I personally favor the approach of keeping the Strong's numbers in a separate location that links back to their corresponding words in the scriptural document; this approach is scalable in that it provides a model for additional lemma identification systems to be added in the future, and it keeps the scriptural documents more lightweight; it also separates the maintenance of the two documents.

I hope this helpful.

Weston Ruter
Founder
Open Scriptures Project
http://openscriptures.org/

Ze'ev Clementson

unread,
Feb 3, 2010, 12:29:30 AM2/3/10
to jewishlitu...@googlegroups.com
Hi Efraim,

Sounds good to me!

With regards to a digitized version of the Brown-Driver-Briggs (BDB)
dictionary, There are a number of scanned PDF copies of BDB available;
however, I don't think anyone has created a freely-available complete
digital version of it. I know that David Troidl is working on a BDB
"outline" with cross-references. He has done a tremendous job of
tagging the wlc with strongs numbers and I hope his BDB work also
becomes available at some stage. However, he hasn't indicated when (or
whether) he will be releasing it and under what license.

- Ze'ev

Ze'ev Clementson

unread,
Feb 3, 2010, 12:29:50 AM2/3/10
to Weston Ruter, jewishlitu...@googlegroups.com, Efraim Feinstein, open-scriptures, Robert Hunt
Hi Weston,

Thanks for your comments - see my follow-ups below:

On Tue, Feb 2, 2010 at 8:44 PM, Weston Ruter <westo...@gmail.com> wrote:
> (CC'ing Open Scriptures group: see entire thread at
> http://groups.google.com/group/jewishliturgy-discuss/browse_thread/thread/6c323b6a32196432
> )
>
> Hi Ze'ev, my responses appear inline:
>
> On Tue, Feb 2, 2010 at 5:26 PM, Ze'ev Clementson wrote:
>> 2. If the Open Scriptures REST API (which hasn't been defined or
>> developed yet) does not provide the level of granularity needed, some
>> parsing of the returned data will be required.
>
> This is true, but granularity is also at the forefront of our concerns.
> Every word and even every punctuation mark should be individually
> addressable.

Has much work been done on the Open Scriptures REST API? I've been
following the Open Scriptures discussions for a few months now;
however, I haven't seen any comments about this actually being worked
on. David Troidl has been the most prolific svn poster with his
contributions of the Strongs-tagged WLC and the Strongs dictionary and
his ongoing updates/edits to those. But, the REST API doesn't seem to
be getting much attention. Have any developers committed to working on
it and is there any timeframe for when specific milestones might be
met?

Thanks,
Ze'ev

Ze'ev Clementson

unread,
Feb 3, 2010, 1:25:20 PM2/3/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Tue, Feb 2, 2010 at 8:29 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> (1) determine correspondence between Open Scriptures @osisId and Open Siddur
> @xml:id in Tanach (we need to do this anyway for MorphHB!)
> (2) Obtain Strong's Dictionary from Open Scriptures (the Brown-Driver-Briggs
> dictionary is also public domain; I'm not sure if there's a complete
> digitized version of it yet); convert the dictionary to TEI (the TEI does
> have an already-defined dictionary module, which we can import into the
> JLPTEI schema).  Each entry has an @xml:id, initially related to the
> Strong's number. (3) The ids from #2 and #1 link each other in a linkage
> file.

Still trying to come to grips with jlptei, TEI, and ODD (quite a lot
of reading!), so to use our recent discussion as an example, is the
following correct?

In order to add dictionary support to jlptei.xml:
1. Add: <moduleRef key="dictionary" />
2. Optionally, delete elements in the dictionary module that aren't
needed using:
<elementSpec mode="delete" module="dictionary" ident="xxxxxx" />

Is that all there is to it, or am I missing some steps?

Thanks,
Ze'ev

Efraim Feinstein

unread,
Feb 3, 2010, 1:40:03 PM2/3/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
>
> In order to add dictionary support to jlptei.xml:
> 1. Add: <moduleRef key="dictionary" />
> 2. Optionally, delete elements in the dictionary module that aren't
> needed using:
> <elementSpec mode="delete" module="dictionary" ident="xxxxxx" />
>
> Is that all there is to it, or am I missing some steps?
>

Yeah, from a schema generation perspective, that's it!

From a guidelines perspective, it would be best to then specify exactly
how our dictionary entries should appear in the wiki-based guidelines.
I haven't done that yet for most of the vanilla-TEI that we import, but
it will have to be done eventually.

One of the nice things about the TEI is that a lot of the work is
already done for you. Someone else already figured out what a schema
for a dictionary should look like, and all we have to do is import the
schema module.

For dictionary type markup, it would be worth checking out what other
open source projects (like FreeDict or WordNet) are doing. I haven't
done it because I hadn't considered dictionaries a primary data source.

Ze'ev Clementson

unread,
Feb 3, 2010, 4:06:16 PM2/3/10
to jewishlitu...@googlegroups.com, David Troidl
Hi Efraim,

On Wed, Feb 3, 2010 at 10:40 AM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Hi,
>
> Ze'ev Clementson wrote:
>>
>> In order to add dictionary support to jlptei.xml:
>> 1. Add: <moduleRef key="dictionary" />
>> 2. Optionally, delete elements in the dictionary module that aren't
>> needed using:
>> <elementSpec mode="delete" module="dictionary" ident="xxxxxx" />
>>
>> Is that all there is to it, or am I missing some steps?
>>
>
> Yeah, from a schema generation perspective, that's it!

Cool!

> From a guidelines perspective, it would be best to then specify exactly how
> our dictionary entries should appear in the wiki-based guidelines.  I
> haven't done that yet for most of the vanilla-TEI that we import, but it
> will have to be done eventually.
> One of the nice things about the TEI is that a lot of the work is already
> done for you.  Someone else already figured out what a schema for a
> dictionary should look like, and all we have to do is import the schema
> module.
> For dictionary type markup, it would be worth checking out what other open
> source projects (like FreeDict or WordNet) are doing.  I haven't done it
> because I hadn't considered dictionaries a primary data source.

FreeDict has some (not very useful) instructions here:
http://sourceforge.net/apps/mediawiki/freedict/index.php?title=FreeDict_HOWTO_-_TOC

However, they don't really make full use of the TEI Dictionary module.
I had a look at some of the dictionary files in svn and they're too
"basic" to be a model for us. Here are examples of the English/Arabic
dictionary and the German/English dictionary (CAUTION: big files):
http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/eng-ara/eng-ara-nophon.tei?view=markup&pathrev=649
http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/deu-eng/deu-eng.tei?revision=625&view=markup

I didn't find any versions of WordNet that use TEI. This page
indicates that the WordNet database files are all ascii text files:
http://wordnet.princeton.edu/wordnet/man/wnintro.5WN.html

I was interested in OpenCyc as one stage (it uses an RDF-based
ontology that incorporates data from Cyc), but it isn't TEI either.

I did a search for "dictionary tei" and found the following:
http://www.crosswire.org/wiki/TEI_Dictionaries

There are (at the bottom of the page) Sample TEI P5 documents for
Stong's (a poor subset of the info that David Troidl has in his
Strong's OSIS doc) and Webster's dictionaries.

If you like, I can make a first stab at converting David's Strong's
Hebrew dictionary into TEI/JLPTEI. I've already worked with it before
to get the info into my iphone app, so I'm familiar with the OSIS
format he used. Alternatively, if you want to have a go at it, I'm
fine with that too.

I've cc'ed David on this email - David, if you haven't been following
this thread, we're discussing converting your excellent OSIS Strongs
dictionary to the Open Siddur JLPTEI format and (at a later stage)
adding the Tanach dictionary links into the version of the Tanach that
is maintained in the Open Siddur repository. You have a wealth of
experience using OSIS and TEI. Do you have any comments on the
conversion process of the dictionary info from OSIS to TEI, and what
you feel could be improved/changed?

- Ze'ev

Ze'ev Clementson

unread,
Feb 3, 2010, 4:54:27 PM2/3/10
to david...@aol.com, jewishlitu...@googlegroups.com
On Wed, Feb 3, 2010 at 1:28 PM, <david...@aol.com> wrote:

> On 2/3/2010 4:06 PM, Ze'ev Clementson wrote:
> I've cc'ed David on this email - David, if you haven't been following
>
> this thread, we're discussing converting your excellent OSIS Strongs
>
> dictionary to the Open Siddur JLPTEI format and (at a later stage)
>
> adding the Tanach dictionary links into the version of the Tanach that
>
> is maintained in the Open Siddur repository. You have a wealth of
>
> experience using OSIS and TEI. Do you have any comments on the
>
> conversion process of the dictionary info from OSIS to TEI, and what
>
> you feel could be improved/changed?
>
> Actually, I've looked at TEI a couple of times, but was put off by the
> obvious learning curve.  My only experience with TEI files was converting
> Christopher Kimball's WLC files to OSIS.  He has an easy, understandable
> subset of TEI, with his own schema, that I could follow.

Ok, thanks David!

- Ze'ev

Efraim Feinstein

unread,
Feb 3, 2010, 5:10:50 PM2/3/10
to jewishlitu...@googlegroups.com
Hi,

Ze'ev Clementson wrote:
> However, they don't really make full use of the TEI Dictionary module.
>

It's unlikely we will either. The TEI is huge, and the challenge is
always choosing which elements to remove, and which parts of the data
not to encode.

> There are (at the bottom of the page) Sample TEI P5 documents for
> Stong's (a poor subset of the info that David Troidl has in his
> Strong's OSIS doc) and Webster's dictionaries.
>

Sounds like the OSIS doc might be the best conversion source.

Any licensing issues we need to be concerned with? (There shouldn't be
if it's acknowledged to be an exact transcription of the PD dictionary.)

> If you like, I can make a first stab at converting David's Strong's
> Hebrew dictionary into TEI/JLPTEI.

I'm not going to get a chance to do it any time soon. Since you seem to
be interested in getting this working, go ahead and scratch your itch.

I haven't seen the OSIS document, but I can't imagine you'll need much
(anything?) from the liturgy extension in encoding a dictionary.

It makes me think about whether we should be using differently
restricted schemas for different types of files. (A liturgy segment is
not a dictionary is not a bibliography is not a contributor list is not
a notes file...)

Is there a download link for the OSIS Strong's dictionary?

Thanks,

Ze'ev Clementson

unread,
Feb 3, 2010, 5:21:05 PM2/3/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Wed, Feb 3, 2010 at 2:10 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> However, they don't really make full use of the TEI Dictionary module.
>>
>
> It's unlikely we will either.  The TEI is huge, and the challenge is always
> choosing which elements to remove, and which parts of the data not to
> encode.

Yes, but they only use (literally) a couple of the Dictionary module
elements while we would have an immediate need for some of the other
ones.

>> There are (at the bottom of the page) Sample TEI P5 documents for
>> Stong's (a poor subset of the info that David Troidl has in his
>> Strong's OSIS doc) and Webster's dictionaries.
>>
>
> Sounds like the OSIS doc might be the best conversion source.

It's the best version I've come across.

> Any licensing issues we need to be concerned with?  (There shouldn't be if
> it's acknowledged to be an exact transcription of the PD dictionary.)

David's header says "Public Domain".

>> If you like, I can make a first stab at converting David's Strong's
>> Hebrew dictionary into TEI/JLPTEI.
>
> I'm not going to get a chance to do it any time soon.  Since you seem to be
> interested in getting this working, go ahead and scratch your itch.

Ok, I'll have a go and keep you updated as I progress.

> I haven't seen the OSIS document, but I can't imagine you'll need much
> (anything?) from the liturgy extension in encoding a dictionary.

Probably not.

> It makes me think about whether we should be using differently restricted
> schemas for different types of files.  (A liturgy segment is not a
> dictionary is not a bibliography is not a contributor list is not a notes
> file...)

Maybe not. For the time being, I'll just add the Dictionary module to
jlptei.xml and use that. After I progress, we can discuss whether it
makes more sense to break it out or keep it in.

> Is there a download link for the OSIS Strong's dictionary?

Yes, here's a link to the file's location in svn:
http://code.google.com/p/open-scriptures/source/browse/trunk/data/strongs-dictionaries/hebrew/StrongHebrewG.xml

- Ze'ev

Efraim Feinstein

unread,
Feb 4, 2010, 12:31:42 AM2/4/10
to jewishlitu...@googlegroups.com, David Troidl
Ze'ev Clementson wrote:
>> It's unlikely we will either. The TEI is huge, and the challenge is always
>> choosing which elements to remove, and which parts of the data not to
>> encode.
>>
>
> Yes, but they only use (literally) a couple of the Dictionary module
> elements while we would have an immediate need for some of the other
> ones.
>

FreeDict may just be a bad model to use.

The best way to do this might be to go the other way -- figure out what
we need and delete what we don't.

>
>> Any licensing issues we need to be concerned with? (There shouldn't be if
>> it's acknowledged to be an exact transcription of the PD dictionary.)
>>
>
> David's header says "Public Domain".
>

Sounds good. See question below.

>>> If you like, I can make a first stab at converting David's Strong's
>>> Hebrew dictionary into TEI/JLPTEI.
>>>
>> I'm not going to get a chance to do it any time soon. Since you seem to be
>> interested in getting this working, go ahead and scratch your itch.
>>
>
> Ok, I'll have a go and keep you updated as I progress.
>

OK -- use a new subdirectory of the svn input-conversion directory for
the code, and a new subdirectory of sources/ for the data.

>> It makes me think about whether we should be using differently restricted
>> schemas for different types of files. (A liturgy segment is not a
>> dictionary is not a bibliography is not a contributor list is not a notes
>> file...)
>>
>
> Maybe not. For the time being, I'll just add the Dictionary module to
> jlptei.xml and use that. After I progress, we can discuss whether it
> makes more sense to break it out or keep it in.
>

Yes. Splitting the schemas is something I've been keeping in the back
of my mind for a while. It might make validation (which is currently
very rudimentary) a bit easier to write and more accurate.

>> Is there a download link for the OSIS Strong's dictionary?
>>
>
> Yes, here's a link to the file's location in svn:
> http://code.google.com/p/open-scriptures/source/browse/trunk/data/strongs-dictionaries/hebrew/StrongHebrewG.xml
>

It looks like there's a reference in there to something with a 1980
copyright. Can we verify that it's either actually free or only
non-copyrightable elements are included in the file? Otherwise, we may
have to strip them out.

It looks to me like converting the dictionary entries themselves should
be relatively easy. Converting the header will probably be a bit more
difficult. See the JLPTEI docs on the individual file header, the
global bibliography and the contributor lists.

Ze'ev Clementson

unread,
Feb 4, 2010, 1:34:37 AM2/4/10
to David Troidl, jewishlitu...@googlegroups.com
Hi David,

On Wed, Feb 3, 2010 at 9:31 PM, Efraim Feinstein


<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:

>>> Is there a download link for the OSIS Strong's dictionary?
>>>
>>
>> Yes, here's a link to the file's location in svn:
>>
>> http://code.google.com/p/open-scriptures/source/browse/trunk/data/strongs-dictionaries/hebrew/StrongHebrewG.xml
>>
>
> It looks like there's a reference in there to something with a 1980
> copyright.  Can we verify that it's either actually free or only
> non-copyrightable elements are included in the file?  Otherwise, we may have
> to strip them out.

David: Efraim is referring to the TWOT copyright notice in the header
of your StrongHebrewG.xml file. Could you please comment on whether
any copyrightable material from TWOT was actually used in
StrongHebrewG.xml - if so, I may need to remove that material from the
JLPTEI version as it would compromise the Public Domain status of the
dictionary as well as the status of any composite documents that
incorporated material from the dictionary.

Thanks,
Ze'ev

Ze'ev Clementson

unread,
Feb 4, 2010, 11:38:32 AM2/4/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Wed, Feb 3, 2010 at 10:40 AM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:

> Hi,
>
> Ze'ev Clementson wrote:
>>
>> In order to add dictionary support to jlptei.xml:
>> 1. Add: <moduleRef key="dictionary" />
>> 2. Optionally, delete elements in the dictionary module that aren't
>> needed using:
>> <elementSpec mode="delete" module="dictionary" ident="xxxxxx" />
>>
>> Is that all there is to it, or am I missing some steps?
>>
>
> Yeah, from a schema generation perspective, that's it!

I added "<moduleRef key="dictionary" />" to the jlptei.xml file, did a
"make schema-clean" followed by a "make schema". Then, I did a diff of
the resulting jlptei.rnc file with a copy of the previous one. Aside
from the creation date, the two are the same (no extra elements added
for the dictionary module). Did I miss a step?

Thanks,
Ze'ev

Efraim Feinstein

unread,
Feb 4, 2010, 11:51:43 AM2/4/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> I added "<moduleRef key="dictionary" />" to the jlptei.xml file, did a
>

It's always the simple things... the module key is "dictionaries"
instead of "dictionary"

Ze'ev Clementson

unread,
Feb 4, 2010, 12:26:40 PM2/4/10
to jewishlitu...@googlegroups.com
On Thu, Feb 4, 2010 at 8:51 AM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> I added "<moduleRef key="dictionary" />" to the jlptei.xml file, did a
>>
>
> It's always the simple things... the module key is "dictionaries" instead of
> "dictionary"

Agghhh!

You would think that roma would at least give you an error/warning message...

- Ze'ev

Ze'ev Clementson

unread,
Feb 4, 2010, 2:03:46 PM2/4/10
to jewishlitu...@googlegroups.com
Hi Efraim,

On Tue, Feb 2, 2010 at 2:10 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> I've just re-sync'ed and see that you've remove the intermediate
>> files. That's a lot tidier. One thing I did notice though is that the
>> code/input-conversion/Makefile needs to be fixed for the tanach-clean
>> target. At the moment, it's:
>>
>> tanach-clean:
>>        rm -f $(TEXTDIR)/text
>>
>> This doesn't remove anything.
>
> oops.  It used to.  An older version of the Tanach transformation code
> output to text/text ... fixed in r436.
>
>>  Also, the WLC code outputs directly to
>> the text directory. Shouldn't it output to a separate wlc text
>> directory instead (perhaps by setting TEXTTAG)?
>
> Probably, it should.  The text/ directory was originally going to be the
> place where texts were shared.  Now, the eXist database is.  So, the text/
> directory is more of a scratch area for transforms to dump their final
> output than anything else.  Fixed in r436.

Unfortunately, your fix also introduced a bug. You changed

From old code:
$(TEXTDIR)/Tanach.xml: $(WLC-CONVERSION-DIR)/tanach2jlptei.xsl2 $(TANACH-SOURCE)
$(XSLT) -it main -o $(TEXTDIR)/Tanach.catalog.xml
$(WLC-CONVERSION-DIR)/tanach2jlptei.xsl2
input-file=`$(LIBDIR)/absolutize $(TANACH-SOURCE)`
result-directory=`$(LIBDIR)/absolutize $(TEXTDIR)`

To new code:
$(WLC-OUTPUT-DIR)/Tanach.xml: $(WLC-CONVERSION-DIR)/tanach2jlptei.xsl2
$(TANACH-SOURCE)
$(XSLT) -it main -o $(WLC-OUTPUT-DIR)/Tanach.catalog.xml
$(WLC-CONVERSION-DIR)/wlc2jlptei.xsl2 input-file=`$(LIBDIR)/absolutize
$(TANACH-SOURCE)` result-directory=`$(LIBDIR)/absolutize
$(WLC-OUTPUT-DIR)`

The WLC-OUTPUT-DIR changes are fine; however, you also changed
"tanach2jlptei.xsl2" -> "wlc2jlptei.xsl2" and there is no
"wlc2jlptei.xsl2" file in svn (maybe you have it in local changes
related to the new wlc conversion you're working on).

- Ze'ev

Efraim Feinstein

unread,
Feb 4, 2010, 2:11:18 PM2/4/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> The WLC-OUTPUT-DIR changes are fine; however, you also changed
> "tanach2jlptei.xsl2" -> "wlc2jlptei.xsl2" and there is no
> "wlc2jlptei.xsl2" file in svn (maybe you have it in local changes
> related to the new wlc conversion you're working on).
>

Yeah, that's it. In svn now.

Ze'ev Clementson

unread,
Feb 4, 2010, 3:39:00 PM2/4/10
to david...@aol.com, Efraim Feinstein, jewishlitu...@googlegroups.com
Thanks David!

Efraim, as David indicates below, no content from the TWOT dictionary
was used, only the TWOT numbers (as a xref to the strongs numbers).
Should I leave them in the JLPTEI version of the doc or should I strip
them out? They could be useful for someone who wants to manually
lookup the TWOT material for a word (there are TWOT word lookup sites
available in addition to the paper version).

- Ze'ev

On Thu, Feb 4, 2010 at 12:29 PM, <david...@aol.com> wrote:
>
> Hi Ze'ev,
>
> On 2/4/2010 1:34 AM, Ze'ev Clementson wrote:
>
> Hi David,
>
>
>
> On Wed, Feb 3, 2010 at 9:31 PM, Efraim Feinstein


>
> <efraim.f...@gmail.com> wrote:
>
> Ze'ev Clementson wrote:
>

> Is there a download link for the OSIS Strong's dictionary?
>
>
>
>
>
> Yes, here's a link to the file's location in svn:
>
>
>
> http://code.google.com/p/open-scriptures/source/browse/trunk/data/strongs-dictionaries/hebrew/StrongHebrewG.xml
>
>
>
>
>
> It looks like there's a reference in there to something with a 1980
>
> copyright.  Can we verify that it's either actually free or only
>
> non-copyrightable elements are included in the file?  Otherwise, we may have
>
> to strip them out.
>
>
>
> David: Efraim is referring to the TWOT copyright notice in the header
>
> of your StrongHebrewG.xml file. Could you please comment on whether
>
> any copyrightable material from TWOT was actually used in
>
> StrongHebrewG.xml - if so, I may need to remove that material from the
>
> JLPTEI version as it would compromise the Public Domain status of the
>
> dictionary as well as the status of any composite documents that
>
> incorporated material from the dictionary.
>

> I just responded to the previous email.  Only the numbers are used, no
> content of the dictionary.  And they only appear in @gloss, if you did want
> to remove them.
>
> Peace,
>
> David

Efraim Feinstein

unread,
Feb 4, 2010, 3:56:07 PM2/4/10
to Ze'ev Clementson, david...@aol.com, jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> Efraim, as David indicates below, no content from the TWOT dictionary
> was used, only the TWOT numbers (as a xref to the strongs numbers).
> Should I leave them in the JLPTEI version of the doc or should I strip
> them out? They could be useful for someone who wants to manually
> lookup the TWOT material for a word (there are TWOT word lookup sites
> available in addition to the paper version).
>

Your choice. I don't think database keys are independently
copyrightable in the US (they might be in other countries, where the
concept of database copyright exists).

Speaking of database keys and dictionaries: Probably, we could have a
nearly complete liturgical dictionary if we had the BDB and Jastrow
(which has both Aramaic and a good deal of rabbinic Hebrew). Both are
public domain, neither are currently transcribed in a free format (as
far as I know).

That would give us at least 3 dictionaries: Strongs, BDB, and Jastrow.
How do we propose combining/linking dictionaries? Do we just link them
all separately and independently? Is there any way we should be marking
up correspondences to words generically? Is that even possible given
different dictionaries' organizations?

Ze'ev Clementson

unread,
Feb 4, 2010, 5:59:17 PM2/4/10
to jewishlitu...@googlegroups.com
On Thu, Feb 4, 2010 at 11:11 AM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> The WLC-OUTPUT-DIR changes are fine; however, you also changed
>> "tanach2jlptei.xsl2" -> "wlc2jlptei.xsl2" and there is no
>> "wlc2jlptei.xsl2" file in svn (maybe you have it in local changes
>> related to the new wlc conversion you're working on).
>>
>
> Yeah, that's it.  In svn now.

I did an svn update (getting join-wlc.xsl2 & wlc2jlptei.xsl2), so did
a "make clean" and a "make" but am getting errors on the tanach build.

~/jewishliturgy/trunk $ make
java -Xms1024m -Xmx1024m -cp
"././lib/saxon9he.jar:././lib/resolver-1.2.jar:././common"
net.sf.saxon.Transform -ext:on
-x:org.apache.xml.resolver.tools.ResolvingXMLReader
-y:org.apache.xml.resolver.tools.ResolvingXMLReader
-r:org.apache.xml.resolver.tools.CatalogResolver -s
code/grammar-parser/xpointer.xml -o code/grammar-parser/xpointer.xsl2
././code/grammar-parser/grammar.xsl2
java -Xms1024m -Xmx1024m -cp
"././lib/saxon9he.jar:././lib/resolver-1.2.jar:././common"
net.sf.saxon.Transform -ext:on
-x:org.apache.xml.resolver.tools.ResolvingXMLReader
-y:org.apache.xml.resolver.tools.ResolvingXMLReader
-r:org.apache.xml.resolver.tools.CatalogResolver -s
code/grammar-parser/xptr-tei.xml -o code/grammar-parser/xptr-tei.xsl2
././code/grammar-parser/grammar.xsl2
java -Xms1024m -Xmx1024m -cp
"././lib/saxon9he.jar:././lib/resolver-1.2.jar:././common"
net.sf.saxon.Transform -ext:on
-x:org.apache.xml.resolver.tools.ResolvingXMLReader
-y:org.apache.xml.resolver.tools.ResolvingXMLReader
-r:org.apache.xml.resolver.tools.CatalogResolver -it main -o
code/transforms/xhtml/muxhtml.xsl2
././code/transforms/xhtml/muxhtml-generator.xsl2
/bin/cp ././code/grammar-parser/xpointer.xsl2 ././code/common
/bin/cp ././code/grammar-parser/xptr-tei.xsl2 ././code/common
java -Xms1024m -Xmx1024m -cp
"././lib/saxon9he.jar:././lib/resolver-1.2.jar:././common"
net.sf.saxon.Transform -ext:on
-x:org.apache.xml.resolver.tools.ResolvingXMLReader
-y:org.apache.xml.resolver.tools.ResolvingXMLReader
-r:org.apache.xml.resolver.tools.CatalogResolver -it main -o
././text/wlc/Tanach.catalog.xml
././code/input-conversion/wlc/wlc2jlptei.xsl2
input-file=`././lib/absolutize ./../sources/tanach/t3utf.dat`
result-directory=`././lib/absolutize ././text/wlc`
Recoverable error on line 352 of wlc2jlptei.xsl2:
FODC0005: java.io.FileNotFoundException:
/Users/bc/jewishliturgy/trunk/code/input-conversion/wlc/joined-wlc.xml
(No such file or directory)
Error on line 352 of wlc2jlptei.xsl2:
XTTE0570: An empty sequence is not allowed as the value of variable
$tanach-as-xml
Transformation failed: Run-time errors were reported
make: *** [text/wlc/Tanach.xml] Error 2

Ze'ev Clementson

unread,
Feb 4, 2010, 6:03:18 PM2/4/10
to Efraim Feinstein, david...@aol.com, jewishlitu...@googlegroups.com
On Thu, Feb 4, 2010 at 12:56 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> Efraim, as David indicates below, no content from the TWOT dictionary
>> was used, only the TWOT numbers (as a xref to the strongs numbers).
>> Should I leave them in the JLPTEI version of the doc or should I strip
>> them out? They could be useful for someone who wants to manually
>> lookup the TWOT material for a word (there are TWOT word lookup sites
>> available in addition to the paper version).
>>
>
> Your choice.  I don't think database keys are independently copyrightable in
> the US (they might be in other countries, where the concept of database
> copyright exists).

I'll leave it in then the converted doc then.

> Speaking of database keys and dictionaries: Probably, we could have a nearly
> complete liturgical dictionary if we had the BDB and Jastrow (which has both
> Aramaic and a good deal of rabbinic Hebrew).  Both are public domain,
> neither are currently transcribed in a free format (as far as I know).
>
> That would give us at least 3 dictionaries: Strongs, BDB, and Jastrow.  How
> do we propose combining/linking dictionaries?  Do we just link them all
> separately and independently?  Is there any way we should be marking up
> correspondences to words generically?  Is that even possible given different
> dictionaries' organizations?

David has done a lot of work with BDB, so I'll let him comment on
this. Once I've worked through the OSIS->TEI mappings, I'll have a
better understanding of the TEI dictionary options and might have
something to add at that time.

Ze'ev

Efraim Feinstein

unread,
Feb 4, 2010, 6:29:03 PM2/4/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
>
>
> I did an svn update (getting join-wlc.xsl2 & wlc2jlptei.xsl2), so did
> a "make clean" and a "make" but am getting errors on the tanach build.
>

Fixed in r438

Sorry.

Ze'ev Clementson

unread,
Feb 4, 2010, 6:47:37 PM2/4/10
to jewishlitu...@googlegroups.com
On Thu, Feb 4, 2010 at 3:29 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>>
>> I did an svn update (getting join-wlc.xsl2 & wlc2jlptei.xsl2), so did
>> a "make clean" and a "make" but am getting errors on the tanach build.
>>
>
> Fixed in r438
>
> Sorry.

Still getting errors after "make clean" and "make":

././code/input-conversion/wlc/join-wlc.xsl2
input-directory=`././lib/absolutize ./../sources/tanach/WLC/Books`
output-file=././text/wlc/joined-wlc.xml
././lib/absolutize: line 12: cd: ./../sources/tanach/WLC: No such file
or directory
Recoverable error on line 248 of join-wlc.xsl2:
FODC0005: java.io.FileNotFoundException:
/Users/bc/jewishliturgy/trunk/Books/Genesis.xml


(No such file or directory)

Recoverable error on line 253 of join-wlc.xsl2:
FODC0005: Document has been marked not available:
file:/Users/bc/jewishliturgy/trunk/Books/Genesis.xml
Recoverable error on line 253 of join-wlc.xsl2:
FODC0005: java.io.FileNotFoundException:
/Users/bc/jewishliturgy/trunk/Books/Exodus.xml


(No such file or directory)

etc for all books...

Efraim Feinstein

unread,
Feb 4, 2010, 7:47:55 PM2/4/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
>
> Still getting errors after "make clean" and "make":
>

The WLC source text itself has now been added.

I really should pick through it to make sure I remove the non-free parts
(the website xsl/js). They're redistributable, but NC termed.

Ze'ev Clementson

unread,
Feb 4, 2010, 7:56:26 PM2/4/10
to jewishlitu...@googlegroups.com
On Thu, Feb 4, 2010 at 4:47 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> Still getting errors after "make clean" and "make":
>>
>
> The WLC source text itself has now been added.
> I really should pick through it to make sure I remove the non-free parts
> (the website xsl/js).  They're redistributable, but NC termed.

Ouch, that's a lot of stuff. Don't you really just need the contents
of the Books subdirectory?

Efraim Feinstein

unread,
Feb 4, 2010, 7:59:35 PM2/4/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> Ouch, that's a lot of stuff. Don't you really just need the contents
> of the Books subdirectory?
>
Yes (which is the largest part of "a lot of stuff") [r440 removes almost
everything else]

Ze'ev Clementson

unread,
Feb 4, 2010, 11:39:07 PM2/4/10
to jewishlitu...@googlegroups.com
On Thu, Feb 4, 2010 at 4:59 PM, Efraim Feinstein
<efraim.f...@gmail.com> wrote:
> Ze'ev Clementson wrote:
>>
>> Ouch, that's a lot of stuff. Don't you really just need the contents
>> of the Books subdirectory?
>>
>
> Yes (which is the largest part of "a lot of stuff") [r440 removes almost
> everything else]

Great - that's a lot tidier (it gets rid of the stuff that isn't
needed and also all the NC stuff), thanks!

I just did a "make" and the build works fine now. One thing I did
notice is that a "make clean" removes all of the non-svn objects
except for one: trunk/code/common/params.xsl2
You should probably update the trunk/code/Makefile to also remove that file.

Thanks again,
Ze'ev

Efraim Feinstein

unread,
Feb 4, 2010, 11:56:24 PM2/4/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> Great - that's a lot tidier (it gets rid of the stuff that isn't
> needed and also all the NC stuff), thanks!
>
> I just did a "make" and the build works fine now. One thing I did
> notice is that a "make clean" removes all of the non-svn objects
> except for one: trunk/code/common/params.xsl2
> You should probably update the trunk/code/Makefile to also remove that file.
>

It shouldn't be removed in make clean.

params.xsl2 is used as the way a user sets transformation parameters
that affect how the XSLT works. The version in svn is params.tmpl.xsl2
(a template file). If the user never sets his/her own template file, it
is copied by make. Otherwise, the user's local copy is used.

As of now, the only thing it sets is the $debug-level parameter (which
can also be set from the command line), because all high-level
transformation parameters are going to be part of the feature structure
conditional system (that way, they can be set from JLPTEI settings files).

Ze'ev Clementson

unread,
Feb 5, 2010, 1:14:46 AM2/5/10
to jewishlitu...@googlegroups.com

So, what do you do with your local copy? Do you just do "svn propset
svn:ignore params.xsl2 ." so that svn will ignore it?

Ze'ev

Efraim Feinstein

unread,
Feb 5, 2010, 9:01:26 AM2/5/10
to jewishlitu...@googlegroups.com
Ze'ev Clementson wrote:
> So, what do you do with your local copy? Do you just do "svn propset
> svn:ignore params.xsl2 ." so that svn will ignore it
As of now, it's just not committed. I haven't done it, but I don't see
that there would be any harm in using svn:ignore on the generated files.
Reply all
Reply to author
Forward
0 new messages