A Modest OSIS Proposal
We at Open Scriptures are in need of a consistent XML markup format for our documents. We tend to favor OSIS, but have encountered a few limitations. We are not interested in replacing OSIS, or calling for a total rewrite. But there are several areas that need to be addressed, before we can fully commit to OSIS.
There is a significant learning curve for OSIS adoption, and this can be eased by the manual. However, the manual has quite a number of errors, beginning with the 'First OSIS' example, where the beginner should be able to start. It would be useful to revive the manual as an ongoing project, with regular updates. The PDF is essential, but a wiki approach may also be helpful. A common source document could fuel both.
In discussing our current and future projects, we have identified a few areas of inadequacy in the current OSIS feature-set.
a) Remote headers: One requirement for fragment markup, and effective document libraries and repositories, is the ability to link with a document containing a full header, rather than requiring the full header to appear in each valid OSIS document. This header could appear in another regular OSIS document, or a separate 'bibliographic' document, designed merely to contain various header information. This is especially useful in getting away from monumental, monolithic documents, toward a more layered approach of linking variants, notes and lexical information to an existing source document.
b) Virtual elements: A second requirement for distributing valid OSIS fragments through web services is a form of virtual, or shadow, element to supply the context of the given fragment. A new global attribute for indicating this virtual status is essential to distinguish them from the actual markup of the document.
c) Milestone paragraphs: With conflicting hierarchies for Book-Chapter-Verse (BCV), Book-Section-Paragraph (BSP) and levels of quotes, we have milestonable elements. But in cases where BCV is the primary form of the document, as in the King James Version or the Robinson-Pierpont text, there are also paragraph break indicators. The most natural way of encoding these documents would be to maintain the BCV structure as the primary markup, but to also allow milestone paragraph markers. Even though there are various versification schemes, they still have historical foundation, and are the most used in web service implementations. Paragraphing varies from one translation to another, and there is no consensus to base a definitive structure on. We are asking for the 'p' element to be made milestonable.
MorphHB is our project for collating available texts of the Hebrew Bible. Currently, we are discussing markup issues revolving around the Westminster Leningrad Codex.
a) We are marking up qere readings, using <rdg type="x-qere">. With the importance of ketiv/qere readings in the Hebrew scriptures, in general, we would like 'ketiv' and 'qere' as official values for the type attribute on the rdg element.
b) For the sake of addressability in eclectic texts, we have chosen to mark up the Hebrew punctuation marks: maqqef, paseq and sof pasuq. To facilitate programmable access and readability, we are using ASCII transliterations for attribute values: <seg type="x-maqqef">, etc. If it is deemed worthwhile, a separate punctuation element could be created. In any case, please add the attribute values: maqqef, paseq, sof-pasuq, geresh, and gershayim, for the chosen punctuation markup element.