OSIS Bible example?

430 views
Skip to first unread message

Russell Allen

unread,
May 11, 2012, 4:53:52 AM5/11/12
to openscr...@googlegroups.com
Hi guys,

Does anyone know of an example OSIS file that fully uses its capabilities? It wouldn't need to be a whole Bible, even just a single book, but something which is more than just unformatted chapters and verses but shows headers, paragraphs, poetry etc...

Cheers, Russell




Peter von Kaehne

unread,
May 11, 2012, 10:10:40 AM5/11/12
to openscr...@googlegroups.com
CrossWire publishes the OSIS XML file of its KJV2006

As far as OSIS capabilities go, this is a pretty good one. It uses
extensive markup around words/Strongs/morph lemmata etc. But, much more
is possible - it has no paragraphing etc.

There is though an example OSIS file on the CrossWire wiki.

Peter



>
>
>

David Troidl

unread,
May 11, 2012, 5:35:16 PM5/11/12
to openscr...@googlegroups.com
Hi Russell,

The first thing that comes to mind are the "original" OSIS examples:
http://www.bibletechnologies.net./osistext/
I haven't looked at these in years, but they were intended to be
examples of the markup.

For the whole KJV, there's:
http://www.crosswire.org/~dmsmith/kjv2006/sword/kjvxml.zip
<http://www.crosswire.org/%7Edmsmith/kjv2006/sword/kjvxml.zip>

And I'm attaching my take on the book of Ephesians, to give you another
point of view. The nice thing about OSIS is that it allows a fair
amount of latitude. This also means it requires some design decisions.

Peace,

David
EphOsis.xml

Weston Ruter

unread,
May 11, 2012, 5:51:44 PM5/11/12
to openscr...@googlegroups.com
The nice thing about OSIS is that it allows a fair amount of latitude.  This also means it requires some design decisions.

As an aside, I personally don't see why it is nice since it means that it is extremely difficult to develop an application that can consume an arbitrary OSIS XML document since all of the possible variations have to be accounted for. I feel there should be a strict subset of the current OSIS schema that specifies one and only one way to mark up a text.

--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To post to this group, send email to openscriptures@googlegroups.com.
To unsubscribe from this group, send email to openscriptures+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/openscriptures?hl=en.




--

Russell Allen

unread,
May 11, 2012, 10:38:41 PM5/11/12
to openscr...@googlegroups.com
Thanks David!

At the moment I'm using usfm and I have some homemade python which does usfm->html, pdf,text etc transformations.

I'm looking into redoing my workflow to use osis, primarily to make verification of the files easier.

Are there existing transform tools? I looked around but osis seems to exist in a vacuum - there is a spec, but that's it.

Or should I be sticking with usfm?

Cheers, Russell
> --
> You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
> To post to this group, send email to openscr...@googlegroups.com.
> To unsubscribe from this group, send email to openscripture...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/openscriptures?hl=en.
>
> <EphOsis.xml>

Peter von Kaehne

unread,
May 12, 2012, 2:04:00 AM5/12/12
to openscr...@googlegroups.com
usfm is the most common format we receive at Crosswire from outside
agencies.

So we have a reasonable set of transform scripts, which are under active
development. usfm2osis.pl, title_cleanup.pl and xreffix.pl

You find them in our svn repos under sword-tools

Peter

Peter von Kaehne

unread,
May 12, 2012, 2:05:56 AM5/12/12
to openscr...@googlegroups.com
The sword engine in turn can be used to do the transformations you want
to see afterwards, from OSIS.

Peter

On Sat, 2012-05-12 at 12:38 +1000, Russell Allen wrote:

Russell Allen

unread,
May 12, 2012, 4:15:55 AM5/12/12
to openscr...@googlegroups.com
That's interesting, thanks.

Is osis sufficiently specified that I can be reasonably sure that if I am in compliant osis that you're tools will work on it? Or do they work best on osis from usfm2osis.pl ?

Last time I tried usfm2osis.pl it didn't work on my usfm perfectly, but if I did a one off conversion then I could either hand fix or write my own usfm->osis converter (which wouldn't be hard as I've already got the backend)

Russell

Peter von Kaehne

unread,
May 12, 2012, 6:09:31 AM5/12/12
to openscr...@googlegroups.com
As Weston writes, the arbitriness of some OSIS constructs makes it
difficult to capture all permutations into software.

At CrossWire we have a loosely (un)defined way we prefer, allowing for a
fair amount of permutation and variation, but not all.

usfm2osis.pl is an incomplete tool in growth. This means

a) there are specific aspects left out on purpose, which are handled in
other scripts
b) it captures what we encounter and if we encounter something else it
gets added. This means if you use USFM tags in a correct fashion (as
defined by the USFM handbook from UBS) and usfm2osis.pl does not deal
with this, we will happily accept patches or (if your work with it is
of sustained and useful nature) permit you to commit directly.

We have tried to keep usfm2osis.pl multiplatform and not dependent on
someone having Sword Perlbindings. Hence some aspects are not dealt with
properly - mainly crossreferences. USFM allows any kind of references,
in whatever language, but osis references are welldefined. So

\x Mattheus 15,6 \x*

needs to become

<note type"crossReference"><reference osisRef="Matt.15.6">Matthaeus
15,6</reference></note>

(I hope I got this right, no time to check)

To achieve that you need to parse the original reference, allowing both
for localised Bible book names and for a huge variety of localised ways
of marking references, including ranges, lists etc - note the comma in
the German reference, in an English one it would likely be a colon.
libsword can do this msotly without hickups, but to incorporate libsword
into a perl script you need to have functioning Sword Perl bindings. So
we left this out of usfm2osis.pl. There are other aspects.
title_cleanup.pl creates titles which conform specifically with what our
frontends expect.

But in summary, I can say, we fix things as we go along and as we get
usfm texts we need to deal with, but can not with the existing tool.
USFM is so huge and of so variable quality, that it would be a gigantic
undertaking to really capture everything in one first go.

Peter

Neal Audenaert

unread,
May 12, 2012, 9:45:15 AM5/12/12
to openscr...@googlegroups.com
Throwing in my $0.02 from an academic perspective. There's a philosophical question about markup here. My experience is more with TEI than OSIS, so the goals of OSIS may (probably do) diverge a bit here, but these systems both address the same general needs: rich markup of texts. 

I'd argue that markup allows an editor to describe the structure of a text that he wants to communicate. Traditionally, that was done with in printed books with page-layout, typesetting, etc. Embedded markup allows people to be more precise,  detailed, and explicit in what they describe. A markup vocabulary (e.g., OSIS or TEI) provides a set of tools for describing a range of features in many different texts according to a variety of editorial perspectives and goals. For example, an editor might want to describe the physical material a text was published on, the narrative structure of the text, textual criticism, rhyming analysis in a poem, literary allusions, etc. On this view, there is no sense in which a single text can illustrate all possible features of markup vocabulary.  

As for Weston's concerns, from TEI's perspective, this latitude is nice in that it allows editors the freedom to represent information in a way that is best suited to their needs. The downside of course, is that you can't write software that supports "TEI documents", you have to write tools to support certain features of TEI documents or documents created as part of a particular encoding project. TEI has gone to great lengths to avoid calling itself a standard. Instead, the TEI consortium provides a vocabulary for encoding documents, a set of guidelines for best practice, and a community of practice. 

By focusing on editorial freedom, TEI excludes the possibility of writing a one size fits all software: you simply can't (er. . . shouldn't) write a program to handle all possible interesting features of all possible texts for all possible audiences. Alternatively, you could build a system that says "this is the way you have to edit your texts" and force editors to conform to the limitations of your system. 

This maximal flexibility for editors approach is what has allowed TEI to gain wide adoption within the academic humanities circles because it side-steps all of the editorial debates that the academics find to be the most fruitful ground for discussion. It can do (almost) everything and you build custom software to represent your content. With XML, that software might be a simple as an XSLT to process your document into something suitable for use in an off the shelf application.

Bottom line: my take is that document markup systems should emphasize editorial freedom (that may be constrained by convention) and place the burden of interpreting documents on the shoulders of software developers because we value the intellectual contribution that a skilled editor brings to the table. 

Neal
--
Neal Audenaert
Institute for Digital Christian Heritage

David Troidl

unread,
May 12, 2012, 11:12:47 AM5/12/12
to openscr...@googlegroups.com
Hi Russell,

On 5/11/2012 10:38 PM, Russell Allen wrote:
> Thanks David!
>
> At the moment I'm using usfm and I have some homemade python which does usfm->html, pdf,text etc transformations.
>
> I'm looking into redoing my workflow to use osis, primarily to make verification of the files easier.
>
> Are there existing transform tools? I looked around but osis seems to exist in a vacuum - there is a spec, but that's it.
Looking around, I found:
http://code.google.com/p/osis-converters/wiki/Compatibility
Crosswire.org used to have a whole bunch of information on OSIS and a
tool for USFM to OSIS conversion. Right now all Google links to their
Wiki seem to fail. Maybe someone from Crosswire could shed some light
on this.
>
> Or should I be sticking with usfm?
There has been some discussion on the developer list about this.
Apparently it is easier to program an editor to write USFM, whereas OSIS
has all the advantages of XML, for parsing and validation. Personally,
I just spent significant time and effort getting my translations from
ODF files to OSIS. And I am very happy with the results. I also
developed an XSL-FO transformation, so I can still get PDF output, when
I need it.

Peace,

David

Daniel Owens

unread,
May 12, 2012, 11:19:23 AM5/12/12
to openscr...@googlegroups.com
I agree with Neal, though I see the problems that it introduces for software developers. The other day I started digitizing Abbott-Smith's Manual Greek Lexicon of the New Testament (https://github.com/dowens76/Abbott-Smith), and I have found TEI (with some help from OSIS, according to the schema defined by CrossWire.org) to be just the right tool for that job. There are some limitations, but for the most part it allows me to create rich markup. This satisfies the editorial interest in rich markup.

To deal with the lack of support within SWORD for certain far-flung features of TEI (i.e., the software developer challenge), I will probably make some changes to simplify the text when moving to create a SWORD module, perhaps using XSLT. I can leave the base text in rich TEI but then simplify for the sake of function.

Daniel

Peter von Kaehne

unread,
May 12, 2012, 11:28:10 AM5/12/12
to openscr...@googlegroups.com
On 12/05/12 16:12, David Troidl wrote:
> Crosswire.org used to have a whole bunch of information on OSIS and a
> tool for USFM to OSIS conversion. Right now all Google links to their
> Wiki seem to fail. Maybe someone from Crosswire could shed some light
> on this.

Our server had a catastrophic failure of the harddrive and is getting
rebuilt.

svn is working

ftp too

the wiki should be up and running today or tomorrow.

Peter

Daniel Owens

unread,
May 12, 2012, 11:16:33 AM5/12/12
to openscr...@googlegroups.com
I agree with Neal, though I see the problems that it introduces for software developers. The other day I started digitizing Abbott-Smith's Manual Greek Lexicon of the New Testament ( https://github.com/dowens76/Abbott-Smith), and I have found TEI (with some help from OSIS, according to the schema defined by CrossWire.org) to be just the right tool for that job. There are some limitations, but for the most part it allows me to create rich markup. I will probably make some changes to simplify the text when moving to create a SWORD module, perhaps using XSLT.

Daniel

On 05/12/2012 08:45 AM, Neal Audenaert wrote:

Russell Allen

unread,
May 13, 2012, 7:33:15 AM5/13/12
to openscr...@googlegroups.com
Thanks guys. Interesting points.

Supplementary question, is there an existing tool to take a Bible in either usfm or OSIS and change the chapter/verse structure from NSRV to the standard Jewish system (eg JPS Tanach)? Or can I markup multiple numbering systems in a single OSIS file?

Best, Russell

David Troidl

unread,
May 13, 2012, 11:38:32 AM5/13/12
to openscr...@googlegroups.com
Hi Russell,

It is possible to mark up multiple numbering systems, see the OSIS
manual, but not recommended. In the WLC:
https://github.com/openscriptures/morphhb/downloads
I used the Hebrew system, and just added notes where the English system
differs.

I don't know of any tool that does this directly.

Peace,

David

Chris Little

unread,
May 13, 2012, 11:41:53 PM5/13/12
to openscr...@googlegroups.com
We specifically designed OSIS to support multiple, concurrent reference
schemes, so I would not say that it is not recommended.

--Chris
Reply all
Reply to author
Forward
0 new messages