As Weston writes, the arbitriness of some OSIS constructs makes it
difficult to capture all permutations into software.
At CrossWire we have a loosely (un)defined way we prefer, allowing for a
fair amount of permutation and variation, but not all.
usfm2osis.pl is an incomplete tool in growth. This means
a) there are specific aspects left out on purpose, which are handled in
other scripts
b) it captures what we encounter and if we encounter something else it
gets added. This means if you use USFM tags in a correct fashion (as
defined by the USFM handbook from UBS) and
usfm2osis.pl does not deal
with this, we will happily accept patches or (if your work with it is
of sustained and useful nature) permit you to commit directly.
We have tried to keep
usfm2osis.pl multiplatform and not dependent on
someone having Sword Perlbindings. Hence some aspects are not dealt with
properly - mainly crossreferences. USFM allows any kind of references,
in whatever language, but osis references are welldefined. So
\x Mattheus 15,6 \x*
needs to become
<note type"crossReference"><reference osisRef="Matt.15.6">Matthaeus
15,6</reference></note>
(I hope I got this right, no time to check)
To achieve that you need to parse the original reference, allowing both
for localised Bible book names and for a huge variety of localised ways
of marking references, including ranges, lists etc - note the comma in
the German reference, in an English one it would likely be a colon.
libsword can do this msotly without hickups, but to incorporate libsword
into a perl script you need to have functioning Sword Perl bindings. So
we left this out of
usfm2osis.pl. There are other aspects.
title_cleanup.pl creates titles which conform specifically with what our
frontends expect.
But in summary, I can say, we fix things as we go along and as we get
usfm texts we need to deal with, but can not with the existing tool.
USFM is so huge and of so variable quality, that it would be a gigantic
undertaking to really capture everything in one first go.
Peter