On Sat, Apr 6, 2013 at 2:31 PM, Tony Proctor <
to...@proctor.net> wrote:
> It sounds like you may have been put off by a misconception Ben. FHISO (and
> especially me) would love you to make a submission to their CfP. However,
> they're not looking for complete standards or entire solutions at this
> stage.
>
The online CfP contains examples, which don't look too intimidating.
Let me mull over what I think the case would be (and get through the
current crunch time on Open Source Indexing) and we'll see what we
come up with.
> One of my functional requirements for the Data Model is that it have support
> for uncertainty in transcriptions. I didn't specify a technical solution but
> I would hope that the selected one would either be something else done
> within FHISO or a recognised standard. I prefer FreeUKGEN's UCF over things
> like TEI alternatives for many of the reasons you give. Plus a RegEx syntax
> is powerful from a real processing perspective, as opposed to simply being
> expressive. RegEx algorithms are well-established.
>
Regarding TEI, I really think it's important to separate the
underlying concepts (which are the product of decades of discussion by
the people with the most experience working with manuscripts) from the
idea of TEI as a data-entry format. There are a lot of objections to
having users type angle brackets, but hand-editing XML is not a
requirement for TEI. (I gave a talk on the topic at the TEI meeting
[transcript here:
http://manuscripttranscription.blogspot.com/2012/11/what-does-it-mean-to-support-tei-for.html
] and will be doing a similar one on your side of the pond at the end
of this month.) Programs may store transcripts in other formats and
use TEI for export and interchange, or (as is the case with
Papyri.info and Itinera Nova) they may use TEI internally but present
traditional print apparatus to editors and readers.
So I wonder what is the appropriate yardstick to use is? It's very
easy to imagine scripts to convert the "narrative" concept in STEMMA
to a subset of TEI and back again, or my own FromThePage wiki syntax.
The same is likely true of most print notations.
I don't think it would be possible to convert most genealogical record
"indexing" transcription projects into TEI or printable editions,
however.
To use the example I know best--the FreeREG database of parish
registers--it seems nearly impossible to reconstitute the original
text of an entry from the abstracts we have transcribed in our
database. You cannot get from a baptismal entry from
http://www.freereg.org.uk/howto/enterdata.htm#baptisms into an
approximation of the words on the original register -- there's no way
to know that the original was in Latin in order to print "baptizata"
instead of "baptized", for example. This is why I struggle with
whether to consider an indexing database as a peculiar kind of edition
or as an edition-like abstract. Once you add the needs of querying,
it seems like a different animal from a digital edition of family
letters altogether, though they both require support for uncertainty.
(Meanwhile the TEI folks are really struggling with ways to represent
financial records and other tabular material, though I'm not sure a
structured DB approach would work any better.)
Ben