(1) In my paper I cited the LSID as an example for an identifier from
which we could take some hints. One of the things I like about that
is the fact that it is hierarchical organized and delegates the actual
coining of IDs to all of the parties involved, without the need of a
centralized authority. Most attempts to organize data with a
centralized authority have failed. So I see the format of the IBA-ID
(this is something off the top of my head, but I hope it shows the
direction) more like this:
iba:cbeta:m:T10N0279:cbeta
(the :m is for manifestation, vis the discussion in the paper about
the FRBR stuff, the trailing :cbeta indicates the cbeta version, as
opposed to iba:cbeta:m:T10N0279:taisho, which would indicate the
printed text. However as I said this needs a lot more careful
thinking)
Using the catalog, one might then discover that a different electronic
version exists at
iba:sat:Taisho-Vol10-No279
(the sat registry uses completely different conventions)
(2) The microformat I was suggesting is indeed a microformat along the
lines of microformats.org. I think it is muddying the waters to make
up an ad-hoc XML vocabulary and calling that a microformat. The
usefulness of those at microformats.org is that they can piggypack on the existing infrastructure without disturbance and thus achieving an important goal without having to change anything of that infrastructure, but adding semantics that can be used by pieces of code that do understand them.
>
>
>
>> (the :m is for manifestation, vis the discussion in the paper about
>> the FRBR stuff, the trailing :cbeta indicates the cbeta version, as
>> opposed to iba:cbeta:m:T10N0279:taisho, which would indicate the
>> printed text. However as I said this needs a lot more careful
>> thinking)
>>
>>
>
> I agree too. (This does need more careful thinking)
> The microformat or tag-standard I envision is for digital objects only.
Which is reasonable enough. In the sample above, I introduced a sort of
"object-type-identifier" (m:), which allows to identify what kind of
object this is an id for. Maybe you can adopt this idea and insert
something like do: (od d: or o:) to make it extensible later?
> I do
> not think with the current man-power and distribution we can work out a
> Buddhist Studies ontology.
>
I quite understand, you would like to see something you could run with
right away, which is a reasonable and important request. If you follow
some basic ideas, I think it would be perfectly viable to start this,
without first having the whole ontology in place.
> Concerning the LSID let's remember the life sciences are building on 200
> years of experience to build a scientific taxonomy of life forms. Their
> legacy structures are much more extensive than the ones we face. Also I tend
> to believe that the natural sciences have an easier time to build
> non-ambivalent taxonomies then the Humanities.
>
We should not drive against the wall called taxonomies. But on the
other hand, we might want to avoid throwing everything in the same
basket. There should be a middle way between these extremes.
> The FRBR-structure is a clever thing, but reminds me a bit of topic-maps in
> the sense you can do everything and nothing with it.
>
Well.... At least it is a bit closer to our domain
> Using the catalog, one might then discover that a different electronic
>
>> version exists at
>> iba:sat:Taisho-Vol10-No279
>>
>> (the sat registry uses completely different conventions)
>>
>
>
> Which catalog? What unifying system would this catalog use to relate all
> buddhist texts: all works, expressions, manifestations, and items which
> would include all different versions of digital objects?
>
Yes, somehow. But it need not be in one place and not be complete from
the start, like Athene.
> I suggest we take it one by one. An ontology of Buddhism would of course be
> nice to have and perhaps even useful.
> However, what I need at work and the sooner the better, is a mechanism that
> allows me to keep track of my digital resources that should be useful for
> others as well. This involves critically a file integrity check as I tried
> to provide for in the last draft.
>
>
But why do you need a ID for that? A hash code as part of your metadata
set should go a long way for that. On the contrary, I think there is a
danger in overloading the ID with that, some grievance the LSID have
with their way of insisting on byte identity for their objects. But
maybe it takes some experiments to find out where to draw the line here.
> I personally have my doubts about this kind of microformats. This is a
> discussion that has lasted a few years now and I tend to agree with the
> arguments forwarded by Obasanjo (
> http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=4ac2b018-2612-4eae-8070-fc83db842d38)
> and Harold (http://cafe.elharo.com/xml/must-ignore-vs-microformats/)
>
I would not really call that arguments. Of course, you should not use a
microformat in stead of a fully developped API, which is addressing
quite a different kind of problem. On the other hand, Harold is not
even bringing forward any argument, except saying that who cares if your
X(HT)ML is not valid? If you play by the rules, that is using
namespaces as appropriate, there should also be no problem in using your
XML tools on microformats.
> However, I am ready to let myself be convinced. Why not have a microformat
> for Buddhist studies? I would prefer an XML feed, but there is no reason why
> we could not have both.
>
Exactly. As you know, I am also tossing the idea of exchanging data
with atom-feeds, which would be a good wrapper for your tags.
So maybe the next step would be to identify, what kind of information
items need to be packaged in such a microformat? You already started
that discussion, with your sample.
> And of course I am happy to call the XML convention suggested in the draft,
> something else, if you are going to design a microformat in the HTML
> attribute style. My suggestion of "IBA-tag(?) cum IBA-no" allows users to
> reference a text, a library to archive a resource and perform an intergrity
> check, and websites to embed it (provided there is a provider). Quite handy
> for a start, no?
>
Indeed, this would be a good start. We might call it "IBA exchange
vocabulary" until we come up with a better name. I see various uses of
this, for example:
* Identifying of a text range in a blog post that comments/translates or
otherwise relates to it
* Grouping/linking of text passages for comparison, thematic
similarities etc.
* Adding hidden information to a catalog display page so that tools like
Zotero can immediately grab the information and inject it to the users
bibliography
Which are just three off the top of my head.
> Let's see how far we can develop our ideas until Hanoi, where we can have
> some late-night discussions. Let's hope we will be able to hammer out some
> concrete conventions.
>
Well, I have to make a trip to Cambridge, MA between now and then, so I
can't say how much I will be able to take part in the former, but the
latter I am in, of course.
All the best,
Christian
--
Christian Wittern
Institute for Research in Humanities, Kyoto University
47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN
Jenjo...@gmail.com wrote:
>
> Besides, upon IBA-id, we still needs other resolution services to
> locate wanted objects and access services to retrieve them.
> It is very possible that every organizations that agree to join IBA-id
> project would like to have their own local resolution and access
> service.
>
Yes, I was thinking of that. But then you would still need to have a
master list of resolution providers (=organisations) so that you can
route the request to other places. But however you implement it, a
hierarchical structure of identifiers starting with iba: is probably
what you will want to have.
> If the namspace can be identified from the iba-id, it could save
> computing time for a local server to determine to search in the local
> database or
> to forward the query to the central catalog database.
>
>
I was thinking it would be better to avoid the need for one centralized
database.
>
> To me, microformat is like a kind of product that between HTML and
> XML, between human-readable and machine-readable format.
> Because microformat takes partial advantage of being an HTML, it's
> valid and has some pre-defined display style for it.
> As such, the loading of displaying a page with microformat+ CSS is
> probability much lighter than the loading of using dynamic XSLT+XML.
> But as many disscussion mentioned, microformat is not as flexible as
> xml.
>
microformats are conceived as attributes used on other XML (or HTML)
formats, as such they have to follow the XML recommendation, but since
they are only attributes, there is quite a serious limit in what you can
do to them. As I wrote to Marcus, I see them not as an alternate, but as
a first step in defining a more full-blown vocabulary that could be used
in things like Atom-feeds, for a Webservice API and the like.
> So I think the microformats would benfit the situation when we want to
> create a web page for both machines and humans at the same time.
> But if what we want to create is the meta data for describing the the
> digital object identified by an IBA-id, it seems more machine-readable
> require, and so, xml format
> might be more suitable here.
>
>
Yes, exactly.
you would still need to have a
master list of resolution providers (=organisations) so that you can route the request to other places. But however you implement it, a hierarchical structure of identifiers starting with iba: is probably what you will want to have.
> Others like Ralph and Louis have pointed out that there are existing
> identifierschemes like the successful ARK and DOI or the (less
> successful) PURLs.
> I have looked at these before writing up the first draft.
> ARK, DOI and PURL are attempts to provide persistent identifiers for
> digital objects, but the IBA TagNo tries to achive two more things
> that go beyond that:
[snip-snip...]
Marcus, these features (checksums and distributivity) are desirable but
I do not see how they are mutually exclusive with the DOI standard. The
way I understand it, the IBA system proposed has these characteristics:
1. The smallest unit of information which uniquely identifies an
electronic resource managed under the IBA scheme is the identifier of
the form "iba:[authority]:[unique serial number]". The serial number is
unique within the scope of the authority responsible for it. So you
could have iba:ddbc:1234 and iba:uva:1234 and these would be identifiers
for different resources.
2. Based on what I read, the relationship between a given unique
identifier in point 1 and a md5 checksum is a one-to-one relationship
and this relationship is permanent.
3. a) The unique identifier (see point 1 above) is a key into a
registry.
b) The registry contains meta-data about the electronic resource. The
meta-data is of two kinds:
* "Housekeeping" meta-data: the date the record was put in the registry,
who created it, checksum, etc.
* Meta-data about the resource itself: author, title, date of
publication, how to get the resource, etc.
I really don't see what prevents using a DOI of the form:
10.9999/ddbc:1234
(This example assumes that 9999 is the number assigned to IBA). It would
fulfill the same functions as:
iba:ddbc:1234
Looking at the DOI standard I can't find any place where the meta-data
associated with a DOI number is *disallowed* to contain checksums. The
DOI standard, as I can understand it, provides a lot of flexibility as
to what is encoded in the meta-data and thus could conceivably contain a
checksum. One of DOI mantras is that DOI is was designed for multiple
business models in mind. Conceivably such business model (e.g. the IBA
business model) could require the use of checksums.
Now, the DOI is three characters longer but will be accepted by:
1. Software which is designed to detect, parse (as much as they can be
parsed...), resolve and otherwise manipulate DOIs.
2. Organizations which have provisions for DOI numbers. This can be as
banal as an indexing organization which has a database having a field
for DOI numbers but not for IBA numbers.
I'm not saying that DOI is specifically the standard to follow but if
IBA mints its own independent system of unique ids, it is going to
hinder interoperability.
I've looked briefly at ARK and I see there is no specific provision for
checksums in there right now but I am not convinced that there is no way
to provide such functionality. (Excuse the double negative.) The one
concern I have about ARK is that it does not seem fully baked yet.
My perspective here is technical, not political or financial. I am not
implying that these two other aspects can be neglected but I've read the
objection to DOI as technical in nature so I'm giving a technical
response.
Ciao,
Louis
I really don't see what prevents using a DOI of the form:
My perspective here is technical, not political or financial. I am not
implying that these two other aspects can be neglected but I've read the
objection to DOI as technical in nature so I'm giving a technical
response.