glossary project

0 views
Skip to first unread message

Marcus Bingenheimer

unread,
May 25, 2008, 2:20:26 AM5/25/08
to iba...@googlegroups.com
Dear IBA group members,

Over the next two years we will make a major effort to integrate the different archives
we have produced at Chung-hwa and DDBC over the last decade into one
platform. The aim is to make all collections (http://www.ddbc.edu.tw/en/digital_archives/projects.html)
searchable in new ways probably along the lines of the Collex tool at the Nines archive (http://nines.org/collex). Collex allows users to interact cleverly with a large number of archives on 19th century scholarship. I wish we had something like this for Buddhism.

As part of this effort we want to collect and organize existing Buddhist glossaries and make them available in various ways (e.g. as Firefox extension, Open Office plug-in, Opera widget, through a online interface...).
We will probably start with these four:

a) 丁福保  漢 - 漢 (約30,000)
b) Soothill-Hodous 漢 - 英 (約16,800)
c) CBETA Thesaurus 同義詞典   漢 - 漢 (約11,000)
d) Chin-San-Manchu Glossary  漢 - 梵 - 滿 (約1070)

There are more in the pipeline, especially Tibetan and Sanskrit related material.
We will only publish material that is either in the public realm, published under an Open Source license, or for which we receive written permission from the copyright holder.

We would like to ask if those of you who hold Buddhist studies related databases to make suggestions about how we should proceed with this. What kind of glossaries would you like to see? Which new glossaries and indices would you like to see?

all the best

marcus




--
============================
Dr. Marcus Bingenheimer 馬德偉
Director, Library and Information Center 圖書資訊館館長
Dharma Drum Buddhist College 法鼓佛教研修學院 (DDBC)
No. 2-6 Xishihu, Jinshan 20842, Taipei County, Taiwan, R.O.C.
台灣,20842台北縣金山鄉西勢湖2-6號 Tel: +886-2-2498-7171 # 2381
http://buddhistinformatics.ddbc.edu.tw/~mb/

chris

unread,
May 30, 2008, 3:45:28 AM5/30/08
to iba-net
Dear Marcus,

Thanks for bringing us up-to-date on your plans. It seems that DDBC
is continuing to inspire by its creative and imaginative use of
technology to support digital scholarship. The idea of collecting
glossaries is a good and important one and well within the scope of
IBA. There are some things I might be able to offer you, but I will
have to investigate legal issues before I can say more.

However, I think what you say about your planned use of Collex very
interesting. Collex and the NINES project is an obvious example to
look at for the future development of IBA. I have briefly looked at
it severely times, but not done serious research using it (do you know
somebody in the field of 19th century studies? It might be worth to
inquiry what users in the field think about it). So my thoughts given
here are based only on what I can glean from the available
documentation and a bit of playing with the interface, the question I
am trying to figure out is of course how my vision of IBA differs from
Collex/NINES and what place there still is for Collex.

There is also an article which introduces some of the features from a
users perspective at http://www.erudit.org/revue/ravon/2007/v/n47/016707ar.html
The description of Collex there says "the social software and faceted
browsing system that powers NINES, a “networked infrastructure for
nineteenth-century electronic scholarship.” " It works by asking the
participating projects to submit metadata on their collections and
ressources to some kind of centralized infrastructure, but the digital
objects themselves remain under custody of the participating projects.

This is the first problem I see when applied to IBA: We would
probably think of a federation of Collex sites rather than one
centralized one? This might not be mutually exclusive, but it has to
be carefully thought through. This point is in fact related to the
next one:

Collex requires users to login to the system and all the user
generated data (which is the researchers work) is hold in the system
with no apparent way to get it out. I think this is pretty
unacceptable: The recent developments with Social Software sites
(SSS) have showed clearly that the users have to own their content and
be able to control the flow. So within IBA, I hope we can come up
with an infrastructure that allows user to use services like those
provided by COLLEX, without disowning their work (This is something I
tried to make clear at my presentation in February ad CBETA10/
EBTI15). Since then there has been some development and it looks
like the SSS are figuring out how to get this right. And of course
IBA should use OpenID to start with.

There are some other quibbles I have, but they are minor things. This
is of course not to discourage you from basing your developments on
COLLEX. I hope you will keep us current on the developments and
thoughts you and your team have while working to realize this. Is
there a blog or other site where your process of creating this site is
traced? I think that would be interesting reading for all of us.

BTW, although it says that Collex is open source, I have not been able
to locate the source. Do you have any hints where to look?


All the best,

Christian

On May 25, 3:20 pm, "Marcus Bingenheimer" <m.bingenhei...@gmail.com>
wrote:

Simon Wiles

unread,
May 30, 2008, 5:14:48 AM5/30/08
to iba...@googlegroups.com
Hi Christian,

> BTW, although it says that Collex is open source, I have not been able
> to locate the source. Do you have any hints where to look?

The source for Collex can be found in the SVN repository here:

https://subversion.lib.virginia.edu/repos/patacriticism/collex/trunk/


It's pretty buggy though, and clearly still needs quite a lot of work.

Take care,

Simon
simon...@gmail.com

Simon Wiles


Dharma Drum Buddhist College 法鼓佛教研修學院 (DDBC)
No. 2-6 Xishihu, Jinshan 20842, Taipei County, Taiwan, R.O.C.

台灣,20842台北縣金山鄉西勢湖2-6號 Tel: +886-2-2498-7171 # 2228

Marcus Bingenheimer

unread,
May 30, 2008, 6:24:38 AM5/30/08
to iba...@googlegroups.com
Hi Christian,

I really meant " *along the lines* of the Collex tool".
We are still assessing if the Collex software is the way to go or if it is actually cheaper to develop something else.

We (that is Simon, Joey and me) are pretty sure however that, the Lucene/Solr stock is the way to go. Last time I looked at the figures for the querytime it was really lightning fast, Lucene is mature and scales well to the amount of data we might have to deal with.

Collex requires users to login to the system and all the user
generated data (which is the researchers work) is hold in the system
with no apparent way to get it out.

The interface at Nines does not really require users to register before searching across all archives this meta-archive offers. The login is for customization of certain features like collecting and tagging objects. True, this is the main promise of Collex, but in a first stage - say, until next summer - we will strive to build a front end that can search across multilingual collections and support "faceting" of information.
Customization features are not high on my agenda these days.
 
 I think this is pretty
unacceptable:  

Agree completely, when customization happens it should be transparent and user-empowered.
 
 And of course
IBA should use OpenID to start with.

Of course.
 
Is
there a blog or other site where your process of creating this site is
traced?  I think that would be interesting reading for all of us.

Not yet. Though we will try to keep everybody in the loop.
The plan is to develop the technology to pull the 15+ different collections at DDBC together. This then could serve as a model to build an IBA site (other suggestions are welcome). As to one or many such IBA sites: since with Lucene/Solr/Collex we are talking index engine, the idea is that everybody keeps his/her data, but lets IBA index it and provides stable URLs. So basically everybody can develop one's own archive and interface, but there is one common gateway where academic value can be searched in intelligent ways, and that can assist with metadata, collection development, archiving etc.
As for the organizational structure of IBA we also might take a page out of the "book of Nines" (http://www.nines.org/contributors/boards.html). Their set up with a steering committee and various editorial boards for different content areas looks quite good to me.
Another model could be the TEI consortium, which in my eyes has proved democratic and transparent, while attaining a high academic standard.
The IBA community would supervise the technical as well as content development.
Perhaps we will have something like a Buddhist TEI one day.

all the best


marcus

Christian Wittern

unread,
May 30, 2008, 11:26:18 PM5/30/08
to iba...@googlegroups.com
Marcus Bingenheimer wrote:
> Hi Christian,
>
> I really meant " *along the lines* of the Collex tool".
> We are still assessing if the Collex software is the way to go or if it is
> actually cheaper to develop something else.
>
> We (that is Simon, Joey and me) are pretty sure however that, the
> Lucene/Solr stock is the way to go. Last time I looked at the figures for
> the querytime it was really lightning fast, Lucene is mature and scales well
> to the amount of data we might have to deal with.
>
This is what I have been arriving at as well. Since February, I have
played around with SolR and have quite a good impression. It is fast
and it does faceted searching really well. One problem is that it does
not yet supports Unicode Extension B, but that should come along sooner
or later.

So I see Collex mainly as an example to learn from, not necessarily to
adopt wholesale, and it is nice to see that you seem to have a similar view.

> Collex requires users to login to the system and all the user
>
>> generated data (which is the researchers work) is hold in the system
>> with no apparent way to get it out.
>>
>
>
> The interface at Nines does not really require users to register before
> searching across all archives this meta-archive offers. The login is for
> customization of certain features like collecting and tagging objects. True,
> this is the main promise of Collex, but in a first stage - say, until next
> summer - we will strive to build a front end that can search across
> multilingual collections and support "faceting" of information.
> Customization features are not high on my agenda these days.
>
>

Clearly you have to move step by step, but it is good also to know where
you want to get:-)


>
>
>> Is
>> there a blog or other site where your process of creating this site is
>> traced? I think that would be interesting reading for all of us.
>>
>
>
> Not yet. Though we will try to keep everybody in the loop.
> The plan is to develop the technology to pull the 15+ different collections
> at DDBC together. This then could serve as a model to build an IBA site
> (other suggestions are welcome). As to one or many such IBA sites: since
> with Lucene/Solr/Collex we are talking index engine, the idea is that
> everybody keeps his/her data, but lets IBA index it and provides stable
> URLs. So basically everybody can develop one's own archive and interface,
> but there is one common gateway where academic value can be searched in
> intelligent ways, and that can assist with metadata, collection development,
> archiving etc.
>

Here is where I think the Collex has got it right: The collections only
expose data, that can get harvested (but don't need to). And since
SolR can also do distributed search, I think it is way better to talk
about a protocol (what we send over the wire, and what we expect back),
than do all the indexing yourself. But I agree that either way could
probably work some way or the other.


> As for the organizational structure of IBA we also might take a page out of
> the "book of Nines" (http://www.nines.org/contributors/boards.html). Their
> set up with a steering committee and various editorial boards for different
> content areas looks quite good to me.
> Another model could be the TEI consortium, which in my eyes has proved
> democratic and transparent, while attaining a high academic standard.
> The IBA community would supervise the technical as well as content
> development.
> Perhaps we will have something like a Buddhist TEI one day.
>
>

Well, I wonder what the others think? But for the moment, its probably
best to play around and see what works and what does not work.

All the best,

Christian

--

Christian Wittern
Institute for Research in Humanities, Kyoto University
47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN

Christian Wittern

unread,
May 30, 2008, 11:26:52 PM5/30/08
to iba...@googlegroups.com
Simon Wiles wrote:
>
> https://subversion.lib.virginia.edu/repos/patacriticism/collex/trunk/
>
>
Thanks. I will check it out.
Reply all
Reply to author
Forward
0 new messages