Web-based development of a thesaurus, which tool?

245 views
Skip to first unread message

Bastiaan Braams

unread,
Aug 29, 2013, 7:22:17 AM8/29/13
to skos...@googlegroups.com
Greetings. I wonder if SKOSed would be the right tool for me or what else I should consider. I want to work with a few colleagues from around the world on the construction of a small thesaurus. The area of interest for the thesaurus is a sub-field of atomic and molecular physics. I estimate that the number of preferred concepts or terms in the thesaurus will be about 100 with perhaps another 200 non-preferred terms, and the relations will be the simplest ones: narrower term, broader term, related term, and pointers between non-preferred and preferred terms. Small as it is, I still think that it would be nice to have a proper thesaurus editor at the disposal of the group. It would be very nice if everything were web-based and it would be ideal if I would not have to manage the web hosting other than by using some standard public wiki or web-hosting tools. The colleagues are subject-matter experts and I am not thinking about automatic vocabulary extraction tools; I just want to put up a draft as a start and have it open for editing (and commenting, somehow) by the colleagues.

Antoine Isaac

unread,
Aug 29, 2013, 8:21:04 AM8/29/13
to skos...@googlegroups.com
Dear Bastiaan,

http://www.w3.org/2001/sw/wiki/SKOS gives access to a list of SKOS tools, including several online registries/editors.
You could use one of these after kick-starting a draft vocabulary offline in SKOSEd.

You can use the general SKOS list (public-...@w3.org) if you want to reach other tools' developers.

Best,

Antoine


> Greetings. I wonder if SKOSed would be the right tool for me or what else I should consider. I want to work with a few colleagues from around the world on the construction of a small thesaurus. The area of interest for the thesaurus is a sub-field of atomic and molecular physics. I estimate that the number of preferred concepts or terms in the thesaurus will be about 100 with perhaps another 200 non-preferred terms, and the relations will be the simplest ones: narrower term, broader term, related term, and pointers between non-preferred and preferred terms. Small as it is, I still think that it would be nice to have a proper thesaurus editor at the disposal of the group. It would be very nice if everything were web-based and it would be ideal if I would not have to manage the web hosting other than by using some standard public wiki or web-hosting tools. The colleagues are subject-matter experts and I am not thinking about automatic vocabulary extraction tools; I just want to
> put up a draft as a start and have it open for editing (and commenting, somehow) by the colleagues.
>
> --
> You received this message because you are subscribed to the Google Groups "skos-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to skos-dev+u...@googlegroups.com.
> To post to this group, send email to skos...@googlegroups.com.
> Visit this group at http://groups.google.com/group/skos-dev.
> For more options, visit https://groups.google.com/groups/opt_out.

Bastiaan Braams

unread,
Aug 30, 2013, 4:34:24 PM8/30/13
to skos...@googlegroups.com
<<You could use one of these after kick-starting a draft vocabulary offline in SKOSEd.>>

That sounds good, and I tried to make a start on it. However, I am stuck already. If someone can direct me to appropriate documentation or to another group then please do so. Let me describe my experience.

I downloaded and installed Protege-4.3 on my Ubuntu system by following the instructions at protege.stanford.edu. I invoke Protege by doing './run.sh &' in the Protege-4.3 base directory. I want to open an ontology, any ontology, just to see what is there, so I do "File > Open from URL ...". First I try the pizza ontology as it was recommended somewhere as a sandbox, but it fails; I find out that the web page is down. Next I try the koala ontology; it sounds like it might be another sandbox. It seems to be empty, although I would not really know. I try the Wine ontology and it seems to have some content. Fine!

Next, on to the SKOSeditor. I download version 2.0-alpha from code.google.com/p/skoseditor/‎, unpack the zip archive and move the file org.protege.skoseditor.jar into the Protege-4.3/plugins directory; I am just guessing that this is the thing to do. I close and open Protege again and I notice a SKOSEd button in the top menu bar. Under that button I see that I can import a SKOS-DL ontology and I can convert labels to annotations.

That was some time ago and I haven't made progress since. I can't figure out how to start a new SKOS thesaurus. I can't find any instructions on the web either about how to use SKOSed, all that I find is how to download it.

Simon Jupp

unread,
Aug 30, 2013, 6:32:01 PM8/30/13
to skos-dev
Hi Bastian, 

There is a SKOSEd tab that you need to activate in Protege under Window > Tabs > SKOS view. You can then use the Asserted Concept Hierarchy panel on the right to start adding new Concepts. Try loading the simple skos file attached to see how it looks.

Protege is primarily an OWL ontology editor, so SKOSEd gives you a very OWL centric view of SKOS. This can be useful, but can also be a distraction if aren't that familiar with Protege and OWL. If you want a web based solution, I heard that the Protege team are developing support for a collaborative SKOS editor for Web-Protege, but I'll have to check on the status of that. 

If you have any more specific problems or questions about SKOSEd then feel free to contact me directly. 

Simon






music.skos.owl.zip

Bastiaan Braams

unread,
Aug 31, 2013, 3:13:30 PM8/31/13
to skos...@googlegroups.com
<<I heard that the Protege team are developing support for a collaborative SKOS editor for Web-Protege>>

I had been looking for that already and look forward to finding it in due time. My thesaurus needs are modest by the standards of SKOS and SKOS is modest compared to OWL. I think that I find SKOSEd too big for my purpose; and besides, I really want my project on the web for collaboration.

Desperation for a lightweight web-based collaborative treatment may drive me to use the Google Docs+Drive file system. I think the idea is not even crazy and so I like to lay it out here. The Docs file system has some notable features. (1) The folder structure allows a file or folder to have multiple parents all treated equally, not like the links in Unix or the shortcuts in Windows. (2) Entity names are just tags; there can be multiple items that share the same name even within the same folder. (3) Every object has a description field. (4) Collaboration is possible and the rights of non-owners can be controlled per folder.

For my thesaurus I have in mind now to create a folder for every preferred term and a text file for every non-preferred term; the name of the folder or file will be the name of the concept. The folder structure (the containment relations) will be set up to represent the broader concept and narrower concept (BT/NT) relationship; as noted, a concept can have more than one BT and the file system handles it gracefully. A text file that represents a non-preferred term will belong to one folder if there is a one-to-one USE/UF relation. If the relation is indicated by USE+ then the file will belong to more than one folder. (I am inclined to understand it to mean: use one or more of the indicated preferred terms.) The related terms (RT) relation may be indicated by a file that I give the name _RT and that I assign to two or more folders to indicate mutual relations among the represented preferred concepts; as noted, there can be arbitrarily many files all called _RT. (The underscore only serves to set it off in an alphabetical listing.) If I want to use classification codes for the preferred terms then I may use files with a name in the style #<code> and assign such a file to a concept's folder to indicate the code for that concept. Scope notes (SN) for a preferred term go into the description field for the associated folder, but that description field may be used for additional notes (definitions, historical notes) as well.

The non-preferred terms are represented by a text file, which also has a description field attached to it, besides which it may have textual content. I will be inclined to use the description field more the more formal usage pointers and use the content of the text file, if it is to be used at all, for informal notes and discussion. Likewise the _RT files have a description field and textual content, and I'll use the description field for the terse formal explanation of the relationship and the textual content, if it will be used at all, for informal notes and discussion.

The folders and the files have a different icon in the listing, so there is a visual distinction between preferred and non-preferred terms. Color can be used for some other distinction. It might be used not at all, or it might be used in some informal way to indicate some class of concepts (individuals, actions, etc.) Items can be given a star and I might use that, if it is to be used at all, to indicate the most highly preferred terms.

There are drawbacks to this treatment, and I'll drop it if Web-Protege comes up with a lightweight Thesaurus/SKOS editor. One drawback is that Google Docs does not have a history recovery mechanism for folder structure. (They do have such a mechanism for file content.) It means that one needs to be careful and use a backup. The second drawback then follows: Google Docs does not have a suitable backup system. They have Takeout (www.google.com/takeout/), but it does not preserve the description fields and it has been flaky with respect to Google Docs [1]. The biggest drawback is that there isn't at this time a tool to extract a transportable representation of the thesaurus out of this Docs folder and file structure. However, it doesn't look so far-fetched to imagine such a tool.

A charming idea beyond my immediate need is that the file structrure that encodes the thesaurus could also be used to hold a bibliographical database for which the thesaurus is intended. The entries in the database could be RIS or BibTeX files and they can just live in their proper place in the structure. However, I don't really have this application in mind now; I just want to create the thesaurus.

[1] (2012-12-27) Google Drive files not showing up in Google Takeout.

tee toth

unread,
Jan 21, 2014, 2:08:46 PM1/21/14
to skos...@googlegroups.com

On Friday, August 30, 2013 6:32:01 PM UTC-4, Simon wrote:
There is a SKOSEd tab that you need to activate in Protege under Window > Tabs > SKOS view. You can then use the Asserted Concept Hierarchy panel on the right to start adding new Concepts. Try loading the simple skos file attached to see how it looks.

I got the SKOSEd tab no problem when I put the jar into the plugins directory and restarted Protege.
And I could also load your music skos file no problem (thanks for the example!) and it showed the broader relationships you had set up,

That is about how far I got with putting my own glossary into SKOS; i.e. only class relationships, which is nice if you want to build an ontology.

But where do I put the definitions of terms?  Isn't providing definitions of words part of the purpose of SKOS? After all, it's part of SKOS reference: http://www.w3.org/2009/08/skos-reference/skos.html#definition

So how do I define Opera as "The style of music where it's not over until the fat lady sings"? (and keep all the nice subclass relationships you set up).

Tee

tee toth

unread,
Jan 28, 2014, 11:52:50 AM1/28/14
to skos...@googlegroups.com
I finally figured out the SKOSEd plugin, and wrote a first-timer's tutorial for it.

Using SKOSEd is easy, but not necessarily intuitive.

See http://code.google.com/p/skoseditor/wiki/SKOSEditorTutorial

-Tee

On Tuesday, January 21, 2014 2:08:46 PM UTC-5, tee toth wrote:

Where do I put the definitions of terms?  Isn't providing definitions of words part of the purpose of SKOS? After all, it's part of SKOS reference: http://www.w3.org/2009/08/skos-reference/skos.html#definition

How do I define Opera as "The style of music where it's not over until the fat lady sings"?

Tee
Reply all
Reply to author
Forward
0 new messages