Gathering Publications

Bram Vandeputte

unread,

Mar 23, 2010, 4:10:49 AM3/23/10

to scit...@googlegroups.com

Hi all,

This message is an update on our ongoing work on the collection of publication metadata, and an effort to keep this work (that involves STELLAR members as well as others) synchronized with the rest of WP6.

Discussion is mainly taking place at a google group (http://groups.google.com/group/scitel20)

We have also started to document the work on the Stellar Wiki :
http://www.stellarnet.eu/d/6/3/KULDocumentation

Here is a short overview:

BuRST :

extending : We want to add geographical data to affiliations, would this be possible by extending BuRST ?
feed paging mechanism :

How do you envision feeding large amounts of publication data ? At some point we will need a paging or selection mechanism (for eg. by modification date). Possible options are :agreeing on an rss paging mechanism or using OAI-PMH with (part of) a BuRST feed

pub.fm api :

With pub.fm we try to have a commonly agreed simple api available, which allows for tools and widgets to extract info from the publication data without having to parse all BuRST feeds over and over.
This is the api we will make available on our datasets. If other datasets (like the Stellar OA) would expose this api, then it would become easy to plug-in those tools to other datasets. Also when other Stellar widgets would use this api, they could easily be put on top of all datasets using the pub.fm api.
link to documentation : http://www.stellarnet.eu/d/6/3/KULDocumentation#pub.fm_API

architecture :

We are trying to come up with a way of having all TEL related BuRST feeds available through one central endpoint. Possible options for this are setting up an own "super" DB, which harvest form all available datasets (OA, ECTEL, ED-MEDIA, ...), making one existing repository the "super" repository, or agreeing on a common api and maybe putting some federated search on top...
link to documentation : http://www.stellarnet.eu/d/6/3/KULDocumentation#Architecture

We really would like to get your feedback and therefor propose a poll for organising the next flashmeeting on this topic:
http://www.doodle.com/xtuytsz4cg24t2uf

greetings,

Bram

Erik Duval

unread,

Mar 25, 2010, 5:27:30 PM3/25/10

to scit...@googlegroups.com, w...@lists.stellarnet.eu

Thanks to all who responded!

Seems like Tuesday 30 March @ 11 am will work best (sorry, Peter!). I have made a booking at

http://fm.ea-tel.eu/fm/2bd9fe-20968

I would suggest that we use this occasion for an open discussion of 1 hour on what we are all doing and where want to go, so that we all understand each others interests...

Best from Washington, on my way home,

--Erik Duval

http://erikduval.wordpress.com/

--
You received this message because you are subscribed to the Google Groups "SciTEL2.0" group.
To post to this group, send email to scit...@googlegroups.com.
To unsubscribe from this group, send email to scitel20+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scitel20?hl=en.

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more information.

Bram Vandeputte

unread,

Mar 30, 2010, 7:18:50 AM3/30/10

to scit...@googlegroups.com

This is an attempt to post the omnigraffle file of the architecture overview.

greetings,

Bram

Publications architecture.graffle

Thomas Ullmann

unread,

Mar 30, 2010, 12:03:23 PM3/30/10

to SciTEL2.0

Hello Bram and SciTELs,

Thank you for putting this together.

I worked before at KMi on a Semantic Search Engine visualization and I
am somewhat familiar with Semantic Web technology. Personally I see
great potential for the area of information management. Especially
under the aspect of enriching information with other (semantic) data
sources.

For average case this technology can help to mash different sources
together without the need to set up anything, by just using the
resources out there (e.g. the diverse SPARQL endpoints of the LOD
initiative). For other cases especially for more complicated queries,
the time from the query to the result simply takes too long. The users
will not wait for this.
Also I am not sure how well the integration of several different
SPARQL endpoints practical works.

Maybe we can brainstorm here about a demonstration wookie widget,
which will show some of the benefits of the semantic data. For example
we can build the following widget using the dblp or later with the
OA:
- enter you name
- select from a list of authors with the same name the one you are
interested in
- show publications of this person

Challenges:
- Matching the names to author instances (I know that we can use
regexpression in SPARQL, but could be a bit tricky).
- Parsing of the result triples with either Javascript (if there is a
library) in the wookie widget or getting a jason either from the
SPARQL endpoint (if dblp endpoint does this) or from from an own
parser from a server.

Best,

Thomas

Wolfgang Reinhardt

unread,

Mar 31, 2010, 7:51:55 AM3/31/10

to scit...@googlegroups.com, Wolfgang Reinhardt

Hi all,

maybe I missed it in yesterday's discussion, but how do you plan to extract keywords from the publications and are you only focussed on "words" or also interested in "phrases". In Paderborn we're using OpenCalais and Orchestr8 to extract keywords and key phrases from text and this works pretty well. Some years ago we implemented something ourselves based on GATE, which was pretty prone to errors...

Best

Wolle

--
You received this message because you are subscribed to the Google Groups "SciTEL2.0" group.
To post to this group, send email to scit...@googlegroups.com.
To unsubscribe from this group, send email to scitel20+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scitel20?hl=en.