Linked Data: Open Calais+Triplify

4 views
Skip to first unread message

Joss Winn

unread,
Sep 14, 2009, 11:11:02 AM9/14/09
to jiscpress
Hello all,

I just wanted to update you on what Alex and I have been cooking
recently and we welcome your comments.

As we've discussed previously, Alex has developed a plugin for
WordPressMU (WPMU) that operates as a background service on all blogs
across the platform, sending content to Open Calais
(http://opencalais.com/) and Yahoo's term extraction API
(http://developer.yahoo.com/search/content/V1/termExtraction.html) and
returning contextual tags (Open Calais) and extracted terms (Yahoo).
These are then stored in a separate WordPress DB table with links to
the corresponding blog posts. There are options in the plugin to
adjust the relevancy of the returned tags and to opt specific blogs
out of the process altogether. There's also the option to set the
process working at different intervals (hourly/twice daily/daily) or
to force it to run. It will also look for existing content on a
platform and work over it one time when first installed.

In addition, Alex's plugin then analyses/indexes the stored tags and
through the use of a sidebar widget, displays 'related' posts based on
the relationships found across the platform through the Open Calais
and Yahoo tags. I like to think of the widget as a bit like Adsense
in GMail, where ads are displayed that relate to the content of your
email. In terms of the JISCPress project, where every blog site is a
single document and every blog post a document section, each document
section has contextual and extracted tags (as well as normal, human
created tags) and the sidebar widget displays links to related
documents held on the JISCPress platform. I think Alex intends to
include the option of displaying links to related documents or links
to related document sections.

So, that's the first piece of work he's done and the code is zipped up
here: http://code.google.com/p/jiscpress/downloads/list Alex will add
it to the SVN repo here, after making a few more changes:
http://code.google.com/p/jiscpress/source/checkout Have a look at it
but I advise that you wait until we announce that it's ready for
general use.

The second piece of work Alex has done is to integrate Triplify
(http://triplify.org/Overview) with WPMU. Triplify is a small
application that is configured to look at a relational database and
produce an RDF (N-Triples) representation of it as a flat file. I've
written about Triplify here:
http://joss.blogs.lincoln.ac.uk/2009/04/27/triplify-make-your-blog-mashable/

Until now, there has been a configuration script available for single
hosted versions of WordPress but nothing for WPMU. Alex has now
written a WPMU config script for Triplify and a first, draft version
of it is here: http://code.google.com/p/jiscpress/source/browse/trunk/Triplify%20WPMU/config.inc.php
Some small changes still need to be made to that file, so don't go
using it in earnest just yet.

What this does is provide an RDF/XML file of triples for each blog on
a WPMU platform. It also registers the blog when hit for the first
time, with Triplify's registry (http://triplify.org/Registry) so that
it is easily found by crawlers. Here's an example of a WPMU JISCPress
document viewed through the Triplify RDF:
http://triplify.org/exhibit/?url=http%3A%2F%2Fscience.jiscpress.org%2Ftriplify%2F

Note that all of the document content is sourced from the JISCPress
platform. The Triplify output merely provides triples which link to
the original source content.

Once this is polished (a day or so from now), Alex will integrate it
with a Triplify plugin he is writing for WPMU which checks for
Triplify on the server and pushes the RDF/XML from each blog to Talis'
Connected Commons RDF triple store (http://www.talis.com/platform/cc/)
as a background process. This work has already started and we hope to
have a first version working this week. I wrote a bit about this idea
a while ago, too:
http://joss.blogs.lincoln.ac.uk/2009/04/29/getting-your-triples-into-talis-connected-commons/
and Talis seem to like the idea so we're hoping they accept our use of
the free Connected Commons platform.

Finally, I'm hoping that Alex will agree to include all the tags
returned from Open Calais and Yahoo in the data that Triplify exposes
as RDF/XML. This would happen anyway using an existing Open Calais
plugin such as tagaroo which stores tags in the normal WordPress tag
table, but our plugin uses a separate table and so the Triplify config
script will need to be aware of that. It's something we can change
for our own purposes while releasing the config script for use with a
generic WordPress install.

The end result of all of this work is that JISCPress as a document
hosting platform will have each document tagged with contextual and
extracted tags. Each document, including those 'semantic' tags, will
be represented as RDF triples and those triples then pushed nightly to
Talis Connect Commons. The potential for this is quite considerable
for a platform like JISCPress, but also for a large WPMU platform such
as wordpress.com

Effectively it means that the 6 million blogs on wordpress.com could
have hosted RDF/XML representations, making a huge contribution to the
overall store of Linked Data. At this scale there are significant
resourcing issues, but I think we will be able to show a proof of
concept in a few days time.

Thanks Alex!

Joss

Alex

unread,
Sep 14, 2009, 12:49:16 PM9/14/09
to JISCPress
Just to reiterate a point Joss made, these plugins were not built with
scalability in mind.

I will post a message here when I've finished the two updates I want
to make.

Alex
> it to the SVN repo here, after making a few more changes:http://code.google.com/p/jiscpress/source/checkoutHave a look at it
> but I advise that you wait until we announce that it's ready for
> general use.
>
> The second piece of work Alex has done is to integrate Triplify
> (http://triplify.org/Overview) with WPMU.  Triplify is a small
> application that is configured to look at a relational database and
> produce an RDF (N-Triples) representation of it as a flat file. I've
> written about Triplify here:http://joss.blogs.lincoln.ac.uk/2009/04/27/triplify-make-your-blog-ma...
>
> Until now, there has been a configuration script available for single
> hosted versions of WordPress but nothing for WPMU.  Alex has now
> written a WPMU config script for Triplify and a first, draft version
> of it is here:http://code.google.com/p/jiscpress/source/browse/trunk/Triplify%20WPM...
>  Some small changes still need to be made to that file, so don't go
> using it in earnest just yet.
>
> What this does is provide an RDF/XML file of triples for each blog on
> a WPMU platform.  It also registers the blog when hit for the first
> time, with Triplify's registry (http://triplify.org/Registry) so that
> it is easily found by crawlers. Here's an example of a WPMU JISCPress
> document viewed through the Triplify RDF:http://triplify.org/exhibit/?url=http%3A%2F%2Fscience.jiscpress.org%2...
>
> Note that all of the document content is sourced from the JISCPress
> platform.  The Triplify output merely provides triples which link to
> the original source content.
>
> Once this is polished (a day or so from now), Alex will integrate it
> with a Triplify plugin he is writing for WPMU which checks for
> Triplify on the server and pushes the RDF/XML from each blog to Talis'
> Connected Commons RDF triple store (http://www.talis.com/platform/cc/)
> as a background process.  This work has already started and we hope to
> have a first version working this week. I wrote a bit about this idea
> a while ago, too:http://joss.blogs.lincoln.ac.uk/2009/04/29/getting-your-triples-into-...
Reply all
Reply to author
Forward
0 new messages