Cocoon Sitemap + Jena Assembler = Peanut Butter + Strawberry Jam?

9 views
Skip to first unread message

stub

unread,
Mar 19, 2008, 1:59:01 PM3/19/08
to using Cocoon as a Semantic Platform
A bit of background:
Assembler descriptions are RDF frames used to instantiate RDF models
from a variety of sources (OWL or turtle files, Jena SQL databases,
etc.) with different kinds of inference (RDFS, OWL, DIG/pellet/Fact+
+,
custom Jena inference rules), and also to set up SPARQL input datasets
based on those models.

Both models and rules may be specified inline, or in separate files.

The scoop:
http://jena.sourceforge.net/assembler/index.html
http://jena.sourceforge.net/assembler/assembler-howto.html

This facility is quite powerful, and potentially a strong complement
to the Cocoon sitemap. Consider the following similarities:

Sitemaps and Assemblers are both:
*Part of an open-source java framework with a liberal license
(Apache/BSD/MIT style)

*Declarative, rule based descriptions of application components

*Nestable in multiple dimensions (sitemap mounting, pipeline
nesting and aggregation, model imports, quoted models,
merged models)

*Modifiable and reloadable at runtime

*A "master" artifact used to coordinate action of many "slave"
artifacts (e.g. XSLT and Jena Rules)

*Able to concentrate different kinds of logic in a single
artifact, or to spread it over many artifacts

*Concentrated attempts to leverage standards-based web technology
using a problem-specific language encoded as data.

(Sitemaps are XML, Assembler Descs are RDF, so there is no
special parser for either one, but the semantics of each
language are carefully defined to match a problem domain).

...so there is a lot of philosophical overlap. On the other hand,
they offer mostly orthogonal functionality. Assemblers are used to
identify the location of ontologies in files and databases, and
how these ontologies should be combined and inferenced over to set
up a query (or other model operation). Sitemaps are used to define
pipelines of XML-compatible processing.

One thing Assemblers don't do is specify how outputs should be
connected to inputs in a multi-stage processing pipeline.
Assemblers aren't used to identify transformation/query/update
operations, either (they just set up the models/datasets and
the inference). But hey, those things are exactly what Cocoon
sitemaps are good for!

So, I guess it's obvious that I'm smelling a major peanut butter
and jelly scenario here (apologies to those allergic to peanuts.
Black Beans and Advocado? Laurel and Hardy? Cagney and Lacey?)

We have been experimenting with using the Assembler under Cocoon
for over a year. The combination is quite potent. There is some
code for doing this in the recent Peruser release, but it's not
fully exposed and it's not documented at all yet. Some parts of
the Peruser code were built before we were aware of the Assembler
stuff, hence we wound up duplicating some Assembler functionality,
and
we have been thinking about how to make the integration tighter now.

If we imagine an application as a set of pipelines interacting with
semantic models (as well as other Cocoon-accessible resources: SQL,
web services, etc.), all based on open source and standards, the
possibilities are exciting (to me, anyway)! There are some
technical problems that come up, of course, having to do with naming,
caching, updates, among other things.

One thing to keep in mind is that there are at least two ways to
produce XML output from a model or dataset:
1) As RDF/XML (which can then become another model as needed)
2) As SPARQL XML Results Format

Each of these formats can of course be transformed with XSLT, and
there are examples of doing both in the current Perser codebase.

It's not terribly hard to set up a scenario where we do a
SPARQL query over some inferenced models, use that result to pull
in some SQL as XML data (using Cocoon SQL-Transformer) and some
more web service XML data (perhaps from an eXist-DB XQuery service),
XSL-Transform some of that XML into RDF/XML and then instantiate
some more models, do some more inference (perhaps using Pellet or
Fact++), execute another SPARQL query, and finally transform for user
output, all within a single pipeline (or multiple pipelines), all
run-time modifiable and with no custom java code.

Until now I hadn't been ready to commit 100% to the idea that the
Assembler+Sitemap combination should be a major focus of work,
for a variety of reasons. But now, I am really starting to lean in
this direction.

I am wondering if many other people (from both the Cocoon and Jena
communities) are interested in pursuing this technology combination
as one of the pillars of our open source semantic application
platforms? If not, what do you see as the main sticking points
or better alternative pathways to building truly rich and dynamic
semantic applications using all open source technology?

If others are gung-ho about this idea, I'd like to discuss how we
can share ideas and work together.

Also, if anyone knows of other work going on in this area, I'd love
to hear about it.

peace,

Stu

P.S. At what point is this thing going to break my line into two ugly
pieces, exactly? My last post looked like crap...and seemed to be
broken at 70 chars, so I formatted this one to (mostly) fit in that.
This very long line is a test.
Reply all
Reply to author
Forward
0 new messages