On Apr 24, 2013, at 9:57 PM, Peter Murray <
peter....@LYRASIS.ORG> wrote:
> On Apr 24, 2013, at 5:37 PM, Aaron Coburn <
aco...@amherst.edu> wrote:
>>
>> Or, for something completely different (which is what I am actively experimenting with now), try handling ingests with a message broker [3] and an integration framework such as Camel [4] -- then everything is done asynchronously and can easily be spread over multiple machines.
>
> Interesting! Is this related to the microservices framework in Islandora? Some of the derivative process (particularly the JP2 creation, it seems) can take quite a while -- and I haven't even gotten to audio and video yet -- so I'll probably be looking at other options.
It is the same basic idea -- both use asynchronous application messaging to handle distributed processing (that's a mouthful!). The existing microservices framework in Islandora is built on a series of python scripts that listen to ActiveMQ over the STOMP protocol. This works really well, and I have written my own share of Python scripts to do similar such tasks. The nice thing about using Camel, though, is that you *don't actually write any code*. Plus, you can skip all of the boilerplate that is required for an equivalent python script (starting/stopping, connecting/reconnecting to the message broker, exception handling, logging, running as a service, etc).
For example, this snippet of "code" implements pretty much everything FedoraGSearch does:
<route>
<from uri="activemq:name.of.queue"/>
<to uri="http4://fedora-host:8080/fedora/object/${header.pid}/objectXML?authUsername=...&authPassword=..."/>
<convertBodyTo type="org.w3c.dom.Document"/>
<to uri="xslt:file:///path/to/stylesheet"/>
<to uri="http4://solr-host:8983/solr/update"/>
</route>
And deploying this is as simple as copying the XML file to a directory (if you are using Karaf as a container).
In production, it is a bit more involved, because I aggregate, filter and split up the messages in order to send to a number of different endpoints (solr for search, jena for linked data, couchdb for intermediate data caching, proai for metadata harvesters, etc). But all of that ends up being really easy to implement and maintain.
Aaron