Strategy for Synchronization with Institutional Database

35 views
Skip to first unread message

Roger Hyam

unread,
Jul 19, 2012, 9:31:29 AM7/19/12
to island...@googlegroups.com
Hi,

We have several institutional databases that contain both core business objects (e.g. specimen records) and organizational objects (e.g. people in ActiveDirectory and other places).

We want to have these objects represented by objects in our Islandora repository and have cron jobs runs to periodically keep the repository updated with changes in the source dbs

I have done something similar to this before by just indexing multiple resources using Solr but Islandora would act as a "sticky index" where objects would remain in their final state even if they drop out of the institutional databases. The repository will also contain a lot of stuff that doesn't reside anywhere else.

What is the best strategy for developing these synchronization scripts? Would it be better to develop a Drupal module for each script or have free standing command line programs? I am comfortable hacking in PHP and Java.

I'm sure this kind of thing is done quite commonly. Is there any documentation or any example code?

Many thanks,

Roger

Mark Leggott

unread,
Aug 10, 2012, 12:01:38 PM8/10/12
to island...@googlegroups.com
Hi Roger,

Not sure if you got an answer to this offline or not, so I thought I would add a couple of comments.

- We have some examples of sync with external/enterprise systems, but as you can imagine they are highly customized to the local systems.
- We have an example here at UPEI where we built integration with the locally developed financial system. When a scanned set of PO documents were uploaded we queried the financial system for that PO record and it sent back a package of XML using a custom program stored on their side.
- We also have numerous examples of Fedora records being populated via queries of external systems (typically via an API) with creation/updating of the record whenever the existing object is accessed.
- We have another system developed via DiscoveryGarden which takes an export of the organizations ERP data (coming from multiple enterprise systems) each night and does an insertion/sync depending on the existence of new/modified records. This is the trickiest since the organization can edit the records in the repository (they have information there not present in the ERP systems), so the system has to sync specific fields so data on the repo side is not lost.

Anyway - there are lost of examples that I can think of, so if you had a more detailed description of your requirement we might be able to provide some examples. I don't believe there are specific examples of code in Git or anywhere else, although we are actively working on a section of the islandora.ca site that would provide this kind of service. We should be making it available in the Fall.

Re the Active Directory piece, you can integrate AD with Islandora via the Drupal LDAP module, which provides basic sync between systems. You can also add additional processing in the Drupal/Islandora side to enhance what is done. Re Drupal module vs free-standing, we have done both and it depends again on the specific requirements and therefore which makes better business or technical sense. In the financial system example above there was a business interest in not having us getting in to their back-end database, so they built a small Java app that we sent our query to and it sent us back what we needed.

Mark
Reply all
Reply to author
Forward
0 new messages