Creating Islandora Tuque object in drush authentication fails.

Roger Hyam

unread,

Apr 23, 2013, 7:37:55 AM4/23/13

to island...@googlegroups.com

Hi Guys,

I'm writing a drush importer script. To do this I have written a module with a simple form that will import one item at a time so I can test it works. I then have a drush command that calls the same code in a batch mode from the command line.

I was having problems getting the authentication to work when I called the code through drush because the Drupal user wasn't being passed through the tuque connection. I was using drush -u to supply the user login name and I could echo the global user variable to see the user but it wasn't getting into tuque. I could print_r the tuque object and see the annon user as well as seeing it in the fedora logs.

I put a debug line in the __constructor of the IslandoraTuque class like this

error_log( "IslandoraTuque::__construct: " . $user->name );

When I called the code through the web interface I got the line printed out.

When I called the code through drush I didn't!

In drush we seem to be making a IslandoraTuque object without calling the constructor... hmmm. Not sure how this is possible.

I have got my drush script working by adding this code before I start accessing the repository

   // make sure we have an authenticated connection
    $tuque = islandora_get_tuque_connection();
    $tuque->connection->username = $user->name;
    $tuque->connection->password = $user->pass;

It is fine but this seems a bit of a hack and like there is a bug there somewhere that might come back to bite us later. Someone with more knowledge of the code might like to take a look and see if there is something obvious.

Of course I may be missing something but I thought I should flag it up just in case.

Many thanks,

Roger

Roger Hyam

unread,

Apr 23, 2013, 7:41:02 AM4/23/13

to island...@googlegroups.com

Answering my own stupidity!!!

Of course it wasn't in the apache logs - it was through drush - duhhh!

Roger

Nick Ruest

unread,

Apr 23, 2013, 7:52:27 AM4/23/13

to island...@googlegroups.com

I'd love to see your Drush import script when your finished. If you
don't mind.

I'm always curious how others import content.

-nruest

> --
> You received this message because you are subscribed to the Google
> Groups "islandora-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora-de...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Adam Vessey

unread,

Apr 23, 2013, 8:19:16 AM4/23/13

to island...@googlegroups.com

When you run a drush script, it has no user selected by default (similar
to anonymous?). To make something happen as a particular user, you
should specify which user in the drush command line, using either (the
long form) "--user" or (the short form) "-u". I believe this option can
take either a uid number (1 = super-user), or a user name.

- Adam

Roger Hyam

unread,

Apr 23, 2013, 8:25:03 AM4/23/13

to island...@googlegroups.com

Hi Adam,

Yes I was setting the user with -u and I could get the drupal user object in the script and dump it to the log just fine and dandy. It just wasn't picked up by IslandoraTuque. I suspect that drush bootstraps the user later than drupal or something? Don't know.

Roger

Adam Vessey

unread,

Apr 23, 2013, 8:39:26 AM4/23/13

to island...@googlegroups.com

Ah derp. Missed that in the second paragraph... Sorry. :P

Where exactly were you probing in the constructor? If it's before taking in the "global $user" bit (https://github.com/Islandora/islandora/blob/7.x/includes/tuque.inc#L71), we might end up with a false-positive that it's not picking it up... On the other hand, if it's doing an islandora_get_tuque_connection() somewhere before it gets bootstrapped, then an instantiation of IslandoraTuque as anonymous could be stored in the static variable (created/accessed via drupal_static()) in islandora_get_tuque_connection().

- Adam

Roger Hyam

unread,

Apr 23, 2013, 8:56:43 AM4/23/13

to island...@googlegroups.com

A couple of people have asked if I could share the script so find attached a zip up of my development directory.

Be warned. This is very early work in progress!!! It still needs a lot of work to get it to go properly and my strategy may prove wrong when I try it on the full data set (700k objects).

The basic plot is that we have a database full of specimens info. Three type: herbarium, living accessions and plants. We want objects representing them in the repo that we occasionally sync with the production database. Specimen objects in the repo are really just things to link more interesting digital objects to.

We have numbers for our specimens so we will use those as the object ids and have different namespaces for the three kinds.

We will have a raw XML representation of the specimen as a DS and then create derivatives from it - chiefly Darwin Core, Dublin Core and maybe MODS.

The source DB is unreliable at keeping track of what has changed. The things we are querying are actually like view tables - the products of complex queries. We therefore do a test to see if the MD5 of the raw DS matches the MD5 of the proposed new raw XML and fail if it does - but with an option to force overwrite. This should mean routine updates are a lot faster. I'm worried about how fast this thing will run.

This code creates a form for testing import of single specimens through the admin interface and also provides a drush command that looks like this:

drush -u admin rbge_specimen_sync HERBARIUM 10 true

So far I have just got end to end with importing 10 herbarium specimens! I need to add scale and the other specimen types to the mix. Things may change a lot but it is probably better to share it now as it is likely to get more complex and harder to understand as I add edge case and performance improving code.

I hope this is of some use to someone. Please let me know about all the dumb things you can see me doing.

Thanks,

Roger

rbge_specimen_WORK_IN_PROGRESS.zip

Peter Murray

unread,

Apr 23, 2013, 9:56:05 AM4/23/13

to island...@googlegroups.com

Thanks for sharing your work, Roger. I'm intrigued by this approach using `drush` because up until now I was planning on writing straight into the Fedora repository. I can see the benefit of using the already established business logic of the Islandora Drupal modules, but I was thinking the processing overhead wouldn't be worth it.

Have others approached command-line bulk loading the same way?

Peter

On Apr 23, 2013, at 8:56 AM, Roger Hyam <roge...@googlemail.com> wrote:
>
> A couple of people have asked if I could share the script so find attached a zip up of my development directory.
>
> Be warned. This is very early work in progress!!! It still needs a lot of work to get it to go properly and my strategy may prove wrong when I try it on the full data set (700k objects).
>
> The basic plot is that we have a database full of specimens info. Three type: herbarium, living accessions and plants. We want objects representing them in the repo that we occasionally sync with the production database. Specimen objects in the repo are really just things to link more interesting digital objects to.
>
> We have numbers for our specimens so we will use those as the object ids and have different namespaces for the three kinds.
>
> We will have a raw XML representation of the specimen as a DS and then create derivatives from it - chiefly Darwin Core, Dublin Core and maybe MODS.
>
> The source DB is unreliable at keeping track of what has changed. The things we are querying are actually like view tables - the products of complex queries. We therefore do a test to see if the MD5 of the raw DS matches the MD5 of the proposed new raw XML and fail if it does - but with an option to force overwrite. This should mean routine updates are a lot faster. I'm worried about how fast this thing will run.
>
> This code creates a form for testing import of single specimens through the admin interface and also provides a drush command that looks like this:
>
> drush -u admin rbge_specimen_sync HERBARIUM 10 true
>
> So far I have just got end to end with importing 10 herbarium specimens! I need to add scale and the other specimen types to the mix. Things may change a lot but it is probably better to share it now as it is likely to get more complex and harder to understand as I add edge case and performance improving code.
>
> I hope this is of some use to someone. Please let me know about all the dumb things you can see me doing.
>
> Thanks,
>
> Roger
>

--
Peter Murray
Assistant Director, Technology Services Development
LYRASIS
Peter....@lyrasis.org
+1 678-235-2955
800.999.8558 x2955

Jordan Dukart

unread,

Apr 23, 2013, 10:04:40 AM4/23/13

to island...@googlegroups.com

Can offer that at DGI, for bulk ingesting purposes, drush is usually our preferred way. Some devs here may argue about the Python approach using fcrepo though.

Peter Murray

unread,

Apr 24, 2013, 5:02:56 PM4/24/13

to island...@googlegroups.com

On Apr 23, 2013, at 10:04 AM, Jordan Dukart <jor...@discoverygarden.ca> wrote:
>
> Can offer that at DGI, for bulk ingesting purposes, drush is usually our preferred way. Some devs here may argue about the Python approach using fcrepo though.

Thanks, Jordan. Other perspectives?

Peter

Randy Fischer

unread,

Apr 24, 2013, 5:37:03 PM4/24/13

to island...@googlegroups.com

On Wed, Apr 24, 2013 at 5:02 PM, Peter Murray <peter....@lyrasis.org> wrote:

Thanks, Jordan. Other perspectives?

I'm a programmer at FLVC, not an islandora developer, so no idea if it's appropriate for me to post here...

I'm having good luck so far with rubydora for Digitool migrations. There's a lot of filesystem work involved (maintaining ingest queues, error processing and logging for external access), so I went for a language our developers were most familiar with. I'll my code iup on github when it's not so <cough> nascent.

Aaron Coburn

unread,

Apr 24, 2013, 5:37:39 PM4/24/13

to <islandora-dev@googlegroups.com>

If you are looking for a python library that is more actively maintained than fcrepo, take a look at eulfedora [1]. This is what I have been using for bulk ingests. If you prefer fcrepo, you should, at the very least, use DGI's own fork of fcrepo, which will definitely work much better than the pypi version.

Or, if you prefer ruby, try rubydora [2].

Or, for something completely different (which is what I am actively experimenting with now), try handling ingests with a message broker [3] and an integration framework such as Camel [4] -- then everything is done asynchronously and can easily be spread over multiple machines.

Aaron

[1] https://github.com/emory-libraries/eulfedora
[2] https://github.com/projecthydra/rubydora
[3] http://activemq.apache.org
[4] http://camel.apache.org

Peter Murray

unread,

Apr 24, 2013, 6:32:30 PM4/24/13

to island...@googlegroups.com

For whatever it's worth from my perspective, thanks for posting your observations (islandora developer or not). Although Ruby is not my language of choice, I'd appreciate seeing the code where you have it in a mostly presentable state to compare it with implementations in other languages. (Who knows? I might learn some Ruby in the process…)

Randy Fischer

unread,

Apr 24, 2013, 7:08:04 PM4/24/13

to island...@googlegroups.com

For whatever it's worth from my perspective, thanks for posting your observations (islandora developer or not). Although Ruby is not my language of choice, I'd appreciate seeing the code where you have it in a mostly presentable state to compare it with implementations in other languages. (Who knows? I might learn some Ruby in the process…)

Sure thing Peter. I've learned a lot reading your posts and questions, so I'll be happy to return the favor.

-Randy Fischer

Peter Murray

unread,

Apr 24, 2013, 9:57:59 PM4/24/13

to island...@googlegroups.com

On Apr 24, 2013, at 5:37 PM, Aaron Coburn <aco...@amherst.edu> wrote:
>
> Or, for something completely different (which is what I am actively experimenting with now), try handling ingests with a message broker [3] and an integration framework such as Camel [4] -- then everything is done asynchronously and can easily be spread over multiple machines.

Interesting! Is this related to the microservices framework in Islandora? Some of the derivative process (particularly the JP2 creation, it seems) can take quite a while -- and I haven't even gotten to audio and video yet -- so I'll probably be looking at other options.

Aaron Coburn

unread,

Apr 25, 2013, 9:46:05 AM4/25/13

to <islandora-dev@googlegroups.com>

On Apr 24, 2013, at 9:57 PM, Peter Murray <peter....@LYRASIS.ORG> wrote:

> On Apr 24, 2013, at 5:37 PM, Aaron Coburn <aco...@amherst.edu> wrote:
>>
>> Or, for something completely different (which is what I am actively experimenting with now), try handling ingests with a message broker [3] and an integration framework such as Camel [4] -- then everything is done asynchronously and can easily be spread over multiple machines.
>
> Interesting! Is this related to the microservices framework in Islandora? Some of the derivative process (particularly the JP2 creation, it seems) can take quite a while -- and I haven't even gotten to audio and video yet -- so I'll probably be looking at other options.

It is the same basic idea -- both use asynchronous application messaging to handle distributed processing (that's a mouthful!). The existing microservices framework in Islandora is built on a series of python scripts that listen to ActiveMQ over the STOMP protocol. This works really well, and I have written my own share of Python scripts to do similar such tasks. The nice thing about using Camel, though, is that you *don't actually write any code*. Plus, you can skip all of the boilerplate that is required for an equivalent python script (starting/stopping, connecting/reconnecting to the message broker, exception handling, logging, running as a service, etc).

For example, this snippet of "code" implements pretty much everything FedoraGSearch does:

<route>
<from uri="activemq:name.of.queue"/>
<to uri="http4://fedora-host:8080/fedora/object/${header.pid}/objectXML?authUsername=...&authPassword=..."/>
<convertBodyTo type="org.w3c.dom.Document"/>
<to uri="xslt:file:///path/to/stylesheet"/>
<to uri="http4://solr-host:8983/solr/update"/>
</route>

And deploying this is as simple as copying the XML file to a directory (if you are using Karaf as a container).

In production, it is a bit more involved, because I aggregate, filter and split up the messages in order to send to a number of different endpoints (solr for search, jena for linked data, couchdb for intermediate data caching, proai for metadata harvesters, etc). But all of that ends up being really easy to implement and maintain.

Aaron

>
> Peter
> --
> Peter Murray
> Assistant Director, Technology Services Development
> LYRASIS
> Peter....@lyrasis.org
> +1 678-235-2955
> 800.999.8558 x2955
>

Reply all

Reply to author

Forward