Non-ARC/NHMRC curated activities

43 views
Skip to first unread message

Grant Jackson

unread,
Nov 22, 2011, 5:11:37 AM11/22/11
to redbox-repo
Hi Greg,

ARC/NHMRC activity records are characterised by:
- not needing a handle
- not needing to be published via the OAI portal

since they link to ARC/NHMRC PURLs at the ANDS Registry.

I am interested in creating non-ARC/NHMRC activities which:
- create a handle
- need to be published via the OAI portal
- for the moment I would be happy with CSV columns identical to ARC/NHMRC
activities

I have done this in v1.0.1-SNAPSHOT & v1.1 without handles & without PIDs.
I presume the above is feasible. Please point me to doco if I've missed it.

I imagine I need to do something like:

- put some CSV data at /mint/home/data/Activities_Other_Projects.csv;
seems ok

- reuse template at /mint/home/templates/activities/rif.vm; seems ok

- reuse rules at /mint/home/harvest/Activities.py; seems ok

- reuse directory tree /mint/portal/Activities; seems ok

- Copy /mint/home/harvest/Activities_NHMRC_2010.json to
Activities_Other_Projects.json (new file) & make changes. Seems more
complicated. Perhaps:
* update fileLocation
* update recordIDPrefix
* set "curation": ["handle"],
* set "metadata": ["jsonVelocity"]
* set "neverPublish": false
* set "alreadyCurated": false
* add handle section (copied from Parties_People.json)
* update templatesPath (if not reusing template)
* update repository.name

Have I missed any files/dirs?

Have I missed anything within /mint/home/harvest/Activities_Other_Projects.json (eg. is a "ingest-relations" section needed)? Thanks.

Cheers, Grant

Grant Jackson

unread,
Nov 22, 2011, 5:16:51 AM11/22/11
to redbox-repo
> I have done this in v1.0.1-SNAPSHOT & v1.1 without handles &
> without PIDs.

Oops! I meant to say "without handles & without curation".

Cheers, Grant

Greg Pendlebury

unread,
Nov 22, 2011, 5:55:41 AM11/22/11
to redbo...@googlegroups.com
Without having tried it myself, I would say that your line of thought looks correct. The outline you've provided sounds like it hits all the keynotes for curation/publication.

If it doesn't work as expected I can setup something fairly similar on the dev server to test.

Ta,
Greg

Grant Jackson

unread,
Nov 22, 2011, 7:38:34 PM11/22/11
to redbox-repo
Hi Greg,

Ok thanks. I've attempted it now and the activity is not yet curated (but
looks close).


SUMMARY

- Mint Java exception during CSV file loading.
- Mint Java exception during publication of the associated dataset.
- Mint main.log for loading & publication; seems ok.
- Handle is assigned to the activity (& the dataset) ok.
- Activity preview screen shows the link back (via a handle) to the
dataset ok.
- Activity object history seems ok.
- But no activity appears in the Published Objects view (hence no activity
in OAI).


ADDITIONAL DETAILS

All Java exceptions (from transactionManager.log) are in the attached file.

Most recent object history for the activity is:

- com.googlecode.fascinator.messaging.EmailNotificationConsumer notify
- Curation Publication flag set
- Curation This object is ready for publication
- Curation Curation completed.
- Curation Object curation requested.


Any thoughts? Thank you.

Cheers, Grant

mint-activity-transactionManager.log.txt

Greg Pendlebury

unread,
Nov 22, 2011, 10:23:58 PM11/22/11
to redbo...@googlegroups.com
The errors look like they didn't hold anything up, but if you'd like to remove them, it looks like your config is telling it to run the 'IngestedRelationshipsTransformer' (from the stack trace) but no config as to what it should do... although the errors do indicate that it is acting fairly stupid in the absence of config, so that could be improved.

That would mean your .json file looks something like this:
    "transformer": {
        "curation": ["handle"],
        "metadata": ["ingest-relations", "jsonVelocity"]
    },

And you just want to remove 'ingest-relations':
    "transformer": {
        "curation": ["handle"],
        "metadata": ["jsonVelocity"]
    },

Having said this, you could alternatively supply some configuration for the Transformer (you could link it to known Parties etc. at ingest time) but this is off-topic for your first prototype.

In terms of the object not showing in the published view, it occurs to me that the activities rules file probably isn't looking for the publication flag and indexing it appropriately. The Parties rules file having something like this:
        # Publication
        published = self.params["published"]
        if published is not None:
            self.utils.add(self.index, "published", "true")

Which you could copy into Activities.py to try out. If you see some ARCS/NHRMC Activities start to appear in the published view after this we'll probably need to make that code a bit smarter... or you can avoid the issue entirely and use a separate rules file. But I don't think that will arise anyway, the Curation Manager isn't supposed to set the publication flag at all if 'neverPublish' is true.

Ta,
Greg

Grant Jackson

unread,
Nov 23, 2011, 12:07:38 AM11/23/11
to redbox-repo
Hi Greg,

Oops. The java exception errors were fixed as you described. Thanks.

I copied /mint/home/harvest/Parties_People.py to Activities.py; did not
restart Mint; loaded a new activity; created/published a new dataset linked
to the activity. Symptom remains the same as previous SUMMARY (but without
java errors).

I also attempted an edit of the original Activities.py by adding the
following lines (at the same spot as Parties_People.py):

- self.utils.add(self.index, "known_ids", handle)

- the 4 publication lines as per your previous email

I couldn't decide what to do with the "self.utils.registerNamespace" lines
so I left them unchanged. Anyway, the symptom remained the same.

Greg Pendlebury

unread,
Nov 23, 2011, 12:34:01 AM11/23/11
to redbo...@googlegroups.com
Hey Grant,


>> I copied /mint/home/harvest/Parties_People.py to Activities.py

Do you mean you copied the whole file? or just the differences (which you noted you 'also' tried). There aren't many differences between the two, but you'll note there are some.

>> did not restart Mint

You'll need to do this, as the scripts are cached after the first time they are executed... it gets a tad annoying, but it's an enormous performance boost when you are ingesting large datasets.

It sounds like your edits should do the trick though, if a reboot occurs.


>> I couldn't decide what to do with the "self.utils.registerNamespace" lines so I left them unchanged

Yes, I don't think they need to be there, they are used to prep the SAX parser inside the util library when you are about to parse XML... so they probably got left behind by accident from something long ago, since there's no XML in sight in that file. They aren't needed, but they also aren't hurting anything.

Ta,
Greg

Greg Pendlebury

unread,
Nov 23, 2011, 12:36:33 AM11/23/11
to redbo...@googlegroups.com
>> >> did not restart Mint
>> You'll need to do this, as the scripts are cached after the first time they are executed... it gets a tad annoying, but it's an enormous performance boost when you are ingesting large datasets.

Sorry, I forgot to note as well, that this annoyance is one I've already flagged as an issue to address (probably in the next core Fascinator release), since it can actually cause problems beyond simple annoyance:
http://code.google.com/p/the-fascinator/issues/detail?id=60

Ta,
Greg
Reply all
Reply to author
Forward
0 new messages