Re: Non-ARC/NHMRC curated activities

41 views
Skip to first unread message

Grant Jackson

unread,
Nov 23, 2011, 9:10:23 PM11/23/11
to redbox-repo
Hi Greg,

Excellent thanks. The restart this morning did the trick; after saving a
related dataset, the activity records moved into the "Published Objects"
view & became visible via the OAI-PMH portal.

My next problem was the OAI-key & OAI identifier were set to the
(uncurated) OID. I fixed this by changing the activity rif.vm (copied
"Handle" fragments from people rif.vm).

My next problem was that the activities did not point back to datasets via
<relatedObject> elements. Again I fixed this by changing the activity
rif.vm (copied "Relations" fragments from people rif.vm).

Hence curated activities look pretty good now. I've attached my
non-ARC/NHMRC activity rif.vm in case anyone wants to repeat this.

I still need to:

- separate the ARC/NHMRC activity rif.vm from the non-ARC/NHMRC activity
rif.vm (by pointing to a different "templatesPath" from the
Activities_Other_Projects.json file)

- customise/add some RIF-CS fields

but I'm hoping these changes will be straight forward (ie. similar to v1.1).

QUESTION: Do you think the new Activities_Other_Projects.py (diff is
attached) will be sufficient or should other changes be made? Thanks.

Cheers, Grant


Quoting Greg Pendlebury <greg.pe...@gmail.com>:

> Hey Grant,
>
> >> I copied /mint/home/harvest/Parties_People.py to Activities.py
>
> Do you mean you copied the whole file? or just the differences (which you
> noted you 'also' tried). There aren't many differences between the two, but
> you'll note there are some.
>
> >> did not restart Mint
>
> You'll need to do this, as the scripts are cached after the first time they
> are executed... it gets a tad annoying, but it's an enormous performance
> boost when you are ingesting large datasets.
>
> It sounds like your edits should do the trick though, if a reboot occurs.
>
> >> I couldn't decide what to do with the "self.utils.registerNamespace"
> lines so I left them unchanged
>
> Yes, I don't think they need to be there, they are used to prep the SAX
> parser inside the util library when you are about to parse XML... so they
> probably got left behind by accident from something long ago, since there's
> no XML in sight in that file. They aren't needed, but they also aren't
> hurting anything.
>
> Ta,
> Greg
>
> On 23 November 2011 15:07, Grant Jackson <gra...@csem.flinders.edu.au>wrote:
>
> > Hi Greg,
> >
> > Oops. The java exception errors were fixed as you described. Thanks.
> >
> > I copied /mint/home/harvest/Parties_People.py to Activities.py; did not
> > restart Mint; loaded a new activity; created/published a new dataset
> linked
> > to the activity. Symptom remains the same as previous SUMMARY (but without
> > java errors).
> >
> > I also attempted an edit of the original Activities.py by adding the
> > following lines (at the same spot as Parties_People.py):
> >
> > - self.utils.add(self.index, "known_ids", handle)
> >
> > - the 4 publication lines as per your previous email
> >
> > I couldn't decide what to do with the "self.utils.registerNamespace" lines
> > so I left them unchanged. Anyway, the symptom remained the same.
> >
> > Cheers, Grant
> >
> >
> > Quoting Greg Pendlebury <greg.pe...@gmail.com>:
> >
> > > The errors look like they didn't hold anything up, but if you'd like to
> > > remove them, it looks like your config is telling it to run the
> > > 'IngestedRelationshipsTransformer' (from the stack trace) but no config
> > as
> > > to what it should do... although the errors do indicate that it is
> acting
> > > fairly stupid in the absence of config, so that could be improved.
> > >
> > > That would mean your .json file looks something like this:
> > > "transformer": {
> > > "curation": ["handle"],
> > > "metadata": ["ingest-relations", "jsonVelocity"]
> > > },
> > >
> > > And you just want to remove 'ingest-relations':
> > > "transformer": {
> > > "curation": ["handle"],
> > > "metadata": ["jsonVelocity"]
> > > },
> > >
> > > Having said this, you could alternatively supply some configuration for
> > the
> > > Transformer (you could link it to known Parties etc. at ingest time) but
> > > this is off-topic for your first prototype.
> > >
> > > In terms of the object not showing in the published view, it occurs to
> me
> > > that the activities rules file probably isn't looking for the
> publication
> > > flag and indexing it appropriately. The Parties rules file having
> > something
> > > like this:
> > > # Publication
> > > published = self.params["published"]
> > > if published is not None:
> > > self.utils.add(self.index, "published", "true")
> > >
> > > Which you could copy into Activities.py to try out. If you see some
> > > ARCS/NHRMC Activities start to appear in the published view after this
> > > we'll probably need to make that code a bit smarter... or you can avoid
> > the
> > > issue entirely and use a separate rules file. But I don't think that
> will
> > > arise anyway, the Curation Manager isn't supposed to set the publication
> > > flag at all if 'neverPublish' is true.
> > >
> > > Ta,
> > > Greg
> > >
> > > On 23 November 2011 10:38, Grant Jackson <gra...@csem.flinders.edu.au
> > >wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > Ok thanks. I've attempted it now and the activity is not yet curated
> > (but
> > > > looks close).
> > > >
> > > >
> > > > SUMMARY
> > > >
> > > > - Mint Java exception during CSV file loading.
> > > > - Mint Java exception during publication of the associated dataset.
> > > > - Mint main.log for loading & publication; seems ok.
> > > > - Handle is assigned to the activity (& the dataset) ok.
> > > > - Activity preview screen shows the link back (via a handle) to the
> > > > dataset ok.
> > > > - Activity object history seems ok.
> > > > - But no activity appears in the Published Objects view (hence no
> > > activity
> > > > in OAI).
> > > >
> > > >
> > > > ADDITIONAL DETAILS
> > > >
> > > > All Java exceptions (from transactionManager.log) are in the attached
> > > file.
> > > >
> > > > Most recent object history for the activity is:
> > > >
> > > > - com.googlecode.fascinator.messaging.EmailNotificationConsumer
> > notify
> > > > - Curation Publication flag set
> > > > - Curation This object is ready for publication
> > > > - Curation Curation completed.
> > > > - Curation Object curation requested.
> > > >
> > > >
> > > > Any thoughts? Thank you.
> > > >
> > > > Cheers, Grant
> > > >
> > > >
> > > > Quoting Greg Pendlebury <greg.pe...@gmail.com>:
> > > >
> > > > > Without having tried it myself, I would say that your line of
> thought
> > > > looks
> > > > > correct. The outline you've provided sounds like it hits all the
> > > keynotes
> > > > > for curation/publication.
> > > > >
> > > > > If it doesn't work as expected I can setup something fairly similar
> > on
> > > > the
> > > > > dev server to test.
> > > > >
> > > > > Ta,
> > > > > Greg
> > > > >
> > > > > On 22 November 2011 20:16, Grant Jackson <
> > gra...@csem.flinders.edu.au
> > > > >wrote:
> > > > >
> > > > > > > I have done this in v1.0.1-SNAPSHOT & v1.1 without handles &
> > > > > > > without PIDs.
> > > > > >
> > > > > > Oops! I meant to say "without handles & without curation".
> > > > > >
> > > > > > Cheers, Grant
> > > > > >
> > > > > >
> > > > > > Quoting Grant Jackson <gra...@csem.flinders.edu.au>:
> > > > > >
> > > > > > > Hi Greg,
> > > > > > >
> > > > > > > ARC/NHMRC activity records are characterised by:
> > > > > > > - not needing a handle
> > > > > > > - not needing to be published via the OAI portal
> > > > > > >
> > > > > > > since they link to ARC/NHMRC PURLs at the ANDS Registry.
> > > > > > >
> > > > > > > I am interested in creating non-ARC/NHMRC activities which:
> > > > > > > - create a handle
> > > > > > > - need to be published via the OAI portal
> > > > > > > - for the moment I would be happy with CSV columns identical to
> > > > > > ARC/NHMRC
> > > > > > > activities
> > > > > > >
> > > > > > > I have done this in v1.0.1-SNAPSHOT & v1.1 without handles &
> > without
> > > > > > PIDs.
> > > > > > > I presume the above is feasible. Please point me to doco if I've
> > > > missed
> > > > > > it.
> > > > > > >
> > > > > > > I imagine I need to do something like:
> > > > > > >
> > > > > > > - put some CSV data at
> > > > /mint/home/data/Activities_Other_Projects.csv;
> > > > > > > seems ok
> > > > > > >
> > > > > > > - reuse template at /mint/home/templates/activities/rif.vm;
> > seems
> > > ok
> > > > > > >
> > > > > > > - reuse rules at /mint/home/harvest/Activities.py; seems ok
> > > > > > >
> > > > > > > - reuse directory tree /mint/portal/Activities; seems ok
> > > > > > >
> > > > > > > - Copy /mint/home/harvest/Activities_NHMRC_2010.json to
> > > > > > > Activities_Other_Projects.json (new file) & make changes. Seems
> > more
> > > > > > > complicated. Perhaps:
> > > > > > > * update fileLocation
> > > > > > > * update recordIDPrefix
> > > > > > > * set "curation": ["handle"],
> > > > > > > * set "metadata": ["jsonVelocity"]
> > > > > > > * set "neverPublish": false
> > > > > > > * set "alreadyCurated": false
> > > > > > > * add handle section (copied from Parties_People.json)
> > > > > > > * update templatesPath (if not reusing template)
> > > > > > > * update repository.name
> > > > > > >
> > > > > > > Have I missed any files/dirs?
> > > > > > >
> > > > > > > Have I missed anything within
> > > > > > > /mint/home/harvest/Activities_Other_Projects.json (eg. is a
> > > > > > > "ingest-relations" section needed)? Thanks.
> > > > > > >
> > > > > > > Cheers, Grant

non-arc-nhmrc-activity-rif.vm.txt
non-arc-nhmrc-activities.py-diff.txt

Greg Pendlebury

unread,
Nov 23, 2011, 9:29:24 PM11/23/11
to redbo...@googlegroups.com
That's great news Grant.

The diff looks pretty good to me. I was surprised at how small it was, but looking at the two files side-by-side I guess they were already very similar.

Ta,
Greg
Reply all
Reply to author
Forward
0 new messages