Hi Greg,
Making the change to harvest directly in to 'Published' was very easy.
The second part is proving tricky and I believe an existing problem I have is the stumbling block.
The harvester I am using is a combination of the CSV and Filesystem harvesters.
The harvester creates a payload called "metadata.json" not a '.tfpackage' as required by ReDBox.
The format of the 'metadata.json' is different to that in the '.tfpackage'. To resolve the problem, I am using the harvester rules file (directoryNames.py) to create
a 'formData.tfpackage'. I had to do this as I could not edit my harvested data in ReDBox. I used the 'rda_harvest.py', supplied with the demo redbox as a code example.
When running a harvest, the first entry I see in the 'transactionManager.log' is from the Curation Manager. As shown below.
2012-10-23 09:27:50,467 transactionManager DEBUG CurationManager
{
"harvester": {
"type": "directory",
"directory": {
"targets": [
{
"baseDir": "${fascinator.home}/data/Edgar",
"recordIDPrefix": "
jcu.edu.au/tdh/collection/"
}
]
},
"file-system": {
"caching": "basic",
"cacheId": "config-caching"
},
"default-files": {
"default-metadata-filename": "edgar_default_metadata.json",
"override-metadata-filename": "metadata.json"
},
"metadata-types": [
{
"type": "occurrences"
},
{
"type": "suitability"
}
]
},
"transformer": {
"curation": [
],
"metadata": [
"jsonVelocity"
]
},
"curation": {
"neverPublish": false,
"alreadyCurated": false
},
"transformerOverrides": {
},
"indexer": {
"script": {
"type": "python",
"rules": "directoryNames.py"
},
"params": {
"
repository.name": "ReDBox",
"repository.type": "Metadata Registry"
}
},
"stages": [
{
"name": "inbox",
"label": "Inbox",
"description": "Potential records for investigation.",
"security": [
"guest"
],
"visibility": [
"librarian",
"reviewer",
"admin"
]
},
{
"name": "investigation",
"label": "Investigation",
"description": "Records under investigation.",
"security": [
"librarian",
"reviewer",
"admin"
],
"visibility": [
"librarian",
"reviewer",
"admin"
],
"template": "workflows/inbox"
},
{
"name": "metadata-review",
"label": "Metadata Review",
"description": "Records to be reviewed by a data librarian.",
"security": [
"librarian",
"reviewer",
"admin"
],
"visibility": [
"librarian",
"reviewer",
"admin"
],
"template": "workflows/dataset"
},
{
"name": "final-review",
"label": "Final Review",
"description": "Completed records ready for publication and approval into the repository.",
"security": [
"reviewer",
"admin"
],
"visibility": [
"librarian",
"reviewer",
"admin"
],
"template": "workflows/dataset"
},
{
"name": "live",
"description": "Records already published in the repository.",
"label": "Published",
"security": [
"reviewer",
"admin"
],
"visibility": [
"guest"
],
"template": "workflows/dataset"
},
{
"name": "retired",
"description": "Records that have been retired.",
"label": "Retired",
"security": [
"admin"
],
"visibility": [
"guest"
],
"template": "workflows/dataset"
}
],
"oid": "5863d7f96f78a5799ebb6f19a06e324d"
}
Next is the following:
2012-10-23 09:27:50,580 transactionManager INFO nVelocityTransformer Transforming PID '.tfpackage' from OID '5863d7f96f78a5799ebb6f19a06e324d'
2012-10-23 09:27:50,604 transactionManager ERROR nVelocityTransformer Error accessing payload in storage: '{}'
com.googlecode.fascinator.api.storage.StorageException: ID '.tfpackage' does not exist.
The transformer is failing because I don't a 'formData.tfpackage'. This is correct, the formData.tfpackage gets created when the Indexer run the harvester rules file, which happens next.
2012-10-23 09:27:50,610 transactionManager ERROR ManagerQueueConsumer Error processing order from Transaction Manager:
{
"type": "TRANSFORMER",
"target": "jsonVelocity",
"oid": "5863d7f96f78a5799ebb6f19a06e324d",
"config": {
}
}
2012-10-23 09:27:50,623 transactionManager DEBUG SolrIndexer First time parsing config file: '27f9c7326edfc81e240c97cd937b31ec'
2012-10-23 09:27:50,630 transactionManager DEBUG SolrIndexer First time parsing rules script: '7bcd41bbd55eb97d1252887657fec59b'
2012-10-23 09:27:55,569 transactionManager INFO SolrIndexer Creating 'formData.tfpackage' payload for object '5863d7f96f78a5799ebb6f19a06e324d'
2012-10-23 09:28:01,850 transactionManager DEBUG SolrIndexer Indexing has altered metadata, closing object.
How can I get the harvester rules file, processed by the Indexer, to run before the Transformer ?
I'm hoping that once this is done, curation may kick in properly. Is this a correct assumption ?
My last resort fix, is to go back and revisit the harvester to ensure it creates a formData.tfpackage instead of a 'metadata.json. I would then also need to revisit my harvester rules file to alter the data mapping as the formats of the '.tfpackage' and 'metadata.json' differ. I've been trying to avoid these changes, large amount of rework.
Thanks,
Jay.