I have a dataset in the form of a zip archive and I would like to ingest it into the ReDBox and since this needs to be an automatic process, I decided to look at new Json Harvester for this task.
So I have Json Harvester client polling a directory for json files and following is a sample json file that I am using:
{
"type": "DatasetJson",
"harvesterId": "jsonHarvester",
"data": {
"data": [
{
"varMap": {
"file.path": "${fascinator.home}/packages/<oid>.tfpackage"
},
"tfpackage": {
"dc:created": "2014-09-08",
"dc:title": "Test twenty two",
"title": "Test 22",
"metaList": [
"dc:title"
],
"redbox:newForm": "false",
"redbox:formVersion": "1.7-SNAPSHOT",
"repository_type": "Metadata Registry",
"repository_name": "ReDBox",
"redbox:submissionProcess.redbox:submitted": "null",
"viewId": "default",
"packageType": "dataset",
"dc:identifier.dc:type.rdf:PlainLiteral": "handle",
"dc:identifier.dc:type.skos:prefLabel": "HANDLE System Identifier",
},
"datasetId": "test twenty two",
"owner": "admin",
"attachmentDestination": {
"tfpackage": [
"<oid>.tfpackage",
"metadata.json",
"$file.path"
],
"workflow.metadata": [
"workflow.metadata"
]
},
"attachmentList": [
"tfpackage",
"workflow.metadata",
"cratepackage"
],
"customProperties": [
"file.path"
],
"workflow.metadata": {
"id": "dataset",
"formData": {
"title": "",
"description": ""
},
"pageTitle": "Metadata Record 22",
"label": "Metadata Review 22",
"step": "metadata-review"
},
"cratepackage": {
"path": ""
}
}
]
}
}
As you can see, there is an attachmentList array where you can specify your attachments. The attachments specified here are actually some json objects. But my dataset is in the form of a zip archive which consists of several files like word files, images etc. and I wonder if I can attach it so it will be ingested into the ReDBox.
Apparently the attachments in the above json file are just json objects and they just look like metadata to me. So how do I actually attach the actual dataset if possible? Does anyone have any idea about this?