Jacek,
Direct upload (which I’m assuming is what is being used) is a multi-step process in which the files are first uploaded to S3 and then Dataverse is called to add them to the dataset. It sounds like that last step failed for some reason in your case. In a case like that, I would have expected the tool to report some error, so if this problem is repeatable with no error messages, it would be worth submitting an issue.
Using opaque names in the storage (like 18fc37c8662-df6b1c8cd7cd) is how Dataverse works. The link between the file name and this identifier is kept in the database. (Note that file names can be changed per dataset version.) So that itself is not an indication of an S3 misconfiguration. It could still be that you have some issue with Dataverse being able to access the bucket – if you do, uploads of smaller files via the Dataverse UI would also be failing for that S3 store/bucket.
Things you could try:
If you want to keep the files already in S3, you could try manually calling the last step in the direct-upload api – e.g. https://guides.dataverse.org/en/latest/developers/s3-direct-upload-api.html#adding-the-uploaded-file-to-the-dataset for a single file, or the multifile version in the next section. The JSON payload required for these calls is what you are seeing in the log as you’ve shown below. If that works, it may have been a one-time issue with your upload or perhaps a bug in the tool. If it doesn’t work, you’ll at least have a repeatable case to debug with.
Given the log entry below where “categories”:[“Data”] is set, it looks like you are not using the DVUploader – it doesn’t set categories, (or this log entry wasn’t from the tool?). Perhaps it was python-dvuploader? There may be a local log file from that tool (I know DVUploader creates one, but I don’t know how python-dvuploader deals with errors.) That may indicate what the underlying issue is, but perhaps not with nothing in the Dataverse log. In any case, I don’t think any of these tools can address the case where the files are already in place. If you want to retry with the same or different tool, you’d need to delete the files now in S3 and re-run the uploads. (I would definitely make sure you have the latest version of whatever tool is being used.) If these are the only files in the dataset, deleting them with the AWS command line client might be easiest. Alternately, or If there are good files in the dataset as well, using the Cleanup storage of a dataset API call should also remove the files not listed in the dataset.
I hope that helps. If you have/get additional details about when the failure occurs, we can perhaps identify an issue that can be fixed in Dataverse or the upload tool.
-- Jim
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dataverse-community/2f7933a7-9618-46d2-aea7-ac81fb547592n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/32ce3e08-5e4e-4956-991d-472bca910703n%40googlegroups.com.