Error with Redetect File Type API

73 views
Skip to first unread message

Steven FEREY

unread,
Jan 11, 2021, 10:56:06 AM1/11/21
to Dataverse Users Community
Hello,

We have errors after executing the resource

curl -H "X-Dataverse-key: $ API_TOKEN" -X POST "$ SERVER_URL / api / files / $ ID / redetect? dryRun = false"

for some files:
{"status": "ERROR", "message": "Command edu.harvard.iq.dataverse.engine.command.impl.RedetectFileTypeCommand@33f6c80f failed: Client's transaction aborted"}

With the parameter dryRun = true, everything is ok:
{"status": "OK", "data": {"dryRun": true, "oldContentType": "text / tab-separated-values"}}

The error in the application server:
javax.validation.ConstraintViolationException: One or more Bean Validation constraints were violated while executing Automatic Bean Validation on callback event: preUpdate for class: edu.harvard.iq.dataverse.DataFile. Please refer to the embedded constraint violations for details.

This results in HTTP 500 errors on the affected datasets.
To resolve this, we need to re-download the affected files.

Is this problem known to the community?
Is the redetect resource (with dryRun = false) recommended?

Dataverse version: 5.0

Thank you very much for your feedback.
Steven.

Philip Durbin

unread,
Jan 11, 2021, 11:31:17 AM1/11/21
to dataverse...@googlegroups.com
Hi Steven,

We have some tests in a class called FileTypeDetectionIT.java* that regularly exercise both dryRun=true and dryRun=false. I just ran these tests locally and they passed. I'm not sure why you're seeing that ConstraintViolationException.

My first thought is that you might see that same ConstraintViolationException for that file in other contexts, such as editing, indexing, etc. Maybe you could try getting metadata for the file with the following API and let us know if it works: https://guides.dataverse.org/en/5.0/api/native-api.html#getting-file-metadata

Constraint violations could be any number of things, such as a disallowed character in a filename. I believe the indexing API will report details about the violation, perhaps in server.log.

I hope this helps!

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/bf784e1c-6422-4c59-9e6b-6e6d9da20ffcn%40googlegroups.com.


--

Steven FEREY

unread,
Jan 21, 2021, 11:26:44 AM1/21/21
to Dataverse Users Community
Hello Philip,

Thank you for your reply.
I have advanced in the analysis of my problem. Here is the information for an example:

Dataverse V5.0
S3 storage
file extension saved in S3: .tabular
current MIME type for this file: "application / octet-stream"
The .tabular extension is declared in the MimeTypeDetectionByFileExtension.properties file => tabular = text / tab-separated-values

Here is the problem :
When the redetect API resource is called, because the file is remote, its content is inserted into a temporary file: tempFileTypeCheck.tmp
The file extension is then compared to the list in MimeTypeDetectionByFileExtension.properties but the .tmp is not there.
Server return: "tmp is a file extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileExtension.properties file."
Finally, the "application / octet-stream" MIME Type is the result of the redetect API resource for this file :(
The expected result is "text / tab-separated-values"

Shouldn't the temporary file have the same extension as the original file?

Thank you so much.
Steven.

Philip Durbin

unread,
Jan 21, 2021, 11:44:54 AM1/21/21
to dataverse...@googlegroups.com
Hmm, I'm not sure what the best solution would be but you definitely seem to have found a bug. If you would create an issue about this at https://github.com/IQSS/dataverse/issues I would appreciate it.

Steven FEREY

unread,
Jan 22, 2021, 4:26:05 AM1/22/21
to Dataverse Users Community
Hi Philip,

Thank you very much for your answer.
I will take the elements in a issue.

Thank you

Steven FEREY

unread,
Jan 22, 2021, 4:40:38 AM1/22/21
to Dataverse Users Community
Reply all
Reply to author
Forward
0 new messages