Fail to publish dataset

Rondineli Gama Saad

unread,

Nov 29, 2022, 4:53:40 AM11/29/22

to dataverse...@googlegroups.com

Hello,

We have a problem with one dataset. When I trying to publish the dataset I got the following error:

Failed to Publish Dataset – The dataset could not be published because one or more of the datafiles in the dataset could not be validated (physical file missing, checksum mismatch, etc.) Please contact support for further assistance.

I cannot delete the Draft, or publish it. If the admin deletes the draft, and attempts to update the files, this occurs again. I'm not sure what to do.

The storage service used is AWS S3. Is there any solution for this problem?
Dataverse version: 5.12.1
S3 cmd version: s3cmd-2.3.0-1.el7.noarch
Log error during the publish process:
[2022-11-16T22:17:11.830-0300] [Payara 5.2021.6] [INFO] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=308 _ThreadName=__ejb-thread-pool1] [timeMillis: 1668647831830] [levelValue: 800] [[
Failed to open datafile id 4247 for reading]]

[2022-11-16T22:17:11.873-0300] [Payara 5.2021.6] [WARNING] [] [edu.harvard.iq.dataverse.DatasetServiceBean] [tid: _ThreadID=308 _ThreadName=__ejb-thread-pool1] [timeMillis: 1668647831873] [levelValue: 900] [[
CommandException caught when executing the asynchronous portion of the Dataset Publication Command.]]

[2022-11-16T22:17:33.674-0300] [Payara 5.2021.6] [WARNING] [] [edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter] [tid: _ThreadID=88 _ThreadName=http-thread-pool::http-listener-1(6)] [timeMillis: 1668647853674] [levelValue: 900] [[
could not read image with ImageIO.read()]]

[2022-11-16T22:17:33.675-0300] [Payara 5.2021.6] [WARNING] [] [com.amazonaws.services.s3.internal.S3AbortableInputStream] [tid: _ThreadID=88 _ThreadName=http-thread-pool::http-listener-1(6)] [timeMillis: 1668647853675] [levelValue: 900] [[
Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.]]

[2022-11-16T22:17:34.859-0300] [Payara 5.2021.6] [WARNING] [] [edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter] [tid: _ThreadID=88 _ThreadName=http-thread-pool::http-listener-1(6)] [timeMillis: 1668647854859] [levelValue: 900] [[
could not read image with ImageIO.read()]]

[2022-11-16T22:17:34.859-0300] [Payara 5.2021.6] [WARNING] [] [com.amazonaws.services.s3.internal.S3AbortableInputStream] [tid: _ThreadID=88 _ThreadName=http-thread-pool::http-listener-1(6)] [timeMillis: 1668647854859] [levelValue: 900] [[
Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.]]

I appreciate any help or suggestion.
Best Regards,

Rondineli Saad
Coordenador de TI / Preservação Digital / Segurança da Informação

SciELO - Scientific Electronic Library Online
FAPESP - CAPES - CNPq - BVS-BIREME/OPAS/OMS - FapUNIFESP

______________________________

Rua Dr. Diogo de Faria, 1087
04037-003 - São Paulo - SP - Brasil
www.scielo.org | www.scielo.br

Facebook - Twitter

Inscreva-se na nossa Newsletter

Esta mensagem pode conter informação confidencial, sendo seu sigilo protegido por lei. Se você não for o destinatário ou a pessoa autorizada a receber esta mensagem, não pode usar, copiar ou divulgar as informações nela contidas ou tomar qualquer ação baseada nessas informações. Se você recebeu esta mensagem por engano, por favor, avise imediatamente ao remetente, respondendo o e-mail e em seguida apague-a. Agradecemos sua cooperação.

This message may contain confidential information and its confidentiality is protected by law. If you are not the addressed or authorized person to receive this message, you must not use, copy, disclose or take any action based on it or any information herein. If you have received this message by mistake, please advise the sender immediately by replying the e-mail and then deleting it. Thank you for your cooperation.

Don Sizemore

unread,

Nov 29, 2022, 7:19:13 AM11/29/22

to dataverse...@googlegroups.com

Hello,

It looks like you have a file which is physically missing from storage:

Failed to open datafile id 4247 for reading

If the dataset doesn't have many files, you could just mouse over the datafile links and watch for "fileId=4247" in your browser's link preview thingy.

Or, if the dataset does have many files:

A database query like

select storageidentifier from dvobject where id='4247';

should tell you which file is missing from the bucket.

A database query like

select label from filemetadata where datafile_id='4247';

should tell you the original filename.

Either way, if you replace the original file, you should be able to publish the dataset.

I hope this helps?

Don

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAMeyrrp%3D6eNrCePK8xFZvkY6VwWy7v22jnBwU%2BpLGg0_11w15w%40mail.gmail.com.

Rondineli Gama Saad

unread,

Dec 1, 2022, 8:45:39 AM12/1/22

to Dataverse Users Community

Hello Don,

First of all, thanks for answering my question. In this dataset we have 2 files. One of them when I trying to download it, I get the error: "Datafile 4247: Failed to locate and/or open physical file."

I will try to remove and add again. As soon as possible I will reply this messages with the result.

Thanks again!

Rondineli Gama Saad

unread,

Dec 1, 2022, 1:55:25 PM12/1/22

to Dataverse Users Community

Hello Don,

I deleted the file and send again and worked properly. I change the storage source to file instead of S3. Maybe there is any problem with S3 connection.

Best Regards,

Reply all

Reply to author

Forward