BagIt Bags Zip file

108 views
Skip to first unread message

Sherry Lake

unread,
Aug 4, 2021, 3:30:46 PM8/4/21
to Dataverse Users Community
We have zip bags now, on local disk. But are we supposed to be able to "unzip" the bag ".zip" file?

We tried to unzip and got the following error:

End-of-central-directory signature not found.  Either this file is not

  a zipfile, or it constitutes one disk of a multi-part archive.  In the

  latter case the central directory and zipfile comment will be found on

  the last disk(s) of this archive.


How do we know what is in the bag?


Also, what does this setting do? 
ArchiverClassName - the fully qualified class to be used for archiving.


curl -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand" http://localhost:8080/api/admin/settings/:ArchiverClassName


BUT that is not the same as the current options from this page: https://guides.dataverse.org/en/5.4/installation/config.html#archiversettings

Thanks!

Sherry Lake



James Myers

unread,
Aug 4, 2021, 3:43:14 PM8/4/21
to dataverse...@googlegroups.com

Yes, the Bag should be something you can unzip and inspect. If you haven’t set up the ArchiverClassName and related settings, it’s possible that the problem is misconfiguration.

 

If not, a problem with the Bag might indicate that some metadata and/or file problem exists – hopefully documented in the server log.

 

W.r.t. configuration, there are currently 3 options for where Bags should be sent when created

- to DSpace as the front end of Duracloud/Chronopolis (and DPN before it folded)

- to the Google cloud

- to a local file directory (which, as planned by Odum, could be synced with iRoDS for archival purposes).

 

The BagIt Export section (https://guides.dataverse.org/en/5.4/installation/config.html?highlight=archiverclassname#id104) has examples of the setup for all three.

 

All three share the same code for generating the Bag and it is streamed to the final destination wherever that may be. Network interruptions during the Bag transfer could result in an incomplete Bag. Since the Bag code is retrieving an zipping all of the files in the dataset as well, if any of those are missing/there are network access problems reading them, the Bag could also be incomplete. There is some functionality to retry connections in the code so truly intermittent problems shouldn’t be an issue but if a file just isn’t accessible, Bag creation would probably fail.

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/9e99b9ec-f30b-46c1-8efc-aff8320fa7f5n%40googlegroups.com.

Sherry Lake

unread,
Aug 4, 2021, 4:04:11 PM8/4/21
to Dataverse Users Community
Thanks, Jim.

We are using local file directory (and it is where our "zip" files went, just not unzipp-able).

I think our problem is that we used the example in the BagIt Export section (link you put in above)....
curl -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand" http://localhost:8080/api/admin/settings/:ArchiverClassName

What is the "Harvard" stuff on this command?

Should we simply use this?
curl -X PUT -d 'LocalSubmitToArchiveCommand' http://localhost:8080/api/admin/settings/:ArchiverClassName

James Myers

unread,
Aug 4, 2021, 4:09:32 PM8/4/21
to dataverse...@googlegroups.com

Ah – sorry. The short version is a typo. The long edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand is the full name and package hierarchy for the class that’s being used. (All of the Dataverse code is in the edu.harvard.iq.dataverse package or some sub-package from there.).

 

If you can make an issue about the typo, that would be helpful (PR as well if you’re up for that).

Sherry Lake

unread,
Aug 4, 2021, 4:36:16 PM8/4/21
to dataverse...@googlegroups.com
Ok, server log complains about failed validity checks and can't retrieve the file (see partial server log - I can send more). I have confirmed that the URL to the datafile is correct (you won't be able to access it as this server is behind a firewall, but it downloads fine for me using the URL).

Here are our settings: 
":ArchiverSettings":":BagItLocalPath"

":ArchiverClassName":"edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand"

":BagItLocalPath":"/home/xw5d/BagItStorage/"


We get an xml file and a .zip file, but not complete.


[2021-08-04T16:18:37.408-0400] [Payara 5.2021.4] [WARNING] [] [edu.harvard.iq.dataverse.util.bagit.BagGenerator] [tid: _ThreadID=28674 _ThreadName=pool-37-thread-1] [timeMillis: 1628108317408] [levelValue: 900] [[
  Attempt# 1 : Unable to retrieve file: https://dvdev.lib.virginia.edu/api/access/datafile/236171
javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:128)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:264)
        at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:259)
        at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:642)
        at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:461)
        at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:361)
        at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
        at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444)
        at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
        at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:178)
        at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:164)
        at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1152)
        at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1063)
        at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:402)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at edu.harvard.iq.dataverse.util.bagit.BagGenerator$3.get(BagGenerator.java:989)
        at org.apache.commons.compress.archivers.zip.ZipArchiveEntryRequest.getPayloadStream(ZipArchiveEntryRequest.java:62)
        at org.apache.commons.compress.archivers.zip.ScatterZipOutputStream.addArchiveEntry(ScatterZipOutputStream.java:99)
        at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$3.call(ParallelScatterZipCreator.java:210)
        at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$3.call(ParallelScatterZipCreator.java:206)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
        at java.base/sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:350)
        at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:248)
        at java.base/sun.security.validator.Validator.validate(Validator.java:264)
        at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:321)
        at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:237)
        at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:110)
        at org.apache.http.ssl.SSLContextBuilder$TrustManagerDelegate.checkServerTrusted(SSLContextBuilder.java:416)
        at java.base/sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted(SSLContextImpl.java:1509)
        at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:626)
        ... 31 more
Caused by: java.security.cert.CertPathValidatorException: validity check failed
        at java.base/sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:135)
        at java.base/sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:233)
        at java.base/sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:141)
        at java.base/sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:80)
        at java.base/java.security.cert.CertPathValidator.validate(CertPathValidator.java:309)
        at java.base/sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:345)
        ... 39 more
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Sat May 30 06:48:38 EDT 2020
        at java.base/sun.security.x509.CertificateValidity.valid(CertificateValidity.java:274)
        at java.base/sun.security.x509.X509CertImpl.checkValidity(X509CertImpl.java:687)
        at java.base/sun.security.provider.certpath.BasicChecker.verifyValidity(BasicChecker.java:190)
        at java.base/sun.security.provider.certpath.BasicChecker.check(BasicChecker.java:144)
        at java.base/sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:125)
        ... 44 more
]]

[2021-08-04T16:18:37.409-0400] [Payara 5.2021.4] [SEVERE] [] [] [tid: _ThreadID=28674 _ThreadName=pool-37-thread-1] [timeMillis: 1628108317409] [levelValue: 1000] [[
  javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed




James Myers

unread,
Aug 4, 2021, 4:54:32 PM8/4/21
to dataverse...@googlegroups.com

This is pretty clearly something about the certificate – hard to know without being able to see it. In general, the Bag generator code is being pretty lenient about checking the certificate (since they are all for files that should be on your server anyway), but there must be something about it that Java doesn’t like. If this is a locally signed certificate just used for dev, it may be that the certificate isn’t in the Java keystore. It’s also possible that there’s something like the certificate not listing the dvdev.lib.viriginia.edu name as the server it is for, etc.

 

In any case, I think this is probably something that you can resolve locally, i.e. by adding your cert to the local java keystore on your machine (lots of web resources on how to do that). If not, or if that’s not clear, let me know and we might be able to do a quick screen share so I can see the cert/help debug.

Sherry Lake

unread,
Aug 12, 2021, 11:36:39 AM8/12/21
to Dataverse Users Community
Thanks, Jim.

We fixed our root certificate and now are making Bags. Our Bags will be going into our preservation system - APTrust.

--
Sherry

Sherry Lake

unread,
Mar 3, 2022, 2:52:38 PM3/3/22
to Dataverse Users Community
Jim,

The sysadmin who fixed the Cert error on our test server is out for awhile and now we have encountered the same Java Cert error on our production machine, but unfortunately, my backup sysadmin isn't sure how it was fixed on test. 

Can I have him contact you directly to debug and fix?

If so, send a good email address for you to shL...@virginia.edu and I will pass it along to him. You-two can set up screen share time, etc.

Thanks 
Sherry

Reply all
Reply to author
Forward
0 new messages