corrupted hdf5 file

1,902 views
Skip to first unread message

Valentyn Stadnytskyi

unread,
Jun 7, 2020, 2:07:05 PM6/7/20
to h5py
All,

I am pretty sure this topic has been discussed multiple times. I have found several questions on StackOverflow and other sites. I have installed hdf5-tools 

$ sudo apt install hdf5-tools

And I checked my corrupted file.

h5debug test.h5
HDF5
-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 140359847631744:
 
#000: ../../../src/H5F.c line 579 in H5Fopen(): unable to open file
    major
: File accessibilty
    minor
: Unable to open file
 
#001: ../../../src/H5Fint.c line 1208 in H5F_open(): unable to read superblock
    major
: File accessibilty
    minor
: Read failed
 
#002: ../../../src/H5Fsuper.c line 273 in H5F__super_read(): file signature not found
    major
: File accessibilty
    minor
: Not an HDF5 file
cannot open file

somehow h5clear and h5check do not exist but mentioned here https://support.hdfgroup.org/HDF5/doc/RM/Tools.html

Of course, I want to get my data back. But I have a bigger question. How do I make sure it never happens? I use Python and whenever I write file I always use "with h5py.File() as f:" python structure to insure file is always properly closed. If I open file for read-only I do: "f =  h5py.File(filename,'r')" and if I close or exit my program file doesn't get corrupted (at least it hasn't happened). The reason I do it this way, so I can access data whenever I need in my code without "with" statement. Also, the rest of my team does the same. I guess this was discussed in https://github.com/h5py/h5py/issues/397

So is it ok to open file with 

f =  h5py.File(filename,'r')
and do not close it?
f.close()




Thomas Kluyver

unread,
Jun 7, 2020, 3:54:41 PM6/7/20
to h5...@googlegroups.com
Hi Valentyn,

As far as I know, if you open a file read-only (mode 'r'), it shouldn't be modified at all, so it shouldn't get corrupted whatever happens.

Using 'with' to open a file for writing is a good idea, as that makes sure it's closed when you leave the with block. But even if you don't do that, h5py tries to do the right thing anyway: the file should be automatically closed when the last object belonging to it (the file object, and any group or dataset objects from it) is closed.

If your program crashes hard (e.g. a segfault, or you use 'kill -9' on it), then no more code gets to run - not even to exit a with block or a try/finally statement. So if there's a file open for writing when this happens, it could be corrupted.

Thomas

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h5py/88d00c60-73ee-431d-a41e-4700367a01a4o%40googlegroups.com.

Valentyn Stadnytskyi

unread,
Jun 8, 2020, 8:38:56 AM6/8/20
to h5py
The HDF5 Tools webpage talks about h5clear to fix a corrupted file. https://support.hdfgroup.org/HDF5/doc/RM/Tools.html However, there is no mentioning of h5clear on this page https://support.hdfgroup.org/products/hdf5_tools
 I have installed sudo apt install hdf5-tools On 

Thomas Kluyver

unread,
Jun 8, 2020, 9:06:18 AM6/8/20
to h5...@googlegroups.com
Newer versions of the Ubuntu hdf5-tools package (e.g. 20.04: https://packages.ubuntu.com/focal/amd64/hdf5-tools/filelist ) seem to have it, while older ones don't. It looks like Ubuntu 18.04 has a pretty old version of HDF5 - 1.10.0. You might want to try with a newer version, e.g. from conda.

Thomas

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.

Ray Osborn

unread,
Jun 8, 2020, 12:22:28 PM6/8/20
to h5py
This has been a helpful thread. I confess I didn't even know that h5clear existed even though I have been using h5py for years. In our application we have a file-lockings scheme to prevent simultaneous writing, but if it fails for any reason, could running h5clear on a file that is already opened by another process cause worse damage or does HDF5 prevent that happening? 

I am wondering about adding an option to clear a file if it fails to open, perhaps in a subprocess, but if there is advice on what to check for first, that would be a great help. For example, does the OSError specify whether status flags need to be cleared?

Ray


To unsubscribe from this group and stop receiving emails from it, send an email to h5...@googlegroups.com.

Thomas Kluyver

unread,
Jun 8, 2020, 12:37:42 PM6/8/20
to h5...@googlegroups.com
HDF5 also does it's own file locking which is meant to prevent simultaneous writing (unless you turn this off by setting an environment variable).

I don't know much about h5clear. h5clear --help suggests that it can do some different things depending on the options you use. I'd guess that it's reasonably safe if you're confident nothing currently has the file open to write, but clearing a file while it's being written could cause problems. But that's just a guess. If other people don't chime in here, try the HDF forum: https://forum.hdfgroup.org/ .

Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h5py/b023be84-a687-4819-8aad-6aa2038d8263o%40googlegroups.com.

Valentyn Stadnytskyi

unread,
Jun 8, 2020, 1:19:53 PM6/8/20
to h5...@googlegroups.com
Yeah. The reason I start this thread is a corrupted file. I don’t know how that happened. I kept it on Dropbox in online only mode. Is it possible it got corrupted during transfer? Has anyone experienced file corruption by Dropbox?

But in any-case, I am trying to figure out how to recover the file. I have opened it as text file and I could open it in Python with open(filename, ‘rb’) But I am not sure if I can fix anything inside this way. Also, I have looked on the web but I can’t really find any good instructions.  Today, I installed Conda on my Mac and something did not get installed right because I don’t have Conda in my terminal but I was able to open anaconda navigator and I tried to import hdf5 and it says package doesn’t exist. See below. It is confusing on how I am supposed to use HDF5 tools.

Jupyter QtConsole 4.6.0
Python 3.7.6 (default, Jan  8 2020, 13:42:34) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.

import h5py

import hdf5
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-5a2a3b48ef54> in <module>
----> 1 import hdf5

ModuleNotFoundError: No module named 'hdf5'

h5py.__version__
Out[3]: '2.10.0'




Osborn, Raymond

unread,
Jun 8, 2020, 1:26:00 PM6/8/20
to 'Jason Greenlaw - NOAA Affiliate' via h5py
You run the HDF5 tools from the terminal command line, not within Python. There is no Python package called HDF5, only h5py. 

Also, when run from the command line, ‘conda’ is not capitalized. Do you get anything with ‘which conda’? Or ‘which h5clear’ for that matter. If your conda environment installed h5py successfully, you should have ‘h5clear’ in <path-to-anaconda>/bin. Because conda installs the HDF5 library along with h5py, you get all the command-line tools for free, even though they are not used within Python.

Ray

On Jun 8, 2020, at 12:19 PM, Valentyn Stadnytskyi <v.stad...@gmail.com> wrote:

Yeah. The reason I start this thread is a corrupted file. I don’t know how that happened. I kept it on Dropbox in online only mode. Is it possible it got corrupted during transfer? Has anyone experienced file corruption by Dropbox?

But in any-case, I am trying to figure out how to recover the file. I have opened it as text file and I could open it in Python with open(filename, ‘rb’) But I am not sure if I can fix anything inside this way. Also, I have looked on the web but I can’t really find any good instructions.  Today, I installed Conda on my Mac and something did not get installed right because I don’t have Conda in my terminal but I was able to open anaconda navigator and I tried to import hdf5 and it says package doesn’t exist. See below. It is confusing on how I am supposed to use HDF5 tools.

Jupyter QtConsole 4.6.0
Python 3.7.6 (default, Jan  8 2020, 13:42:34) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.

import h5py

import hdf5
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-5a2a3b48ef54> in <module>
----> 1 import hdf5

ModuleNotFoundError: No module named 'hdf5'

h5py.__version__
Out[3]: '2.10.0'


<PastedGraphic-2.png>

Valentyn Stadnytskyi

unread,
Jun 9, 2020, 9:56:48 AM6/9/20
to h5...@googlegroups.com
I assume this means that the file is not recoverable. Right?

$ which h5clear
/home/femtoland/anaconda3/bin/h5clear

$ h5clear -s corrupted_file_1.h5 
h5clear error: h5tools_fopen

I have looked at the file history in dropbox and it looks like it hasn’t been changed since the creation. So it means that I haven’t closed it right when I created it. That is sad.

On Jun 8, 2020, at 1:19 PM, Valentyn Stadnytskyi <v.stad...@gmail.com> wrote:

Yeah. The reason I start this thread is a corrupted file. I don’t know how that happened. I kept it on Dropbox in online only mode. Is it possible it got corrupted during transfer? Has anyone experienced file corruption by Dropbox?

But in any-case, I am trying to figure out how to recover the file. I have opened it as text file and I could open it in Python with open(filename, ‘rb’) But I am not sure if I can fix anything inside this way. Also, I have looked on the web but I can’t really find any good instructions.  Today, I installed Conda on my Mac and something did not get installed right because I don’t have Conda in my terminal but I was able to open anaconda navigator and I tried to import hdf5 and it says package doesn’t exist. See below. It is confusing on how I am supposed to use HDF5 tools.

Jupyter QtConsole 4.6.0
Python 3.7.6 (default, Jan  8 2020, 13:42:34) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.

import h5py

import hdf5
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-5a2a3b48ef54> in <module>
----> 1 import hdf5

ModuleNotFoundError: No module named 'hdf5'

h5py.__version__
Out[3]: '2.10.0'


<PastedGraphic-2.png>

Jérôme Kieffer

unread,
Jun 9, 2020, 11:40:33 AM6/9/20
to h5...@googlegroups.com
On Tue, 9 Jun 2020 09:56:24 -0400
Valentyn Stadnytskyi <v.stad...@gmail.com> wrote:

> I have looked at the file history in dropbox and it looks like it hasn’t been changed since the creation. So it means that I haven’t closed it right when I created it. That is sad.

HDF5 uses different file driver depending on the filesystem the file sits on.
if Dropbox does not follow the "posix" recommendation (likely) it is
possible that the error comes from this.

NFS is neither posix compliant.

Cheers,
--
Jerome
Reply all
Reply to author
Forward
0 new messages