Segfault reading variable length packet table

59 views

Skip to first unread message

Chris Jewell

unread,

Sep 15, 2015, 8:51:35 AM9/15/15

to h5py

Hello all,

I am experiencing a segfault when trying to read a table containing variable length records made up of a compound datatype. This is possibly best explained using the output of h5dump -H:

HDF5 "thefile.hd5" {
GROUP "/" { 
   GROUP "posterior" { 
      DATASET "ids" { 
         DATATYPE  H5T_STRING { 
            STRSIZE H5T_VARIABLE; 
            STRPAD H5T_STR_NULLTERM; 
            CSET H5T_CSET_ASCII; 
            CTYPE H5T_C_S1; 
         } 
         DATASPACE  SIMPLE { ( 188361 ) / ( 188361 ) } 
      } 
      DATASET "infections" { 
         DATATYPE  H5T_VLEN { H5T_COMPOUND { 
            H5T_STD_I32LE "idx"; 
            H5T_IEEE_F32LE "val"; 
         }} 
         DATASPACE  SIMPLE { ( 100000 ) / ( H5S_UNLIMITED ) } 
      } 
      DATASET "parameters" { 
         DATATYPE  H5T_ARRAY { [21] H5T_IEEE_F32LE } 
         DATASPACE  SIMPLE { ( 100000 ) / ( H5S_UNLIMITED ) } 
         ATTRIBUTE "tags" { 
            DATATYPE  H5T_STRING { 
               STRSIZE H5T_VARIABLE; 
               STRPAD H5T_STR_NULLTERM; 
               CSET H5T_CSET_ASCII; 
               CTYPE H5T_C_S1; 
            } 
            DATASPACE  SIMPLE { ( 21 ) / ( 21 ) } 
         } 
      } 
   } 
} 
}

The "posterior/infections" data is actually written using an FL_PacketTable in C++, and I need to read it into Python. In Python I do:

>>> import h5py
>>> import numpy as np
>>> f=h5py.File("thefile.hd5","r")
>>> infec = f['posterior/infections']
>>> infec
<HDF5 dataset "infections": shape (100000,), type "|O8">
>>> infec[0]
Segmentation fault: 11

The other datasets in the file read fine.

I created a testcase using h5py:

>>> iType = np.create_special(np.dtype([('idx', np.int32),('val',np.float32)])
>>> x = h5py.special_dtype(vlen=iType)
>>> f = h5py.File("test.hd5","w")
>>> dset = f.create_dataset("mydataset",(100,),maxshape=(None,),dtype=x)
>>> dset[0] = np.array([(1,0.3),(2,0.1)], dtype=iType)
>>> dset[1] = np.array([(1,0.3),(2,0.1),(5,0.8),(8,0.2)],dtype=iType)
>>> f.close()

which looks like

HDF5 "test.hd5" {
GROUP "/" {
   DATASET "mydataset" {
      DATATYPE  H5T_VLEN { H5T_COMPOUND {
         H5T_STD_I32LE "idx";
         H5T_IEEE_F32LE "val";
      }}
      DATASPACE  SIMPLE { ( 100 ) / ( H5S_UNLIMITED ) }
   }
}
}

similar to my 'posterior/infections' dataset above. I can read from this testcase fine using code similar to the above.

Any ideas on what might be causing the segfault? Is it a bug, or am I missing something, I wonder?

Cheers,

Chris

Dan Guest

unread,

Sep 18, 2015, 7:11:18 PM9/18/15

to h5py

I'm having the same problem.

I'm also creating files with the C++ interface, but in my case I'm reading a slightly more complicated data structure: the datatype consists of several nested variable-length compound types. I was able to read the data by removing some of the fields in the compound type, but I was unable to pin down an obvious cause for the segfault.

One question that this raises: is it possible to write a dataset which dumps fine with h5dump, but it corrupted and will crash h5py?

Thanks,

- Dan

Reply all

Reply to author

Forward

0 new messages