Segfault reading variable length packet table

59 views
Skip to first unread message

Chris Jewell

unread,
Sep 15, 2015, 8:51:35 AM9/15/15
to h5py
Hello all,

I am experiencing a segfault when trying to read a table containing variable length records made up of a compound datatype.  This is possibly best explained using the output of h5dump -H:

HDF5 "thefile.hd5" {
GROUP
"/" {
   GROUP
"posterior" {
      DATASET
"ids" {
         DATATYPE  H5T_STRING
{
            STRSIZE H5T_VARIABLE
;
            STRPAD H5T_STR_NULLTERM
;
            CSET H5T_CSET_ASCII
;
            CTYPE H5T_C_S1
;
         
}
         DATASPACE  SIMPLE
{ ( 188361 ) / ( 188361 ) }
     
}
      DATASET
"infections" {
         DATATYPE  H5T_VLEN
{ H5T_COMPOUND {
            H5T_STD_I32LE
"idx";
            H5T_IEEE_F32LE
"val";
         
}}
         DATASPACE  SIMPLE
{ ( 100000 ) / ( H5S_UNLIMITED ) }
     
}
      DATASET
"parameters" {
         DATATYPE  H5T_ARRAY
{ [21] H5T_IEEE_F32LE }
         DATASPACE  SIMPLE
{ ( 100000 ) / ( H5S_UNLIMITED ) }
         ATTRIBUTE
"tags" {
            DATATYPE  H5T_STRING
{
               STRSIZE H5T_VARIABLE
;
               STRPAD H5T_STR_NULLTERM
;
               CSET H5T_CSET_ASCII
;
               CTYPE H5T_C_S1
;
           
}
            DATASPACE  SIMPLE
{ ( 21 ) / ( 21 ) }
         
}
     
}
   
}
}
}



The "posterior/infections" data is actually written using an FL_PacketTable in C++, and I need to read it into Python.  In Python I do:

>>> import h5py
>>> import numpy as np
>>> f=h5py.File("thefile.hd5","r")
>>> infec = f['posterior/infections']
>>> infec
<HDF5 dataset "infections": shape (100000,), type "|O8">
>>> infec[0]
Segmentation fault: 11


The other datasets in the file read fine.


I created a testcase using h5py:

>>> iType = np.create_special(np.dtype([('idx', np.int32),('val',np.float32)])
>>> x = h5py.special_dtype(vlen=iType)
>>> f = h5py.File("test.hd5","w")
>>> dset = f.create_dataset("mydataset",(100,),maxshape=(None,),dtype=x)
>>> dset[0] = np.array([(1,0.3),(2,0.1)], dtype=iType)
>>> dset[1] = np.array([(1,0.3),(2,0.1),(5,0.8),(8,0.2)],dtype=iType)
>>> f.close()



which looks like

HDF5 "test.hd5" {
GROUP
"/" {
   DATASET
"mydataset" {
      DATATYPE  H5T_VLEN
{ H5T_COMPOUND {
         H5T_STD_I32LE
"idx";
         H5T_IEEE_F32LE
"val";
     
}}
      DATASPACE  SIMPLE
{ ( 100 ) / ( H5S_UNLIMITED ) }
   
}
}
}

similar to my 'posterior/infections' dataset above.  I can read from this testcase fine using code similar to the above.


Any ideas on what might be causing the segfault?  Is it a bug, or am I missing something, I wonder?


Cheers,


Chris




Dan Guest

unread,
Sep 18, 2015, 7:11:18 PM9/18/15
to h5py
I'm having the same problem. 

I'm also creating files with the C++ interface, but in my case I'm reading a slightly more complicated data structure: the datatype consists of several nested variable-length compound types. I was able to read the data by removing some of the fields in the compound type, but I was unable to pin down an obvious cause for the segfault.

One question that this raises: is it possible to write a dataset which dumps fine with h5dump, but it corrupted and will crash h5py?

Thanks,

 - Dan
Reply all
Reply to author
Forward
0 new messages