Read H5File from a folder inside a zipped folder into pandas dataframe

494 views
Skip to first unread message

Sonia Samipillai

unread,
Sep 27, 2019, 4:52:26 PM9/27/19
to h5py

The directory structure I have looks like this: 


file.zip/2019/file.h5


where:


file.zip is the zipped folder

2019 is the folder inside the zipped folder


I can extract the folder using extractall and read the h5 file from the folder. However, looking to read it directly from the zipped folder to pandas dataframe. Its a H5File and not HDFStore.

Thomas Kluyver

unread,
Sep 28, 2019, 4:53:41 AM9/28/19
to h5...@googlegroups.com
As of h5py 2.9, you can pass a Python file-like object to open as an HDF5 file:

So you could use ZipFile.open() to get a file-like object (https://docs.python.org/3.7/library/zipfile.html#zipfile.ZipFile.open ), and then pass it to h5py.

I suspect that won't be very efficient, but if the file's not very big, maybe it will be fast enough. Otherwise, the fallback is extracting it to a temporary directory.

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h5py/82d39bae-8504-4c4a-b582-f0cd9ece4acd%40googlegroups.com.

Sonia

unread,
Sep 28, 2019, 3:02:40 PM9/28/19
to h5py
Tried doing as you mentioned.

with zipfile.ZipFile('/.../zipfolder.zip') as z:
for filename in z.namelist():
df = h5py.File(z.open('zipfolder/2019/sample.h5', 'r'))
Error: TypeError: expected str, bytes or os.PathLike object, not ZipExtFile

Please advise

Sonia

unread,
Sep 28, 2019, 3:14:11 PM9/28/19
to h5py
using h5py version 2.10.0 in google colab

Thomas Kluyver

unread,
Sep 29, 2019, 8:45:36 AM9/29/19
to h5...@googlegroups.com
Looking through the Python docs, it looks like it will only work with Python 3.7 - in earlier versions, the handle for a file inside a zipfile doesn't implement .seek(). It's working for me with Python 3.7:

In [3]: zf = zipfile.ZipFile('foo.zip')                                                              

In [4]: fiz = zf.open('foo.h5')                                                                      

In [5]: fiz                                                                                          
Out[5]: <zipfile.ZipExtFile name='foo.h5' mode='r' compress_type=deflate>

...

In [11]: hf = h5py.File(fiz)                                                                        

In [12]: hf['data'][:]                                                                              
Out[12]: array([1, 2, 3])

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.

Sonia

unread,
Sep 29, 2019, 10:43:36 AM9/29/19
to h5py
Thank you for looking into this. much appreciated.

import zipfile
import h5py
zf = zipfile.ZipFile('zipfolder.zip')
fiz = zf.open('zipfolder/2019/sample.h5')
hf = h5py.File(fiz)
hf['data'][:]

throws "TypeError: expected str, bytes or os.PathLike object, not ZipExtFile"

file structure : folder1.zip-->folder1--->2019-->sample.h5

Code to create a sample file: Here is the code to recreate a sample h5 file that I am trying to use in this scenario: Step 1:

`import h5py
file = h5py.File('sample.h5','w')
dataset = file.create_dataset("dset",(4, 6), h5py.h5t.STD_I32BE)
file.close()`
Step 2: After the file is created, put it in a folder "2019".Place "2019" inside another folder called zipfolder and zip it. so now the directory structure looks like "file.zip/2019/file.h5"

Thomas Kluyver

unread,
Sep 29, 2019, 2:21:48 PM9/29/19
to h5...@googlegroups.com
What version of Python are you using?

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.

Sonia Samipillai

unread,
Sep 29, 2019, 4:28:16 PM9/29/19
to h5...@googlegroups.com

Thomas Kluyver

unread,
Sep 30, 2019, 5:35:32 AM9/30/19
to h5...@googlegroups.com
Is there a traceback with the exception? Can you include it? And can you double-check what you get from h5py.__version__ ?

Sonia

unread,
Sep 30, 2019, 11:23:29 AM9/30/19
to h5py
It runs good now. I just noticed that even though I upgraded the package, it was still pulling from the old one. Restarted and it worked fine. Many Thanks!! much appreciated!!
Reply all
Reply to author
Forward
0 new messages