[SciPy-User] Maximum file size for .npz format?

2,023 views
Skip to first unread message

Jose Gomez-Dans

unread,
Mar 11, 2010, 6:10:33 AM3/11/10
to SciPy Users List
Hi!
I need to save a fairly large set of arrays to disk. I have saved it using numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly large ;D). When I try to load it using numpy.load, the zipfile module compains about
BadZipfile: Bad magic number for file header

I can't open it with the normal zip utility present on the system, but it could be that it's barfing about files being larger than 2Gb.
Is there some file limit for npzs? Is there anyway I can recover the data (I guess I could try decompressing the file with 7z and extracting the individual npy files?)

Thanks!
Jose

Robert Kern

unread,
Mar 11, 2010, 10:48:41 AM3/11/10
to SciPy Users List
On Thu, Mar 11, 2010 at 05:10, Jose Gomez-Dans <jgome...@gmail.com> wrote:
> Hi!
> I need to save a fairly large set of arrays to disk. I have saved it using
> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly
> large ;D). When I try to load it using numpy.load, the zipfile module
> compains about
> BadZipfile: Bad magic number for file header
>
> I can't open it with the normal zip utility present on the system, but it
> could be that it's barfing about files being larger than 2Gb.
> Is there some file limit for npzs?

Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
not yet support the ZIP64 format.

> Is there anyway I can recover the data (I
> guess I could try decompressing the file with 7z and extracting the
> individual npy files?)

Possibly. However, if the normal zip utility isn't working, 7z
probably won't, either. Worth a try, though.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Lafras Uys

unread,
Mar 12, 2010, 2:50:23 AM3/12/10
to SciPy Users List
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>> I need to save a fairly large set of arrays to disk. I have saved it using
>> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly
>> large ;D). When I try to load it using numpy.load, the zipfile module
>> compains about
>> BadZipfile: Bad magic number for file header
>>
>> I can't open it with the normal zip utility present on the system, but it
>> could be that it's barfing about files being larger than 2Gb.
>> Is there some file limit for npzs?
>
> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
> not yet support the ZIP64 format.
>
>> Is there anyway I can recover the data (I
>> guess I could try decompressing the file with 7z and extracting the
>> individual npy files?)
>
> Possibly. However, if the normal zip utility isn't working, 7z
> probably won't, either. Worth a try, though.

I've had similar problems, my solution was to move to HDF5. There are
two options for accessing and working with HDF files from python: h5py
(http://code.google.com/p/h5py/) and pytables
(http://www.pytables.org/). Both packages have built in numpy support.

Regards,

Lafras
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkuZ8jcACgkQKUpCd+bV+kruKwCghfG0yAo/eRXzDZxH6i1eOyfn
bnUAoLsuLB2O9qyvJV7CP3jXT9OcMwye
=1xc3
-----END PGP SIGNATURE-----

Paul Anton Letnes

unread,
Mar 12, 2010, 12:22:55 PM3/12/10
to SciPy Users List

On 11. mars 2010, at 23.50, Lafras Uys wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>>> I need to save a fairly large set of arrays to disk. I have saved it using
>>> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly
>>> large ;D). When I try to load it using numpy.load, the zipfile module
>>> compains about
>>> BadZipfile: Bad magic number for file header
>>>
>>> I can't open it with the normal zip utility present on the system, but it
>>> could be that it's barfing about files being larger than 2Gb.
>>> Is there some file limit for npzs?
>>
>> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
>> not yet support the ZIP64 format.
>>
>>> Is there anyway I can recover the data (I
>>> guess I could try decompressing the file with 7z and extracting the
>>> individual npy files?)
>>
>> Possibly. However, if the normal zip utility isn't working, 7z
>> probably won't, either. Worth a try, though.
>
> I've had similar problems, my solution was to move to HDF5. There are
> two options for accessing and working with HDF files from python: h5py
> (http://code.google.com/p/h5py/) and pytables
> (http://www.pytables.org/). Both packages have built in numpy support.
>
> Regards,
> Lafras

I've experienced similar issues too, but I moved to NetCDF. The only disadvantage was that I did not find any python modules that work well _and_ support numpy. Hence, I am considering moving to HDF5. Which python module would people here recommend? (Or, alternatively, did I miss a great netCDF python module that someone could tell me about?)

Cheers,
Paul.

Gökhan Sever

unread,
Mar 12, 2010, 12:29:58 PM3/12/10
to SciPy Users List
There is http://code.google.com/p/netcdf4-python/

I know netcdf4 is a subset of HDF5. What advantages there to use HDF5 not NetCDF4 ?


--
Gökhan

Keith Goodman

unread,
Mar 12, 2010, 12:30:19 PM3/12/10
to SciPy Users List
On Fri, Mar 12, 2010 at 9:22 AM, Paul Anton Letnes
<paul.ant...@gmail.com> wrote:
> I've experienced similar issues too, but I moved to NetCDF. The only disadvantage was that I did not find any python modules that work well _and_ support numpy. Hence, I am considering moving to HDF5. Which python module would people here recommend? (Or, alternatively, did I miss a great netCDF python module that someone could tell me about?)
>
> Cheers,
> Paul.

I use h5py. I think it is great. It gives you a dictionary-like
interface to your archive. Here's a quick example:

>> import h5py
>> a = np.random.rand(1000,1000)
>> f = h5py.File('/tmp/myfile.hdf5')
>> f['a'] = a # <-- Save
>> f.keys()
['a']
>> f.filename
'/tmp/myfile.hdf5'
>> b = f['a'] # <-- Load

Paul Anton Letnes

unread,
Mar 12, 2010, 1:18:18 PM3/12/10
to SciPy Users List

> _______________________________________________

I don't know any particular advantages of the file format itself. There are, however, several python modules for hdf5 that use numpy. Your suggestion for a netcdf module might be a good one, but it does not build on my system: it does not find the netcdf library, only the hdf5 lib - even if they reside in the same folder... I'll see if it works out eventually!

-Paul

Anne Archibald

unread,
Mar 12, 2010, 1:27:27 PM3/12/10
to SciPy Users List
On 11 March 2010 10:48, Robert Kern <rober...@gmail.com> wrote:
> On Thu, Mar 11, 2010 at 05:10, Jose Gomez-Dans <jgome...@gmail.com> wrote:
>> Hi!
>> I need to save a fairly large set of arrays to disk. I have saved it using
>> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly
>> large ;D). When I try to load it using numpy.load, the zipfile module
>> compains about
>> BadZipfile: Bad magic number for file header
>>
>> I can't open it with the normal zip utility present on the system, but it
>> could be that it's barfing about files being larger than 2Gb.
>> Is there some file limit for npzs?
>
> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
> not yet support the ZIP64 format.

Could it be arranged that an exception is raised when creating a >4GB
.npz file, so people do not find themselves with unrecoverable data?

>> Is there anyway I can recover the data (I
>> guess I could try decompressing the file with 7z and extracting the
>> individual npy files?)
>
> Possibly. However, if the normal zip utility isn't working, 7z
> probably won't, either. Worth a try, though.

If your data is valuable enough — irreplaceable space mission results,
say — some careful spelunking in the code combined with some knowledge
about your data might allow you to semi-manually reconstruct it from
the damaged zip file. This would be a long and painful labour, but
would probably produce a python module that supported zip64 as an
incidental side effect.

Anne

Robert Kern

unread,
Mar 12, 2010, 1:35:48 PM3/12/10
to SciPy Users List
On Fri, Mar 12, 2010 at 12:27, Anne Archibald <peridot...@gmail.com> wrote:
> On 11 March 2010 10:48, Robert Kern <rober...@gmail.com> wrote:
>> On Thu, Mar 11, 2010 at 05:10, Jose Gomez-Dans <jgome...@gmail.com> wrote:
>>> Hi!
>>> I need to save a fairly large set of arrays to disk. I have saved it using
>>> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly
>>> large ;D). When I try to load it using numpy.load, the zipfile module
>>> compains about
>>> BadZipfile: Bad magic number for file header
>>>
>>> I can't open it with the normal zip utility present on the system, but it
>>> could be that it's barfing about files being larger than 2Gb.
>>> Is there some file limit for npzs?
>>
>> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
>> not yet support the ZIP64 format.
>
> Could it be arranged that an exception is raised when creating a >4GB
> .npz file, so people do not find themselves with unrecoverable data?

If you can arrange it, sure.

Chris Barker

unread,
Mar 12, 2010, 1:42:21 PM3/12/10
to SciPy Users List
Paul Anton Letnes wrote:
> On 12. mars 2010, at 09.29, Gökhan Sever wrote:
>> I've experienced similar issues too, but I moved to NetCDF. The
>> only
>> disadvantage was that I did not find any python modules that work well
>> _and_ support numpy. Hence, I am considering moving to HDF5. Which
>> python module would people here recommend? (Or, alternatively, did I
>> miss a great netCDF python module that someone could tell me about?)

yes, numpy support is critical -- why anyone would write a netcdf
wrapper and not use numpy is beyond me.

>> There is http://code.google.com/p/netcdf4-python/
>>
>> I know netcdf4 is a subset of HDF5. What advantages there to use HDF5 not NetCDF4 ?

The way I think about it is that netcdf is a more structured subset of
hdf5. If the structure imposed by netcdf works well for your needs, it's
a good option. There are also things like the CF metadata standard that
make it easier to exchange data with others.

However if you are using it only to save and reload data for your own
app -- pytables may be a better bet.

I've found the netcdf4-python package to be robust and have a nice
interface -- and certainly works well with numpy. My only build issue
with it are actually with getting hd5 built right, the netcdf part has
been easier, and the python bindings very easy -- at least on OS-X.
Windows support is not in as good shape, though if you don't' need
opendap support, I think there are Windows binaries for netcdf4 you
could use.

Also, I think the python bindings still support netcd3, if you don't
need the extra stuff netcdf4 gives you (you may, if you need big files
-- not sure about that).

If netcdf seems like the right way to go for you, then I'm sure you can
get netcdf4-python working -- and Jeff Whitaker is very helpful if you
have trouble.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Bruce Southey

unread,
Mar 12, 2010, 1:45:55 PM3/12/10
to SciPy Users List
On 03/12/2010 12:35 PM, Robert Kern wrote:
On Fri, Mar 12, 2010 at 12:27, Anne Archibald <peridot...@gmail.com> wrote:
  
On 11 March 2010 10:48, Robert Kern <rober...@gmail.com> wrote:
    
On Thu, Mar 11, 2010 at 05:10, Jose Gomez-Dans <jgome...@gmail.com> wrote:
      
Hi!
I need to save a fairly large set of arrays to disk. I have saved it using
numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly
large ;D). When I try to load it using numpy.load, the zipfile module
compains about
BadZipfile: Bad magic number for file header

I can't open it with the normal zip utility present on the system, but it
could be that it's barfing about files being larger than 2Gb.
Is there some file limit for npzs?
        
Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
not yet support the ZIP64 format.
      
Could it be arranged that an exception is raised when creating a >4GB
.npz file, so people do not find themselves with unrecoverable data?
    
If you can arrange it, sure.

  
Hi,
Please note that there is the related ticket 991:
http://projects.scipy.org/numpy/ticket/991

Jose, what do Python version are you using and does the '  normal zip utility present on the system' actually support zip64?
You should try seeing if 7z sees it.

Bruce


Philip Austin

unread,
Mar 12, 2010, 4:10:04 PM3/12/10
to SciPy Users List
Chris Barker wrote:
>
> Also, I think the python bindings still support netcd3, if you don't
> need the extra stuff netcdf4 gives you (you may, if you need big files
> -- not sure about that).

You can do netcdf3 without hdf and with big files by
compiling netcdf4-python (on linux) via:

> export NETCDF3_DIR=/home/phil/usr64/netcdf_3.6.3
> ~/usr64/bin/python setup-nc3.py install

then in python open a dataset for writing with:

theFile=Dataset(ncfilename,'w','NETCDF3_64BIT')

-- Phil

Ryan May

unread,
Mar 12, 2010, 7:35:35 PM3/12/10
to SciPy Users List
> I've experienced similar issues too, but I moved to NetCDF. The only disadvantage was that I did not find
> any python modules that work well _and_ support numpy. Hence, I am considering moving to HDF5.
> Which python module would people here recommend? (Or, alternatively, did I miss a great netCDF
> python module that someone could tell me about?)

You could try pupynere, which is pure python, only a single file
(netcdf 3 only).

http://pypi.python.org/pypi/pupynere/

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Andrew Collette

unread,
Mar 12, 2010, 8:30:25 PM3/12/10
to SciPy Users List
> I use h5py. I think it is great. It gives you a dictionary-like
> interface to your archive. Here's a quick example:

Just to add, h5py (and PyTables also) allows you to read/write subsets
of your data:

>>> f = h5py.File('foo.hdf5','w')
>>> f['a'] = np.random.rand(1000,1000)
>>> subset = f['a'][200:300, 400:500:2] # only reads this slice from the file

H5py also supports transparent compression on a per-dataset basis,
with no limits on the size of the datasets or files. Slicing is still
efficient for compressed datasets since HDF5 supports a chunked
storage model. There's a general introduction to h5py here:

http://h5py.alfven.org/docs/guide/quick.html

Andrew Collette

Paul Anton Letnes

unread,
Mar 12, 2010, 8:47:01 PM3/12/10
to SciPy Users List

On 12. mars 2010, at 16.35, Ryan May wrote:

>> I've experienced similar issues too, but I moved to NetCDF. The only disadvantage was that I did not find
>> any python modules that work well _and_ support numpy. Hence, I am considering moving to HDF5.
>> Which python module would people here recommend? (Or, alternatively, did I miss a great netCDF
>> python module that someone could tell me about?)
>
> You could try pupynere, which is pure python, only a single file
> (netcdf 3 only).
>
> http://pypi.python.org/pypi/pupynere/
>
> Ryan

pupynere is read-only, which of course is a show-stopper.

Everyone else, thanks for good advice. I still can't get netcdf4-python to work with my netcdf4 library though, which it won't detect, for some mysterious reason... Anyway, h5py seems like a nice module - thanks Keith! I think I might go that route instead.

Paul

Ryan May

unread,
Mar 12, 2010, 8:54:01 PM3/12/10
to SciPy Users List
On Fri, Mar 12, 2010 at 7:47 PM, Paul Anton Letnes
<paul.ant...@gmail.com> wrote:
> On 12. mars 2010, at 16.35, Ryan May wrote:
>> You could try pupynere, which is pure python, only a single file
>> (netcdf 3 only).
>>
>> http://pypi.python.org/pypi/pupynere/
>
> pupynere is read-only, which of course is a show-stopper.
>
> Everyone else, thanks for good advice. I still can't get netcdf4-python to work with my netcdf4 library though, which it won't detect, for some mysterious reason... Anyway, h5py seems like a nice module - thanks Keith! I think I might go that route instead.
>

No, it does allow writing. At the top of the link I sent before
(which I'm guessing you didn't read :) ):

"Pupynere is a Python module for reading and writing NetCDF files...."

It works pretty well for me. The only problem is that it doesn't
allow modifying files, but that's not too bad of a limitation. The
pure python part makes it really simple to install (it doesn't even
rely on having the official netcdf libraries installed.)

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Stéfan van der Walt

unread,
Mar 14, 2010, 3:43:26 PM3/14/10
to SciPy Users List
On 11 March 2010 17:48, Robert Kern <rober...@gmail.com> wrote:
>> I can't open it with the normal zip utility present on the system, but it
>> could be that it's barfing about files being larger than 2Gb.
>> Is there some file limit for npzs?
>
> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
> not yet support the ZIP64 format.

I see Bruce dug up http://projects.scipy.org/numpy/ticket/991. Is
this the right route to go, or do we need a more sophisticated
solution?

Regards
Stéfan

Christopher Barker

unread,
Mar 14, 2010, 8:24:38 PM3/14/10
to SciPy Users List
Stéfan van der Walt wrote:
>> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
>> not yet support the ZIP64 format.
>
> I see Bruce dug up http://projects.scipy.org/numpy/ticket/991. Is
> this the right route to go, or do we need a more sophisticated
> solution?

It might be nice to build something on libbzip2 -- It looks like the
license is right, it's got good compression qualities, supports 64 bit,
and it getting pretty widely used.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Robert Kern

unread,
Mar 14, 2010, 10:23:48 PM3/14/10
to SciPy Users List
On Sun, Mar 14, 2010 at 18:24, Christopher Barker <Chris....@noaa.gov> wrote:
> Stéfan van der Walt wrote:
>>> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
>>> not yet support the ZIP64 format.
>>
>> I see Bruce dug up  http://projects.scipy.org/numpy/ticket/991.  Is
>> this the right route to go, or do we need a more sophisticated
>> solution?
>
> It might be nice to build something on libbzip2 -- It looks like the
> license is right, it's got good compression qualities, supports 64 bit,
> and it getting pretty widely used.

bzip2 does not support random access. It just compresses a single
file. It is not a replacement for zipfile.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco

Nathaniel Smith

unread,
Mar 15, 2010, 12:47:56 AM3/15/10
to SciPy Users List
On Sun, Mar 14, 2010 at 5:24 PM, Christopher Barker
<Chris....@noaa.gov> wrote:
> It might be nice to build something on libbzip2 -- It looks like the
> license is right, it's got good compression qualities, supports 64 bit,
> and it getting pretty widely used.

If we're going to pull in a new C library for this purpose, then maybe
we should just use libhdf5? :-).

-- Nathaniel

Jose Gomez-Dans

unread,
Mar 15, 2010, 6:32:32 AM3/15/10
to SciPy Users List
Hi!

On 12 March 2010 18:45, Bruce Southey <bsou...@gmail.com> wrote:
Jose, what do Python version are you using and does the '  normal zip utility present on the system' actually support zip64?
You should try seeing if 7z sees it.

I'm using python 2.5 on x86_64. The file is not recognised by either the local zip ( 2.31), or  7za (v 4.61). The latter lists the files and reports "Unsupported method".

It appears from this that the pytables/hdf route may be the most promising for me.

Thanks!
J

Bruce Southey

unread,
Mar 15, 2010, 9:20:17 AM3/15/10
to scipy...@scipy.org
_______________________________________________ SciPy-User mailing list SciPy...@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Hi,
Good, then you should try the patch for Ticket 991 :
http://projects.scipy.org/numpy/ticket/991.

Note that Python 2.6 documentation says:
"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GByte in size). "

This is better than the Python 2.5.2 documentation. So can you try Python 2.6 ?

It would be great to resolve your numpy issue especially for other users - at least to say whether or not the patch or a new Python solves it.

Bruce

Christopher Barker

unread,
Mar 15, 2010, 1:50:30 PM3/15/10
to SciPy Users List
Nathaniel Smith wrote:
> On Sun, Mar 14, 2010 at 5:24 PM, Christopher Barker
> <Chris....@noaa.gov> wrote:
>> It might be nice to build something on libbzip2 -- It looks like the
>> license is right, it's got good compression qualities, supports 64 bit,
>> and it getting pretty widely used.
>
> If we're going to pull in a new C library for this purpose, then maybe
> we should just use libhdf5? :-).

except that hdf is one big honking pain in the butt to build.

Robert's right, bzip isn't a great option anyway -- it can do multiple
files, but it just concatenates them, and doesn't appear to provide an
index. I used to use afio, which would zip each file first, then put
them all in an archive -- I liked that approach. We could, of course,
build something like that with bzip, but it looks like python's zip will
work for >= 2.6, so no need for something new.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Giovanni Marco Dall'Olio

unread,
Mar 12, 2010, 12:44:04 PM3/12/10
to SciPy Users List
On Fri, Mar 12, 2010 at 6:22 PM, Paul Anton Letnes <paul.ant...@gmail.com> wrote:


 (Or, alternatively, did I miss a great netCDF python module that someone could tell me about?)

As for the documentation, PyTables can also read/write netCDF:
- http://www.pytables.org/docs/manual/ch07.html



--
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

Francesc Alted

unread,
Mar 15, 2010, 2:20:51 PM3/15/10
to SciPy Users List
A Friday 12 March 2010 18:44:04 Giovanni Marco Dall'Olio escrigué:

> On Fri, Mar 12, 2010 at 6:22 PM, Paul Anton Letnes <
>
> paul.ant...@gmail.com> wrote:
> > (Or, alternatively, did I miss a great netCDF python module that someone
> > could tell me about?)
>
> As for the documentation, PyTables can also read/write netCDF:
> - http://www.pytables.org/docs/manual/ch07.html

No. The above is only an API that emulates the NetCDF interface of the
Scientific package, and do not create (nor can read either) pure NetCDF files.
For creating/reading NetCDF files, better use the netcdf4-python project:

http://code.google.com/p/netcdf4-python/

--
Francesc Alted

Francesc Alted

unread,
Mar 15, 2010, 2:26:30 PM3/15/10
to SciPy Users List
A Monday 15 March 2010 18:50:30 Christopher Barker escrigué:

> > If we're going to pull in a new C library for this purpose, then maybe
> > we should just use libhdf5? :-).
>
> except that hdf is one big honking pain in the butt to build.

I wouldn't say that HDF5 it is very difficult to build/install. In fact, it
is a matter of "./configure; make install" --and that only if there is not a
binary package available for your SO, which is usually the case. It is just
that it is another dependency to add to numpy/scipy ...and a *big* one.

--
Francesc Alted

Andrew Collette

unread,
Mar 15, 2010, 2:56:32 PM3/15/10
to SciPy Users List
> I wouldn't say that HDF5 it is very difficult to build/install.  In fact, it
> is a matter of "./configure; make install" --and that only if there is not a
> binary package available for your SO, which is usually the case.  It is just
> that it is another dependency to add to numpy/scipy ...and a *big* one.

Except on Windows, of course, where you need to have Visual Studio and
a lot of patience. :) Even on UNIX one of the major support issues
I've had with h5py is that everyone has a slightly different version
of HDF5, built in a slightly different way.

Andrew

Christopher Barker

unread,
Mar 15, 2010, 3:25:09 PM3/15/10
to SciPy Users List
Francesc Alted wrote:

> I wouldn't say that HDF5 it is very difficult to build/install. In fact, it
> is a matter of "./configure; make install" --and that only if there is not a
> binary package available for your SO, which is usually the case.

clearly, you've never tired to build a Universal binary on OS-X ;-)

> It is just
> that it is another dependency to add to numpy/scipy ...and a *big* one.

yeah, that too.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Francesc Alted

unread,
Mar 15, 2010, 4:04:20 PM3/15/10
to SciPy Users List
A Monday 15 March 2010 19:56:32 Andrew Collette escrigué:

> > I wouldn't say that HDF5 it is very difficult to build/install. In fact,
> > it is a matter of "./configure; make install" --and that only if there is
> > not a binary package available for your SO, which is usually the case.
> > It is just that it is another dependency to add to numpy/scipy ...and a
> > *big* one.
>
> Except on Windows, of course, where you need to have Visual Studio and
> a lot of patience. :)

Yeah, Windows, but this is a platform where everybody expects to have a
binary ...and fortunately HDF5 is not an exception there. :)

> Even on UNIX one of the major support issues
> I've had with h5py is that everyone has a slightly different version
> of HDF5, built in a slightly different way.

Well, I cannot say the same with PyTables, but that could be somewhat
expected, as you try to support much more HDF5 low level features than I do.

--
Francesc Alted

Francesc Alted

unread,
Mar 15, 2010, 4:07:44 PM3/15/10
to SciPy Users List
A Monday 15 March 2010 20:25:09 Christopher Barker escrigué:

> Francesc Alted wrote:
> > I wouldn't say that HDF5 it is very difficult to build/install. In fact,
> > it is a matter of "./configure; make install" --and that only if there is
> > not a binary package available for your SO, which is usually the case.
>
> clearly, you've never tired to build a Universal binary on OS-X ;-)

Ok, touché. But is there any package for which it is easy to build an
Universal binary on Mac OS-X? ;-)

--
Francesc Alted

Robert Kern

unread,
Mar 15, 2010, 4:11:39 PM3/15/10
to SciPy Users List
On Mon, Mar 15, 2010 at 15:07, Francesc Alted <fal...@pytables.org> wrote:
> A Monday 15 March 2010 20:25:09 Christopher Barker escrigué:
>> Francesc Alted wrote:
>> > I wouldn't say that HDF5 it is very difficult to build/install.  In fact,
>> > it is a matter of "./configure; make install" --and that only if there is
>> > not a binary package available for your SO, which is usually the case.
>>
>> clearly, you've never tired to build a Universal binary on OS-X ;-)
>
> Ok, touché.  But is there any package for which it is easy to build an
> Universal binary on Mac OS-X? ;-)

numpy.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco

Francesc Alted

unread,
Mar 15, 2010, 4:16:06 PM3/15/10
to SciPy Users List
A Monday 15 March 2010 21:11:39 Robert Kern escrigué:

> On Mon, Mar 15, 2010 at 15:07, Francesc Alted <fal...@pytables.org> wrote:
> > A Monday 15 March 2010 20:25:09 Christopher Barker escrigué:
> >> Francesc Alted wrote:
> >> > I wouldn't say that HDF5 it is very difficult to build/install. In
> >> > fact, it is a matter of "./configure; make install" --and that only if
> >> > there is not a binary package available for your SO, which is usually
> >> > the case.
> >>
> >> clearly, you've never tired to build a Universal binary on OS-X ;-)
> >
> > Ok, touché. But is there any package for which it is easy to build an
> > Universal binary on Mac OS-X? ;-)
>
> numpy.

Good! So, I should start by looking at how it achieves that then :)

--
Francesc Alted

David Warde-Farley

unread,
Mar 16, 2010, 11:05:05 AM3/16/10
to SciPy Users List
Hey Francesc,

On 2010-03-15, at 4:16 PM, Francesc Alted <fal...@pytables.org> wrote:

> Good! So, I should start by looking at how it achieves that then :)

Actually I did build a 4-way universal binary of HDF5. Long story
short, because of some quirks of the build process for HDF5 it isn't
possible to add some -arch flags to CFLAGS -- you actually have to
build it separately for each architecture and then do something like

make install DESTDIR=path/for/arch

and then stitch them together manually with otool. I'm also not sure
of the correct way, if there is one, to handle "hdf5.settings" -- it's
unclear to me whether programs ever look at this at build time/
runtime, or if it's just there for the user's convenience.

An installer for the binaries I made is at

http://www.cs.toronto.edu/~dwf/mirror/hdf5-1.8.4-quad.pkg

I can write up some instructions if that would be helpful.

David

Francesc Alted

unread,
Mar 16, 2010, 12:29:39 PM3/16/10
to SciPy Users List
A Tuesday 16 March 2010 16:05:05 David Warde-Farley escrigué:

> Hey Francesc,
>
> On 2010-03-15, at 4:16 PM, Francesc Alted <fal...@pytables.org> wrote:
> > Good! So, I should start by looking at how it achieves that then :)
>
> Actually I did build a 4-way universal binary of HDF5. Long story
> short, because of some quirks of the build process for HDF5 it isn't
> possible to add some -arch flags to CFLAGS -- you actually have to
> build it separately for each architecture and then do something like
>
> make install DESTDIR=path/for/arch
>
> and then stitch them together manually with otool. I'm also not sure
> of the correct way, if there is one, to handle "hdf5.settings" -- it's
> unclear to me whether programs ever look at this at build time/
> runtime, or if it's just there for the user's convenience.

Yes, I think "hdf5.settings" it is for convenience purposes only (a fast way
to look for configuration of the compiler flags for the library). So you
don't need to worry too much about this.

> An installer for the binaries I made is at
>
> http://www.cs.toronto.edu/~dwf/mirror/hdf5-1.8.4-quad.pkg

Hey, that's great.

> I can write up some instructions if that would be helpful.

Please do. I'm definitely interested!

Thanks,

--
Francesc Alted

Christopher Barker

unread,
Mar 16, 2010, 12:49:12 PM3/16/10
to SciPy Users List
David Warde-Farley wrote:
> Actually I did build a 4-way universal binary of HDF5.

> An installer for the binaries I made is at


>
> http://www.cs.toronto.edu/~dwf/mirror/hdf5-1.8.4-quad.pkg
>
> I can write up some instructions if that would be helpful.

Wonderful! Any chance you've tackled netcdf4 as well?

Thanks,

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

David Cournapeau

unread,
Mar 16, 2010, 3:21:34 PM3/16/10
to SciPy Users List
On Tue, Mar 16, 2010 at 10:05 AM, David Warde-Farley <d...@cs.toronto.edu> wrote:
> Hey Francesc,
>
> On 2010-03-15, at 4:16 PM, Francesc Alted <fal...@pytables.org> wrote:
>
>> Good!  So, I should start by looking at how it achieves that then :)
>
> Actually I did build a 4-way universal binary of HDF5. Long story
> short, because of some quirks of the build process for HDF5 it isn't
> possible to add some -arch flags to CFLAGS -- you actually have to
> build it separately for each architecture and then do something like
>
>        make install DESTDIR=path/for/arch
>
> and then stitch them together manually with otool.

Actually, that's a much better way of doing things generally. I wish
this was the standard way of building universal binaries, it is
usually more robust than using multiple -arch arguments, especially
when you export a public API with C headers. I use this technique
myself for all my packages pure C libraries (audiolab and samplerate
in particular),

cheers,

David Warde-Farley

unread,
Mar 16, 2010, 4:56:51 PM3/16/10
to SciPy Users List
On Tue, Mar 16, 2010 at 09:49:12AM -0700, Christopher Barker wrote:
> David Warde-Farley wrote:
> > Actually I did build a 4-way universal binary of HDF5.
>
> > An installer for the binaries I made is at
> >
> > http://www.cs.toronto.edu/~dwf/mirror/hdf5-1.8.4-quad.pkg
> >
> > I can write up some instructions if that would be helpful.
>
> Wonderful! Any chance you've tackled netcdf4 as well?

Sorry, I haven't, but it should be roughly the same procedure. I'll write up
some instructions.

One thing with HDF5 was that I believe I had to compile i386 and x86_64
binaries on an Intel Mac and ppc/ppc64 on a PowerPC Mac, there was some issue
with cross-compiling that I never quite got sorted out. It may compile a
binary which it then wants to actually *run* for another step of the build
process. Rosetta should take care of this on Intel but the machine I was
sitting in front of was a G5, and the Intel build on that machine failed
miserably, so I logged in remotely to an Intel Mac. I will try compiling all
4 on an Intel machine and see if that works.

David

Reply all
Reply to author
Forward
0 new messages