Problem with io.fits and a big catalogue file

642 views
Skip to first unread message

Yannick Roehlly

unread,
Nov 1, 2015, 5:40:42 PM11/1/15
to astro...@googlegroups.com
Dear all,

I have a problem with a big catalogue FITS file (12,8 million rows,
480 columns, 37 GiB). Only reading it and saving it thereafter leads to a
truncated 2 GiB file whether I use memmap or not (see the sample session
below).

For me, the first problem is that astropy does not throw any error. Does
anybody have any idea on how to debug that?

Yannick

PS: The problems occurs both with current stable astropy and development
verion.

Sample session:
---------------

In [1]: from astropy.io import fits

In [2]: hdu_list = fits.open("WP3-XMMLSS-OptNIR-v0.fits")

In [3]: hdu_list.writeto("tmp.fits")

In [4]: fits.open("tmp.fits")
WARNING: File may have been truncated: actual file length (2147577472) is
smaller than the expected size (39496055040) [astropy.io.fits.file]
Out[4]:
[<astropy.io.fits.hdu.image.PrimaryHDU at 0x7fca823ae240>,
<astropy.io.fits.hdu.table.BinTableHDU at 0x7fca82379dd8>]

In [5]: hdu_list = fits.open("WP3-XMMLSS-OptNIR-v0.fits", memmap=True)

In [6]: hdu_list.writeto("tmp2.fits")

In [7]: fits.open("tmp2.fits")
WARNING: File may have been truncated: actual file length (2147577472) is
smaller than the expected size (39496055040) [astropy.io.fits.file]
Out[7]:
[<astropy.io.fits.hdu.image.PrimaryHDU at 0x7fca8242cdd8>,
<astropy.io.fits.hdu.table.BinTableHDU at 0x7fca822c37f0>]

--
Everyone is more or less mad on one point.
-- Rudyard Kipling

Demitri Muna

unread,
Nov 1, 2015, 5:49:48 PM11/1/15
to astro...@googlegroups.com
Hi,

The 2GB file size is a flag suggesting a file system limitation.


What operating system and file system are you using?

Cheers,
Demitri

_________________________________________
Demitri Muna

Department of Astronomy
Der Ohio State University

http://scicoder.org/workshop


Yannick Roehlly

unread,
Nov 1, 2015, 5:59:23 PM11/1/15
to astro...@googlegroups.com
Le lundi 2 novembre 2015, 09:49:40 Demitri Muna a écrit :
> The 2GB file size is a flag suggesting a file system limitation.

Hi Demitri,

I'm using 64 bits Linux (Debian and Scientific Linux). Also, I managed to
process the file with Topcat and (although not 100% sure) with Python fitsio.

Yannick

--
You have an ambitious nature and may make a name for yourself.

Erik Tollerud

unread,
Nov 8, 2015, 8:42:26 AM11/8/15
to astropy-dev
Can you tell if it *reads* correctly?  That is, can you access parts of the file that are beyond the first 2 GB?

---
Erik T


--
You received this message because you are subscribed to the Google Groups "astropy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yannick Roehlly

unread,
Nov 8, 2015, 6:17:31 PM11/8/15
to astro...@googlegroups.com
Le dimanche 8 novembre 2015, 05:42:05 Erik Tollerud a écrit :
> Can you tell if it *reads* correctly? That is, can you access parts of the
> file that are beyond the first 2 GB?

Hi Eric,

No, if I try to access the data it raises:

ValueError: mmap length is greater than file size

(more below)

I will try to make a script generating a big FITS file to have an easy
example.

Regards,

Yannick


Full error:
===========

In [10]: f[1].data.shape
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/anaconda3/lib/python3.4/site-packages/astropy/utils/decorators.py in
__get__(self, obj, owner)
338 try:
--> 339 return obj.__dict__[self._key]
340 except KeyError:

KeyError: 'data'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
<ipython-input-10-8eb3dff55040> in <module>()
----> 1 f[1].data.shape

/opt/anaconda3/lib/python3.4/site-packages/astropy/utils/decorators.py in
__get__(self, obj, owner)
339 return obj.__dict__[self._key]
340 except KeyError:
--> 341 val = self._fget(obj)
342 obj.__dict__[self._key] = val
343 return val

/opt/anaconda3/lib/python3.4/site-packages/astropy/io/fits/hdu/table.py in
data(self)
397 @lazyproperty
398 def data(self):
--> 399 data = self._get_tbdata()
400 data._coldefs = self.columns
401 # Columns should now just return a reference to the
data._coldefs

/opt/anaconda3/lib/python3.4/site-packages/astropy/io/fits/hdu/table.py in
_get_tbdata(self)
166 else:
167 raw_data = self._get_raw_data(self._nrows, columns.dtype,
--> 168 self._data_offset)
169 if raw_data is None:
170 # This can happen when a brand new table HDU is being
created

/opt/anaconda3/lib/python3.4/site-packages/astropy/io/fits/hdu/base.py in
_get_raw_data(self, shape, code, offset)
521 offset=offset)
522 elif self._file:
--> 523 return self._file.readarray(offset=offset, dtype=code,
shape=shape)
524 else:
525 return None

/opt/anaconda3/lib/python3.4/site-packages/astropy/io/fits/file.py in
readarray(self, size, offset, dtype, shape)
246 return Memmap(self._file, offset=offset,
247 mode=MEMMAP_MODES[self.mode], dtype=dtype,
--> 248 shape=shape).view(np.ndarray)
249 else:
250 count = reduce(lambda x, y: x * y, shape)

/opt/anaconda3/lib/python3.4/site-packages/numpy/core/memmap.py in
__new__(subtype, filename, dtype, mode, offset, shape, order)
255 bytes -= start
256 offset -= start
--> 257 mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
258
259 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,

ValueError: mmap length is greater than file size


--
A journey of a thousand miles must begin with a single step.
-- Lao Tsu

Yannick Roehlly

unread,
Nov 11, 2015, 6:44:44 PM11/11/15
to astro...@googlegroups.com
Le lundi 9 novembre 2015 00:17:27, vous avez écrit :
> I will try to make a script generating a big FITS file to have an easy
> example.

Hi,

I wrote a script exposing the problem and reported the bug[1]. Strangely,
while writing the script I found that if I count the number of rows in the
table before saving it to a new file, then this newly save file is not
truncated.

Yannick

[1] https://github.com/astropy/astropy/issues/4307

--
Para hacer poco y malo no hace falta salir temprano.

Reply all
Reply to author
Forward
0 new messages