Support for FITS files larger than memory

553 views
Skip to first unread message

Sri Krishna

unread,
Jun 23, 2016, 1:27:21 PM6/23/16
to astro...@googlegroups.com
Hi,

How would one deal with FITS files which are larger than the physical RAM on the machine?

If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file, but the physical memory is just read in chunks. This fails if my FITS file is itself larger than memory (or memory + swap, I suppose).

I am trying to read in a interferometric visibility dataset (i.e., in the random groups format).

I think this is an important question to figure out because several radio telescopes have come up recently that generate ~ 1 TB of data a day, which falls neatly into the category of 'small enough to fit on disk but too large for memory'.

There exist solutions like dask which have figured this out, but I'm not too familiar with the dask internals. One possible solution for astropy could be to do something similar to xarray which just calls dask under the hood, but where the dependency on dask is entirely optional.

Thanks,

Srikrishna Sekhar

John K. Parejko

unread,
Jun 23, 2016, 5:44:54 PM6/23/16
to astro...@googlegroups.com
This strongly suggests to me that single FITS files containing all the day's data is not the right data format for these telescopes to be generating.

John
> --
> You received this message because you are subscribed to the Google Groups "astropy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Thomas Robitaille

unread,
Jun 24, 2016, 1:22:13 PM6/24/16
to astropy-dev mailing list
Hi Sri,

On 23 June 2016 at 18:27, Sri Krishna <kitchi.s...@gmail.com> wrote:
> Hi,
>
> How would one deal with FITS files which are larger than the physical RAM on
> the machine?
>
> If I open a FITS file with the memmap=True keyword, the memory map tries to
> allocate enough *virtual* memory for the FITS file, but the physical memory
> is just read in chunks. This fails if my FITS file is itself larger than
> memory (or memory + swap, I suppose).

I haven't noticed this behavior before - I have accessed data from
parts of 30+Gb files without having the RAM go over 50Mb. Could you
provide an example of how you are reading and accessing the data?

Cheers,
Tom

>
> I am trying to read in a interferometric visibility dataset (i.e., in the
> random groups format).
>
> I think this is an important question to figure out because several radio
> telescopes have come up recently that generate ~ 1 TB of data a day, which
> falls neatly into the category of 'small enough to fit on disk but too large
> for memory'.
>
> There exist solutions like dask which have figured this out, but I'm not too
> familiar with the dask internals. One possible solution for astropy could be
> to do something similar to xarray which just calls dask under the hood, but
> where the dependency on dask is entirely optional.
>
> Thanks,
>
> Srikrishna Sekhar
>

Srikrishna Sekhar

unread,
Jun 25, 2016, 5:00:32 AM6/25/16
to astro...@googlegroups.com


On Fri, 24 Jun, 2016 at 10:51 PM, Thomas Robitaille <thomas.r...@gmail.com> wrote:
Hi Sri, On 23 June 2016 at 18:27, Sri Krishna <kitchi.s...@gmail.com> wrote:
Hi, How would one deal with FITS files which are larger than the physical RAM on the machine? If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file, but the physical memory is just read in chunks. This fails if my FITS file is itself larger than memory (or memory + swap, I suppose).
I haven't noticed this behavior before - I have accessed data from parts of 30+Gb files without having the RAM go over 50Mb. Could you provide an example of how you are reading and accessing the data?

Hi Tom,

Perhaps your RAM + swap is > 30 GB so enough virtual memory can be allocated? Anyway I've provided a code snippet below. I'm intentionally using a really large file (30 GB) and my RAM + swap is 4 GB. I'm running on Arch Linux x64.

        from astropy.io import fits
        ffile = fits.open('TEST.FITS', memmap=True)
        # So far so good
        data = ffile[0].data.data
  # It errors out here

The last line is where it crashes - The error is - "OSError: [Errno 12] Cannot allocate memory" with a larger traceback that I can post if it's relevant.
Even if I just want to access a small part of it, like

data = ffile[0].data[0:20].data

it errors out with the OSError.

Thanks for your help!

Thanks,
Krishna

Eric Jeschke

unread,
Jun 26, 2016, 5:15:15 AM6/26/16
to astropy-dev, kitchi.s...@gmail.com
Does the particular FITS file using image scaling?

Specifically, citing this:

Unfortunately, memory mapping does not currently work as well with scaled image data, where BSCALE and BZERO factors need to be applied to the data to yield physical values. Currently this requires enough memory to hold the entire array, though this is an area that will see improvement in the future.

~Eric

Srikrishna Sekhar

unread,
Jun 26, 2016, 7:41:53 AM6/26/16
to astro...@googlegroups.com
Hi Eric,


On Sun, 26 Jun, 2016 at 2:45 PM, Eric Jeschke <er...@redskiesatnight.com> wrote:
Does the particular FITS file using image scaling?

Specifically, citing this:

Unfortunately, memory mapping does not currently work as well with scaled image data, where BSCALE and BZERO factors need to be applied to the data to yield physical values. Currently this requires enough memory to hold the entire array, though this is an area that will see improvement in the future.

~Eric

BSCALE = 1 and BZERO = 0 in the header file. I tried passing do_not_scale_image_data=True to fits.open() but that still caused the out of memory OSError.
Even if BSCALE is 1 does it try to do the scaling? 

Let me try and manually delete the BSCALE and BZERO keywords from the header and see if that solves the problem.

Thanks,
Krishna

On Thursday, June 23, 2016 at 7:27:21 AM UTC-10, Sri Krishna wrote:
Hi,

How would one deal with FITS files which are larger than the physical RAM on the machine?

If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file, but the physical memory is just read in chunks. This fails if my FITS file is itself larger than memory (or memory + swap, I suppose).

I am trying to read in a interferometric visibility dataset (i.e., in the random groups format).

I think this is an important question to figure out because several radio telescopes have come up recently that generate ~ 1 TB of data a day, which falls neatly into the category of 'small enough to fit on disk but too large for memory'.

There exist solutions like dask which have figured this out, but I'm not too familiar with the dask internals. One possible solution for astropy could be to do something similar to xarray which just calls dask under the hood, but where the dependency on dask is entirely optional.

Thanks,

Srikrishna Sekhar

Srikrishna Sekhar

unread,
Jun 26, 2016, 12:02:56 PM6/26/16
to astro...@googlegroups.com


On Sun, 26 Jun, 2016 at 5:17 PM, Srikrishna Sekhar <kri...@gmail.com> wrote:
Hi Eric,

On Sun, 26 Jun, 2016 at 2:45 PM, Eric Jeschke <er...@redskiesatnight.com> wrote:
Does the particular FITS file using image scaling?

Specifically, citing this:

Unfortunately, memory mapping does not currently work as well with scaled image data, where BSCALE and BZERO factors need to be applied to the data to yield physical values. Currently this requires enough memory to hold the entire array, though this is an area that will see improvement in the future.

~Eric

BSCALE = 1 and BZERO = 0 in the header file. I tried passing do_not_scale_image_data=True to fits.open() but that still caused the out of memory OSError.
Even if BSCALE is 1 does it try to do the scaling? 

Let me try and manually delete the BSCALE and BZERO keywords from the header and see if that solves the problem.

I removed the BSCALE and BZERO keywords using the .remove() method, and wrote it back to the 30 GB file. I have verified that the updated HDU does not contain either the BSCALE or BZERO keywords. When I try to access the data like

ffile = fits.open('TEST.FITS')
data = ffile[0].data.data

I still get the OSError [Errno 12] Cannot allocate memory.

Mark Taylor

unread,
Jun 26, 2016, 4:52:26 PM6/26/16
to astro...@googlegroups.com
Krishna,

On Sat, 25 Jun 2016, Srikrishna Sekhar wrote:

> Perhaps your RAM + swap is > 30 GB so enough virtual memory can be allocated?
> Anyway I've provided a code snippet below. I'm intentionally using a really
> large file (30 GB) and my RAM + swap is 4 GB. I'm running on Arch Linux x64.
>
> from astropy.io import fits
> ffile = fits.open('TEST.FITS', memmap=True)
> # So far so good
> data = ffile[0].data.data
> # It errors out here
>
> The last line is where it crashes - The error is - "OSError: [Errno 12] Cannot
> allocate memory" with a larger traceback that I can post if it's relevant.

I'm not any kind of python or AstroPy expert so this may well be wide
of the mark, but are you sure you're running a 64-bit python binary
on your 64-bit OS? The error looks similar to what I'd expect if
you had a 32-bit address space. Running something like
"file `which python`" should give you a clue whether that's the case.

Mark

--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.t...@bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/

Thomas Robitaille

unread,
Jun 26, 2016, 6:21:57 PM6/26/16
to astropy-dev mailing list
Hi Krishna,

On 26 June 2016 at 21:52, Mark Taylor <M.B.T...@bristol.ac.uk> wrote:
> Krishna,
>
> On Sat, 25 Jun 2016, Srikrishna Sekhar wrote:
>
>> Perhaps your RAM + swap is > 30 GB so enough virtual memory can be allocated?
>> Anyway I've provided a code snippet below. I'm intentionally using a really
>> large file (30 GB) and my RAM + swap is 4 GB. I'm running on Arch Linux x64.
>>
>> from astropy.io import fits
>> ffile = fits.open('TEST.FITS', memmap=True)
>> # So far so good
>> data = ffile[0].data.data
>> # It errors out here
>>
>> The last line is where it crashes - The error is - "OSError: [Errno 12] Cannot
>> allocate memory" with a larger traceback that I can post if it's relevant.
>
> I'm not any kind of python or AstroPy expert so this may well be wide
> of the mark, but are you sure you're running a 64-bit python binary
> on your 64-bit OS? The error looks similar to what I'd expect if
> you had a 32-bit address space. Running something like
> "file `which python`" should give you a clue whether that's the case.

Another way to find out if you are using 32- or 64-bit Python is to do:

python -c 'import sys; print(sys.maxsize)'

If you get 9223372036854775807, your Python is 64-bit, and if you get
2147483648, it is 32-bit.

Cheers,
Tom

>
> Mark
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.t...@bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/
>

Srikrishna Sekhar

unread,
Jun 27, 2016, 3:08:32 AM6/27/16
to astro...@googlegroups.com
Hi Tom,


On Mon, 27 Jun, 2016 at 3:51 AM, Thomas Robitaille <thomas.r...@gmail.com> wrote:
Hi Krishna, On 26 June 2016 at 21:52, Mark Taylor <M.B.T...@bristol.ac.uk> wrote:
Krishna, On Sat, 25 Jun 2016, Srikrishna Sekhar wrote:
Perhaps your RAM + swap is > 30 GB so enough virtual memory can be allocated? Anyway I've provided a code snippet below. I'm intentionally using a really large file (30 GB) and my RAM + swap is 4 GB. I'm running on Arch Linux x64. from astropy.io import fits ffile = fits.open('TEST.FITS', memmap=True) # So far so good data = ffile[0].data.data # It errors out here The last line is where it crashes - The error is - "OSError: [Errno 12] Cannot allocate memory" with a larger traceback that I can post if it's relevant.
I'm not any kind of python or AstroPy expert so this may well be wide of the mark, but are you sure you're running a 64-bit python binary on your 64-bit OS? The error looks similar to what I'd expect if you had a 32-bit address space. Running something like "file `which python`" should give you a clue whether that's the case.
Another way to find out if you are using 32- or 64-bit Python is to do: python -c 'import sys; print(sys.maxsize)' If you get 9223372036854775807, your Python is 64-bit, and if you get 2147483648, it is 32-bit.

Looks like I'm running 64-bit Python. 

> python -c 'import sys; print(sys.maxsize)'
9223372036854775807

I am also definitely running a 64-bit OS as well.

From what I've read about memmap, this shouldn't be happening on a 64 bit system no? Does fits.open() try and scale the image even if BSCALE = 1 and BZERO = 0? I'm very confused about what's happening. 

Thanks for all your help!

Thanks,
Krishna 

Ole Streicher

unread,
Jun 27, 2016, 3:47:09 AM6/27/16
to astro...@googlegroups.com
Srikrishna Sekhar <kri...@gmail.com> writes:
> ffile = fits.open('TEST.FITS')
> data = ffile[0].data.data
>
> I still get the OSError [Errno 12] Cannot allocate memory.

Just a wild guess: What does

$ /sbin/sysctl vm.overcommit_memory

give (should be zero)? And could you (as root) execute

# /sbin/sysctl vm.overcommit_memory=1

and then try again? As far as I remember, mmap() is counted for
overcommitment, and with vm.overcommit_memory=0, obvious overcommitments
are rejected.

Best regards

Ole

Srikrishna Sekhar

unread,
Jun 27, 2016, 4:16:42 AM6/27/16
to astro...@googlegroups.com
Looks like this was the problem! Setting vm.overcommit_memory = 1 solved it, thanks!

Would it be useful to add a line about this in the documentation? I just checked the value of vm.overcommit_memory on several people's computers and they're all set to 0 so looks like it is a fairly common issue.

Are there any known problems with setting it to 1?

Thanks!
Krishna


Best regards Ole

Demitri Muna

unread,
Jun 27, 2016, 8:16:08 PM6/27/16
to astro...@googlegroups.com, Srikrishna Sekhar
Hi,

On Jun 27, 2016, at 9:22 AM, Srikrishna Sekhar <kri...@gmail.com> wrote:

Looks like this was the problem! Setting vm.overcommit_memory = 1 solved it, thanks!

Would it be useful to add a line about this in the documentation? I just checked the value of vm.overcommit_memory on several people's computers and they're all set to 0 so looks like it is a fairly common issue.

Are there any known problems with setting it to 1?

While this may have solved the problem for you and you should adjust this system setting to continue working for the moment, this is not a long-term solution and definitely not something to put in the documentation as standard practice. See here for details, for example: 


Admittedly, the Linux kernel documentation specifically points out scientific data as a use case for the "1" setting:

1	-	Always overcommit. Appropriate for some scientific
		applications. Classic example is code using sparse arrays
		and just relying on the virtual memory consisting almost
		entirely of zero pages.

However, our use case here is not sparse arrays.

The real question to ask is what are you trying to do with this file? Do you *really* need it in memory all at once (even memory mapped)? I doubt it. You are not displaying it as an image - you don't have enough pixels on your screen to do that. Are you calculating some statistics? You can do that by loading pieces of the data at a time. Do you need to extract some subset of the data? Then do that rather than load the whole file.

Cheers,
Demitri

_________________________________________
Demitri Muna
http://muna.com

Department of Astronomy
La Ohio State University

My Projects:

Erik Bray

unread,
Jun 29, 2016, 11:10:57 AM6/29/16
to astropy-dev
On Fri, Jun 24, 2016 at 7:21 PM, Thomas Robitaille
<thomas.r...@gmail.com> wrote:
> Hi Sri,
>
> On 23 June 2016 at 18:27, Sri Krishna <kitchi.s...@gmail.com> wrote:
>> Hi,
>>
>> How would one deal with FITS files which are larger than the physical RAM on
>> the machine?
>>
>> If I open a FITS file with the memmap=True keyword, the memory map tries to
>> allocate enough *virtual* memory for the FITS file, but the physical memory
>> is just read in chunks. This fails if my FITS file is itself larger than
>> memory (or memory + swap, I suppose).
>
> I haven't noticed this behavior before - I have accessed data from
> parts of 30+Gb files without having the RAM go over 50Mb. Could you
> provide an example of how you are reading and accessing the data?

FWIW (and I haven't read the whole thread yet so I might be behind on
this already) but there is a known issue--there's a ticket for it on
GitHub but I forget where--that one some systems, particularly those
with overcommit disabled, when trying to mmap a large file it will
still fail if you don't have enough physical RAM, because it needs to
be able to guarantee enough memory for copy-on-write of the entire
array.

A workaround that's been known to work in the past is using
`mode='readonly'` which explicitly disables copy-on-write. If you can
find the ticket there's more details there but I'm on my phone right
now.

Srikrishna Sekhar

unread,
Jun 29, 2016, 11:46:59 AM6/29/16
to astro...@googlegroups.com


On Wed, 29 Jun, 2016 at 8:40 PM, Erik Bray <erik....@gmail.com> wrote:
On Fri, Jun 24, 2016 at 7:21 PM, Thomas Robitaille <thomas.r...@gmail.com> wrote:
Hi Sri, On 23 June 2016 at 18:27, Sri Krishna <kitchi.s...@gmail.com> wrote:
Hi, How would one deal with FITS files which are larger than the physical RAM on the machine? If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file, but the physical memory is just read in chunks. This fails if my FITS file is itself larger than memory (or memory + swap, I suppose).
I haven't noticed this behavior before - I have accessed data from parts of 30+Gb files without having the RAM go over 50Mb. Could you provide an example of how you are reading and accessing the data?
FWIW (and I haven't read the whole thread yet so I might be behind on this already) but there is a known issue--there's a ticket for it on GitHub but I forget where--that one some systems, particularly those with overcommit disabled, when trying to mmap a large file it will still fail if you don't have enough physical RAM, because it needs to be able to guarantee enough memory for copy-on-write of the entire array. A workaround that's been known to work in the past is using `mode='readonly'` which explicitly disables copy-on-write. If you can find the ticket there's more details there but I'm on my phone right now.

Looks like this is the issue. Either way, Ole Streicher suggested modifying  /sbin/sysctl vm.overcommit_memory=1 to resolve my problem.

However, is it possible to open the FITS file more lazily? By which I mean neither allocate virtual memory nor load the entire FITS file to memory, but read the FITS file on demand as the array is being sliced into (like dask arrays for example).

Thanks,
Krishna

Perry Greenfield

unread,
Jun 29, 2016, 11:54:28 AM6/29/16
to astro...@googlegroups.com
For arrays (i.e., Image HDUs) one can use the .section() to only return part of the  data without requiring mapping the whole image into memory. I don't believe that is available for tables though, nor compressed images (for the latter, it most likely would involve retrieving individual tiles). It could be added of course for tables. Erik can correct me if I'm wrong.

Perry

Erik Bray

unread,
Jul 1, 2016, 9:54:47 AM7/1/16
to astropy-dev
On Wed, Jun 29, 2016 at 5:54 PM, Perry Greenfield
<perrygr...@gmail.com> wrote:
> For arrays (i.e., Image HDUs) one can use the .section() to only return part
> of the data without requiring mapping the whole image into memory. I don't
> believe that is available for tables though, nor compressed images (for the
> latter, it most likely would involve retrieving individual tiles). It could
> be added of course for tables. Erik can correct me if I'm wrong.

That sounds about right. And it's long been an open issue for
compressed images (the fact that they can't be loaded by individual
tiles kind of defeats the purpose, but no one has bothered....)
Reply all
Reply to author
Forward
0 new messages