Hi Sri, On 23 June 2016 at 18:27, Sri Krishna <kitchi.s...@gmail.com> wrote:Hi, How would one deal with FITS files which are larger than the physical RAM on the machine? If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file, but the physical memory is just read in chunks. This fails if my FITS file is itself larger than memory (or memory + swap, I suppose).I haven't noticed this behavior before - I have accessed data from parts of 30+Gb files without having the RAM go over 50Mb. Could you provide an example of how you are reading and accessing the data?
Unfortunately, memory mapping does not currently work as well with scaled image data, where BSCALE and BZERO factors need to be applied to the data to yield physical values. Currently this requires enough memory to hold the entire array, though this is an area that will see improvement in the future.
Does the particular FITS file using image scaling?
Specifically, citing this:Unfortunately, memory mapping does not currently work as well with scaled image data, where BSCALE and BZERO factors need to be applied to the data to yield physical values. Currently this requires enough memory to hold the entire array, though this is an area that will see improvement in the future.~Eric
On Thursday, June 23, 2016 at 7:27:21 AM UTC-10, Sri Krishna wrote:I am trying to read in a interferometric visibility dataset (i.e., in the random groups format).If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file, but the physical memory is just read in chunks. This fails if my FITS file is itself larger than memory (or memory + swap, I suppose).Hi,How would one deal with FITS files which are larger than the physical RAM on the machine?I think this is an important question to figure out because several radio telescopes have come up recently that generate ~ 1 TB of data a day, which falls neatly into the category of 'small enough to fit on disk but too large for memory'.There exist solutions like dask which have figured this out, but I'm not too familiar with the dask internals. One possible solution for astropy could be to do something similar to xarray which just calls dask under the hood, but where the dependency on dask is entirely optional.Thanks,Srikrishna Sekhar
Hi Eric,
On Sun, 26 Jun, 2016 at 2:45 PM, Eric Jeschke <er...@redskiesatnight.com> wrote:
Does the particular FITS file using image scaling?
Specifically, citing this:Unfortunately, memory mapping does not currently work as well with scaled image data, where BSCALE and BZERO factors need to be applied to the data to yield physical values. Currently this requires enough memory to hold the entire array, though this is an area that will see improvement in the future.~EricBSCALE = 1 and BZERO = 0 in the header file. I tried passing do_not_scale_image_data=True to fits.open() but that still caused the out of memory OSError.Even if BSCALE is 1 does it try to do the scaling?Let me try and manually delete the BSCALE and BZERO keywords from the header and see if that solves the problem.
Hi Krishna, On 26 June 2016 at 21:52, Mark Taylor <M.B.T...@bristol.ac.uk> wrote:Krishna, On Sat, 25 Jun 2016, Srikrishna Sekhar wrote:Another way to find out if you are using 32- or 64-bit Python is to do: python -c 'import sys; print(sys.maxsize)' If you get 9223372036854775807, your Python is 64-bit, and if you get 2147483648, it is 32-bit.Perhaps your RAM + swap is > 30 GB so enough virtual memory can be allocated? Anyway I've provided a code snippet below. I'm intentionally using a really large file (30 GB) and my RAM + swap is 4 GB. I'm running on Arch Linux x64. from astropy.io import fits ffile = fits.open('TEST.FITS', memmap=True) # So far so good data = ffile[0].data.data # It errors out here The last line is where it crashes - The error is - "OSError: [Errno 12] Cannot allocate memory" with a larger traceback that I can post if it's relevant.I'm not any kind of python or AstroPy expert so this may well be wide of the mark, but are you sure you're running a 64-bit python binary on your 64-bit OS? The error looks similar to what I'd expect if you had a 32-bit address space. Running something like "file `which python`" should give you a clue whether that's the case.
Best regards Ole
Looks like this was the problem! Setting vm.overcommit_memory = 1 solved it, thanks!Would it be useful to add a line about this in the documentation? I just checked the value of vm.overcommit_memory on several people's computers and they're all set to 0 so looks like it is a fairly common issue.Are there any known problems with setting it to 1?
1 - Always overcommit. Appropriate for some scientific applications. Classic example is code using sparse arrays and just relying on the virtual memory consisting almost entirely of zero pages.
On Fri, Jun 24, 2016 at 7:21 PM, Thomas Robitaille <thomas.r...@gmail.com> wrote:Hi Sri, On 23 June 2016 at 18:27, Sri Krishna <kitchi.s...@gmail.com> wrote:FWIW (and I haven't read the whole thread yet so I might be behind on this already) but there is a known issue--there's a ticket for it on GitHub but I forget where--that one some systems, particularly those with overcommit disabled, when trying to mmap a large file it will still fail if you don't have enough physical RAM, because it needs to be able to guarantee enough memory for copy-on-write of the entire array. A workaround that's been known to work in the past is using `mode='readonly'` which explicitly disables copy-on-write. If you can find the ticket there's more details there but I'm on my phone right now.Hi, How would one deal with FITS files which are larger than the physical RAM on the machine? If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file, but the physical memory is just read in chunks. This fails if my FITS file is itself larger than memory (or memory + swap, I suppose).I haven't noticed this behavior before - I have accessed data from parts of 30+Gb files without having the RAM go over 50Mb. Could you provide an example of how you are reading and accessing the data?