Creating a dask-powered DataArray from memory mapped file

335 views
Skip to first unread message

Srikrishna Sekhar

unread,
Jun 18, 2016, 2:17:30 PM6/18/16
to xarray
Hi,

I've looked through the documentation explaining how to create a DataArray that is accelerated by Dask. But my situation is slightly different and I was wondering if anyone has a solution/work-around.

I'm reading in a n-dimensional memory mapped array using the astropy library (it is a FITS file, if that is important). Is it possible to pass this memory mapped structure into xarray and tell it to recognize the data in chunks using dask?

Porting the FITS file into a HDF5 or equivalent dataset is unfeasible for reasons I can explain, but aren't strictly relevant.

Thanks,
Krishna

Ryan Abernathey

unread,
Jun 20, 2016, 2:53:34 PM6/20/16
to xar...@googlegroups.com
Hi Krishna,

I have a similar situation: I want to use xarray to analyze output from an ocean model which uses a custom binary format. Outputting to / converting to netCDF is not an option.

I tried to implement something like this by subclassing xarray.backends.common.AbstractDataStore
and creating something I called a memmap array wrapper

The code is messy (currently trying to refactor), but it basically works.

However, I'm not convinced memmap is the best way to handle this. One problem is that memmap allocates memory for all the data even if it doesn't read the file. That means this approach can't be used for out-of-core-sized datasets.

I would love to hear some suggestions for a better approach.

Cheers,
Ryan


--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/255696a1-bd55-4be5-81e3-bf8c65ec08de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephan Hoyer

unread,
Jun 20, 2016, 3:52:54 PM6/20/16
to xar...@googlegroups.com
On Sat, Jun 18, 2016 at 11:17 AM, Srikrishna Sekhar <kri...@gmail.com> wrote:
I'm reading in a n-dimensional memory mapped array using the astropy library (it is a FITS file, if that is important). Is it possible to pass this memory mapped structure into xarray and tell it to recognize the data in chunks using dask? 

I think dask's da.from_array function should work perfectly for this. Take a look at these docs for guidance from the dask side:
http://dask.pydata.org/en/latest/array-creation.html

You can pass dask arrays into xarray data structures in the exact same way as numpy arrays.

Srikrishna Sekhar

unread,
Jun 21, 2016, 2:55:47 AM6/21/16
to xarray

I think dask's da.from_array function should work perfectly for this. Take a look at these docs for guidance from the dask side:
http://dask.pydata.org/en/latest/array-creation.html

You can pass dask arrays into xarray data structures in the exact same way as numpy arrays.
 
Thanks! I'll give that a shot. If I do this, xarray should understand the chunks I define in the dask arrays I guess?

@Ryan - Doesn't memmaping only allocate the memory when I slice into the memmap or access the memmap elements? I did a few tests using the astropy memmap functionality and that's what I noticed, but I could have been doing them wrong.

Thanks,
Krishna

Ryan Abernathey

unread,
Jun 21, 2016, 10:07:26 AM6/21/16
to xar...@googlegroups.com
@Ryan - Doesn't memmaping only allocate the memory when I slice into the memmap or access the memmap elements? I did a few tests using the astropy memmap functionality and that's what I noticed, but I could have been doing them wrong.

I think it allocates virtual memory for the entire file but only resident memory for the data you actually read.

I don't understand all the system-level details, but I am encountering memory errors using memmaped files more frequently than I expected. This is currently a major bottleneck in my workflow.
 

Thanks,
Krishna

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages