Dataset.to_netcdf() problem when engine='scipy'

664 views
Skip to first unread message

Seth P

unread,
Nov 29, 2017, 1:53:22 PM11/29/17
to xarray
Saving a Dataset using ds.to_netcdf() works fine if I specify engine='h5netcdf', but crashes if I don't specify engine (in which case it uses scipy). Any idea what's causing this error? I don't know if it's an xarray or scipy issue. I include below the traceback as well as the output of xr.show_versions().

<xarray.Dataset>
Dimensions:                     (foo: 90, foo1: 90, item: 8050, model_date: 5350)
Coordinates:
  * model_date                  (model_date) datetime64[ns] 1990-01-01 ...
  * item                        (
item) <U7 'ITM0001' 'ITM0002' ...
  * foo                         (foo) <U16 'FooDescriptionA' ...
  * foo1                        (foo1) <U16 '
FooDescriptionA' ...
Data variables:
    my_data_array               (model_date, item, foo) float64 0.626 ...
Traceback (most recent call last):
  File "C:\Users\Seth\Anaconda3\lib\site-packages\xarray\backends\api.py", line 618, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "C:\Users\Seth\Anaconda3\lib\site-packages\xarray\core\dataset.py", line 1071, in dump_to_store
    store.sync()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\xarray\backends\scipy_.py", line 209, in sync
    self.ds.flush()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 391, in flush
    self._write()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 403, in _write
    self._write_var_array()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 451, in _write_var_array
    self._write_var_metadata(name)
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 490, in _write_var_metadata
    self._pack_int(vsize)
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 760, in _pack_int
    self.fp.write(array(value, '>i').tostring())
OverflowError: Python int too large to convert to C long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "my_script.py", line 324, in <module>
    ds.to_netcdf(**parse_to_netcdf_arguments(args_vars))
  File "C:\Users\Seth\Anaconda3\lib\site-packages\xarray\core\dataset.py", line 1132, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "C:\Users\Seth\Anaconda3\lib\site-packages\xarray\backends\api.py", line 623, in to_netcdf
    store.close()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\xarray\backends\scipy_.py", line 212, in close
    self.ds.close()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 281, in close
    self.flush()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 391, in flush
    self._write()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 403, in _write
    self._write_var_array()
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 451, in _write_var_array
    self._write_var_metadata(name)
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 490, in _write_var_metadata
    self._pack_int(vsize)
  File "C:\Users\Seth\Anaconda3\lib\site-packages\scipy\io\netcdf.py", line 760, in _pack_int
    self.fp.write(array(value, '>i').tostring())
OverflowError: Python int too large to convert to C long


In [1]: import xarray as xr

In [2]: xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

xarray: 0.10.0
pandas: 0.21.0
numpy: 1.13.3
scipy: 1.0.0
netCDF4: None
h5netcdf: 0.5.0
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.0
matplotlib: 2.1.0
cartopy: None
seaborn: 0.8.1
setuptools: 37.0.0
pip: 9.0.1
conda: 4.3.29
pytest: 3.3.0
IPython: 6.2.1

sphinx: 1.6.5

Stephan Hoyer

unread,
Nov 30, 2017, 12:47:43 AM11/30/17
to xar...@googlegroups.com
I think this Dataset is too big for SciPy's netCDF3 file-format (which has a 32-bit = 2GB size limit for arrays). Unfortunately this is a hard limit for netCDF3, but SciPy should certainly have a better error message.

Separately, someone has been working on how the default netCDF backend is chosen, so we should hopefully be able to make this work by default in the future:

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/d9ff8e08-4d6e-4d1c-b250-7ed97c7e9c8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seth P

unread,
Nov 30, 2017, 12:08:32 PM11/30/17
to xarray
That makes sense. Thanks. Yes, the error message could certainly be clearer.
Reply all
Reply to author
Forward
0 new messages