Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

ERDDAP™ version v2.25_1 is now available!

19 views
Skip to first unread message

chris.john

unread,
Nov 13, 2024, 10:23:53 AM11/13/24
to ERDDAP-Announce

chris.john

unread,
Nov 13, 2024, 4:07:16 PM11/13/24
to ERDDAP-Announce
Roy Mendelssohn asked me to share this with the group:

To add on to Chris's announcement:

1.  Please do update to this version if at all possible.  It is not just for the new features,  it is to make sure that all libraries being used are at their latest patch levels.  Try to keep your tomcat, apache/nginx/java as up-to-date as possible also.  ERDDAP v2.25_1   has passed several security scans running on our server.

2.  Chris mentioned the zarr read support.  I know the intent over the next few releases is to try and improve object-store support in general,,  if there are things that would help raise them on the Github page.

3.  I have been playing with the parquet support,  what follows is not exhaustive , so YMMV.  In one example,  a 320MB csv file is reduced to 17.3MB.  Usually when you compress files there is a tradeoff between file size and access speed,  but in my tests the parquet read was faster.  There are also two options for parquet downloads,  ".parquet" and ".parquetWMeta".  The latter includes file metadata.  The base parquet format only allows for one non-data  column,  which is used for the column names,  the metadata are put as key-value pairs in the metadata section.  Be aware that not all parquet readers (and writers) access the metadata section.  parquet table downloads, like above,  also are often up to 90% the size of the same csv file.  So if the file being served is parquet and the download is parquet,  the result is much faster access and download.   

If you write parquet files,  missing values should be written as "null".  The writers I tested all did this automatically if the missing value is denoted by the standard for that language,  such as in R arrow::write_parquet() will write NA as null,  and will read the null as NA.  For R users,  in the next few weeks I will be adding support to download in parquet to rerddap::tabledap()  (it is already implemented,  just not tested  lot,  and I am old and lazy).

You can also download grids into parquet files.  This can be convenient when the desired end-result is to have the data in what the tidyverse calls "long-form",  which is needed to use any of the  tidyverse functions (programs such as rerddap::griddap() already do this for you,  but it is convenient in other settings).  The read time for parquet is slower than reading  netcdf file,  probably because to create the parquet file the data must be "melted"  (a tidyverse term) to long-form.  However,  while for smaller extracts (say 5MB-10MB) the file sizes are similar,   as the extracts get larger the parquet files become smaller than the netcdf file.  For example,  for one large MUR extract,  the netcdf file was 116.7 MB while the parquet file was 92.9MB. So while the parquet read time was slower (time to start the download),  the overall times were comparable.

4.  If you have an ERDDAP that has been have memory issues,  do consider updating to this version.  Our testing suggests this version is more stable.  But with the new Java memory model,  heap size isn't usually the limiting factor if set appropriately,  overall memory size is.  So if you have lots of files with heavy use,  you need a lot of memory,  and you must have you settings such that ERDDAP never swaps.   Also consider changing the garbage collector you are using.  Right now we are using  "-XX:+UseZGC -XX:+ZGenerational"  and consider further tuning by setting "-XX:SurvivorRatio" to some higher value than the default.

Most of all thank you to everyone who has helped in the development of ERDDAP whether from giving suggestions,  or from reporting bugs,  or from supplying code (which is really appreciated).  Thanks to Chris for his work on this,  and thanks to all of you for using and supporting ERDDAP.

-Roy
Reply all
Reply to author
Forward
0 new messages