DARS - fast and light DAP/2 server written in rust

65 views
Skip to first unread message

Gaute Hope

unread,
Sep 10, 2020, 12:46:06 PM9/10/20
to OPeNDAP Tech
Hi,

I have been experimenting with a DAP/2 server written in rust: DARS (https://github.com/gauteh/dars, or https://hub.docker.com/r/gauteh/dars for docker). 

The focus is on being lightweight and asynchronous, it is not finished, but it is possible to test it out! The plan is to only support the DAP/* protocol (currently only v2 since client support appears to be spotty at the moment). So it will only support a fraction of the features that Hyrax and Thredds can do.

It has basic support for HDF5, NetCDF4 (through HDF5) and Ncml. Since the HDF5 library is not thread-safe at all, an concurrent (and highly experimental) HDF5 reader was written. Inspired by the DMR++ module in Hyrax (thanks for the advice and previous work!). It performs on par or better than the official HDF5 library for sequential reads, but far better for concurrent reads: https://github.com/gauteh/hidefix

Benchmarking is difficult or even meaningless, but I have tried to compare the performance (requests per second) between dars, thredds, and hyrax for a couple of cases. Note that it would be better to look at latency percentile histograms, but those are difficult to compare for servers that perform differently. The tests tries to make as many requests per second using 10 concurrent connections, for metadata (DAS & DDS), data: small request(40kb) slicing through large dataset (464mb), large request entire dataset (464mb), small request, entire dataset(740kb). The tests are run on the _default_ docker images for the servers. The large dataset is chunked, compressed (gzip) and shuffled, (MEPS) while the small dataset is chunked, but uncompressed (coads_climatology).


Note that I encountered frequent out-of-memory errors with thredds especially when benchmarking against the large dataset.

Hope this might be of interest!

Best regards, Gaute

James Gallagher

unread,
Sep 10, 2020, 12:54:44 PM9/10/20
to OPeNDAP Tech, Gaute Hope
This is very interesting work! Thanks very much for posting. Have you tried testing with clients like Python's xarray() or Panoply or Matlab?

Thanks,
James

Gaute Hope

unread,
Sep 10, 2020, 1:11:25 PM9/10/20
to James Gallagher, OPeNDAP Tech
I've tested with netcdf-based clients, like python netcdf4, ncdump or nco. Should definitely try more clients! Some missing features can cause some problems, like float32 fillvalue attribute messes up min and max values... Easy to add, but not done yet. 

Gaute

Gaute Hope

unread,
Sep 10, 2020, 3:02:27 PM9/10/20
to OPeNDAP Tech, Gaute Hope, OPeNDAP Tech, James Gallagher
As far as I can see both Panolpy and xarray use netCDF to get to DAP. Xarray uses Python-netCDF4, which I have integration tests for.

By the way; I have a temporary demo server running here: 


But it will be shut down unless cost stays negligible. It has the lowest resource setting possible on google compute engine, so it won't be fast.

Regards, Gaute

Gaute Hope

unread,
Sep 14, 2020, 1:32:12 PM9/14/20
to James Gallagher, Tech OPeNDAP


On Mon, Sep 14, 2020 at 5:51 PM James Gallagher <jgall...@opendap.org> wrote:


On Sep 10, 2020, at 1:02 PM, Gaute Hope <e...@gaute.vetsj.com> wrote:

As far as I can see both Panolpy and xarray use netCDF to get to DAP. Xarray uses Python-netCDF4, which I have integration tests for.

By the way; I have a temporary demo server running here: 


But it will be shut down unless cost stays negligible. It has the lowest resource setting possible on google compute engine, so it won't be fast.

I did test Panoply with the server and could not get it to work. It might be something to do with the CE processing, but I don’t really know.

Ok, thanks for testing. I think ncview triggers the same issue. I believe it might be related to some missing attributes (specifically FillValue for Float32's). I will make sure these tools work in the future.

-- gaute

Beto Dealmeida

unread,
Sep 15, 2020, 3:05:33 PM9/15/20
to Gaute Hope, James Gallagher, Tech OPeNDAP
Not sure if this is still relevant or useful, but back in 2012 I wrote a blog post about making a DAP server work with as many clients as possible:

https://robertodealmeida.posthaven.com/making-sure-your-dataset-can-be-accessed-thro

Best,
--Beto
> --
> You received this message because you are subscribed to the Google Groups "OPeNDAP Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opendap-tech...@opendap.org.
> To view this discussion on the web visit https://groups.google.com/a/opendap.org/d/msgid/opendap-tech/CABKe4MustbdyTtXvvagR0s5BKmUiEQqRTq0Co2xw7UQZAgiC2A%40mail.gmail.com.

James Gallagher

unread,
Sep 17, 2020, 7:04:15 PM9/17/20
to Gaute Hope, James Gallagher, Tech OPeNDAP
On Sep 10, 2020, at 1:02 PM, Gaute Hope <e...@gaute.vetsj.com> wrote:

As far as I can see both Panolpy and xarray use netCDF to get to DAP. Xarray uses Python-netCDF4, which I have integration tests for.

By the way; I have a temporary demo server running here: 


But it will be shut down unless cost stays negligible. It has the lowest resource setting possible on google compute engine, so it won't be fast.
I did test Panoply with the server and could not get it to work. It might be something to do with the CE processing, but I don’t really know.

James


Regards, Gaute

On Thursday, September 10, 2020 at 7:11:25 PM UTC+2 Gaute Hope wrote:
I've tested with netcdf-based clients, like python netcdf4, ncdump or nco. Should definitely try more clients! Some missing features can cause some problems, like float32 fillvalue attribute messes up min and max values... Easy to add, but not done yet. 

Gaute


tor. 10. sep. 2020, 18:54 skrev James Gallagher <jgall...@opendap.org>:
This is very interesting work! Thanks very much for posting. Have you tried testing with clients like Python's xarray() or Panoply or Matlab?

Thanks,
James

On Thursday, September 10, 2020 at 10:46:06 AM UTC-6 Gaute Hope wrote:
Hi,

I have been experimenting with a DAP/2 server written in rust: DARS (https://github.com/gauteh/dars, or https://hub.docker.com/r/gauteh/dars for docker). 

The focus is on being lightweight and asynchronous, it is not finished, but it is possible to test it out! The plan is to only support the DAP/* protocol (currently only v2 since client support appears to be spotty at the moment). So it will only support a fraction of the features that Hyrax and Thredds can do.

It has basic support for HDF5, NetCDF4 (through HDF5) and Ncml. Since the HDF5 library is not thread-safe at all, an concurrent (and highly experimental) HDF5 reader was written. Inspired by the DMR++ module in Hyrax (thanks for the advice and previous work!). It performs on par or better than the official HDF5 library for sequential reads, but far better for concurrent reads: https://github.com/gauteh/hidefix

Benchmarking is difficult or even meaningless, but I have tried to compare the performance (requests per second) between dars, thredds, and hyrax for a couple of cases. Note that it would be better to look at latency percentile histograms, but those are difficult to compare for servers that perform differently. The tests tries to make as many requests per second using 10 concurrent connections, for metadata (DAS & DDS), data: small request(40kb) slicing through large dataset (464mb), large request entire dataset (464mb), small request, entire dataset(740kb). The tests are run on the _default_ docker images for the servers. The large dataset is chunked, compressed (gzip) and shuffled, (MEPS) while the small dataset is chunked, but uncompressed (coads_climatology).


Note that I encountered frequent out-of-memory errors with thredds especially when benchmarking against the large dataset.

Hope this might be of interest!

Best regards, Gaute

--
James Gallagher
jgall...@opendap.org


signature.asc
Reply all
Reply to author
Forward
0 new messages