Hello OpenDAP!
I have been tasked with getting some data into DMR++ and to use VirtualliZarr to host some type of Zarr endpoint API. I believe Hyrax is the right tool for this, but I think I'm bumping up against the newness of these capabilities as far as the documentation is concerned.
I am currently running the Hyrax container on an AWS EC2 instance, and have netCDF data in S3. I used the ingest_s3bucket tool to create dmrpp files on the EC2 filesystem that is then a mounted volume within the container using the docker run parameter: --volume ~/tmp/data:/usr/share/hyrax
Is there anyway to set the dmrpp files to exist only on S3?
Although orders of magnitude smaller than the netCDF data, I'm seeing file sizes ~60kb per dmrpp which adds up to about 20TB. That is simply too much data for the EC2's local storage, and really expensive for an EFS (Elastic File System). Also if we try to use s3fs to mount the s3 to the container, we would see a performance impact from traversing such large buckets during catalog building/loading. So this becomes a real architectural problem, and I hope there is a work-around for allowing object-storage-hosted DMR++ files to be used by the Hyrax BES.
Does Hyrax have a Zarr API endpoint?
Thanks,
Myles McManus, P.E.
Data Scientist
Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)
NCEI Data Stewardship Division
151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email: myles,b,mcm...@noaa.gov | Voice/Text: (828) 419-1569