Advanced Compression module for OpenDAP

15 views
Skip to first unread message

Robert Bouillon

unread,
Sep 26, 2023, 10:54:50 AM9/26/23
to sup...@opendap.org, pcorn...@me.com

Hello,

 

It was suggested at a recent local networking that I reach out to Peter Cornillon by Joe Loberti event about Open DAP.

 

I’ve developed a novel compression algorithm which should significantly enhance the ratio at which data is compression as well as the decompression performance (in some cases allowing compressed data to be accessed directly, without decompressing first).

 

I’m looking for potential applications of this technology and was wondering if this is something that could benefit the system you’ve created? The technology is comparable to Parquet, ORC, and AVRO.

 

To be clear, I’m not looking for money – I’m in the early stages and am looking for applications of this technology, large data sets against which I can test the compression algorithm, and potential use-cases which would benefit from the ability to use the data in its compressed state.

 

 

Thanks!

--

Robert Bouillon

James Gallagher

unread,
Sep 27, 2023, 6:23:39 PM9/27/23
to sup...@opendap.org, Robert Bouillon, pcorn...@me.com
Robert,

It sounds very interesting. Can you describe some of the applications where you think would be a good fit? That is, is the technique limited to certain kinds, or organizations, of data?

Thanks,
James

James Gallagher

Peter Cornillon

unread,
Sep 28, 2023, 5:22:24 PM9/28/23
to Robert Bouillon, Peter Cornillon, sup...@opendap.org
Hi Robert,

Joe is correct,I have very large archives of satellite data and space is a big issue for me. Right now I’m struggling with an approximately 50 TB dataset and I want to acquire 3 to 4 times more similar data. To that end, I’ve just ordered a 250 TB RAID array. 

Most of the data are in netCDF, which uses chunking to allow extraction from the files without decompressing the entire file. If your method could be used in place of the compression that is used in netCDF and it proved to be as fast or faster and result in more compression I, and a lot of other people, would be very interested. A good starting point would be rewrite one of the files I have in netCDF but without chunking, compress the resulting file with your approach and compare its size to that of the chunked file you started with. If it is the same size or smaller, the next step would be to determine if it is faster. This would be tougher in that you would have to dive into the netCDF code and replace the compression they use with yours.

Let me know if you want to follow up on this.

I’m taking a few days off—should be back in the office this coming Thursday (10/5).

Peter


Peter Cornillon
 215 South Ferry Road               
 Office Phone: (401) 874-6283
 Cell: (401) 742-2911
  Graduate School of Oceanography
    University of Rhode Island
      Narragansett, RI 02882 USA

Robert Bouillon

unread,
Sep 29, 2023, 10:56:45 AM9/29/23
to Peter Cornillon, Peter Cornillon, sup...@opendap.org

Hi Peter,

 

This sounds like the perfect application of the technology I’ve developed. I’d love to link up when you’re back in the office. When would a good time for us to meet? I can do a zoom meeting, or we can chat in-person if you’d like – I live in North Kingstown.

 

 

--

Robert

Reply all
Reply to author
Forward
0 new messages