Would porting the code for CUDA / OpenCL be hard to do? [Dev lvl 0/10]

Dr. Dietrich Davidstein

unread,

Oct 3, 2016, 7:04:31 PM10/3/16

to Brotli

Hello fellow zopfli- and brotlians,

Brotli supports multiple cpu threads, but compressing large files is only feasible with a high core machine (e.g. mainboard with dual or quad xeon - sockets).

If it´s relevant – here´s the setup Google used to perform their tests:

The test computer we used is an Intel® Xeon® CPU E51650 v2 running at 3.5 GHz with six cores and six additional hyper threading contexts.

We run linux 3.13.0. All codecs were compiled using the same compiler, GCC 4.8.4 at O2 level optimization.

All tests were run singlethreaded on an otherwise idle computer.

"Only" six cores (12 with hyperthreading) – but that´s more than my already 5 year old sandy bridge i7...

Trying to compress a folder with a few 100mb made up from random file types:
mostly .indd [InDesign documents] with embedded text, images [jpg, tiff, psd, png], vector graphics [eps, ai, pdf])
takes forever with my i7 2600k (@4,3 Ghz).

I know, that´s not the intention behind brotli - to be a low ressource/high efficiency compressor like 7zip or alike (please correct me if i misinterpreted this!).

Brotli wasn´t written to be the fastest compressor out there, was it? But it has an incredible compression efficiency!

Since i don't have any experience with porting C code to OpenCL or CUDA, i hope someone of the community could explain these few questions to me:

1) Would porting the code to a GPU shader language be hard to do? Just a few hours? Or 2-3 days of work? More ? :-/

2) Could the smaller VRAM of the card (most people have 2-4GB, max. 8GB) compared to the much higher RAM usually available (on average 8-32 GB) cause problems?
How important is the size of the RAM? Could you compress large files/folders, say over 1000MB, with only 2 GB of RAM installed?
Is it possible to give a rough prediction how a NVIDIA Card with 1500-2300 shaders would perform compared to just using the cpu?
Pardon me if i think too far ahead - i know it´s hard to calculate without even having the GPU option.

3) Would there be enough people interested in using the GPU-implementation?

4) If you know practically how to do it, but don't have the time or nerves to port the code – where would you start? The CUDA / OpenCL documentation?
Or is it useless to try if you don´t have any experience with compression algorithms in general?

Wonder why i didn´t find anything here or on github – hope i´m not the only one interested in the GPU support – i guess this could solve brotli´s greatest problem - it´s speed!

I just thought about all the ML Neural Style projects - Convolutional Neural Nets would be impossible to compute without having a decent GPU.
Of course you can use the CPU, if you are willing to wait a few hours for a single image...

-------

PS:
To the compression experts:
Do you know a compression software that utilizes the GPU?

And more importantly – what´s the Weissman score of brotli? :-)

Evgenii Kliuchnikov

unread,

Oct 6, 2016, 11:43:53 AM10/6/16

to Brotli

Hello.

Current implementation (tries) to use strong sides of CPU. If you just port it to GPU, most probably, you will get worse speed.

To use the strong sides of GPU you need to design a new architecture of compressor. So, my estimates are not hours, or days, but weeks of work.

Our team is open to all the new and innovative, so we will be glad to help, if someone try to do this. Unfortunately we have not much human resources to work on this ourselves now...

Perhaps it doesn't make sense to try to cover the whole speed-density space with a single implementation. If I were going to write a GPU brotli encoder I would choose a narrow use-case first (e.g. ultra-fast encoding, or high-load environment, or best-possible density) and try to write encoder just for this case...

sandip1...@gmail.com

unread,

Jul 13, 2020, 4:15:30 PM7/13/20

to Brotli

Let me give you my scenario:

We are server-side rendering of the page for eCommerce. we get more than 1 million+ requests per day. During sale week we get 3 million+ requests per day. This number is increasing and sometimes we get more than 2lakh requests in an hour. for every request, we compress our content to brotli or gzip. If I will relay on cpu then cost will increase every day. We also perform some task for every request. If we can develop a solution where I can utilize 100s of cores of GPU from nodejs as we can use CPU cores then the cost of server will be less for every website.

Eugene Kluchnikov

unread,

Sep 2, 2020, 5:59:55 AM9/2/20

to Brotli

I suppose content is generated as instantiation of the template? In this case a special encoder could "preprocess" template and compress much faster / cheaper.

GPU computations are most efficient for calculation that are independent. In case of LZ compression (especially high densities) the whole compression process is a chain of dependencies. So it does not fit the compression model.

Reply all

Reply to author

Forward