Hello..............
As you have noticed i have implemented my Parallel archiver
and my Parallel compression library.
Here they are:
https://sites.google.com/site/aminer68/parallel-archiver
and
https://sites.google.com/site/aminer68/parallel-compression-library
Other than that you have to know that GPUs are basically vector
processors, where you can be doing hundreds or thousands of ADD
instructions all in lock step, and executing programs where there are
very few data-dependent branches.
Compression algorithms in general sound more like an SPMD (Single
Program Multiple Data) or MIMD (Multiple Instruction Multiple Data)
programming model, which is better suited to multicore cpus.
Video compression algorithms can be accellerated by GPGPU processing
like CUDA only to the extent that there is a very large number of pixel
blocks that are being cosine-transformed or convolved (for motion
detection) in parallel, and the IDCT or convolution subroutines can be
expressed with branchless code.
GPUs also like algorithms that have high numeric intensity (the ratio of
math operations to memory accesses.) Algorithms with low numeric
intensity (like adding two vectors) can be massively parallel and SIMD,
but still run slower on the GPU than the CPU because they're memory bound.
Thank you,
Amine Moulay Ramdane.