I want to do Huffman decoding optimization of libjpeg in Chrome, and also want to contribute code to Chromium

Zhang Peixuan

unread,

Dec 12, 2012, 4:39:57 AM12/12/12

to chromium...@chromium.org, chromi...@chromium.org

Hello All,

I'm a programmer, my team and I are developing a parallel version of libjpeg-turbo, and we hope to contribute the code to Chromium. We use OpenCL to completed the optimization.

We have completed the optimization of invert-quantify, IDCT, up-sample, color-conversion, etc. The performance has been improved.

But we want to further improve the performance, so we tried to optimize Huffman decoding in libjpeg-turbo.

Because of the characteristics of this algorithm, we can't run Huffman decoding in parallel directly (Huffman decoding need to be run one MCU by one MCU). However, we could do this optimization in Web Browsers:

A JPEG file will be accessed more than one time in Web Browsers (we may meet it 2nd time when opening the same page once again), so we could record some node information to files at 1st time, and for 2nd time we meet the same JPEG files, we could run Huffman decoding in parallel using these node information.

We have written OpenCL kernels for this parallel Huffman decoding, and we need to save node information files into local disk. However, we have no permission to it while the sandbox is enable.

So I want to know that: Could I use the Dist Cache to save/load node information, or there are some other ways to do it. And if we write such the code of Huffman decoding, whether the Chromium community is willing to accept this code? Whether there are other considerations, to determine whether it is worthwhile to do so?

In any case, we are willing to contribute the existing code we have to Chromium, and if we want to contribute code, who can tell us how to do it?

Sincerely,

Peixuan Zhang

2012/12/12

Yang Guo

unread,

Dec 12, 2012, 4:50:44 AM12/12/12

to chromium...@chromium.org, chromi...@chromium.org

Hi,

I'm not familiar with the topic and am not in any position to comment on whether Chromium should integrate your optimizations. But here is a thought. If I understood correctly, you found a way to parallelize image decoding for the jpeg format if a small amount of pre-computed information is available to support parallelism. If this algorithm indeed proves to be correct and offers a better performance, wouldn't the best thing to do be adding this information to a newer version of the jpeg format so that images encoded in such way can always be concurrently decoded? I suggest bringing this up to the Joint Photographic Experts Group. Of course this would take longer than adding an ad-hoc implementation to Chromium.

If I misunderstood this issue, please ignore my comment.

Yang

Scott Hess

unread,

Dec 12, 2012, 9:34:55 AM12/12/12

to zhangpe...@gmail.com, chromium...@chromium.org, chromi...@chromium.org

I am not specifically knowledgeable in this area, but two questions
immediately come to mind:

1) How will you determine that they are the same jpeg files? I ask
because image-decoding is often a lovely place to find buffer-overrun
and stack-overrun attack vectors.

2) Does this save enough time to even make it worth hitting the disk?
I would expect Huffman decoding to be limited by bandwidth on modern
CPUs, so you could probably do an entire fresh decode in the time it
takes you to just check the disk for cached decode information.

As far as evaluating whether Chromium would accept it, you haven't
posted any hypothetical benchmarks or reference images which will be
improved by this approach. Merely parallelizing an algorithm is not
the goal, there are many algorithms which can be made parallel without
improving overall system performance at all (or even making it worse
by using resources other operations could make better use of).

-scott

> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev

krtulmay

unread,

Dec 12, 2012, 6:02:27 PM12/12/12

to chromium...@chromium.org, zhangpe...@gmail.com, chromi...@chromium.org, sh...@chromium.org

And when you say "used OpenCL to completed the optimization", do you mean the cryptographic library now known as Botan?

And you should probably post programming related items to Chromium-dev: http://groups.google.com/a/chromium.org/forum/?fromgroups#!forum/chromium-dev

Zhang Peixuan

unread,

Dec 12, 2012, 7:28:26 PM12/12/12

to yan...@chromium.org, chromium...@chromium.org, chromi...@chromium.org

Yeah, I understand what you mean, but this seems to be more difficult, because we need to modify a standard that has been used for 20 years.

Of course, even if not optimized Huffman decoding, the current version is still obvious speed boost, I'll give you some performance data later today.

2012/12/12 Yang Guo <yan...@chromium.org>

--

Zhang Peixuan

unread,

Dec 12, 2012, 7:40:35 PM12/12/12

to Scott Hess, chromium...@chromium.org, chromi...@chromium.org

for question 1, my idea is to do it in Chrome, we could store the infomation into Disk Cache. But you're right, it does bring some risks of security.

for question 2, I will give some performance data later today.

Thanks a lot.

2012/12/12 Scott Hess <sh...@chromium.org>

Zhang Peixuan

unread,

Dec 12, 2012, 7:47:50 PM12/12/12

to krtulmay, chromium...@chromium.org, chromi...@chromium.org, sh...@chromium.org

Sorry, I don't understand what you means.

We have written a GPU based libjpeg-turbo, using OpenCL. So I do not think that there is associated with cryptographic.

I will post more information later today.

Thanks a lot.

2012/12/13 krtulmay <krtu...@gmail.com>

Zhang Peixuan

unread,

Dec 13, 2012, 1:44:17 AM12/13/12

to Alpha Lam, chromium...@chromium.org, chromi...@chromium.org, Alpha (Hin-Chung) Lam

Hi Alpha, this PDF file shows what we have done.

For Huffman decoding, it was descripted in the email.

For the performance, I just give some data, and I will do more test and give it tomorrow.

On a AMD A10M machines, the performance is:

4096x3200 CPU 177ms GPU 98ms

1920x1080 CPU 31ms GPU 22ms

1024x768 CPU 10ms GPU 6ms

This performance data is not include paralllel Huffman decoding. Only did the optimization in the PDF file.

2012/12/13 Alpha Lam <hc...@chromium.org>

(using @chromium.org this time)

Hi,

Is there a paper I can refer to? Also performance gain numbers? I'm
skeptical that this is a big enough performance win.

We have implemented architecture to parallel decode multiple images which
brings us the most performance gain, I'm not sure if the case you mentioned
is common enough.

Alpha

2012/12/12 Zhang Peixuan <zhangpe...@gmail.com>

--

Doc1.pdf

Zhang Peixuan

unread,

Dec 14, 2012, 12:46:20 AM12/14/12

to Alpha Lam, chromium...@chromium.org, chromi...@chromium.org

I think OpenCL is necessary for JPEG decoding optimizations.

According to my understanding, what you say is only suitable for video, not suitable for image.
The GPU acceleration are widely used, so I think that the sandbox is necessary to support OpenCL.

2012/12/14 Alpha Lam <hc...@chromium.org>

GL shaders

Reply all

Reply to author

Forward