Issue 422 in zxing: suggested speedup in com.google.zxing.common.reedsolomon

zx...@googlecode.com

unread,

May 27, 2010, 8:47:11 PM5/27/10

to zx...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 422 by jmsachs: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

I did some quick performance tests in com.google.zxing.common.reedsolomon
with a (255,247) code using rev1393 of the svn repos.

If I run the encoder 5000 times on a source array, it takes about 4900msec.
If I run the decoder 5000 times (each time clone() a source array with
errors that can be corrected, then ReedSolomonDecoder.decode()) on my CPU,
it takes about 1200msec.

I'm looking at the Galois field math and I noticed an easy speedup after
having read http://nisl.wayne.edu/Papers/Tech/GF.pdf earlier today ("Fast
Software Implementations of Finite Field Operations", Huang and Xu)

In GF256.multiply, the code currently reads:

return expTable[(logTable[a] + logTable[b]) % 255];

This can be sped up by a factor of about 2 by eliminating the modulo 255
using a technique similar to casting out nines (X mod 9 = (sum of X's
base-10 digits) mod 9)

int logsum = logTable[a] + logTable[b];
return expTable[(logsum & 0xff) + (logsum >> 8)];

Huang and Xu say you need to augment the exponentiation table but you
really don't, when you're only multiplying two numbers. expTable is an
array of indices 0 to 255, containing numbers from 1 to 255. logTable is an
array of indices 1 to 255 (0 never used) containing numbers from 0 to 255.
The expression "logsum" given above contains a number from 0 to 510. The
sum of its radix-256 digits is either (0 + (a number between 0 and 255)) or
(1 + (a number between 0 and 254)) so the sum of its radix-256 digits is
between 0 and 255 and therefore something that can be used directly as an
index in the existing expTable. (The expression I gave above computes the
sum of its radix-256 digits)

If you use a similar technique to let logsum = the sum of several
logTable[x], then the sum of its digits can be slightly higher (should be
worst-case of 255+(k-2) where 2 <= k <= 255 is the number of addends in the
logsum expression) and the table can be extended.

Why go to all this trouble? Because a modulo 255 takes more time than
bitwise arithmetic and addition. I ran this one simple change and my timing
test for decoding reduces from about 1200msec to about 500msec.

The encoder still takes much longer than decoding, which seems really
suspicious....

zx...@googlecode.com

unread,

May 27, 2010, 8:55:15 PM5/27/10

to zx...@googlegroups.com

Comment #1 on issue 422 by jmsachs: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

> Huang and Xu say you need to augment the exponentiation

> table but you really don't

correction/clarification: this is an off-by-one issue with the expTable
only needing
255 values (0 - 254) in the original code, not 256, if it is used for
multiplication
only, since in the original code it is expTable[{something} % 255], and
expTable[255]
= expTable[0]. But if you use the Huang/Xu speedup then you do need to
populate that
extra value expTable[255].

zx...@googlecode.com

unread,

May 27, 2010, 9:25:46 PM5/27/10

to zx...@googlegroups.com

Comment #2 on issue 422 by jmsachs: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

The bottleneck in encoding seems to be in the GF256Poly.divide() method.
Perhaps I
can suggest a speedup there.

zx...@googlecode.com

unread,

May 28, 2010, 12:21:14 AM5/28/10

to zx...@googlegroups.com

Updates:
Status: Accepted
Owner: srowen
Labels: -Type-Defect -Priority-Medium Type-Enhancement Priority-Low

Comment #3 on issue 422 by sro...@gmail.com: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

OK. As you say, RS decoding takes only like a millisecond to begin with.
I'll look at the change later when I'm
back. RS encoding is going to be slower, and it's less important still, but
sure if you can see an improvement
that's good.

zx...@googlecode.com

unread,

May 28, 2010, 8:10:36 AM5/28/10

to zx...@googlegroups.com

Comment #4 on issue 422 by jmsachs: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

for barcode encoding/decoding it's probably not so important. I'm looking
at the RS
encoding/decoding for ECC in large files streaming to/from disk (at roughly
100KB/sec), so am concerned with efficiency.

thanks for implementing -- I'm amazed given all the java libs out there
that this is
one of the few libraries containing Reed-Solomon encoding

zx...@googlecode.com

unread,

May 28, 2010, 11:13:29 AM5/28/10

to zx...@googlegroups.com

Comment #5 on issue 422 by jmsachs: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

As I thought, the divide is not efficient primarily because it's creating a
bunch of
new arrays. I rewrote it and as a result encoding sped up by a factor of
about 15 (!)
so it's comparable to decoding in speed. (10MB test file: encode = 35 sec
previously
-> 2.4 sec now ; decode = 1.2-1.3sec)

Following is the core part of relevant changes. When I have a chance, I'll
get you a
compatible file set with minimal changes -- I've factored out some
interfaces /
factory classes for my own reasons, so the code I am using has morphed a
bit from
what's in the google svn.

The following uses a mutable working array to perform the divide in-place.
Each step
of the divide subtracts (or adds) out a multiple of the divisor so that the
leading
term of the working array to set to 0. In place of that zeroed out term, it
puts the
scaling factor which is the appropriate term of the quotient.

private void scaleAndSubtract(int[] w, GFPoly p, int j0, int a)
{
// subtracts p * (a*x^n) from w
// i0 = initial index of w (leading term of coefficients)
for (int i = p.getDegree(), j = j0; i >= 0; --i, ++j)
{
int b = getField().multiply(p.getCoefficient(i), a);
w[j] = getField().addOrSubtract(w[j], b);
}
}

@Override public GF256Poly2[] divide(GFPoly divisor)
{
if (!getField().equals(divisor.getField())) {
throw new IllegalArgumentException("GF256Polys do not have same GF256
field");
}
if (divisor.isZero()) {
throw new IllegalArgumentException("Divide by 0");
}

int[] w = this.coefficients.clone();
// working array -- this algorithm divides in place

int denominatorLeadingTerm = divisor.getCoefficient(divisor.getDegree());
int inverseDenominatorLeadingTerm =
getField().inverse(denominatorLeadingTerm);

int quotientDegree = getDegree() - divisor.getDegree();
for (int i = quotientDegree, j0 = 0; i >= 0; --i, ++j0)
{
int scale = getField().multiply(w[j0], inverseDenominatorLeadingTerm);
scaleAndSubtract(w, divisor, j0, scale);

// w[j0] should be 0 at this point. Store quotient coefficients here.
w[j0] = scale;
}

GF256Poly2 quotient = this.factory.buildPolynomial(w, 0, quotientDegree);
GF256Poly2 remainder = this.factory.buildPolynomial(w, quotientDegree+1,
divisor.getDegree()-1);
return new GF256Poly2[] { quotient, remainder };
}

zx...@googlecode.com

unread,

May 28, 2010, 11:39:25 AM5/28/10

to zx...@googlegroups.com

Comment #6 on issue 422 by sro...@gmail.com: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

I'll commit the first changes. It can actually get faster by removing the
check for one operand being 1. Saving two
branches helps the pipeline more than it hurts by having to carry out the
rest of the operation in those cases.

Sure post a patch with the rest of your findings. I am sure it can be sped
up.

Credit goes partly to Google's code base, we got permission to pinch and
adapt some of the routines used
internally.

zx...@googlecode.com

unread,

Jun 1, 2010, 2:28:37 PM6/1/10

to zx...@googlegroups.com

Updates:
Status: Fixed

Comment #7 on issue 422 by sro...@gmail.com: suggested speedup in
com.google.zxing.common.reedsolomon
http://code.google.com/p/zxing/issues/detail?id=422

Reopen if you have a concrete patch with more speedups

Reply all

Reply to author

Forward