Profiling my application shows the loop I'm currently using to set this
memory is very costly, below is a sample of my code:
f1 = (byte) (buffer[idx++]);
f2 = (byte) (buffer[idx++]);
for (int k = 0; k < xchpx; k++)
{
PixelData[iDecodeIndex][ppx++] = f1;
PixelData[iDecodeIndex][ppx++] = f2;
}
the two assignment lines account for 40% of the functions time. Does
anyone know how I can improve this? Thanks for any help in advance.
Gregg Sirrine
Waterford Institute
gsir...@waterford.org
-------------------==== Posted via Deja News ====-----------------------
http://www.dejanews.com/ Search, Read, Post to Usenet
No. Not likely to ever be one.
You do know that all Java data is intialized to zero
when it is created?
One thing you might consider is pre-initializing an
array to your desired value, keeping a static reference
to it around, and then doing System.arrayCopy to copy
that initialized array (or parts of it) to another
working array.
Some quick suggestions to try, in addition to the array copy
suggestion.
1. Set up a reference to PixelData[iDecodeIndex] outside the loop. (A
sufficiently smart compiler would notice the loop invariant expression
and move it out of the loop itself, so this might not help.)
2. If array copy helps, but needs too much memory for the extra array,
consider initializing a moderate sized chunk of array with the right
pattern, and then copying it. This will also work if the pattern
changes during the run - initialize a portion of the array using the
current loop, then start copying the already initialized portion,
doubling the size of the area you copy on each iteration until you
have at least half the array initialized, then copy enough to fill
whatever is left over.
3. You don't say whether the loop control is a significant factor. If
it is, consider loop unrolling - deal with more than two elements in
each iteration.
4. The way your loop is coded suggests that pairs of bytes really form
an element. If that is the case, consider switching to short, and
splitting the short into two bytes when you need individual bytes. (I
don't know whether this is reasonable from the point of view of the
rest of the program.) It seems likely that number of elements
processed is a more significant factor than amount of space,
especially if it is running significantly slower than memset. The
bigger the elements, the fewer you need to process.
Patricia
An array copy cannot be an efficient memset. For every productive
store, it does a wasted load. Loads that cache miss (likely if the
area is large) tend to do worse things to processor pipelines than
stores. It may be less inefficient than doing it an element at a time,
but is likely to take at least twice as long as a really efficient
native code memset would.
If one does need to use the arraycopy technique, and performance
matters, there are a couple of changes that might improve the version
in the FAQ. One is to start by writing a chunk using the conventional
store loop, rather than just one byte. Short arraycopys may be less
efficient than the loop. The other is to stop doubling once the block
is large enough to hide the arraycopy call overhead. This may reduce
cache flushing due to reading in a very big area.
Patricia
An efficient memset is easily built from System.arraycopy(). For
an example, see the Java Programmers FAQ item 3.19:
http://www.best.com/~pvdl/javafaq.html
What I would like to see in the core is System.arraycompare():
class System {
public native int arraycompare(Object src,int sbeg,Object dst,int dbeg,int n);
}
which returns the index of the first element which is different between
two arrays. The caller can then compare the elements in whatever
way is desired. For instance, to do an unsigned byte[] compare:
byte[] b1 = ...;
byte[] b2 = ...;
int i = System.arraycompare(b1,0,b2,0,b1.length);
if (b1[i] & 0xFF > b2[i] & 0xFF) {
...
}
--
Stuart D. Gathman <stu...@bmsi.com> / <..!uunet!bms88!stuart>
Business Management Systems Inc.
Phone: 703 591-0911 Fax: 703 591-6154
"Microsoft is the QWERTY of Operating Systems" - SDG