Do modern JVMs use aliasing information?

Steven Stewart-Gallus

unread,

Jan 19, 2019, 2:51:33 PM1/19/19

to mechanical-sympathy

It is well known that compilers can't optimise things as well if values may alias. Because src and dest may overlap:

public static void multiply(int[] accum, int accumOffset, int[] src, int srcOffset, int n){
    for (var ii = 0; ii < n; ++ii) {
        dest[ii] *= src[ii];
    }
}

does not have the same semantics as possibly better optimised versions.

This is behind the difference between the memcpy and memmove functions in C and why C99 has the restrict keyword and also why Fortran was historically faster for a long time.

Now Java doesn't have the restrict keyword but you can create separate byte arrays and then checking whether arrays can possibly overlap is just a pointer comparison and then if you have to you can dispatch to manually optimised versions for that case.

The thing is I am starting to wonder if aliasing is preventing one of my programs from being optimised as much as it should.

Do modern JVMs use aliasing information? Is it possible to give them more aliasing information?

Should I look into doing something like:

public static void multiply(int[] dest, int destStart, int[] src, int srcStart, int n) {
    if (src != dest) {
        multiply_nonaliasing(dest, destStart, src, srcStart, n);
    } else if (srcStart + n < destStart || destStart + n < srcStart) {
        multiply_nonoverlapping(dest, destStart, src, srcStart, n);
    } else {
        // maybe throw an error instead
        multiply_overlapping(dest, destStart, src, srcStart, n);
    }
}

How would one code multiply_nonalaising and nonoverlapping give the JVM such information?

I believe I have one of the worst case examples were I am emulating a big long flat heap memory.

private static byte[] bigHeap = new byte[8 * 4096];

It seems silly to me but I am starting to wonder if an approach like:

private static byte[][] bigHeapOfPages = new byte[8][4096];

might theoretically be possible to make faster because I can make sure things do not alias each other.

There seem to be some complicated scholarly papers online about alias disambiguation which mention this would be useful for solving performance difficulties on the JVM but I don't know what the current state of things are right now and what I can do personally right now.

Richard Startin

unread,

Jan 20, 2019, 1:00:30 PM1/20/19

to mechanical-sympathy

Vladimir Ivanov left some insights on how C2 handles this issue here http://richardstartin.uk/mmm-revisited/#comment-4915. In short, C2's autovectorizer gives up when it can't prove dest and source don't alias eachother, for instance, if there are offsets into the arrays, and the offsets differ.

Steven Stewart-Gallus

unread,

Jan 20, 2019, 1:48:38 PM1/20/19

to mechanical-sympathy

I see, so the same doesn't apply to read only workflow though right?

private static int dot(int[] heap, int n, int a, int b) {
    var sum = 0;


    for (var ii = 0; ii < n; ++ii) {


         sum += heap[a + ii] * heap[b + ii];
    }
    return sum;
}

And one last question.

Can VarHandle byte array usage vectorize yet?

It seems to me it might be fastest to do something like:

private static void mul(byte[] heap, int n, int accum, int src) {
        int[] newSrc = copyToTempIntBuf(heap, src, n);
        int[] newAccum = copyToTempIntBuf(heap, accum, n);
        // c2 needs same indices to vectorise


        for (var ii = 0; ii < n; ++ii) {


                newAccum[ii] *= newSrc[ii];
        }
        freeTempBuf(newSrc);
        copyBack(heap, newAccum, n);
        freeTempBuf(newAccum);
}

But that is extremely circumlocutious and I doubt you can efficiently copy between byte arrays and int arrays without Unsafe anyway (even using byte array varhandles.)

Steven Stewart-Gallus

unread,

Jan 21, 2019, 12:43:53 AM1/21/19

to mechanical-sympathy

I found a much more complete reference (also by Vladimir Ivanov it seems) but it may be out of date (it's from 2017.)

http://cr.openjdk.java.net/~vlivanov/talks/2017_Vectorization_in_HotSpot_JVM.pdf

Steven Stewart-Gallus

unread,

Jan 28, 2019, 7:50:30 PM1/28/19

to mechanica...@googlegroups.com

Ahah!

I believe I might have a possible hacky workaround!

Instead of doing:

static void mul(int[] accum, int accumOffset, int[] src, int srcOffset, int n) {
    for (int ii = 0; ii < n; ++ii) {
        accum[ii + accumOffset] *= src[ii + srcOffset];
    }
}

do

static void mul(int[] accum, int accumOffset, int[] src, int srcOffset, int n) {
    var diff = srcOffset - accumOffset;
    rotate(accum, diff);
    for (int ii = 0; ii < n; ++ii) {
        accum[ii + srcOffset] *= src[ii + srcOffset];
    }
    rotate(accum, -diff);
}

where rotate rotates the array in place and so avoids expensive allocations (and does the same amounts of copies as allocating temporaries would.)

I can see how rotating once can be done in place (and using System.arraycopy so it could be fast.)

static void shiftOnce(int[] array) {
    var len = array.length;
    var start = array[0];
    System.arraycopy(array, 0, array, 1, len - 1);
    array[0] = start;
}

But I can't really see a good way of rotating an array in place for higher counts (I doubt repeated iteration would be good.)

I'll still need to benchmark but this might be good.

Reply all

Reply to author

Forward