Aeron Memory Visibility

279 views
Skip to first unread message

Dominik Konik

unread,
Aug 19, 2016, 3:04:54 PM8/19/16
to mechanical-sympathy
Hi,

We believe we found some misbehavior with respect to the memory visibility of the buffer that Aeron returns while polling a subscription. We are also linking a barebones setup that is able to reproduce the misbehavior, can be found here.

The Behavior:
We observe that the bytes inside the buffer that is passed to the FragmentHandler while polling seem to (on occasion) change inside the handler. More specifically, at the beginning of the FragmentHandler we call buffer.getLong(offset), then we do some work, then at the end of the FragmentHandler we call buffer.getLong() again and the value that is returned is not the same as the first time getLong was called. For example, with the following code in the FragmentHandler:

final long receivedIndex = buffer.getLong(offset);

if (receivedIndex != previous + 1) {
System.out.println("MISSED MESSAGE AT INDEX: " + previous + "->" + receivedIndex);
System.out.printf("%x %x %x %x %x %x %x %x\n", buffer.getByte(offset), buffer.getByte(offset + 1),
buffer.getByte(offset + 2), buffer.getByte(offset + 3), buffer.getByte(offset + 4),
buffer.getByte(offset + 5), buffer.getByte(offset + 6), buffer.getByte(offset + 7),
buffer.getByte(offset + 8));
System.out.println("Offset(again): " + buffer.getLong(offset) + "\n");
}

++messageNum;
previous = receivedIndex;


We get output of:


MISSED MESSAGE AT INDEX: 188610->96091
c3 e0 2 0 0 0 0 0

Offset(again): 188611


MISSED MESSAGE AT INDEX: 96091->188612

c4 e0 2 0 0 0 0 0

Offset(again): 188612


MISSED MESSAGE AT INDEX: 371101->278582

9e a9 5 0 0 0 0 0

Offset(again): 371102


MISSED MESSAGE AT INDEX: 278582->371103

9f a9 5 0 0 0 0 0

Offset(again): 371103


As you can see, for the first one, you would expect the returned value to be 188610 + 1, but the first time getLong is called a different number is returned (96091). Afterwards, when getLong is called again, the correct value is returned, implying that Aeron is giving us a buffer before the memory contained by that buffer is fully visible in all threads.


Additionally, we have only observed this while sending 5-10 packets back to back as fast as possible (as opposed to say, a one packet at a time ping-pong style). 


System Configurations:

Both boxes are Haswells with a direct connection between them (no switch), and rhel 7 version 3.10.0-327.22.2.el7.x86_64


Martin Thompson

unread,
Aug 20, 2016, 5:58:54 AM8/20/16
to mechanical-sympathy
Thanks for reporting this.

I've raised an issue:


A more suitable place to have further discussion would be here:


The code for this in Aeron is quite simple and it should be fairly easy to identify what is going on.

Best,
Martin...

Martin Thompson

unread,
Aug 20, 2016, 6:27:03 AM8/20/16
to mechanical-sympathy
Also, could you please update the GitHub issue with JVM version & vendor plus Aeron version.


Thanks,
Martin...

On Friday, 19 August 2016 20:04:54 UTC+1, Dominik Konik wrote:

Avi Kivity

unread,
Aug 21, 2016, 3:33:59 AM8/21/16
to mechanica...@googlegroups.com

Please post issues with Aeron to the Aeron mailing list or bug tracker, not the mechanical-sympathy mailing list.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages