Question about SBE and DirectBuffer

983 views
Skip to first unread message

Wayne

unread,
Sep 23, 2016, 10:10:38 PM9/23/16
to mechanica...@googlegroups.com
Hello,

I met some very strange behavior when using SBE java generated stubs. The application is driven by a fork-join thread pool, and I have a ThreadLocal ByteBuffer for the SBE encoder and decoder. It happens but very rarely that the header encoded is inconsistent while the data encoded is correct. To debug this I have logged all the encoded bytes to a file stream, and found the incorrectly encoded header are sometimes all 0 and sometimes appears to be for another SBE message. It looks as if the header are not written at all so what was left in the buffer last time was there. However, the data part seems good though. Btw I always clear the ByteBuffer before each use.

This problem happens when I use allocateDirect() for the ByteBuffer (as the ThreadLocal initialValue), and seems to disappear if I just use allocate(). Is the allocateDirect() not thread safe so that it actually allocates duplicate off-heap memories for my local buffers? Or could there be any low level memory model stuff which can make the Agrona UnsafeBuffer's putXXX method ineffective?

Thanks a lot.

Todd Montgomery

unread,
Sep 24, 2016, 11:36:27 AM9/24/16
to mechanical-sympathy
It would be better to open an issue on the Agrona repo where we can ask questions and answer there. I'm not quite sure off hand, but there are a number of possibilities that could cause this behavior.

In any case, this isn't really an issue for this mailing list.

-- Todd

On Fri, Sep 23, 2016 at 7:10 PM, Wayne <wei...@gmail.com> wrote:
Hello,

I met some very strange behavior when using SBE java generated stubs. The application is driven by a fork-join thread pool, and I have a ThreadLocal ByteBuffer for the SBE encoder and decoder. It happens but very rarely that the header encoded is inconsistent while the data encoded is correct. To debug this I have logged all the encoded bytes to a file stream, and found the incorrectly encoded header are sometimes all 0 and sometimes appears to be for another SBE message. It looks as if the header are not written at all so what was left in the buffer last time was there. However, the data part seems good though. Btw I always clear the ByteBuffer before each use.

This problem happens when I use allocateDirect() for the ByteBuffer (as the ThreadLocal initialValue), and seems to disappear if I just use allocate(). Is the allocateDirect() not thread safe so that it actually allocates duplicate off-heap memories for my local buffers? Or is there any other situations which can make the Agrona UnsafeBuffer's putXXX method ineffective?

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wayne

unread,
Sep 24, 2016, 1:42:49 PM9/24/16
to mechanical-sympathy
Thanks, Todd. I will open an issue there. So you don't think java ByteBuffer.allocateDirect() is the potential problem?

Simone Bordet

unread,
Sep 26, 2016, 4:39:00 AM9/26/16
to mechanica...@googlegroups.com
Hi,

On Sat, Sep 24, 2016 at 7:42 PM, Wayne <wei...@gmail.com> wrote:
> Thanks, Todd. I will open an issue there. So you don't think java
> ByteBuffer.allocateDirect() is the potential problem?

Has the issue been opened, as I'd like to follow the discussion ?

Coincidentally, we have seen similar buffer corruptions in Jetty, but
could not pinpoint yet the cause.
Would be interesting to figure out if it's a JVM issue.

What JVM version are you using ?

--
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless. Victoria Livschitz

Wayne

unread,
Sep 26, 2016, 8:36:43 PM9/26/16
to mechanical-sympathy
Hi Simone,

I made the ThreadLocal initialValue() method synchronized and the problem appears to go away so far. I am doing more test before opening the issue in Agrona. 

I am using java 8u101 64 bit on linux. How does your buffer problem look like? Can you replicate your problem? 

Simone Bordet

unread,
Sep 27, 2016, 10:46:47 AM9/27/16
to mechanica...@googlegroups.com
Hi,

On Tue, Sep 27, 2016 at 2:36 AM, Wayne <wei...@gmail.com> wrote:
> Hi Simone,
>
> I made the ThreadLocal initialValue() method synchronized and the problem
> appears to go away so far.

That's weird !

> I am doing more test before opening the issue in
> Agrona.

Ok, we are also trying to replicate.

Martin Thompson

unread,
Nov 15, 2016, 1:44:20 PM11/15/16
to mechanical-sympathy
We have had another similar issue raised on this in a single threaded example. It seems that writes to a direct buffer sometimes when read later are zero, as if they never happened. I'm going to look into creating an example of this that is not SBE or Agrona specific because it looks like a direct ByteBuffer issue.

Anyone else seen similar?

Vitaly Davidovich

unread,
Nov 15, 2016, 2:14:23 PM11/15/16
to mechanica...@googlegroups.com
Could be this: https://bugs.openjdk.java.net/browse/JDK-8087134.

Are the failures happening when C1 is enabled (i.e. Tiered comp is enabled)?

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Sent from my phone

Martin Thompson

unread,
Nov 15, 2016, 5:05:40 PM11/15/16
to mechanical-sympathy
Not sure if it is related. The case we are seeing is with Unsafe.putShort and it does not result in a SIGEV. 
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vitaly Davidovich

unread,
Nov 15, 2016, 5:16:18 PM11/15/16
to mechanica...@googlegroups.com
If you read that bug report, it sometimes results in a sigsegv and sometimes corruption whereby a 0 is read out rather than the stored value. The sigsegv is likely due to broken address calculation resulting in access to an invalid page, whereas a 0 is a bogus access to a valid page. Which one you hit depends on allocation pattern.

One of the linked bug reports is broken for an int, whereas the one I pasted is for longs. I don't know if shorts are affected the same way or not, but I see no reason why not.

Having said that, I don't know if it's related to your issue. But turning off tiered compilation and/or trying Xint to see if it reproduces is a worthwhile experiment.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Fayanne King

unread,
Mar 9, 2017, 7:21:08 AM3/9/17
to mechanical-sympathy
Hi,

We are getting the following exceptions when deserializing using SBE. We got the same issue on both 1.5.6 and 1.6.0 versions of SBE.

java.lang.IndexOutOfBoundsException: index=22, length=524370, capacity=1601
        at org.agrona.concurrent.UnsafeBuffer.boundsCheck0(UnsafeBuffer.java:1096) ~[aeron-all-1.0.4.jar:?]
        at org.agrona.concurrent.UnsafeBuffer.getBytes(UnsafeBuffer.java:823) ~[aeron-all-1.0.4.jar:?]

java.lang.IndexOutOfBoundsException: index=524396, capacity=1601
        at org.agrona.concurrent.UnsafeBuffer.boundsCheck(UnsafeBuffer.java:1087) ~[aeron-all-1.0.4.jar:?]
        at org.agrona.concurrent.UnsafeBuffer.getByte(UnsafeBuffer.java:778) ~[aeron-all-1.0.4.jar:?]

When we run unit tests on our local machine, everything is serializing and deserializing properly. However, when we send the serialized message over the network, the deserializing process is getting the exception above.

Messages are serialized and deserialized in a single thread and we are using a ThreadLocal.initialValue of new UnsafeBuffer(ByteBuffer.allocateDirect(4096*4)).

We are running on Java(TM) SE Runtime Environment (build 1.8.0_111-b14).

Thanks a lot for your help.


On Wednesday, November 16, 2016 at 6:16:18 AM UTC+8, Vitaly Davidovich wrote:
If you read that bug report, it sometimes results in a sigsegv and sometimes corruption whereby a 0 is read out rather than the stored value. The sigsegv is likely due to broken address calculation resulting in access to an invalid page, whereas a 0 is a bogus access to a valid page. Which one you hit depends on allocation pattern.

One of the linked bug reports is broken for an int, whereas the one I pasted is for longs. I don't know if shorts are affected the same way or not, but I see no reason why not.

Having said that, I don't know if it's related to your issue. But turning off tiered compilation and/or trying Xint to see if it reproduces is a worthwhile experiment.

On Tue, Nov 15, 2016 at 5:05 PM Martin Thompson <mjp...@gmail.com> wrote:
Not sure if it is related. The case we are seeing is with Unsafe.putShort and it does not result in a SIGEV. 


On Tuesday, 15 November 2016 19:14:23 UTC, Vitaly Davidovich wrote:
Could be this: https://bugs.openjdk.java.net/browse/JDK-8087134.

Are the failures happening when C1 is enabled (i.e. Tiered comp is enabled)?

On Tue, Nov 15, 2016 at 1:44 PM Martin Thompson <mjp...@gmail.com> wrote:
We have had another similar issue raised on this in a single threaded example. It seems that writes to a direct buffer sometimes when read later are zero, as if they never happened. I'm going to look into creating an example of this that is not SBE or Agrona specific because it looks like a direct ByteBuffer issue.

Anyone else seen similar?

On Saturday, 24 September 2016 03:10:38 UTC+1, Wayne wrote:
Hello,

I met some very strange behavior when using SBE java generated stubs. The application is driven by a fork-join thread pool, and I have a ThreadLocal ByteBuffer for the SBE encoder and decoder. It happens but very rarely that the header encoded is inconsistent while the data encoded is correct. To debug this I have logged all the encoded bytes to a file stream, and found the incorrectly encoded header are sometimes all 0 and sometimes appears to be for another SBE message. It looks as if the header are not written at all so what was left in the buffer last time was there. However, the data part seems good though. Btw I always clear the ByteBuffer before each use.

This problem happens when I use allocateDirect() for the ByteBuffer (as the ThreadLocal initialValue), and seems to disappear if I just use allocate(). Is the allocateDirect() not thread safe so that it actually allocates duplicate off-heap memories for my local buffers? Or could there be any low level memory model stuff which can make the Agrona UnsafeBuffer's putXXX method ineffective?

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Martin Thompson

unread,
Mar 9, 2017, 3:42:59 PM3/9/17
to mechanical-sympathy
If you can it would be best to raise an issue on the GitHub repo with a repeatable test. This looks like you have a logic bug given the exception. You should check the source of where you get the length parameter. This is not the best place to ask such questions.
Reply all
Reply to author
Forward
0 new messages