SIGSEGV calling RocksIterator.next() in JavaAPI

301 views
Skip to first unread message

Bradley Yoo

unread,
Apr 29, 2021, 1:47:01 PM4/29/21
to rocksdb
While using the RocksIterator, we occasionally get a SIGSEGV with the following stack


Stack: [0x00007fd4d21a2000,0x00007fd4d22a3000],  sp=0x00007fd4d22a0c58,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x16ac70]  __memcmp_sse4_1+0xcc0
C  [librocksdbjni788768896225669227.so+0x518c7f]  rocksdb::BlockBasedTableIterator::CheckDataBlockWithinUpperBound()+0x8f
C  [librocksdbjni788768896225669227.so+0x518ef4]  rocksdb::BlockBasedTableIterator::InitDataBlock()+0x254
C  [librocksdbjni788768896225669227.so+0x519988]  rocksdb::BlockBasedTableIterator::FindBlockForward()+0x338
C  [librocksdbjni788768896225669227.so+0x519db2]  rocksdb::BlockBasedTableIterator::Next()+0xf2
C  [librocksdbjni788768896225669227.so+0x519e36]  rocksdb::BlockBasedTableIterator::NextAndGetResult(rocksdb::IterateResult*)+0x16
C  [librocksdbjni788768896225669227.so+0x403351]
C  [librocksdbjni788768896225669227.so+0x56ee43]  rocksdb::MergingIterator::NextAndGetResult(rocksdb::IterateResult*)+0x53
C  [librocksdbjni788768896225669227.so+0x368ece]  rocksdb::DBIter::Next()+0x36e
J 15196  org.rocksdb.RocksIterator.next0(J)V (0 bytes) @ 0x00007ffaa2ecdf70 [0x00007ffaa2ecdec0+0xb0]
J 15298 C2 alluxio.master.block.DefaultBlockMaster.generateBlockInfo(J)Ljava/util/Optional; (377 bytes) @ 0x00007ffaa2fcf61c [0x00007ffaa2fcdc20+0x19fc]
J 14396 C2 alluxio.master.block.DefaultBlockMaster.getBlockInfoList(Ljava/util/List;)Ljava/util/List; (64 bytes) @ 0x00007ffaa2bc1800 [0x00007ffaa2bc1580+0x280]
J 16401 C2 alluxio.master.file.DefaultFileSystemMaster.getFileInfoInternal(Lalluxio/master/file/meta/LockedInodePath;Lcom/codahale/metrics/Counter;)Lalluxio/wire/FileInfo; (391 bytes) @ 0x00007ffaa2330008 [0x00007ffaa232f840+0x7c8]
J 15564 C2 alluxio.master.file.DefaultFileSystemMaster.getFileInfo(Lalluxio/AlluxioURI;Lalluxio/master/file/contexts/GetStatusContext;)Lalluxio/wire/FileInfo; (748 bytes) @ 0x00007ffaa30c9e1c [0x00007ffaa30c8b40+0x12dc]
J 17710 C2 alluxio.master.file.FileSystemMasterClientServiceHandler$$Lambda$354.call()Ljava/lang/Object; (20 bytes) @ 0x00007ffaa2e389f4 [0x00007ffaa2e38460+0x594]
J 17093 C2 alluxio.RpcUtils.callAndReturn(Lorg/slf4j/Logger;Lalluxio/RpcUtils$RpcCallableThrowsIOException;Ljava/lang/String;ZLjava/lang/String;[Ljava/lang/Object;)Ljava/lang/Object; (356 bytes) @ 0x00007ffaa1ee63f0 [0x00007ffaa1ee6200+0x1f0]


The code that the stack refers to is: 


but this doesn't contain any references to RocksIterator so I'm guessing it is actually


which is what gets called with mBlockStore.getLocations

There is nothing odd about these methods (perhaps, I can't call it with RocksUtils.toByteArray(id, Long.MAX_VALUE)? )


Here are the codes for the relevant method pasted in (sorry for the lack of proper formatting)

public List<BlockLocation> getLocations(long id) {
byte[] startKey = RocksUtils.toByteArray(id, 0);
byte[] endKey = RocksUtils.toByteArray(id, Long.MAX_VALUE);

try (RocksIterator iter = db().newIterator(mBlockLocationsColumn.get(),
new ReadOptions().setIterateUpperBound(new Slice(endKey)))) {
iter.seek(startKey);
List<BlockLocation> locations = new ArrayList<>();
for (; iter.isValid(); iter.next()) {
try {
locations.add(BlockLocation.parseFrom(iter.value()));
} catch (Exception e) {
throw new RuntimeException(e);
}
}
return locations;
}
}



Thanks for the help!

Adam Retter

unread,
Apr 29, 2021, 5:27:12 PM4/29/21
to Bradley Yoo, rocksdb
Which version of RocksDB?
> --
> You received this message because you are subscribed to the Google Groups "rocksdb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rocksdb+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/b9a5bb2a-8e33-4a4f-b851-d9865342ebfen%40googlegroups.com.



--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Bradley Yoo

unread,
Apr 30, 2021, 12:57:33 PM4/30/21
to rocksdb
It's 6.15.2

Adam Retter

unread,
Apr 30, 2021, 1:02:26 PM4/30/21
to Bradley Yoo, rocksdb
Is there any opportunity to try the latest version first please?
> To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/7c44ad7e-04c3-4292-aa63-eee75afd9879n%40googlegroups.com.

Bradley Yoo

unread,
Apr 30, 2021, 1:09:04 PM4/30/21
to Adam Retter, rocksdb
That's gonna be difficult because we don't have a reliable reproduction of this issue and the issue was discovered at a deployment. 

We previously had a similar crash before but that was with a version that was more than 36 months old so we performed an upgrade to what was the latest version at the time.

Siying Dong

unread,
May 4, 2021, 5:42:12 PM5/4/21
to Bradley Yoo, rocksdb

I suspect that after this line:

https://github.com/Alluxio/alluxio/blob/55cc3b70278672ca0768634c9d1439350faa7cf5/core/server/master/src/main/java/alluxio/master/metastore/rocks/RocksBlockStore.java#L155

JVM thinks the object created by “new ReadOptions().setIterateUpperBound(new Slice(endKey))” can be destroyed. However, RocksIterator doesn’t hold reference to the ReadOptions object but only point to native resource owned by the ReadOptions. During iterator operations are executed, the object could be destroyed, and we could see segfault.

 

Can you try to hold reference to the ReadOptions for the whole duration of the function and see whether the problem would go away?

--

Bradley Yoo

unread,
May 5, 2021, 12:40:30 PM5/5/21
to Siying Dong, rocksdb
Thank you for the response!

That is very surprising that such behavior is possible. I'll make the update and see if the problem goes away. 
(Unfortunately, it has taken month+ in the past to confirm this problem still persisted after the RocksDB upgrade so we won't have an update on this for a while)

Siying Dong

unread,
May 5, 2021, 1:34:54 PM5/5/21
to Bradley Yoo, Adam Retter, rocksdb

Adam, no matter whether this is the issue, given the complexity of reasoning this, do you think it makes sense for Java’s RocksIterator to keep a reference to ReadOptions to simplify things?

Message has been deleted

梅洋

unread,
Nov 28, 2021, 6:24:59 AM11/28/21
to rocksdb
I ran into the same problem with alluxio 2.6.2 which problem has been fixed. Have you encountered this problem since then
Reply all
Reply to author
Forward
0 new messages