transactionDisable() corrupts storage

102 views
Skip to first unread message

Markus Döring

unread,
Jul 12, 2015, 2:08:01 PM7/12/15
to ma...@googlegroups.com
Hi,
I am building a custom object serializer using Kryo under the hood. It all works nice, but when I run a small single threaded test that does writes with disabled transactions I always get a DataCorruption exception:

Exception in thread "main" org.mapdb.DBException$DataCorruption: Data were read beyond record size, check your serializer. Read size:521, expected size:35
 at org.mapdb.Store.deserialize(Store.java:380)
 at org.mapdb.StoreDirect.getFromOffset(StoreDirect.java:283)
 at org.mapdb.StoreDirect.get2(StoreDirect.java:270)
 at org.mapdb.Store.get(Store.java:224)
 at org.mapdb.HTreeMap.putInner(HTreeMap.java:897)
 at org.mapdb.HTreeMap.put(HTreeMap.java:853)

I am running Java 8 on a MacBook Pro with Flash Storage. 
Could that be related to the flash drive or am I doing something obviously wrong?
I have tried with both MapDB 1.0.8 and 2.0-beta2 and the result is the same for both.

The code can be found here:

Scott Carey

unread,
Jul 13, 2015, 12:56:36 PM7/13/15
to ma...@googlegroups.com
It is not anything to do with the flash drive.

Does Kryo 'read ahead' when it deserializes?  Are you properly flushing the serializer state before closing?   The error indicates that perhaps:

*  The serializer was not finished writing / flushing before the DB was closed, or it was not closed properly.
*  The deserialization is attempting to read more than necessary (sometimes this is a performance optimization for deserializers that expect to continue reading from a stream repeatedly)

Markus Döring

unread,
Aug 4, 2015, 5:11:05 AM8/4/15
to ma...@googlegroups.com
Late thanks, Scott.

I was not handling the kryo streams correctly but managed since then to incorporate a kryo based mapdb serializer in my main project.
It is buffering the kryo bytes, but works fine, is thread safe and does a very fast job:

https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/main/java/org/gbif/checklistbank/cli/common/MapDbObjectSerializer.java<https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/main/java/org/gbif/checklistbank/cli/common/MapDbObjectSerializer.java>

Cheers,
Markus


--
You received this message because you are subscribed to the Google Groups "MapDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Kotek

unread,
Aug 7, 2015, 9:30:02 AM8/7/15
to ma...@googlegroups.com

Hi Markus,

 

sorry for late response. There could be two problems:

 

1) Storage is not correctly closed and gets corrupted. Transaction disabled settings will also disable crash protection and there is 90% chance of corruption with unclean shutdown.

 

2) The kryo serializer uses 'stream' of data approach. Ie it

reads until data are available. MapDB expects serializer to read and write the same amount of data. It is performance optimalization.

 

I think 2) is most likely the cause. For now solution is to wrap your serializer in something which does data copy on deserialization and provides limits. There is no such thing yet, except Serializer.CompressionWrapper.

 

I expect to fix this behavior before stable 2.0 release. This performance

optimalization will be only available to 'trusted' serializers, which are bundled with MapDB.

 

Regards,

Jan Kotek

--

Dmitriy Shabanov

unread,
Aug 7, 2015, 9:47:15 AM8/7/15
to ma...@googlegroups.com
Hi Jan,

On Fri, Aug 7, 2015 at 4:29 PM, Jan Kotek <j...@kotek.net> wrote:
 

I expect to fix this behavior before stable 2.0 release. This performance

optimalization will be only available to 'trusted' serializers, which are bundled with MapDB.


Will it be possible to flag external serializer as 'safe' for performance optimization?

--
Dmitriy Shabanov

Scott Carey

unread,
Aug 10, 2015, 1:05:50 PM8/10/15
to MapDB

Yeah, I think this should be the case, or better than a flag:  two different types.  Users can then opt-in for the optimization by what type they extend, and the javadoc on the subtype of serializer that is less safe can clearly specify how its contract differs from the safer ones.

In short, these are two different types of serializers with two different contracts.

--
Dmitriy Shabanov
Reply all
Reply to author
Forward
0 new messages