How to get rid of a document that can't be deserialized? System.IO.InvalidDataException: Properties offset not valid

75 views
Skip to first unread message

Andrej Krivulčík

unread,
Nov 13, 2020, 12:40:39 PM11/13/20
to RavenDB - 2nd generation document database
During replication from a 3.5 server to a 5 server, one of the documents somehow got corrupted in the target database. Any attempt to open it or to do anything with it results in the following:

System.IO.InvalidDataException: Properties offset not valid
   at Sparrow.Json.BlittableJsonReaderObject.ThrowInvalidPropertiesOffest() in C:\Builds\RavenDB-Stable-5.0\50017\src\Sparrow\Json\BlittableJsonReaderObject.cs:line 1332
   at Sparrow.Json.BlittableJsonReaderObject..ctor(Byte* mem, Int32 size, JsonOperationContext context) in C:\Builds\RavenDB-Stable-5.0\50017\src\Sparrow\Json\BlittableJsonReaderObject.cs:line 90
   at Raven.Server.Documents.DocumentsStorage.Get(DocumentsOperationContext context, Slice lowerId, DocumentFields fields, Boolean throwOnConflict, Boolean skipValidationInDebug) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\DocumentsStorage.cs:line 1043
   at Raven.Server.Documents.DocumentsStorage.Get(DocumentsOperationContext context, String id, DocumentFields fields, Boolean throwOnConflict) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\DocumentsStorage.cs:line 1039
   at Raven.Server.Documents.Handlers.DocumentHandler.GetDocumentsByIdAsync(DocumentsOperationContext context, StringValues ids, Boolean metadataOnly) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\Handlers\DocumentHandler.cs:line 230
   at Raven.Server.Documents.Handlers.DocumentHandler.Get() in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\Handlers\DocumentHandler.cs:line 122
   at Raven.Server.Routing.RequestRouter.HandlePath(RequestHandlerContext reqCtx) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Routing\RequestRouter.cs:line 196
   at Raven.Server.RavenServerStartup.RequestHandler(HttpContext context) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\RavenServerStartup.cs:line 243


It can't be opened, it can't be deleted, new document with its id can't be created (to replace it), it breaks replication within the cluster.

When trying to delete the whole collection:

var operation =
                        await session.Advanced.DocumentStore.Operations.SendAsync(
                            new DeleteByQueryOperation(new IndexQuery {Query = "from 'Cache`1'"}));
                    await operation.WaitForCompletionAsync(TimeSpan.FromMinutes(1));

I get a different error:

Specified argument was out of the range of valid values. (Parameter 'System.ArgumentOutOfRangeException: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks. (Parameter 'ticks')
   at System.DateTime..ctor(Int64 ticks)
   at Raven.Server.Documents.DocumentsStorage.ParseDocumentPartial(JsonOperationContext context, TableValueReader& tvr, DocumentFields fields) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\DocumentsStorage.cs:line 1393
   at Raven.Server.Documents.DocumentsStorage.GetDocumentsFrom(DocumentsOperationContext context, String collection, Int64 etag, Int64 start, Int64 take, DocumentFields fields)+MoveNext() in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\DocumentsStorage.cs:line 945
   at Raven.Server.Documents.CollectionRunner.ExecuteOperation(String collectionName, Int64 start, Int64 take, CollectionOperationOptions options, DocumentsOperationContext context, Action`1 onProgress, Func`2 action, OperationCancelToken token) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\CollectionRunner.cs:line 86
   at Raven.Server.Documents.Queries.QueryRunner.ExecuteDeleteQuery(IndexQueryServerSide query, QueryOperationOptions options, QueryOperationContext queryContext, Action`1 onProgress, OperationCancelToken token) in C:\Builds\RavenDB-Stable-5.0\50017\src\Raven.Server\Documents\Queries\QueryRunner.cs:line 265
The server at https://b.fs.ravendb.community responded with status code: InternalServerError.')

The document is a cached document and is not very important to us. Even the whole collection can be deleted if possible. It's just cache in this case.

Is there a way to delete the document directly from Voron storage, without the need to deserialize it?

I'll try to reproduce this behavior later but now we want to move forward with the migration as soon as possible so just getting rid of the document from the storage should be sufficient solution.

I already did several attempts to reproduce, but without success:
  • When I take the source document from the 3.5 database and create it manually in a 5 database, it works okay.
  • When I take the source document, create it manually in an empty 3.5 database and set up replication to a 5 database, it works okay.

Egor Shamanaev

unread,
Nov 15, 2020, 3:35:53 AM11/15/20
to rav...@googlegroups.com
Hi

Try to use query on @all_docs and filter by id: 
from @all_docs where id() = 'ID'

another option is to delete it on 3.5 and replicate the data again to RavenDB 5

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/01eb2a0d-866f-435a-97b9-217ff1bb80b9n%40googlegroups.com.


--
Egor
Developer   /   Hibernating Rhinos LTD
Support:  sup...@ravendb.net
  

Grisha Kotler

unread,
Nov 15, 2020, 4:46:05 AM11/15/20
to rav...@googlegroups.com
Can you send us that document?

Grisha Kotler
Team Leader   /   Hibernating Rhinos LTD
Skype:  grisha.kotler
Support:  sup...@ravendb.net
  


Andrej Krivulčík

unread,
Nov 15, 2020, 5:22:20 AM11/15/20
to RavenDB - 2nd generation document database
We can send you the whole database for inspection.

The document seems to be okay but the issue seems to be related to compression being enabled. I'll send a detailed report about the sequence of events which led to this situation (probably tomorrow).

Andrej Krivulčík

unread,
Nov 18, 2020, 7:07:49 AM11/18/20
to RavenDB - 2nd generation document database
For investigation, here's what happened:

  • The data was stored in a 3.5.10-patch-35295, we migrated to 5.0.4.
  • Replication was setup from 3.5.10 to 5.0.4 (node A).
  • During the initial replication, collection compression was enabled on several collections.
  • Compaction of the database on the target cluster was started, but after 15-20 minutes was aborted - server was heavily swapping (56 MB/s written to pagefile etc.), compaction progress was none.
  • After that, the database servers were restarted several times to tweak indexing limits - otherwise the servers filled the 64 GB of RAM and started swapping, heavily.
  • I *think* that this is when I set the amount of swap to 16 MB, to prevent swapping.
  • With these settings, the replication and indexes proceeded fine.
  • After some time, I noticed that the replication from 3.5.10 to node A was not proceeding. Also, replication from node A to nodes B and C didn't work. Replication from B and C worked to other servers.
  • After investigating what was the last successfully replicated document, we identified the document that caused the issues. It was a "cache" document (our own implementation) which got created on the 3.5.10 server during normal web operation.
  • Any operation on that document failed with the error in the original post - it's not possible to open it, delete it etc. Delete by query operation fails too.
  • During that, the replication from 3.5.10 was reconfigured to go to node B.
  • Meanwhile, the cached document on 3.5.10 server was recreated with a higher etag, so it didn't get replicated from 3.5.10 to node B.
  • When the replication reached the same problematic document, replication from 3.5.10 stopped working, and also replication from B to other nodes stopped working.
  • Replication of the same document (created manually, including metadata etc.) from 3.5 (different instance) to 5.0 worked correctly.
  • Replication of the same document within a 5.0 cluster worked correctly.
It seems that the issue is related to the compression.

I can privately send the following:
  • The document which caused the issue.
  • The whole database from node A from after the replication got stuck.
  • Logs from voron.recovery tool which contain information about possible corruptions
    • Example:
      2020-11-14T08:51:14.7761266, 1, Operations, Voron Recovery, Voron.Recovery.Recovery, Found invalid blittable document [TableId:-1] at pos=8069726272 with key=null
      System.ArgumentException: Data size is invalid, possible corruption when parsing BlittableJsonReaderObject (Parameter 'size')
         at Raven.Server.Documents.DocumentsStorage.ParseRawDataSectionDocumentWithValidation(JsonOperationContext context, TableValueReader& tvr, Int32 expectedSize)
         at Voron.Recovery.Recovery.WriteDocument(Byte* mem, Int32 sizeInBytes, BlittableJsonTextWriter writer, JsonOperationContext context, Int64 startOffset)


Egor Shamanaev

unread,
Feb 23, 2021, 8:23:25 AM2/23/21
to rav...@googlegroups.com

Andrej Krivulčík

unread,
Feb 23, 2021, 11:05:40 AM2/23/21
to RavenDB - 2nd generation document database
Thanks for the information, Egor.

We don't have the database in compressed state anymore so we can't easily determine whether the issue was fixed. We currently don't use compression, to avoid data corruption. If we try it out in the future, we'll keep in mind that this should be resolved.
Reply all
Reply to author
Forward
0 new messages