Tips on how to purge a document without compacting

38 views
Skip to first unread message

Alexander Selling

unread,
Oct 15, 2015, 2:48:36 PM10/15/15
to Couchbase Mobile
Hi!

Since CBL fails to process _removed from SG sometimes I've tried my best to find a way to get rid of corrupted documents. Just one corrupted document can kill the replicator, which means it'll completely disable that user from using the app(His data has to sync to generate the reports he needs). Purging is out of the question since it requires a compact, and only after a day or two of usage my users have more data than what is reasonable to run a compact on.

If I had access to the revs table it would be super easy, grab the json body in the revs table and remove the underscores. However some db-function(?) is missing so I can't query the revs table, which only leaves me with the docs table.  I've tried to just rename the docid since it points to the offending row in the revs table. Thinking that it would cut off the corrupted rev and leave it hanging in empty space, however CBL seems to be going through the revs table looking for underscores without doing a lookup in the docs table so that didn't help at all.

Do you think it would be faster to compact the database or to create a new database and write all the latest revs into the new database?

Jens Alfke

unread,
Oct 15, 2015, 4:26:10 PM10/15/15
to mobile-c...@googlegroups.com
On Oct 15, 2015, at 11:48 AM, Alexander Selling <alexande...@gmail.com> wrote:

Since CBL fails to process _removed from SG sometimes I've tried my best to find a way to get rid of corrupted documents.

I think you’re referring to issue #776? That was fixed in 1.1.1.

Purging is out of the question since it requires a compact,

No, it doesn’t. You can purge without running a compact.

and only after a day or two of usage my users have more data than what is reasonable to run a compact on.

If you don’t compact, the database storage will grow continuously, which is bad. Can you describe the problem you’re having with compaction?

If I had access to the revs table it would be super easy, grab the json body in the revs table and remove the underscores. However some db-function(?) is missing so I can't query the revs table,

CBL defines some custom SQL functions like the JSON collator. If you open the database in the sqlite3 command-line tool, there are some queries that won’t run there because of course that function isn’t defined.

I've tried to just rename the docid since it points to the offending row in the revs table. Thinking that it would cut off the corrupted rev and leave it hanging in empty space, however CBL seems to be going through the revs table looking for underscores without doing a lookup in the docs table so that didn't help at all.

Please don’t grope into the SQLite database, since there’s no limit to what could go wrong. Obviously we can’t stop you, but we’re not going to offer help. If you want to make significant changes or additions to what CBL can do, it would be better to add them as new features in CBL itself.

Do you think it would be faster to compact the database or to create a new database and write all the latest revs into the new database?

I don’t know; you could try it easily enough. The latter is equivalent to doing a push replication into a new empty local database.

—Jens

Alexander Selling

unread,
Oct 16, 2015, 3:54:54 AM10/16/15
to Couchbase Mobile


I think you’re referring to issue #776? That was fixed in 1.1.1.

I'm not sure, if you look into the json body in the revs table lots of rows have _removed but the CBLLogger doesn't complain, it's only some documents where it fails to process them. I'll give it a try though! I sent you an email a while ago with some databases that will re-produce that as soon as you start a replicator connected to those databases. It's the Push replicator that fails, and it gives up when it detects illegal keys such as _removed (which isn't illegal I know but cbl thinks it is, sometimes)
 
No, it doesn’t. You can purge without running a compact.

That's pretty cool news! I'll try that right away, I've never gotten that to work before. I usually keep the revs table up and look at the json bodies and they only disappear after a purge, but I'm probably just doing something wrong.  


If you don’t compact, the database storage will grow continuously, which is bad. Can you describe the problem you’re having with compaction?

I don't really have problems with purging besides it being buggy when being initiated from a background thread via the "tellDatabaseNamed", it has to be run from the main thread to be stable. And it just takes too long to complete when the database gets large, I do run it at every login but at a certain point it just takes too long. 



Please don’t grope into the SQLite database, since there’s no limit to what could go wrong. Obviously we can’t stop you, but we’re not going to offer help. If you want to make significant changes or additions to what CBL can do, it would be better to add them as new features in CBL itself.

I won't need to once the CBLReplicator doesn't die from _removed being flagged as an illegal key, I would prefer to stay as far away as possible from the underlaying storage mechanisms. 

Don't even get me started on what CBL does when you insert an illegal documentID :P If you want to pull your hair out, insert a documentID with (null) i.e. project_(null)_123456789 and then have SG assign that documentID as a channel, which breaks SGs naming convention for channels. 

Now there is no way you can fix this, CBL doesn't let you forget a document, purge only removes the json bodies not the documentID in the docs table. So your user is now completely disabled from syncing because it keeps getting HTTP 500 back from SG, and errors/warnings always kills the replicator. On the top of my wish list is to have a setting that makes the replicator just skip that specific document but still replicates all the other changes so that users don't lose all their data just because of one offending document.

The only way to fix this is to rename the docid in the doc table, which of course is super easy since I have write access to it, as opposed to the revs table. 

It's obviously my fault that I'm inserting project_(null)_123456789 instead of project_projectName_123456789 but when things go wrong it shouldn't be impossible to fix it (which it wasn't in this case, the _removed key getting flagged as illegal however was)


I don’t know; you could try it easily enough. The latter is equivalent to doing a push replication into a new empty local database.

True that! But if I can get purge to work without a compact this will be a non-issue! 


Jens Alfke

unread,
Oct 16, 2015, 1:00:56 PM10/16/15
to mobile-c...@googlegroups.com
On Oct 16, 2015, at 12:54 AM, Alexander Selling <alexande...@gmail.com> wrote:

It's the Push replicator that fails, and it gives up when it detects illegal keys such as _removed (which isn't illegal I know but cbl thinks it is, sometimes)

Yes, that’s issue #776 that we fixed a few months ago and released in 1.1.1. (Oops, actually #671; #776 was the pull request for the patch.)

I don't really have problems with purging besides it being buggy when being initiated from a background thread via the "tellDatabaseNamed", it has to be run from the main thread to be stable.

Have you filed a bug report?

And it just takes too long to complete when the database gets large, I do run it at every login but at a certain point it just takes too long. 

In 1.2 we’re enabling background auto-compact in ForestDB, so you won’t need to worry about compaction. (ForestDB in 1.1.x needs manual compaction, but I think it’s still faster than with SQLite.)

Don't even get me started on what CBL does when you insert an illegal documentID :P If you want to pull your hair out, insert a documentID with (null) i.e. project_(null)_123456789 and then have SG assign that documentID as a channel, which breaks SGs naming convention for channels. 

That’s not an illegal document ID. It’s an illegal channel name, but those aren’t the same thing; using a docID as a channel name unmodified must be something your sync function did.

Now there is no way you can fix this, CBL doesn't let you forget a document, purge only removes the json bodies not the documentID in the docs table.

Whether there’s an entry in the docs table is irrelevant. The docs table is only a space-saving device that lets revisions refer to docIDs using a foreign key instead of a long string.

If there is a bug where CBL keeps trying to push a purged document, please report it; I haven’t seen any issue like that come up.

On the top of my wish list is to have a setting that makes the replicator just skip that specific document but still replicates all the other changes so that users don't lose all their data just because of one offending document.

Alex, we can’t see your wish list. The real wish list for Couchbase Lite is the issue tracker on Github. And I think what you’re really asking for is fixes for issues that put a doc into a state that can break replication; again, please file bug reports if you want those fixed. (IIRC we fixed something like that in the past month or two, meaning the fix is in master and will be in 1.2.)

—Jens
Reply all
Reply to author
Forward
0 new messages