Deleting local data

45 views
Skip to first unread message

Nithin Haridas

unread,
Oct 31, 2014, 6:42:12 AM10/31/14
to mobile-c...@googlegroups.com
Hi,

We have been using couchbase lite in IOS for analytics, where the device pushes data to the server (one way only).

There is a lot of data that gets logged that is not going to be useful once it is replicated on the server. 
So once in a while, we wan't to get rid of the local data.

Some approaches that are being tried;

  1.  Delete the local database and re-create it and all connected replications.
     
    Concerns: Will the switch be clean ? 
    • a) If recreating the replication objects after re-creation of db would  be sufficient. Or are there more cached data to clean up ?
     
    • b) How to handle incoming updates ? Should I keep a temporary db to write to and then copy it back when the db is being set up ?

    • c) Will the operation be time consuming ? Can/Should it be moved to a different thread ?

    1. Purge the documents, once replication is stopped.
    Concerns: Time consumption.

      • a)  Since, it blocks the database, it can be moved to a different thread, while app runs on.
      • b) The incoming data need to be managed with either session memory or a temporary db while purge operation happens 
      • c) Also, is the compact operation recommended  here after purge ?
      • d) How to check if the purge is going on ?
    What is the recommended course of action in this case ?
     Is it easier to delete the database or purge it ? The key point is that, if the database is replicated on server, we don't care about it being on the device anymore.




    Jens Alfke

    unread,
    Oct 31, 2014, 3:13:21 PM10/31/14
    to mobile-c...@googlegroups.com
    On Oct 31, 2014, at 3:42 AM, Nithin Haridas <nithinh...@gmail.com> wrote:

    We have been using couchbase lite in IOS for analytics, where the device pushes data to the server (one way only).
    There is a lot of data that gets logged that is not going to be useful once it is replicated on the server. 

    This is actually a use case that I tell people Couchbase Lite isn't very well suited for. The replication protocol is designed to handle bidirectional replication, between many peers, of mutable documents that are frequently modified. Data collection is a much simpler application that doesn't need any of those features, so replication adds a fair amount of overhead.

    On the other hand, I don't know of any off-the-shelf engine that does only what you need (the store-and-forward part is still nontrivial) so CBL may still be worth it for you.

     Delete the local database and re-create it and all connected replications.

    That's a fine way to do it, assuming the replication really is push-only; otherwise the new database will have to pull data from the server again. One workaround is to use two databases, with one of them containing only the ephemeral docs you want to send to the server. Then you only need to delete and recreate that one.

    • Purge the documents, once replication is stopped.

    As you point out, this can take a while. Even if you run it on a background thread, it blocks access to the database file while it's going on, so you can't do anything else with that database on any thread until it finishes. (This is a limitation of SQLite, which only allows a single writer.)

    The more frequently you purge, though, the less stuff there is to get rid of, so the less time it takes.

    Overall it sounds like the two-databases approach, where you frequently delete one, would be the best.

    —Jens

    Jens Alfke

    unread,
    Oct 31, 2014, 3:23:58 PM10/31/14
    to mobile-c...@googlegroups.com
    On Oct 31, 2014, at 3:42 AM, Nithin Haridas <nithinh...@gmail.com> wrote:

    We have been using couchbase lite in IOS for analytics, where the device pushes data to the server (one way only).
    There is a lot of data that gets logged that is not going to be useful once it is replicated on the server. 
    This is actually a use case that I tell people Couchbase Lite isn't very well suited for. The replication protocol is designed to handle bidirectional replication, between many peers, of mutable documents that are frequently modified. Data collection is a much simpler application that doesn't need any of those features, so replication adds a fair amount of overhead.

    On the other hand, I don't know of any off-the-shelf engine that does only what you need (the store-and-forward part is still nontrivial) so CBL may still be worth it for you.
     Delete the local database and re-create it and all connected replications.

    Nithin Haridas

    unread,
    Nov 1, 2014, 12:05:32 PM11/1/14
    to mobile-c...@googlegroups.com

    Thanks Jens for your inputs.
    As you said, data collection is indeed a non-intensive use case that doesn't really need couchbase.
    But it's off the shelf functionalities help.
    Also, we want to test the waters before moving more critical data to couchbase where bi directional replication will be in use.

    Our database is completely push only for analytics though.
    So delete and recreate does looks like a viable option.
    This is what I intend to do.
    1) check for file size and delete database after a certain threshold size .
    2) recreate the database and replication objects.
    3) Add all incoming data to a temporary db while the above process is going on.
    4) Copy the data from temporary db to the recreated db.

    Does this sound good?
    Is the delete process fast enough that there is little risk from interruptions like app backgrounding?
    Really appreciate the help.

    Jens Alfke

    unread,
    Nov 3, 2014, 1:25:15 PM11/3/14
    to mobile-c...@googlegroups.com

    > On Nov 1, 2014, at 9:05 AM, Nithin Haridas <nit...@odinix.com> wrote:
    >
    > 3) Add all incoming data to a temporary db while the above process is going on.
    > 4) Copy the data from temporary db to the recreated db.

    I think that's going to cause problems. Copying documents from one database to another without using the replicator won't preserve the replication history (and might even alter the revision ID), and that has the potential to confuse the replicator. In the worst case those copied documents could get replicated back up to the server as false conflicts, then pulled by other clients, causing a big mess.

    It would be best to keep the incoming docs in a separate database locally. They can live in the same database on the server; just point both local databases at the same server database, but in the pull replication add a local filter that only brings in the docs that you want (i.e. not the docs that your other db is pushing.)

    —Jens
    Reply all
    Reply to author
    Forward
    0 new messages