> On 10 Oct 2016, at 14:59, Bogdan Andu <
bog...@gmail.com> wrote:
>
> yes, I know , but couchdb storage engine cannot optimize
> this while operating normaly. only after compaction is finished, the
> database
> is optimized.
>
> I presume that the entire btree is traversed to detect revisions and unused
> btree nodes.
>
> I have no revisions on documents.
>
> My case clear leans toward the unused nodes.
>
> Couldn't be those nodes detected in a timely manner,
> while inserting (appending to the end of file) documents , and be deleted
> automatically?
we could do that, but then we’d open ourselves up for database corruption
during power-, hardware- or software-failures. There are sophisticated
techniques to safeguard against that, but they come with their own set
of trade-offs, one of which is code complexity. Other databases have
millions of lines of code in just this area and CouchDB is <100kLoC total.
> But I assume that the btree must be traversed every time an insert is done
> (or may be traversed from a few nodes above the last 100 or 1000 new
> documents).
Yes, for individual docs, it is each time, for bulk doc requests with
somewhat sequential doc ids, it is about per bulk size.
> Now the problem consist in why and how those node become unusable?
>
> What are the conditions necessary that db produces dead nodes?
As soon as a document (or set of docs in a bulk docs request) is written,
we stop referencing existing btree nodes up the tree in the particular
branch.
> If you could manage to avoid this I think you have a self-compacting
> database.
>
> Just my 2 cents.
Again, this is a significant engineering effort. E.g. InnoDB does what
you propose and it took 100s of millions of dollars and 10 years to get
up to speed and reliability. CouchDB does not have these kinds of resources.
>
> just a side question.. wouldn't be nice to have multiple storage engines
> that follow the same
> replication protocol, of course
We are working on this already :)