how to properly design documents, should make smaller docs so that I don't have hundreds of rev?

98 views
Skip to first unread message

Matteo Grolla

unread,
Jul 21, 2016, 8:22:18 AM7/21/16
to PouchDB
I have to decide between two design

1) simpler, multiple entities are contained within the same pouchdb document
2) each entity has it's own pouchdb document

design 1) may lead to hundreds of revisions, even pass the 1000 revs in few years, design 2 doesn't have this problem

Should I go straight for 2) or is 1) still a good candidate?

What are the cons of having many revisions?
I've had troubles when I deleted many hundreds of revisions for a doc, browser was becoming non responding
In the hp that I have hundreds of revisions not in conflicts, if I compact the db and then delete the last rev I shouldn't have problems right? Or is it still inefficent?
If I have hundreds of revisions in conflicts I have to delete them one by one and I had problem in this case (which shouldn't happen)


Johannes Jörg Schmidt

unread,
Jul 21, 2016, 1:10:18 PM7/21/16
to pou...@googlegroups.com
I prefer to design small, mostly immutable documents and use views to efficiently aggregate data from multiple documents.

The benefits of using many small documents are mostly
- less conflicts
- improved replication performance (a similar optimisation for high revisions docs require upcoming CouchDB 2.0)

More about document modelling in CouchDB: http://ehealthafrica.github.io/couchdb-best-practices/#document-modeling

Johannes


--
You received this message because you are subscribed to the Google Groups "PouchDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pouchdb+u...@googlegroups.com.
To post to this group, send email to pou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pouchdb/0d1e92a4-0b69-41ee-8c08-25ae813d6503%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matteo Grolla

unread,
Jul 23, 2016, 4:46:49 PM7/23/16
to PouchDB
Thanks for the advices and for the best practices link.
I like the idea of small immutable docs and I see that docs designed to have few revisions have less chance of conflicts and perform better
(I made a benchmark in couchdb where I create 1000 conflicts on the same doc, after the first few 100s conflicts performance degrades a lot, and I think the situation for pouch is worst given the limitation of browser environment)
The main drawback is that sometimes I find it difficult to come up with an id, for example if I have to save a list of child entities B of parent entity A
with coarse grained docs:
   I would model Bs as elements of an array property of A which is modeled as a pouch doc
with fine grained docs
  every B is a doc and I'd need to find a suitable id, it's prefix would be A's id but how should I model the suffix? (an uuid?)
In general using a business key as doc id is difficult, since values may change, so I may need to use uuid and relying on secondary indexes
having many small docs and relying on secondary indexes means more work on index rebuild
having many small docs means more http requests during replication

should one reason case by case on the correct tradeoff between small immutable doc and large doc with many edits
or is the way of small immutable docs a generally good path
I'd really appreciate some more advices, examples best practices

thank's

Matteo Grolla

unread,
Jul 24, 2016, 8:22:28 AM7/24/16
to PouchDB
Do you use the approach of relying on views to aggregate the information, on couch or on pouch?
If I understood well couch is much more efficent at recomputing aggregates since it keeps partial results on a b-tree, while pouch always recomputes the reduce value from map outputs


Il giorno giovedì 21 luglio 2016 19:10:18 UTC+2, Johannes Jörg Schmidt ha scritto:

Johannes Jörg Schmidt

unread,
Jul 28, 2016, 5:15:43 AM7/28/16
to pou...@googlegroups.com
Its totally OK to use random uuids for your B-documents (as part of the
subresource id, eg `parent-type:<parent id>:subresouce-type:<uuid>`).
I don't fully understand what you mean with "relying on secondary
indexes"? Do you need to achieve some kind of sorting which changes over
time?

For replication on PouchDB side you can specify a batch size [1] to
control how many docs are batched into a single request. For small docs
a value of 1000 or more is often a good fit (default is 100).

In my experience small docs are generally a good path. Maybe you can
tell us a bit more about the concrete data you're modelling.

Johannes

[1]: https://pouchdb.com/api.html#options-5
> <http://www.google.com/url?q=http%3A%2F%2Fehealthafrica.github.io%2Fcouchdb-best-practices%2F%23document-modeling&sa=D&sntz=1&usg=AFQjCNED67lZqgPVQHquF73iSzDgtpb0BA>
>
> Johannes
>
>
> 2016-07-21 14:22 GMT+02:00 Matteo Grolla <matteo...@gmail.com
> <javascript:>>:
>
> I have to decide between two design
>
> 1) simpler, multiple entities are contained within the same
> pouchdb document
> 2) each entity has it's own pouchdb document
>
> design 1) may lead to hundreds of revisions, even pass the 1000
> revs in few years, design 2 doesn't have this problem
>
> Should I go straight for 2) or is 1) still a good candidate?
>
> What are the cons of having many revisions?
> I've had troubles when I deleted many hundreds of revisions for
> a doc, browser was becoming non responding
> In the hp that I have hundreds of revisions not in conflicts, if
> I compact the db and then delete the last rev I shouldn't have
> problems right? Or is it still inefficent?
> If I have hundreds of revisions in conflicts I have to delete
> them one by one and I had problem in this case (which shouldn't
> happen)
>
>
> --
> You received this message because you are subscribed to the
> Google Groups "PouchDB" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to pouchdb+u...@googlegroups.com <javascript:>.
> To post to this group, send email to pou...@googlegroups.com
> <javascript:>.
> <https://groups.google.com/d/msgid/pouchdb/0d1e92a4-0b69-41ee-8c08-25ae813d6503%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "PouchDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pouchdb+u...@googlegroups.com
> <mailto:pouchdb+u...@googlegroups.com>.
> To post to this group, send email to pou...@googlegroups.com
> <mailto:pou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pouchdb/32f81093-be22-4f3e-aca3-7a9d2ebe2485%40googlegroups.com
> <https://groups.google.com/d/msgid/pouchdb/32f81093-be22-4f3e-aca3-7a9d2ebe2485%40googlegroups.com?utm_medium=email&utm_source=footer>.
signature.asc

Johannes Jörg Schmidt

unread,
Jul 28, 2016, 5:21:03 AM7/28/16
to pou...@googlegroups.com
It depends. Mostly I'm interested in the full data of every document so
I'm simply using view collation to get a list of all belonging documents
and transform them into the structure I need later.

Don't pretend to get the data from CouchDB/PouchDB in exactly the
structure you need. CouchDB is good at filtering the documents you need
(range queries) and retrieve them in the correct order. Structure
transforms can be easily done client side, often even in a memory
efficient streaming way.

Johannes

On 24.07.2016 14:22, Matteo Grolla wrote:
> Do you use the approach of relying on views to aggregate the
> information, on couch or on pouch?
> If I understood well couch is much more efficent at recomputing
> aggregates since it keeps partial results on a b-tree, while pouch
> always recomputes the reduce value from map outputs
>
> Il giorno giovedì 21 luglio 2016 19:10:18 UTC+2, Johannes Jörg Schmidt
> ha scritto:
>
> I prefer to design small, mostly immutable documents and use views
> to efficiently aggregate data from multiple documents.
>
> The benefits of using many small documents are mostly
> - less conflicts
> - improved replication performance (a similar optimisation for high
> revisions docs require upcoming CouchDB 2.0)
>
> More about document modelling in CouchDB:
> http://ehealthafrica.github.io/couchdb-best-practices/#document-modeling
> <http://ehealthafrica.github.io/couchdb-best-practices/#document-modeling>
>
> Johannes
>
>
> 2016-07-21 14:22 GMT+02:00 Matteo Grolla <matteo...@gmail.com
> <javascript:>>:
>
> I have to decide between two design
>
> 1) simpler, multiple entities are contained within the same
> pouchdb document
> 2) each entity has it's own pouchdb document
>
> design 1) may lead to hundreds of revisions, even pass the 1000
> revs in few years, design 2 doesn't have this problem
>
> Should I go straight for 2) or is 1) still a good candidate?
>
> What are the cons of having many revisions?
> I've had troubles when I deleted many hundreds of revisions for
> a doc, browser was becoming non responding
> In the hp that I have hundreds of revisions not in conflicts, if
> I compact the db and then delete the last rev I shouldn't have
> problems right? Or is it still inefficent?
> If I have hundreds of revisions in conflicts I have to delete
> them one by one and I had problem in this case (which shouldn't
> happen)
>
>
> --
> You received this message because you are subscribed to the
> Google Groups "PouchDB" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to pouchdb+u...@googlegroups.com <javascript:>.
> To post to this group, send email to pou...@googlegroups.com
> <javascript:>.
> <https://groups.google.com/d/msgid/pouchdb/0d1e92a4-0b69-41ee-8c08-25ae813d6503%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "PouchDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pouchdb+u...@googlegroups.com
> <mailto:pouchdb+u...@googlegroups.com>.
> To post to this group, send email to pou...@googlegroups.com
> <mailto:pou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pouchdb/b8aa23ae-4f51-4ab8-846f-b7d465b2254f%40googlegroups.com
> <https://groups.google.com/d/msgid/pouchdb/b8aa23ae-4f51-4ab8-846f-b7d465b2254f%40googlegroups.com?utm_medium=email&utm_source=footer>.
signature.asc
Reply all
Reply to author
Forward
0 new messages