Repeated documents returned by text-search pagination.

2 views
Skip to first unread message

Luca Morandini

unread,
Sep 19, 2022, 8:24:45 PM9/19/22
to us...@couchdb.apache.org
Hi,

I added a text search index to a 4-node, Kubernetes-deployed,
clustered database and started querying it.

The queries work, but I noticed that a variable (say, 1%-8%)
proportion of the documents ids returned in batches through pagination
(using bookmarks) was already returned by previous pages. The
duplicated IDs change somewhat at every run, hence ithe phenomenon
seems to be random.

I did not use stale in the requests, just the query, a limit set to
200, and the bookmark returned by the previous pagination response.

There are no errors in the log of either CocuhDB or Clouseau.

Could someone shed some light on this?

Cheers,

Luca Morandini

Robert Newson

unread,
Sep 20, 2022, 3:42:01 AM9/20/22
to user
Hi,

The bookmark encodes the "order" property of the last result from each shard range, and a query with a bookmark parameter is simply retrieving matches that come after those order values. If the database changes between queries (documents added, changed or removed) such that the overall ordering of search results also changes, it is normal to see search results repeated (a database change added an item to a previous page, pushing every later change further down the list) or missing (a database change removed an item from a previous page, moving everyone "up").

B.

Luca Morandini

unread,
Sep 20, 2022, 8:27:25 AM9/20/22
to us...@couchdb.apache.org
On Tue, 20 Sept 2022 at 17:41, Robert Newson <rne...@apache.org> wrote:
>
> The bookmark encodes the "order" property of the last result from each shard range, and a query with a bookmark parameter is simply retrieving matches that come after those order values. If the database changes between queries (documents added, changed or removed) such that the overall ordering of search results also changes, it is normal to see search results repeated (a database change added an item to a previous page, pushing every later change further down the list) or missing (a database change removed an item from a previous page, moving everyone "up").

I do not think this applies to my case:
I executed a query for documents in a given "created_at" date range,
hence the results are not influenced by the additions of further
documents (which by necessity have a "created_at" date outside that
range}.

Cheers,

Luca Morandini
Reply all
Reply to author
Forward
0 new messages