collectionGroup().onSnapshot() never completes but get() works fine

124 views
Skip to first unread message

Mehdi Mulani

unread,
Mar 12, 2023, 12:52:41 AMMar 12
to google-cloud-firestore-discuss
I want to listen to document changes across a wide range of collections in our database. Right now I'm doing ~10 collectionGroup queries and using onSnapshot, essentially something like:

```
for (const collectionName of collectionsToWatch) {
  db.collectionGroup(collectionName).onSnapshot((snapshot) => handleSnapshotUpdates(collectionName, snapshot));
}
```

Each collectionGroups has ~10-50k documents and in practice, not all of these snapshots actually return a snapshot. Instead what we see if snapshots being returned for collectionGroups with 10-20k document but the larger collectionGroup snapshots never return.

Weirdly, if we change this code to use `db.collectionGroup(collectionName).get().then((snapshot) => handleSnapshotUpdates(collectionName, snapshot))`, each snapshot returns pretty quickly.

Is this an issue with Firestore or should I be trying to listen to these types of document changes in another way?

In case it helps, I am using the Node API of @google-cloud/fire...@6.4.2.

Mark Duckworth

unread,
Mar 13, 2023, 4:37:55 PMMar 13
to google-cloud-firestore-discuss
I didn't see any published limits that may apply here. So, there could be an issue, but we will need some more information to confirm. Can you enable logging for the SDK and send us the logs when running one of the failing snapshot listeners? Use setLogFunction(...) to enable logging. Also, if you believe there is a bug, you can file it in the GitHub repo or just follow up here.

Since you asked about alternatives, perhaps you can query for documents that have been modified since a specific time? You may already have a timestamp in your documents you can query on, or you may want to look into something like `FieldValue.serverTimestamp()`. Alternatively, perhaps you can partition your larger collectionGroups into something smaller?

Mark

In case it helps, I am using the Node API of @google-cloud/firestore@6.4.2.

Mehdi Mulani

unread,
Mar 13, 2023, 4:44:08 PMMar 13
to google-cloud-firestore-discuss
Thanks for those suggestions.
I already tried using `setLogFunction` and waiting for a failure but never actually ran into a failure. Instead the logs just constantly showed messages like:

Firestore (6.4.2) 2023-03-12T05:23:30.105Z 1hVym [Watch.onData]: Processing change event
Firestore (6.4.2) 2023-03-12T05:23:30.105Z 1hVym [Watch.onData]: Received document change
Firestore (6.4.2) 2023-03-12T05:23:30.105Z 1hVym [Firestore.requestStream]: Received response: {"documentChange":{

which made me think possibly the backend is repeatedly sending the same data? Though I didn't try confirming this because of lack of time.

The timestamp idea would be good but sadly we don't store modification time yet, though we might go with that later.

Using query partitions seems like a good workaround for the time being, will try using that to see if/where this is stalling or hopefully it might just solve the problem.

Also happy to share the code we're using if that would help determine if this is a server side issue.

Mark Duckworth

unread,
Mar 14, 2023, 11:28:24 AMMar 14
to google-cloud-firestore-discuss
I attempted to reproduce this on a collection with 50k documents, but I did not reproduce it. Both methods of querying completed quickly. So, seeing your code could be helpful. Also, if you could describe your documents, or provide a mock, that could help. Do not include actual customer data.

Finally, I'm curious what happens if you only create one snapshot listener for one of the larger collections that is failing? Do you still see the failure? What do you see in the logs in this case?

Mark

Mehdi Mulani

unread,
Mar 14, 2023, 2:07:19 PMMar 14
to google-cloud-firestore-discuss
Here is the general structure of our collection and collection groups. I am querying one main collection called allDocs with onSnapshot, and then 11 different collection groups with onSnapshot. The collection groups are all subcollections of allDocs. This is the rough count of all the documents and what we are trying to query:

allDocs 50,000
collectionGroup1 50,000
collectionGroup2 20,000
collectionGroup3 20,000
collectionGroup4 10,000
collectionGroup5 10,000
collectionGroup6 5,000
collectionGroup7 5,000
collectionGroup8 5,000
collectionGroup9 3,000
collectionGroup10 3,000
collectionGroup11 3,000

Notably if I only query for some of the collection groups along with allDocs, the snapshot listeners actually complete. Possibly because we are loading less documents overall?

In terms of how we are querying the collections, it is pretty simple:
```
['collectionGroup1', ..., 'collectionGroup11'].forEach(collectionName => db.collectionGroup(collectionName).onSnapshot((snapshot) => handleSnapshotUpdates(collectionName, snapshot), error => {
    console.log(`Ran into an error with collection ${collectionName}.`, error);
    process.exit(1);
  }));
db.collection('allDocs').onSnapshot(snapshot => {
...
}, error => {
    console.log(`Ran into an error loading allDocs collection.`, error);
    process.exit(1);
  });
```

While running this script and waiting for all the collections to finish, the error handler is never called.

Sorry, I don't have an easy mock data/template to share but hopefully these details might help a bit more. I still want to try the collectionGroup partitioning idea to see if that narrows it down.

Mark Duckworth

unread,
Mar 16, 2023, 1:24:51 PMMar 16
to google-cloud-firestore-discuss
I attempted to reproduce the error with the same collection structure and document counts, however I was not able to reproduce the hang after a few attempts. In my case, all snapshots were raised successfully. As expected, the larger collections/groups were slower to raise a snapshot, so perhaps data is still loading when it appears hung? You stated that your logs show activity that look like data is still coming from the server.

Other factors that could differentiate our experiments are document size, connection speed, and the complexity of `handleSnapshotUpdates`. My snapshot handler was a very simple function that logged a count to the console.

Mark

Mehdi Mulani

unread,
May 23, 2023, 8:06:18 PMMay 23
to google-cloud-firestore-discuss
Sorry for the late followup, finally got around to looking at this again.

After investigating it further, I actually think there's a bug in the Firestore server implementation. From adding logging, I've observed that the same document is handled repeatedly and beyond a certain point, no new documents are received. (Note that the onSnapshot listener is never actually called either)

My code looks like this: (also uploaded this to https://gist.github.com/mmmulani/29dbbeb3f165b4ff992ad18492e694fe for easier reading)
```
const docsSeen = new Set<string>();
let savedBlob = undefined;
admin.firestore.setLogFunction((msg) => {
  const responseMarker = '[Firestore.requestStream]: Received response:';
  if (msg.includes('JIHdYtkYCRgs0SzCOyRe8HDID33') && msg.includes(responseMarker)) {
    const jsonText = msg.substring(msg.indexOf(responseMarker) + responseMarker.length + 1);
    const responseValue = JSON.parse(jsonText);
    console.log('Seen docs up to this point', docsSeen.size);
    if (savedBlob) {
      console.log('values match', equal(savedBlob, responseValue));
    }
    savedBlob = responseValue;
  }

  const dbMarker = '/databases/(default)/documents/';
  if (msg.includes(dbMarker)) {
    const docPath = msg.substring(msg.indexOf(dbMarker) + dbMarker.length, msg.indexOf('"', msg.indexOf(dbMarker)));
    docsSeen.add(docPath);

  }
});

db.collection('allDocs').onSnapshot(snapshot => {
  ...
}, error => {
  ...
});
```

And then in my console output:
```
Seen docs up to this point 12087
Seen docs up to this point 15934
values match true
Seen docs up to this point 15934
values match true
Seen docs up to this point 15934
values match true
```
which indicates that the Firestore backend keeps sending the same set of documents over and over to the client. In fact, when I changed the script to simply output the docPath of the received document, you can see them increment in the console until looping and starting over from the beginning.

One other oddity I noticed was that some of the documents received were not from the 'allDocs' collection but rather a subcollection, e.g. 'allDocs/<doc id>/subCollection/<sub doc id>', even though I'm only doing a .onSnapshot for allDocs.

It's pretty bold to claim that there's a bug in the backend but from what I'm looking at, it seems like the onSnapshot never completes (even after running it for 20 minutes), and from the console log, it looks like it'll never go beyond a certain point.

Mehdi Mulani

unread,
May 23, 2023, 8:15:24 PMMay 23
to google-cloud-firestore-discuss
Oh from further looking at the logs, it seems like I'm running into some stream error that happens right before @google-cloud/firestore retries the query from the start:

```
Firestore (6.4.2) 2023-05-24T00:12:10.147Z LCH6I [Firestore._initializeStream]: Received stream error: Error: 8 RESOURCE_EXHAUSTED: Resource has been exhausted (e.g. check quota).
    at callErrorFromStatus (node_modules/@grpc/grpc-js/src/call.ts:81:17)
    at Object.onReceiveStatus (node_modules/@grpc/grpc-js/src/client.ts:692:51)
    at Object.onReceiveStatus (node_modules/@grpc/grpc-js/src/client-interceptors.ts:417:48)
    at node_modules/@grpc/grpc-js/src/resolving-call.ts:111:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeBidiStreamRequest (node_modules/@grpc/grpc-js/src/client.ts:676:30)
    at ServiceClientImpl.<anonymous> (node_modules/@grpc/grpc-js/src/make-client.ts:189:15)
    at node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:225:29
    at node_modules/google-gax/src/streamingCalls/streamingApiCaller.ts:71:19
    at node_modules/google-gax/src/normalCalls/timeout.ts:54:13
    at StreamProxy.setStream (node_modules/google-gax/src/streamingCalls/streaming.ts:204:20)
    at StreamingApiCaller.call (node_modules/google-gax/src/streamingCalls/streamingApiCaller.ts:88:12)
    at node_modules/google-gax/src/createApiCall.ts:118:26 {
  code: 8,
  details: 'Resource has been exhausted (e.g. check quota).',
  metadata: Metadata {
    internalRepr: Map(1) { 'content-disposition' => [Array] },
    options: {}
  },
  progress: 'PROCESSED'
}
```

Mehdi Mulani

unread,
May 23, 2023, 10:17:58 PMMay 23
to google-cloud-firestore-discuss
Oh also filed a case with Firebase Support (#10231778) which has the relevant project ID and collection if you want to look closer.

Mark Duckworth

unread,
May 24, 2023, 12:33:58 PMMay 24
to google-cloud-firestore-discuss
Thanks for the update. Since you've filed a case with Firebase support, let's see what they are able to find in the server logs. I'm curious if they find the RESOURCE_EXHAUSTED error and can identify a cause.

Also can you clarify if the scenario producing this behavior has changed? Recently you mentioned "I'm only doing a .onSnapshot for allDocs", but previously you mentioned "I am querying one main collection called allDocs with onSnapshot, and then 11 different collection groups with onSnapshot".

Mark

Mehdi Mulani

unread,
May 24, 2023, 2:18:18 PMMay 24
to google-cloud-firestore-discuss
> Also can you clarify if the scenario producing this behavior has changed? Recently you mentioned "I'm only doing a .onSnapshot for allDocs", but previously you mentioned "I am querying one main collection called allDocs with onSnapshot, and then 11 different collection groups with onSnapshot".

I am still doing the larger query (one main collection and 11 collection groups) but I was seeing that the main collection always failed, so to make it easier to debug, I started just fetching the main collection with onSnapshot. Once we get that resolved, my end goal is to still do the larger query (one main collection and 11 different collection groups).
Reply all
Reply to author
Forward
0 new messages