Fwd: Entities not being returning in query result -- index bug?

34 views
Skip to first unread message

Osvaldo Lopez Acuña

unread,
Jul 20, 2022, 3:26:06 PM7/20/22
to bmu...@tethras.com, gcd-d...@googlegroups.com


---------- Forwarded message ---------
From: bmurr <bmu...@tethras.com>
Date: Friday, July 15, 2022 at 11:10:01 AM UTC-5
Subject: Entities not being returning in query result -- index bug?
To: Google App Engine <google-a...@googlegroups.com>


I was alerted by one of my users that some entries they are associated with were not appearing to them.

Indeed, when I query the datastore for these entities (with a single property filter), they are not returned. After writing them to the datastore, the index is updated and they are returned in the query.

Perhaps technically my query is not guaranteed to return the entities as it is weakly consistent, but none of the entities were changed recently and usually any inconsistent results are resolved quite quickly. (it has been several hours now)

So it seems like the index entries for this property on these entities were lost or damaged somehow. What to do? Wait and hope the index will be regenerated? I can write entities for this user to the datastore to regenerate the index...but doing it for all my users is not really an option.


David Gay

unread,
Jul 22, 2022, 12:17:44 PM7/22/22
to Osvaldo Lopez Acuña, bmu...@tethras.com, Google Cloud Datastore
The most likely reason is that those entities stored the property used in the filter as unindexed (at which point they cannot be found via a query), and the update is storing them as indexed (so they can then be found in a query).

You can check if a given property of a particular entity is indexed or not by editing it in the Datastore entity viewer on the cloud console (the entity overview doesn't display whether properties are indexed or not, but pressing the Edit (pencil icon) button on a specific entity will show that entity in detail including the word "Indexed" beneath each property name that is indexed).

--
David Gay

--
You received this message because you are subscribed to the Google Groups "Google Cloud Datastore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gcd-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gcd-discuss/dcd4d210-90ef-47a7-b422-c3adb406174en%40googlegroups.com.

David Gay

unread,
Jul 22, 2022, 2:49:44 PM7/22/22
to Ben Murray, Osvaldo Lopez Acuña, Google Cloud Datastore
Can you file a support ticket and/or provide project/query details? (but I won't be able to follow up for a week in the second case)

--
David Gay

On Fri, Jul 22, 2022 at 11:16 AM Ben Murray <bmu...@tethras.com> wrote:
Yes, the property is indexed and always has been.

Nothing has been changed on my side, yet this query now returns different results than it used to. It doesn't add up.

David Gay

unread,
Aug 2, 2022, 3:35:20 PM8/2/22
to Ben Murray, Osvaldo Lopez Acuña, Google Cloud Datastore
Thanks to Ben for providing the full details to get to the bottom of this.

It turns out that the query filter was on a value of the App Engine "user" type. Sadly, while storing "user" values in the Datastore is allowed, the implementation has a number of pitfalls that makes this often problematical.

In this specific case (referencing here the Python API's User type), the User.user_id() is stored in the Datastore and filters on user values require both the email and current user_id for the email (looked up at the time the query is run) to match what is stored (I'm ignoring the auth_domain() here for simplicity).

However, these user_ids are not quite as stable as one might want - in this particular case older entities have a missing user_id (stored as 0 in the Datastore) and the newer ones have a non-zero id. The query finds only these latter entities. Rewriting the older entities also automatically updates the user_id to the current value, so those rewritten entities are then found by the query. At a guess, the user in question did not use to have a gmail account (hence the missing user_id), and now does.

There isn't really a fix we can do here - any attempt to change how Datastore works here will just break someone else. If you want to reliably look up users with a Datastore filter, you have to use something else than the user type (an obvious choice is the email address, but that has its own set of pitfalls for long-term use if particular email providers ever allow email reuse - identity is hard).

Because of all this, we've effectively demoted the user type - you won't find it documented in the Cloud Datastore or Firestore APIs, though it can still be read or written if you really, really want it...

--
David Gay

On Fri, Jul 22, 2022 at 11:16 AM Ben Murray <bmu...@tethras.com> wrote:
Yes, the property is indexed and always has been.

Nothing has been changed on my side, yet this query now returns different results than it used to. It doesn't add up.

On Fri, 22 Jul 2022, 18:17 David Gay, <dg...@google.com> wrote:
Reply all
Reply to author
Forward
0 new messages