Incomplete index was built [was: Slow index build]

8 views
Skip to first unread message

Eric Rannaud

unread,
Nov 30, 2009, 1:28:01 PM11/30/09
to Google App Engine
After over 12h of build time, an index over 500,000 entities (200MB of
data) finished building right after I submitted a message for
moderation, asking for help (did somebody at Google do something ?).
(App: 911pagers).

However, the index that was built is incorrect. Indeed, the same GQL
query (e.g. in the admin console) returns different result depending
on whether it has an "order by" statement or not.

SELECT * FROM MessageS where number = '[004548018]'
SELECT * FROM MessageS where number = '[004548018]' order by id
asc

The first query returns 4 results, the second query only 2.

The index in question is:
<datastore-index kind="MessageS" ancestor="false" source="auto">
<property name="number" direction="asc"/>
<property name="id" direction="asc"/>
</datastore-index>

I will delete the index and force a rebuild, but I'll give 6 hours to
Google people to look at it if they want to debug something. You can
contact me directly if needed.

Thanks.

Joshua Smith

unread,
Nov 30, 2009, 4:30:49 PM11/30/09
to google-a...@googlegroups.com
Are you sure all 4 entities *have* an "id" field. I've been bitten by that when I added a new field. if you mention a field that an entity is missing anywhere in the GQL, then that entity will not show up in your query results.
> --
>
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Prashant Gupta

unread,
Nov 30, 2009, 4:37:07 PM11/30/09
to google-appengine
You may want to go through this discussion thread.

Eric Rannaud

unread,
Nov 30, 2009, 4:41:18 PM11/30/09
to google-a...@googlegroups.com
On Mon, Nov 30, 2009 at 1:30 PM, Joshua Smith <Joshua...@charter.net> wrote:
> Are you sure all 4 entities *have* an "id" field.  I've been bitten by that when I added a new field.  if you mention a field that an entity is missing anywhere in the GQL, then that entity will not show up in your query results.

Yes, they all do have an id field. Note that the ordered query works
just fine on the development server, returning 4 results.

For now, my workaround is to retrieve the entities unordered and sort
them in the server code.

Eric.

Jason (Google)

unread,
Dec 4, 2009, 6:22:45 PM12/4/09
to Google App Engine
Hi Eric. We're investigating this issue on our end. Looking into this
a bit deeper, I see that one of the entities that didn't appear in the
query results was written on the 27th and the second was written on
the 30th. I know that you had re-built your index a second time after
our chat on Wednesday, but when did you originally build the index?
Was it after you finished writing all of the entities or sometime
between the 27th and 30th?

Thanks,
- Jason

On Nov 30, 1:41 pm, Eric Rannaud <eric.rann...@gmail.com> wrote:

Eric Rannaud

unread,
Dec 4, 2009, 6:52:31 PM12/4/09
to google-a...@googlegroups.com
On Fri, Dec 4, 2009 at 3:22 PM, Jason (Google) <apij...@google.com> wrote:
> Hi Eric. We're investigating this issue on our end. Looking into this
> a bit deeper, I see that one of the entities that didn't appear in the
> query results was written on the 27th and the second was written on
> the 30th. I know that you had re-built your index a second time after
> our chat on Wednesday, but when did you originally build the index?
> Was it after you finished writing all of the entities or sometime
> between the 27th and 30th?

I believe the chain of events is the following:

1- The 27th, all entities MessageS were uploaded.
2- The first index was built
3- Sometime after that, the MessageS class was updated to have a
"votes" field, without touching the content of the datastore itself.
4- Sometime after that (the 30th I assume), one the entities was voted
on, and therefore was updated.
5- First index deleted.
6- Second index built.

It's possible 3 was before 2, but I don't think so.

This is weird, doing the two queries today, ordered and unordered, now
returns respectively 3 and 4 results. It used to be 2 and 4 (with both
the first index and the second index). Something changed in the past 2
days. It's still wrong, but "less" wrong.

Thanks,
Eric.

Jason (Google)

unread,
Dec 4, 2009, 7:17:22 PM12/4/09
to google-a...@googlegroups.com
Hi Eric. Yes, the reason why you see 3 instead of 4 was because we re-put one of the incorrect entities. If you re-put the other (either programatically or via the Admin Console), you should see it appear when you execute the query. We're still working on determining why they weren't returned with the original index.

It's possible that several of your other entities are affected, and we'll try to clear that up for you soon. If you can't wait, you can always write a remote_api script that queries for all entities and writes each in an individual transaction yourself or continue filtering in memory until we repair the index on our end.

Thanks,
- Jason

Eric.

Eric Rannaud

unread,
Dec 4, 2009, 7:24:18 PM12/4/09
to google-a...@googlegroups.com
On Fri, Dec 4, 2009 at 4:17 PM, Jason (Google) <apij...@google.com> wrote:
> Hi Eric. Yes, the reason why you see 3 instead of 4 was because we re-put
> one of the incorrect entities. If you re-put the other (either
> programatically or via the Admin Console), you should see it appear when you
> execute the query. We're still working on determining why they weren't
> returned with the original index.
> It's possible that several of your other entities are affected, and we'll

It's indeed likely. Since I found that example by chance, it's
unlikely these are the only 2 out of 500,000.


> try to clear that up for you soon. If you can't wait, you can always write a
> remote_api script that queries for all entities and writes each in an
> individual transaction yourself or continue filtering in memory until we
> repair the index on our end.

It's alright, I'm sorting the results manually in application code.
The overhead is negligible with my current level of activity. This
will do fine for now.

Thanks.

Jason (Google)

unread,
Dec 10, 2009, 2:52:34 PM12/10/09
to google-a...@googlegroups.com
Eric, we're currently tracking the issue in http://code.google.com/p/googleappengine/issues/detail?id=2481. Please star it so you will be notified when we make status changes.

- Jason

Reply all
Reply to author
Forward
0 new messages