gql not giving full result set

22 views
Skip to first unread message

Adhi

unread,
Nov 6, 2009, 7:03:16 AM11/6/09
to Google App Engine, mani.do...@orangescape.com
Hi,
Sometimes I am not getting the complete resultset from gql even though
the missing records satisfies the condition. I've proper indexes.
Total records for that query will be around 1300. So, I'm not fetching
the records in a single fetch, I'm using __key__ > last_record_key to
get in batches.


Why is this anomaly..? anything I am missing here.

Adhi

Eli Jones

unread,
Nov 6, 2009, 9:13:48 AM11/6/09
to google-a...@googlegroups.com
Always post a full code snippet.

Aren't you supposed to use Order By when paging by key?
--
Sent from my mobile device

Adhi

unread,
Nov 6, 2009, 12:53:57 PM11/6/09
to Google App Engine, mani.do...@orangescape.com
Yes, I've tried using order by also. But its giving different
resultset.
When using order by I got only 842 records, but with out order by I
got 1251
where as my actual records will be >1260. and when I change the fetch
size
I'm getting different count.

Here is my code...

def get_serialized_data(entityClass, params):
query = entityClass.all()
query.order('__key__')

for filterColumn, filterValue in params.iteritems():
query.filter(filterColumn, filterValue)
limit = 400
offset = 0
totalLimit = 800
lastRecordKey = None
n = 0
entities = query.fetch(limit, offset)
while entities and offset <= (totalLimit-limit):
lastRecordKey = entities[-1].key()
n += len(entities)
# My serialization code here
offset+=limit
if len(entities)==limit:
entities = query.fetch(limit, offset)
else:
entities = None
entities = None
return (n>=totalLimit, lastRecordKey)

def download_data():
params = {'ApplicationId':applicationId, 'Deleted':False,
'SheetMetadataId':'Sheet003'}
(moreRecords, lastRecordKey) = get_serialized_data(PrimaryData,
params)
while moreRecords:
params['__key__ >'] = lastRecordKey
(moreRecords, lastRecordKey) = get_serialized_data
(PrimaryData, params)

download_data()

Each batch will fetch 800 records if I use q.fetch(800) its giving
Timeout so I've used offset.
As per the documentation in http://code.google.com/appengine/articles/remote_api.html
they haven't specified
to add order by for __key__ so I thought its implicit. Thats why I
initially tried with out order by.
Am I doing anything wrong?

Now I'm trying to delete and recreating the indexes because of this
problem, but it still in deleting state.

Adhi


On Nov 6, 7:13 pm, Eli Jones <eli.jo...@gmail.com> wrote:
> Always post a full code snippet.
>
> Aren't you supposed to use Order By when paging by key?
>
> On 11/6/09, Adhi <adhi.ramanat...@orangescape.com> wrote:
>
>
>
>
>
>
>
> > Hi,
> > Sometimes I am not getting the complete resultset fromgqleven though

Martin Trummer

unread,
Nov 7, 2009, 9:25:21 AM11/7/09
to Google App Engine
just a guess:
"An index only contains entities that have every property referred to
by the index."
http://bit.ly/qiTBk

that my be the reason, why you get a different number of results
> As per the documentation inhttp://code.google.com/appengine/articles/remote_api.html

Jason Smith

unread,
Nov 9, 2009, 12:28:26 AM11/9/09
to Google App Engine
I have the same problem, which I wrote about on Stack Overflow but
received no response.

http://stackoverflow.com/questions/1691792/query-gqlquery-order-restricting-the-result-set

My models require the property in question and I manually confirmed
that they are all present, so it is not an issue of queries not
returning entities with missing properties. I am stuck with this
problem, and currently I am working around it by fetching all data and
sorting in memory. Fortunately I can get away with that as it's a
small data set and in infrequent query.
> As per the documentation inhttp://code.google.com/appengine/articles/remote_api.html

Adhi

unread,
Nov 9, 2009, 5:49:34 AM11/9/09
to Google App Engine, mani.do...@orangescape.com
Martin
Thanks for the info. But I explicitly created all the necessary
indexes.

When debugging this issue I felt there might be a problem in building
the indexes,
if I just open and save the missing record then that record is also
included
in the resultset.

So I think when I the record updation has to do something with the
query execution.
Any clues..?

Adhi

On Nov 9, 10:28 am, Jason Smith <j...@proven-corporation.com> wrote:
> I have the same problem, which I wrote about on Stack Overflow but
> received no response.
>
> http://stackoverflow.com/questions/1691792/query-gqlquery-order-restr...

Martin Trummer

unread,
Nov 9, 2009, 8:31:47 AM11/9/09
to Google App Engine
I can imagine just one cenario that would lead to
this behaviour:

say, i have an entity that has a property A
I create 10 of these entities

the I update my application and I added a property B
now I store another 10 of those entities to the datastore

in the datastore there are now 20 of these entities
10 have only property A
10 have properties: A, B

when you now build an index on property B, it will only
include the 10 entities that actually have the property B

now a query on all entities would give 20 results
the same query ordered by B would give only 10 results

when you now edit one of the old entities that only have
property A and save this entity, the saved entity will
also include entity B, so that

now a query on all entities would give 20 results
the same query ordered by B would give only 11 results

mani doraisamy

unread,
Nov 13, 2009, 2:09:05 AM11/13/09
to Google App Engine
Martin,
In this case, we have a property called "ModifiedAt". This property
has neither been added recently nor is it missing in one of the
entities. But it still does not return consistent results. In fact,
when an additional AND condition is added to the query (for testing),
the entity starts appearing in the result, even though these entities
satisfy both the conditions.

thanks,
mani

mani doraisamy

unread,
Nov 27, 2009, 5:21:29 AM11/27/09
to Google App Engine
I am not sure why this problem is unattended. I think it is quite
serious: "Indexing seems to fail during heavy entity writes. There
after the query simply does not return correct results"

The only way we were able to get this working was to read all the 1
million entities out the datastore and write it back on a live
"production server". If someone is interested to look into this
problem, pls let us know.

thanks,
mani

On Nov 13, 12:09 pm, mani doraisamy <mdorais...@orangescape.com>
wrote:

Prashant Gupta

unread,
Nov 28, 2009, 11:24:52 AM11/28/09
to google-appengine
same problem here...

following is my JDO class:

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class _Contact{
    
    @Persistent(primaryKey = "true")
    private String EmailID;
    
    @Persistent
    private String Name;
    
    @Persistent
    private List<String> Groups;
}


following is my test case:


        PersistenceManager pm = pmf.getPersistenceManager();

        Query query = pm.newQuery(_Contact.class);
        
        query.setOrdering("EmailID");
        query.setFilter("Groups.contains(\"mygroup\")");
        
        int i = 1;
        for(_Contact cont : (List<_Contact>) query.execute()){
            resp.getWriter().print(i++ + " " + cont.getID() + "<br>");
        }
        
        pm.close();


above code printed 23 contacts and when I replaced  query.setOrdering("EmailID"); by query.setOrdering("EmailID desc"); it printed 18 contacts only.


This proves that indexes are not working properly, i am stuck in the middle of development because of this bug and no body seems to listening to this problem.

Prashant Gupta

unread,
Nov 29, 2009, 6:19:10 AM11/29/09
to google-appengine
what about this - I had a data store entity with known id, and JDOQL simply failed to retrieve it by id throwing JDOObjectNotFoundException. I think there is some major issue with datastore/indexes.

Ikai L (Google)

unread,
Nov 30, 2009, 2:27:35 PM11/30/09
to google-a...@googlegroups.com
Prashant, do you have sample data you can provide? It's even better if it isn't real data.

On Sun, Nov 29, 2009 at 3:19 AM, Prashant <ants...@gmail.com> wrote:
what about this - I had a data store entity with known id, and JDOQL simply failed to retrieve it by id throwing JDOObjectNotFoundException. I think there is some major issue with datastore/indexes.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Ikai Lan
Developer Programs Engineer, Google App Engine

Prashant Gupta

unread,
Nov 30, 2009, 3:48:02 PM11/30/09
to google-appengine
sorry, no.
For my first problem, I changed the class design and moved all data to new one.
For second problem, I had very less no. of entities, I deleted them manually. Next time if find some, I will remember to keep some for reference.

Dinesh Varadharajan

unread,
Dec 1, 2009, 6:40:42 AM12/1/09
to Google App Engine
Ikai,
Unfortunately we don't have dummy data to showcase this. I have a
reproducible case in production.

The app id is os-dev.appspot.com.

if you execute the query

SELECT * FROM PrimaryData where ApplicationId =
'Application_1652c875_be0f_11de_b4a5_a3c424aa5af6' and SheetMetadataId
= 'Sheet001' and Deleted=False

it returns 8 records.

and if you execute

SELECT * FROM PrimaryData where ApplicationId =
'Application_1652c875_be0f_11de_b4a5_a3c424aa5af6' and SheetMetadataId
= 'Sheet001'

it should at least return 8 records(I am removing a condition). but
it returns only 4 records.

Please let me know if you want to be added as developer to os-dev to
be able to access the datastore.

Dinesh


On Dec 1, 12:27 am, "Ikai L (Google)" <ika...@google.com> wrote:
> Prashant, do you have sample data you can provide? It's even better if it
> isn't real data.
>
> On Sun, Nov 29, 2009 at 3:19 AM, Prashant <antsh...@gmail.com> wrote:
> > what about this - I had a data store entity with known id, and JDOQL simply
> > failed to retrieve it by id throwing JDOObjectNotFoundException. I think
> > there is some major issue with datastore/indexes.
>
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>
> > .

Ikai L (Google)

unread,
Dec 2, 2009, 2:00:03 PM12/2/09
to google-a...@googlegroups.com
Dinesh,

I see the inconsistency. How long have you been writing to this dataset? My suspicion is that the indexes may have been updated incorrectly in a previous release due to a bug that we have since addressed. Unfortunately, the bug fix may not have retroactively addressed the incorrectly updated indexes.

How large is the dataset? For small datasets, bulk exporting and importing will address the issue, but for large datasets, we'll have to look to an alternative solution.

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


Dinesh Varadharajan

unread,
Dec 3, 2009, 4:44:24 AM12/3/09
to Google App Engine
Ikai,
Thanks for checking that out. We already exported and imported close
to 1 million records to make it work again. we left a sample set of
data for google to reproduce the issue.

However we are concerned about this bug being fixed and if possible
when it was fixed, since this had been happening quite frequently last
month.
> > <google-appengine%2Bunsu...@googlegroups.com<google-appengine%252Buns...@googlegroups.com>

Johann Rocholl

unread,
Dec 3, 2009, 8:40:39 AM12/3/09
to Google App Engine
My previous response to this thread is not showing up on Google
Groups, sorry if this is a double post.

I think I'm seeing the same problem, and I made a simple page to
reproduce it. My App ID is scoretool, and the test page is at /dns/
test/ on the appspot domain for my App ID. I'm not including the URL
to prevent this message being marked as spam.

The page shows the results of a simple query, once using the ascending
__key__ index and once the new descending __key__ index that was
created less than 4 days ago. Both queries should return the same
results, but they don't. My dataset is large (total of 2 million
items, using 15 GB including metadata). My full source code is
available at /jcrocholl/scoretool on github.com.

Please let me know if I can do anything else to help diagnose this
issue.

Johann Rocholl

unread,
Dec 3, 2009, 3:31:34 AM12/3/09
to Google App Engine
I think I'm seeing the same problem. I have recently (in the last 3
days) added descending indexes on __key__ for two models, and they
should return very similar entries because for each domains_domain
model instance, there should be a dns_lookup instance. It works okay
for __key__ ascending and for a separate string property called
"backwards" both ascending and descending, but the new descending
__key__ indexes return only partial results. It seems that maybe 40%
of items are missing from the descending __key__ indexes.

I have a rather large dataset (2,203,599 items, 15.7 GB including
metadata). The App ID is scoretool. You can see the problem if you
visit http://scoretool.appspot.com/dns/cron/ and hit refresh until you
get "Selector: before [random string]" and the text "Created [number]
missing DNS lookups:" followed by "(not really)".

My complete source code is available:
http://github.com/jcrocholl/scoretool/blob/master/dns/views.py
http://github.com/jcrocholl/scoretool/blob/master/prefixes/selectors.py

Please let me know if I can do anything else to help diagnose this
problem.

Johann

mani doraisamy

unread,
Dec 4, 2009, 10:19:33 AM12/4/09
to Google App Engine
Ikai,
As Dinesh mentioned we had exported and imported data using remote api
(using some weird query) to get this working. Unfortunately, batch
export/import also did not work. So, some clarity related to this
issue would help us avoid surprises in future:
- Bulk export/import is also based on the same set of queries, which
again does not return correct result. In cases such as these, when
query itself fails, is there any other way to get data out? Is this
not a single point of failure? Shouldnt the bulk export/import be run
from inside the datacenter, directly on the application id/tenant
without queries, without time out and 1000 items per query
limitations?
- Does this happen because of indexing failure during heavy entity
writes? If so, are they not atomic? Why does the query return
incorrect results, even when indexes are rebuilt from Datastore
administration. Are they 2 different things?
- As you can see from this thread, there has been no response for 2
weeks, when our application on production went down. Is there any
escalation/support planned in future for production systems? At least,
an acknowledgement of some form for us to try other alternatives.

Although bugs are understandable, this issue in a way, represents one
of the inherent risks in the cloud:
- data in servers that you do not have control on (to take backup or
restore. Not really the usual "show me my data" stuff)
- openness that hasnt yet reached maturity (atleast technically not
working to resort to an alternative)

So pls ensure that the fall back options are also reliable.

thanks,
mani


On Dec 3, 2:44 pm, Dinesh Varadharajan
<dinesh.varadhara...@orangescape.com> wrote:
> Ikai,
> Thanks for checking that out. We already exported and imported close
> to 1 million records to make it work again. we left a sample set of
> data for google to reproduce the issue.
>
> However we are concerned about this bug being fixed and if possible
> when it was fixed, since this had been happening quite frequently last
> month.
>
> On Dec 3, 12:00 am, "Ikai L (Google)" <ika...@google.com> wrote:
>
>
>
> > Dinesh,
>
> > I see the inconsistency. How long have you been writing to this dataset? My
> > suspicion is that the indexes may have been updated incorrectly in a
> > previous release due to a bug that we have since addressed. Unfortunately,
> > the bug fix maynothave retroactively addressed the incorrectly updated
> > > > > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > > <google-appengine%2Bunsu...@googlegroups.com<google-appengine%252Bunsub scr...@googlegroups.com>
>
> > > > > .
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/google-appengine?hl=en.
>
> > > > --
> > > > Ikai Lan
> > > > Developer Programs Engineer, Google App Engine
>
> > > --
>
> > > You received this message because you are subscribed to the Google Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to google-a...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
Reply all
Reply to author
Forward
0 new messages