--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
if the type of data is larger than 10000 items, you need reindexing
for this result.
and recount each time for getting the proper item.
it seems you have not encountered such a problem.
on this situation, the indexes on the fields helps nothing for the
bulk of data you have to be sorted is really big.
> > >http://code.google.com/appengine/articles/paging.html-Hide quoted text -
On Dec 17, 12:20 am, Andy Freeman <ana...@earthlink.net> wrote:
> > >http://code.google.com/appengine/articles/paging.html-Hide quoted text -
Here is some psuedo code
List itemsToDelete = loadOneHundredItems();
for (Item i in itemsToDelete) {
delete(i);
}
//this will stop it when it gets to the end of the queue
if (itemsTodelete.size() > 0) {
QueueFactory.getDefaultQueue().add("/admin/delete-all");
}
What kind of reindexing are you talking about.
Global reindexing is only required when you change the indices in
app.yaml. It doesn't occur when you add more entities and or have big
entities.
Of course, when you change an entity, it gets reindexed, but that's a
constant cost.
Surely you're not planning to change all your entities fairly often,
are you? (You're going to have problems if you try to maintain
sequence numbers and do insertions, but that doesn't scale anyway.)
> > it seems you have not encountered such a problem.
> on this situation, the indexes on the fields helps nothing for the
> bulk of data you have to be sorted is really big.
Actually I have. I've even done difference and at-least-#
(intersection and union are special cases - at-least-# also handles
majority), at-most-# (binary xor is the only common case that I came
up with), and combinations thereof on paged queries.
Yes, I know that offset is limited to 1000 but that's irrelevant
because the paging scheme under discussion doesn't use offset. It
keeps track of where it is using __key__ and indexed data values.
> > > >http://code.google.com/appengine/articles/paging.html-Hidequoted text -
>
> > > - Show quoted text -- Hide quoted text -
but what i am concerning is about statistics
to count the different fields for different usage,
it means we must count all the data, get the statistics info at once
query.
and more over, this statistics info may be more than one fields and
have different orders between fields.
for the time being, appengine can not handler this no only that we can
not count an entity but also
we can not count indexed fields.
so for a real and fairly big website which need statistics info to see
the conditions of the site.
how can it be achieved by using appengine?
you may find that the __key__ is of no use, because the filtered data
is ordered not by key.
but by the fields value, and for that reason you need to loop query as
you may like to do.
but you will encounter a timeout exception before you really finished
the action.
> > > > >http://code.google.com/appengine/articles/paging.html-Hidequotedtext -
If you have an ordering based on one or more indexed properties, you
can page efficiently wrt that ordering, regardless of the number of
data items. (For the purposes of this discussion, __key__ is an
indexed property, but you don't have to use it or can use it just to
break ties.)
If you're fetching a large number of items and sorting so you can find
a contiguous subset, you're doing it wrong.
You're claiming that one can't page through an entity type without
fetching all instances and sorting them. That claim is wrong because
the order by constraint does exactly that.
For example, suppose that you want to page through by a date/time
field named "datetime". The query for the first page uses order by
datetime while queries for subsequent pages have a "datetime <="
clause for the last datetime value from the previous page and continue
to order by datetime.
What part of that do you think doesn't work?
Do you think that Nick was wrong when he said that time time to
execute such query depends on the number of entities?
You can even do random access by using markers that are added/
maintained by a sequential process like the above.
how do you expect the appengine to handle this problem?
how about at one request with many these actions?
If you need aggregations (average, median, total, etc), you have to
compute them incrementally or with an off-line process.
> when even with the "datetime <=" you still get a big set, how you can
> handle it?
We're talking about paging through a dataset, presenting n (for small
n) elements at a time to a user.
If we're paging through by the value of field with distinct values and
we want to present 20 results per page, the query for the first page
is "order by field" with limit 20. That query has a "last" result.
The query for the next page is "field > {last result's field value}
order by field", again with limit 20. That query also has a last
result so the form of subsequent queries should be obvious. (If
you've got other conditions, such as user id or key, you need to add
those as well.)
Suppose that entities can have the same field value. If you don't
care how those entities are ordered, the first query's order by clause
can be "order by field, __key__", again limit 20. The next query
tries to pick up entities with the same field as the last result from
the previous query. It looks like "field = {last result's field's
value} and __key__ > {last result's key} order by __key__" and you
keep using it until it fails. You then use a query like the "next
page" query from the previous case. (I stopped mentioning limit
because the value depends on what you need to fill the current page.)
and you didn't get what i mean.
because the offset is limited to 1000.
i can not sort data by fields in results more than some limited items
with out the offset limit, we can do it easily.
There's always going to be a limit for scalable applications -
appengine just exposes it.
> because the offset is limited to 1000.
> i can not sort data by fields in results more than some limited items
Don't sort. Use indices. They can handle multiple fields.
Indices are the only way to build scalable applications.