>    * GAE can't not tell you how many total entities there are of a
> specific type and it can't count more then 1000 entities.
>    * GAE limits the entities that can be processed in a single
> transaction to those in the same entity group and only one request
> processing instance can write to an entire entity group at a time.
I fail to see how either of these could be implemented in a truly
scalable and distributed system such as App Engine. The data is
potentially spread over thousands of machines, so any kind of global
co-ordination (needed for both entity counts and transactions) would
require a single machine to be "in charge", and hence the scaling
would be limited to that machine's capacity.
If you care about counting the number of a particular kind or all
kinds, simply increment a sharded counter each time you add an entity.
That would scale very well, and give you quick access to totals.
Dave.
> As far as transactions go, the global coordination is already there at
> the entity group level (if it wasn't, then there wouldn't be any
> transactions at all).  Sure it will be more costly at the entity level
> then at the group level, but it will still be less costly then
> layering the equivalent on top of the data store API in the python
> application code.  Allowing transactions spanning entity groups,
> doesn't mean that entity groups and non-spanning transactions need be
> eliminated.  Leave it the application developer to choose the
> appropriate transaction type as needed.
You're right, application-wide transactions don't have to subsume
entity-group transactions; you could, after all, put every entity in
your application in a single entity group. They are, however, still
inherently unscalable. Entity groups mean that global coordination is
*not* required, only that there is a "home" for each entity group. As
long as the entity groups are small, this responsibility can be split
across machines quite easily.
Dave.
ALL the quotas are there for you to be efficient they the whole
purpose, they are not hard coded to make your live bad they are in
there to make the system better.
I'll point you to Guido's latest talk on the issues you suggest
http://www.youtube.com/watch?v=CmyFcChTc4M&feature
The "admin" complain seems like a django-junkie missing it, rather
than a real complain, you want an admin go write it.
The only place where I agree with you is the lack of the long running
process, although the way you point it seems to be more like,
longer-than-now not really long running.
> Aral
> >
>
Lets look at it from a performance perspective.
1- 1MB datastructure - which unit of data (leaving files outside)
should be bigger than 1mb? IMO that's a badly design datastore.
2- 1000 query limit, which user is going to want 1000 results?
3- Short CPU, it is common knowledge that a user will go away from a
page after 3 seconds of loading. so in order to eliminate this
bottleneck you use catching, after all if it's intensive to compute
it's worth catching.
4- Quotas in general, not even google has enough machines for us to waste.
5- Admin, a django junkie complaining for the lack of UI
The only concern where I agree is file upload, we do need a facility
to uplod videos or pdf or images or whatever we want, but that is
being worked out, same with SSL.
Funny that the OP didn't mention SSL as that IS a showstopper for a
LOT of applications
http://code.google.com/p/googleappengine/issues/detail?id=15
> The query limit was given specifically in combination with the lack of
> expression power to select the records. Nobody wants to return 10.000
> records to the browser, but you have to be able to get the 50 records
> you do want to return. True, some applications know upfront what the
> exact key will be, but some applications need more dynamic querying.
> Also, the limit is hurtful because currently MapReduce can only be
> implemented as a series of successive calls.
>
will you post an example where you need 1000 results to then narrow it
down to 50? this seems to me like a "joins" design which is something
you shouldn't be doing in datastore, it has been discuss several times
that you shouldn't use datastore as a relation database. You may
disagree if you are strong on SQL but denormalization is the first
step of scalability.
> Likewise, the CPU was not discussed in the context of rendering a
> standard user page but in the context of background processing and/or
> report building.
>
Which are facilities that aren't out on the engine, so you can't
critize an interface that was build to serve pages as a bad background
processing tool, it will be like complaining about how bad that
Italian restaurant is at making Hamburgers.
> Quota's were discussed with lot of understanding, but also a lot of
> nuance. How do you actually respond to the remarks made? Let's get
> down to real applications.
>
I'm a minimalistic at heart, and I strongly believe that any app
running that is getting over the head of it's quota is because it
isn't efficient enough, with the sole exception of the max page views
in which case google has and will increase that bar for the
applications that demand it. Again from the talk, the top 5
applications running on GAE have had their quotas raised because they
really need it not because some script kiddie is loading half his
datastore into memory on every request.
> Admin again was not about being able to use Django, but about how to
> do data transformation on your database. Do you think Google would be
> able to rebuild its index, or do any other part of its magic, without
> MapReduce? Admin is about the ability to prepare things before the
> user needs them, so that yes you can respond in subsecond turnarounds.
>
what? the Admin (in a django concept) is a way to fix error in the
data and sometimes input data by advanced users, you shouldn't be
manipulating a db structure from a GUI, that is why we have migration
scripts and so many project dedicated to them.
>> The only concern where I agree is file upload, we do need a facility
>> to uplod videos or pdf or images or whatever we want, but that is
>> being worked out, same with SSL.
>
> Look, Google has declined to provide forward looking data. Yes, I was
> at Google IO and they ARE good guys and I believe them when they
> express their best intentions to meet specific targets (back then, at
> the beginning of the year). But not providing an official calendar
> means they are not putting their foot down, essentially that they
> don't know themselves. Google is asking us to use what is there, and
> only what is there (not that I wouldn't like a calendar, and that
> would reframe this discussion completely - but it simply isn't there).
> And there is little tangible evidence that warrant the faith that they
> will in fact have a big bang improvement within what has now become a
> really very short time frame. So my guess is that we'll see
> substantial delays relative to what was said back at Google IO.
Well that's a social issue, and I wasn't there so I don't have the
first hand experience. But do keep in mind the same thing could be
said about any product. But I'm not the person to answer this, they
are.
>
> And I like Google App Engine very much. Really. And yes, I do build
> real applications on top of it. And I do believe that many of their
> limitations stem from the need to scale.
>
> I feel that still leaves plenty of room for discussion, and especially
> for this blog post that makes a lot of very good arguments. Plus the
> author clearly uses and understands GAE, uses Python, isn't
> complaining about the designers philosophy.
Isn't this what we are doing? my point is that most of the complains
in this group come from people that want GAE to be X,Y and Z without
any criteria, specially with the limits discussion, most people say
you are google you are huge give me your resources for free, instead
of sitting down with their code and fix it.
> The good news is that by the end of *2009*, the world might be really
> interesting with GAE, and with competing platforms driving each others
> features.
>
Maybe, my crystal ball is clowdy today.
> Filip
>
>
> >
>