I planned to migrate a python app ... but my tests on google engine
was really bad, compare to sqlite or mysql.
I had just make a model for "users", with 4 fields, which can have
zero-to-many marks in another model "marks" with 3 fields.
And try to get the higher, the lower and the average mark for each
user.
It is really really slow, on appspot.com, compare to a real db where
it could done in one select (select max(mark), min(mark),
avg(mark) ...)
(i filled my db with 40 users which have each one 10 marks)
- is there a trick to do this with datastore, without parsing all
data ?
ok, i'm pretty sure that a page could be served at the same speed for
a australia boy, or a french boy ... but I asked myself if a can do
better ? (index are well defined), where can I be wrong ?!
On Fri, Apr 11, 2008 at 7:06 PM, manatlan <manat...@gmail.com> wrote:
> I planned to migrate a python app ... but my tests on google engine > was really bad, compare to sqlite or mysql. > I had just make a model for "users", with 4 fields, which can have > zero-to-many marks in another model "marks" with 3 fields. > And try to get the higher, the lower and the average mark for each > user. > It is really really slow, on appspot.com, compare to a real db where > it could done in one select (select max(mark), min(mark), > avg(mark) ...) > (i filled my db with 40 users which have each one 10 marks)
> - is there a trick to do this with datastore, without parsing all > data ?
> ok, i'm pretty sure that a page could be served at the same speed for > a australia boy, or a french boy ... but I asked myself if a can do > better ? (index are well defined), where can I be wrong ?!
Instead of calculating the results at query time, caculate them when you are adding the records. This means that displaying the results is just a lookup, and that the calculation costs are amortised over each record addition.
On Sat, Apr 12, 2008 at 12:06 PM, manatlan <manat...@gmail.com> wrote:
> I planned to migrate a python app ... but my tests on google engine > was really bad, compare to sqlite or mysql. > I had just make a model for "users", with 4 fields, which can have > zero-to-many marks in another model "marks" with 3 fields. > And try to get the higher, the lower and the average mark for each > user. > It is really really slow, on appspot.com, compare to a real db where > it could done in one select (select max(mark), min(mark), > avg(mark) ...) > (i filled my db with 40 users which have each one 10 marks)
> - is there a trick to do this with datastore, without parsing all > data ?
> ok, i'm pretty sure that a page could be served at the same speed for > a australia boy, or a french boy ... but I asked myself if a can do > better ? (index are well defined), where can I be wrong ?!
> Instead of calculating the results at query time, caculate them when
> you are adding the records.
of course I had thought at that .... and if I migrate my app, I will
need to do that ... sure.
But I simple asked why it's too slow than another db, for this kind of
needs.
and if there are tricks to replace avg(), min(), max()
On Sat, Apr 12, 2008 at 5:43 PM, manatlan <manat...@gmail.com> wrote:
> > Instead of calculating the results at query time, caculate them when > > you are adding the records.
> of course I had thought at that .... and if I migrate my app, I will > need to do that ... sure. > But I simple asked why it's too slow than another db, for this kind of > needs. > and if there are tricks to replace avg(), min(), max()
It might look almost look like a sql db when you squint, but it's optimised for a totally different goal. If you think that each different entity you retrieve could be retrieving a different disk block from a different machine in the cluster, then suddenly things start to make sense. avg() over a column in a sql server makes sense, because the disk accesses are pulling blocks in a row from the same disk (hopefully), or even better, all from the same ram on the one computer. With DataStore, which is built on top of BigTable, which is built on top of GFS, there ain't no such promise. Each entity in DataStore is quite possibly a different file in gfs.
So if you build things such that web requests are only ever pulling a single entity from DataStore - by always precomputing everything - then your app will fly on all the read requests. In fact, if that single entity gets hot - is highly utilised across the cluster - then it will be replicated across the cluster.
Yes, this means that everything that we think we know about building web applications is suddenly wrong. But this is actually a good thing. Having been on the wrong side of trying to scale up web app code, I can honestly say it is better to push the requirements of scaling into the face of us developers so that we do the right thing from the beginning. It's easier to solve the issues at the start, than try and retrofit hacks at the end of the development cycle.
> On Sat, Apr 12, 2008 at 5:43 PM, manatlan <manat...@gmail.com> wrote:
> > > Instead of calculating the results at query time, caculate them when
> > > you are adding the records.
> > of course I had thought at that .... and if I migrate my app, I will
> > need to do that ... sure.
> > But I simple asked why it's too slow than another db, for this kind of
> > needs.
> > and if there are tricks to replace avg(), min(), max()
> It might look almost look like a sql db when you squint, but it's
> optimised for a totally different goal. If you think that each
> different entity you retrieve could be retrieving a different disk
> block from a different machine in the cluster, then suddenly things
> start to make sense. avg() over a column in a sql server makes sense,
> because the disk accesses are pulling blocks in a row from the same
> disk (hopefully), or even better, all from the same ram on the one
> computer. With DataStore, which is built on top of BigTable, which is
> built on top of GFS, there ain't no such promise. Each entity in
> DataStore is quite possibly a different file in gfs.
> So if you build things such that web requests are only ever pulling a
> single entity from DataStore - by always precomputing everything -
> then your app will fly on all the read requests. In fact, if that
> single entity gets hot - is highly utilised across the cluster - then
> it will be replicated across the cluster.
> Yes, this means that everything that we think we know about building
> web applications is suddenly wrong. But this is actually a good thing.
> Having been on the wrong side of trying to scale up web app code, I
> can honestly say it is better to push the requirements of scaling into
> the face of us developers so that we do the right thing from the
> beginning. It's easier to solve the issues at the start, than try and
> retrofit hacks at the end of the development cycle.
On Apr 12, 11:02 am, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> Yes, this means that everything that we think we know about building
> web applications is suddenly wrong. But this is actually a good thing.
> Having been on the wrong side of trying to scale up web app code, I
> can honestly say it is better to push the requirements of scaling into
> the face of us developers so that we do the right thing from the
> beginning. It's easier to solve the issues at the start, than try and
> retrofit hacks at the end of the development cycle.
This is all very fine, however, consider that many of the sites are
not meant (and most probably never will) serve more than a few
hundreds hits per day, and for them to be basically redeveloped using
this frame of mind is impractical. But also consider, that, according
to a small test I've run, it takes GAE almost 3 seconds to save in the
datastore a miserable 50 of dummy records (consisting of just 2 text
fields). This means, for example, that my application, that
occasionally needs to create several hundreds of records upon a single
hit, just won't work - it hits the time limit appointed to a single
hit, and just fails.
On Sat, Apr 12, 2008 at 2:38 AM, dimrub <dim...@gmail.com> wrote:
> On Apr 12, 11:02 am, "Brett Morgan" <brett.mor...@gmail.com> wrote: > > Yes, this means that everything that we think we know about building > > web applications is suddenly wrong. But this is actually a good thing. > > Having been on the wrong side of trying to scale up web app code, I > > can honestly say it is better to push the requirements of scaling into > > the face of us developers so that we do the right thing from the > > beginning. It's easier to solve the issues at the start, than try and > > retrofit hacks at the end of the development cycle.
> This is all very fine, however, consider that many of the sites are > not meant (and most probably never will) serve more than a few > hundreds hits per day, and for them to be basically redeveloped using > this frame of mind is impractical. But also consider, that, according > to a small test I've run, it takes GAE almost 3 seconds to save in the > datastore a miserable 50 of dummy records (consisting of just 2 text > fields). This means, for example, that my application, that > occasionally needs to create several hundreds of records upon a single > hit, just won't work - it hits the time limit appointed to a single > hit, and just fails.
This might be a very naive observation, but I perhaps wonder then if
GAE is the tool for you.
As I see it the App Engine is for applications that are meant to
scale, scale and really scale. Sounds like an application with a few
hundred hits daily could easily run on traditional hosting platforms.
It's a completely different mindset.
On Apr 12, 10:38 am, dimrub <dim...@gmail.com> wrote:
> On Apr 12, 11:02 am, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> > Yes, this means that everything that we think we know about building
> > web applications is suddenly wrong. But this is actually a good thing.
> > Having been on the wrong side of trying to scale up web app code, I
> > can honestly say it is better to push the requirements of scaling into
> > the face of us developers so that we do the right thing from the
> > beginning. It's easier to solve the issues at the start, than try and
> > retrofit hacks at the end of the development cycle.
> This is all very fine, however, consider that many of the sites are
> not meant (and most probably never will) serve more than a few
> hundreds hits per day, and for them to be basically redeveloped using
> this frame of mind is impractical. But also consider, that, according
> to a small test I've run, it takes GAE almost 3 seconds to save in the
> datastore a miserable 50 of dummy records (consisting of just 2 text
> fields). This means, for example, that my application, that
> occasionally needs to create several hundreds of records upon a single
> hit, just won't work - it hits the time limit appointed to a single
> hit, and just fails.
> This might be a very naive observation, but I perhaps wonder then if > GAE is the tool for you.
> As I see it the App Engine is for applications that are meant to > scale, scale and really scale. Sounds like an application with a few > hundred hits daily could easily run on traditional hosting platforms.
> It's a completely different mindset.
> On Apr 12, 10:38 am, dimrub <dim...@gmail.com> wrote: > > On Apr 12, 11:02 am, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> > > Yes, this means that everything that we think we know about building > > > web applications is suddenly wrong. But this is actually a good thing. > > > Having been on the wrong side of trying to scale up web app code, I > > > can honestly say it is better to push the requirements of scaling into > > > the face of us developers so that we do the right thing from the > > > beginning. It's easier to solve the issues at the start, than try and > > > retrofit hacks at the end of the development cycle.
> > This is all very fine, however, consider that many of the sites are > > not meant (and most probably never will) serve more than a few > > hundreds hits per day, and for them to be basically redeveloped using > > this frame of mind is impractical. But also consider, that, according > > to a small test I've run, it takes GAE almost 3 seconds to save in the > > datastore a miserable 50 of dummy records (consisting of just 2 text > > fields). This means, for example, that my application, that > > occasionally needs to create several hundreds of records upon a single > > hit, just won't work - it hits the time limit appointed to a single > > hit, and just fails.
I don't think that is really the point. The overall point is the speed of the Db right? He has a couple hundred hits at the click of a button now and is complaining about the speed... What about someone who has several thousand / hundred thousand or even millions...
-Randy
On Sat, Apr 12, 2008 at 11:06 AM, barryhunter <BarryBHun...@googlemail.com> wrote:
> This might be a very naive observation, but I perhaps wonder then if > GAE is the tool for you.
> As I see it the App Engine is for applications that are meant to > scale, scale and really scale. Sounds like an application with a few > hundred hits daily could easily run on traditional hosting platforms.
> It's a completely different mindset.
> On Apr 12, 10:38 am, dimrub <dim...@gmail.com> wrote: > > On Apr 12, 11:02 am, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> > > Yes, this means that everything that we think we know about building > > > web applications is suddenly wrong. But this is actually a good thing. > > > Having been on the wrong side of trying to scale up web app code, I > > > can honestly say it is better to push the requirements of scaling into > > > the face of us developers so that we do the right thing from the > > > beginning. It's easier to solve the issues at the start, than try and > > > retrofit hacks at the end of the development cycle.
> > This is all very fine, however, consider that many of the sites are > > not meant (and most probably never will) serve more than a few > > hundreds hits per day, and for them to be basically redeveloped using > > this frame of mind is impractical. But also consider, that, according > > to a small test I've run, it takes GAE almost 3 seconds to save in the > > datastore a miserable 50 of dummy records (consisting of just 2 text > > fields). This means, for example, that my application, that > > occasionally needs to create several hundreds of records upon a single > > hit, just won't work - it hits the time limit appointed to a single > > hit, and just fails.
Again maybe I am missing something, but the datastore isnt designed to
be super fast at the small scale, but rather handle large amounts of
data, and be distributed. (and because its distributed it can appear
very fast at large scale)
So you break down your database access into very simple processes.
Assume your database acess is VERY slow, and rethink how you do
things. (Of course the peice in the puzzle 'we' are missing is
MapReduce! - the 'processing' part of the BigTable mindset)
On Apr 12, 6:02 pm, "Randy Johnson" <program...@cfconcepts.com> wrote:
> I don't think that is really the point. The overall point is the speed of
> the Db right? He has a couple hundred hits at the click of a button now and
> is complaining about the speed... What about someone who has several
> thousand / hundred thousand or even millions...
> -Randy
> On Sat, Apr 12, 2008 at 11:06 AM, barryhunter <BarryBHun...@googlemail.com>
> wrote:
> > This might be a very naive observation, but I perhaps wonder then if
> > GAE is the tool for you.
> > As I see it the App Engine is for applications that are meant to
> > scale, scale and really scale. Sounds like an application with a few
> > hundred hits daily could easily run on traditional hosting platforms.
> > > > Yes, this means that everything that we think we know about building
> > > > web applications is suddenly wrong. But this is actually a good thing.
> > > > Having been on the wrong side of trying to scale up web app code, I
> > > > can honestly say it is better to push the requirements of scaling into
> > > > the face of us developers so that we do the right thing from the
> > > > beginning. It's easier to solve the issues at the start, than try and
> > > > retrofit hacks at the end of the development cycle.
> > > This is all very fine, however, consider that many of the sites are
> > > not meant (and most probably never will) serve more than a few
> > > hundreds hits per day, and for them to be basically redeveloped using
> > > this frame of mind is impractical. But also consider, that, according
> > > to a small test I've run, it takes GAE almost 3 seconds to save in the
> > > datastore a miserable 50 of dummy records (consisting of just 2 text
> > > fields). This means, for example, that my application, that
> > > occasionally needs to create several hundreds of records upon a single
> > > hit, just won't work - it hits the time limit appointed to a single
> > > hit, and just fails.
Hmm, don't know that I'd go quite that far....scalability is important
but it would also be nice for our web pages to be fast.
It's nice that a page will be no slower with a million users than with
one user, but if "no slower" means "it will still take 4 seconds
serverside" that's a little less satisfying.
On Apr 12, 3:33 pm, barryhunter <BarryBHun...@googlemail.com> wrote:
> Again maybe I am missing something, but the datastore isnt designed to
> be super fast at the small scale, but rather handle large amounts of
> data, and be distributed. (and because its distributed it can appear
> very fast at large scale)
> So you break down your database access into very simple processes.
> Assume your database acess is VERY slow, and rethink how you do
> things. (Of course the peice in the puzzle 'we' are missing is
> MapReduce! - the 'processing' part of the BigTable mindset)
> On Apr 12, 6:02 pm, "Randy Johnson" <program...@cfconcepts.com> wrote:
> > I don't think that is really the point. The overall point is the speed of
> > the Db right? He has a couple hundred hits at the click of a button now and
> > is complaining about the speed... What about someone who has several
> > thousand / hundred thousand or even millions...
> > -Randy
> > On Sat, Apr 12, 2008 at 11:06 AM, barryhunter <BarryBHun...@googlemail.com>
> > wrote:
> > > This might be a very naive observation, but I perhaps wonder then if
> > > GAE is the tool for you.
> > > As I see it the App Engine is for applications that are meant to
> > > scale, scale and really scale. Sounds like an application with a few
> > > hundred hits daily could easily run on traditional hosting platforms.
> > > > > Yes, this means that everything that we think we know about building
> > > > > web applications is suddenly wrong. But this is actually a good thing.
> > > > > Having been on the wrong side of trying to scale up web app code, I
> > > > > can honestly say it is better to push the requirements of scaling into
> > > > > the face of us developers so that we do the right thing from the
> > > > > beginning. It's easier to solve the issues at the start, than try and
> > > > > retrofit hacks at the end of the development cycle.
> > > > This is all very fine, however, consider that many of the sites are
> > > > not meant (and most probably never will) serve more than a few
> > > > hundreds hits per day, and for them to be basically redeveloped using
> > > > this frame of mind is impractical. But also consider, that, according
> > > > to a small test I've run, it takes GAE almost 3 seconds to save in the
> > > > datastore a miserable 50 of dummy records (consisting of just 2 text
> > > > fields). This means, for example, that my application, that
> > > > occasionally needs to create several hundreds of records upon a single
> > > > hit, just won't work - it hits the time limit appointed to a single
> > > > hit, and just fails.- Hide quoted text -
On Sat, Apr 12, 2008 at 7:38 PM, dimrub <dim...@gmail.com> wrote:
> On Apr 12, 11:02 am, "Brett Morgan" <brett.mor...@gmail.com> wrote: > > Yes, this means that everything that we think we know about building > > web applications is suddenly wrong. But this is actually a good thing. > > Having been on the wrong side of trying to scale up web app code, I > > can honestly say it is better to push the requirements of scaling into > > the face of us developers so that we do the right thing from the > > beginning. It's easier to solve the issues at the start, than try and > > retrofit hacks at the end of the development cycle.
> This is all very fine, however, consider that many of the sites are > not meant (and most probably never will) serve more than a few > hundreds hits per day, and for them to be basically redeveloped using > this frame of mind is impractical. But also consider, that, according > to a small test I've run, it takes GAE almost 3 seconds to save in the > datastore a miserable 50 of dummy records (consisting of just 2 text > fields). This means, for example, that my application, that > occasionally needs to create several hundreds of records upon a single > hit, just won't work - it hits the time limit appointed to a single > hit, and just fails.
Can you come up with some questions? I don't understand what you don't understand. So if you can begin by telling me a bit of a story about where you are at, understanding wise, I'll start tailoring for you. Help me here =)
On Sat, Apr 12, 2008 at 6:36 PM, flashpad <pablono...@gmail.com> wrote:
> could you re-explain this in just a little bit simpler english?
> On Apr 12, 1:02 am, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> > On Sat, Apr 12, 2008 at 5:43 PM, manatlan <manat...@gmail.com> wrote:
> > > > Instead of calculating the results at query time, caculate them when > > > > you are adding the records.
> > > of course I had thought at that .... and if I migrate my app, I will > > > need to do that ... sure. > > > But I simple asked why it's too slow than another db, for this kind of > > > needs. > > > and if there are tricks to replace avg(), min(), max()
> > It might look almost look like a sql db when you squint, but it's > > optimised for a totally different goal. If you think that each > > different entity you retrieve could be retrieving a different disk > > block from a different machine in the cluster, then suddenly things > > start to make sense. avg() over a column in a sql server makes sense, > > because the disk accesses are pulling blocks in a row from the same > > disk (hopefully), or even better, all from the same ram on the one > > computer. With DataStore, which is built on top of BigTable, which is > > built on top of GFS, there ain't no such promise. Each entity in > > DataStore is quite possibly a different file in gfs.
> > So if you build things such that web requests are only ever pulling a > > single entity from DataStore - by always precomputing everything - > > then your app will fly on all the read requests. In fact, if that > > single entity gets hot - is highly utilised across the cluster - then > > it will be replicated across the cluster.
> > Yes, this means that everything that we think we know about building > > web applications is suddenly wrong. But this is actually a good thing. > > Having been on the wrong side of trying to scale up web app code, I > > can honestly say it is better to push the requirements of scaling into > > the face of us developers so that we do the right thing from the > > beginning. It's easier to solve the issues at the start, than try and > > retrofit hacks at the end of the development cycle.
<program...@cfconcepts.com> wrote: > I don't think that is really the point. The overall point is the speed of > the Db right? He has a couple hundred hits at the click of a button now and > is complaining about the speed... What about someone who has several > thousand / hundred thousand or even millions...
> -Randy
Remember what GFS and BigTable were originally designed for. Each BigTable entry contained a whole web page, and all the data relating to that web page as the various stages of the google processing pipeline are applied to the page. So storing two numbers in a BigTable entry is like putting a person in a 747, then complaining how long it takes to get the person 50 feet along the ground in said 747 - it would be quicker to get the person to walk.
The power of BigTable comes to the fore when you fill the 747 with people, fire up the engines, and then get the aircraft to cruising altitude. That's when you are using the tool properly.
> Again maybe I am missing something, but the datastore isnt designed to > be super fast at the small scale, but rather handle large amounts of > data, and be distributed. (and because its distributed it can appear > very fast at large scale)
The large chunks of data is the important bit. Much much larger than traditional rdbms rows. By several hundred orders of magnitude.
> So you break down your database access into very simple processes. > Assume your database acess is VERY slow, and rethink how you do > things. (Of course the peice in the puzzle 'we' are missing is > MapReduce! - the 'processing' part of the BigTable mindset)
On Sun, Apr 13, 2008 at 8:52 AM, DennisP <DennisBPeter...@gmail.com> wrote:
> Hmm, don't know that I'd go quite that far....scalability is important > but it would also be nice for our web pages to be fast.
> It's nice that a page will be no slower with a million users than with > one user, but if "no slower" means "it will still take 4 seconds > serverside" that's a little less satisfying.
I know it's going to sound glib, but if it is taking four seconds to render a page, you are using the tool wrong.
What we have to do here is together, as a group, start to explore all the ways of using this toolset, and come up with best practices on how to do things. So yes, seeing things like "it takes X seconds to save Y records" is important. It starts to give us all a feel for how not to do things.
But the next step is important. Exploring how to use it, instead of getting depressed that our initial attempts are wrong. We will keep at it until we get it.
> On Sun, Apr 13, 2008 at 8:52 AM, DennisP <DennisBPeter...@gmail.com> wrote:
> > Hmm, don't know that I'd go quite that far....scalability is important
> > but it would also be nice for our web pages to be fast.
> > It's nice that a page will be no slower with a million users than with
> > one user, but if "no slower" means "it will still take 4 seconds
> > serverside" that's a little less satisfying.
> I know it's going to sound glib, but if it is taking four seconds to
> render a page, you are using the tool wrong.
> What we have to do here is together, as a group, start to explore all
> the ways of using this toolset, and come up with best practices on how
> to do things. So yes, seeing things like "it takes X seconds to save Y
> records" is important. It starts to give us all a feel for how not to
> do things.
> But the next step is important. Exploring how to use it, instead of
> getting depressed that our initial attempts are wrong. We will keep at
> it until we get it.
On Sun, Apr 13, 2008 at 10:15 AM, DennisP <DennisBPeter...@gmail.com> wrote:
> Definitely agree with that...a wiki with best practices might be a > good idea...
> On Apr 12, 8:13 pm, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> > On Sun, Apr 13, 2008 at 8:52 AM, DennisP <DennisBPeter...@gmail.com> wrote:
> > > Hmm, don't know that I'd go quite that far....scalability is important > > > but it would also be nice for our web pages to be fast.
> > > It's nice that a page will be no slower with a million users than with > > > one user, but if "no slower" means "it will still take 4 seconds > > > serverside" that's a little less satisfying.
> > I know it's going to sound glib, but if it is taking four seconds to > > render a page, you are using the tool wrong.
> > What we have to do here is together, as a group, start to explore all > > the ways of using this toolset, and come up with best practices on how > > to do things. So yes, seeing things like "it takes X seconds to save Y > > records" is important. It starts to give us all a feel for how not to > > do things.
> > But the next step is important. Exploring how to use it, instead of > > getting depressed that our initial attempts are wrong. We will keep at > > it until we get it.
> > > On Sun, Apr 13, 2008 at 8:52 AM, DennisP <DennisBPeter...@gmail.com> > wrote:
> > > > Hmm, don't know that I'd go quite that far....scalability is > important > > > > but it would also be nice for our web pages to be fast.
> > > > It's nice that a page will be no slower with a million users than > with > > > > one user, but if "no slower" means "it will still take 4 seconds > > > > serverside" that's a little less satisfying.
> > > I know it's going to sound glib, but if it is taking four seconds to > > > render a page, you are using the tool wrong.
> > > What we have to do here is together, as a group, start to explore all > > > the ways of using this toolset, and come up with best practices on > how > > > to do things. So yes, seeing things like "it takes X seconds to save > Y > > > records" is important. It starts to give us all a feel for how not to > > > do things.
> > > But the next step is important. Exploring how to use it, instead of > > > getting depressed that our initial attempts are wrong. We will keep > at > > > it until we get it.
> > > > On Sun, Apr 13, 2008 at 8:52 AM, DennisP <DennisBPeter...@gmail.com> > wrote:
> > > > > Hmm, don't know that I'd go quite that far....scalability is > important > > > > > but it would also be nice for our web pages to be fast.
> > > > > It's nice that a page will be no slower with a million users than > with > > > > > one user, but if "no slower" means "it will still take 4 seconds > > > > > serverside" that's a little less satisfying.
> > > > I know it's going to sound glib, but if it is taking four seconds to > > > > render a page, you are using the tool wrong.
> > > > What we have to do here is together, as a group, start to explore all > > > > the ways of using this toolset, and come up with best practices on > how > > > > to do things. So yes, seeing things like "it takes X seconds to save > Y > > > > records" is important. It starts to give us all a feel for how not to > > > > do things.
> > > > But the next step is important. Exploring how to use it, instead of > > > > getting depressed that our initial attempts are wrong. We will keep > at > > > > it until we get it.
I have did some tests about the speed issues you guys talking about.
What i suggest, at first, is to read some Search Engines paper,
regarding boolean operators and search algs.
Some things i have noticed:
* AppEngine Database model seens to be based on inverted indexes
* You only have AND operator
* Each additional query its a perfomance issue
The overall perfomance, for search, its fast, i am doing some local
tests to compare to other search engines i have writed/supported and
speed seens good. Currently doing a test with 1 million records. Will
post bench later, need some additional data yet.
One thing i must say, the appengine database model is VERY different
from conventional relational and sqls, you must think the way search
engines think.
For example, knowing that we dont have an "OR" operator, i have
implemented the following way:
1. Query DB with first term
2. Query DB with second term
3. Merge Sort the two itens based on relevancy (Do your own
calculations if necessary)
I am pretty happy with the results.
But, when you create something like 'manatlan', the speed will be a
issue, because he designed the DB as a Relational Database, with joins
and on query calculations.
In general, you can explain a query in mySQL for example and
reimplement the steps in your script. Like i said, the BigTable is a
BASE model.
The approach of Brett Morgan is the way you need to think, but, a good
approach would be something like that:
1. Ignore your USERS model for the moment, you dont need to do
any select at this table
2. Do a query in Marks model, get topK or top100K records
3. Using a user function, do a data range select, using ifs
and merges to create a result
4. Sort the result array the way you like
5. Use it to get the top 10,20,30 users from result
I would like to hear what you guys think and any other ideas.
PS: Sorry for english, Brazil here.
Patrick Negri
On Apr 12, 8:16 pm, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> > > On Sun, Apr 13, 2008 at 8:52 AM, DennisP <DennisBPeter...@gmail.com> wrote:
> > > > Hmm, don't know that I'd go quite that far....scalability is important
> > > > but it would also be nice for our web pages to be fast.
> > > > It's nice that a page will be no slower with a million users than with
> > > > one user, but if "no slower" means "it will still take 4 seconds
> > > > serverside" that's a little less satisfying.
> > > I know it's going to sound glib, but if it is taking four seconds to
> > > render a page, you are using the tool wrong.
> > > What we have to do here is together, as a group, start to explore all
> > > the ways of using this toolset, and come up with best practices on how
> > > to do things. So yes, seeing things like "it takes X seconds to save Y
> > > records" is important. It starts to give us all a feel for how not to
> > > do things.
> > > But the next step is important. Exploring how to use it, instead of
> > > getting depressed that our initial attempts are wrong. We will keep at
> > > it until we get it.
Yes, its slow. Very slow compared to mySql, but, you cant compare ;).
I dont know exactly how the indexes work behind the scenes, but i
hope it works the way google indexes works (Any Google Engineer here
to help us on this?).
If the indexes work like this:
* Each time a record is added, new term is weighted and added
to inverted index.
* This could explain the slow insert time, because we are
adding + indexing
* The inverted index is something like a pre-made "search".
For those that dont know what inverted indexes are:
Supose you have 2 docs.
DocID 1 => The cat is king
DocID 2 => I have bought food for my cat today.
Now, in indexes, you have some lexicon:
1. the, 2. cat, 3. is, 4. king, 5. i, 6. have, 7. bought, 8. food, 9.
for, 10. my, 11. today
Using this, we create inverted indexes for each lexicon, example:
Lexicon "cat" => TermID = 3
DocID(1),1 // DocID 1 has 1 hit for this
DocID(2),1 // DocID 2 has 1 hit for this (Also, here another question,
how the relevancy alg is calculated, i dont think AppEngine is a
exacly match of google indexes, but, if i have 2 hits, the DocID goes
up?)
Now, you guys can see why the search is faster. When you search, a
lookup for termID is done, then you can access a list of hits,
instantly. Because this its very efficient on searchers. Now imagine
the insert, its a bit tricky and i would imagine why its a bit slow.
Any Google Engineers here to answer these questions about indexes?
On Sun, Apr 13, 2008 at 11:15 AM, PatrickNegri <patrickne...@gmail.com> wrote:
> Also, i must say something about Insert speed.
> Yes, its slow. Very slow compared to mySql, but, you cant compare ;).
> I dont know exactly how the indexes work behind the scenes, but i > hope it works the way google indexes works (Any Google Engineer here > to help us on this?).
> If the indexes work like this: > * Each time a record is added, new term is weighted and added > to inverted index. > * This could explain the slow insert time, because we are > adding + indexing > * The inverted index is something like a pre-made "search".
> For those that dont know what inverted indexes are:
> Supose you have 2 docs. > DocID 1 => The cat is king > DocID 2 => I have bought food for my cat today.
> Now, in indexes, you have some lexicon: > 1. the, 2. cat, 3. is, 4. king, 5. i, 6. have, 7. bought, 8. food, 9. > for, 10. my, 11. today
> Using this, we create inverted indexes for each lexicon, example: > Lexicon "cat" => TermID = 3 > DocID(1),1 // DocID 1 has 1 hit for this > DocID(2),1 // DocID 2 has 1 hit for this (Also, here another question, > how the relevancy alg is calculated, i dont think AppEngine is a > exacly match of google indexes, but, if i have 2 hits, the DocID goes > up?)
> Now, you guys can see why the search is faster. When you search, a > lookup for termID is done, then you can access a list of hits, > instantly. Because this its very efficient on searchers. Now imagine > the insert, its a bit tricky and i would imagine why its a bit slow.
> Any Google Engineers here to answer these questions about indexes?