It might look almost look like a sql db when you squint, but it's
optimised for a totally different goal. If you think that each
different entity you retrieve could be retrieving a different disk
block from a different machine in the cluster, then suddenly things
start to make sense. avg() over a column in a sql server makes sense,
because the disk accesses are pulling blocks in a row from the same
disk (hopefully), or even better, all from the same ram on the one
computer. With DataStore, which is built on top of BigTable, which is
built on top of GFS, there ain't no such promise. Each entity in
DataStore is quite possibly a different file in gfs.
So if you build things such that web requests are only ever pulling a
single entity from DataStore - by always precomputing everything -
then your app will fly on all the read requests. In fact, if that
single entity gets hot - is highly utilised across the cluster - then
it will be replicated across the cluster.
Yes, this means that everything that we think we know about building
web applications is suddenly wrong. But this is actually a good thing.
Having been on the wrong side of trying to scale up web app code, I
can honestly say it is better to push the requirements of scaling into
the face of us developers so that we do the right thing from the
beginning. It's easier to solve the issues at the start, than try and
retrofit hacks at the end of the development cycle.
brett
Remember what GFS and BigTable were originally designed for. Each
BigTable entry contained a whole web page, and all the data relating
to that web page as the various stages of the google processing
pipeline are applied to the page. So storing two numbers in a BigTable
entry is like putting a person in a 747, then complaining how long it
takes to get the person 50 feet along the ground in said 747 - it
would be quicker to get the person to walk.
The power of BigTable comes to the fore when you fill the 747 with
people, fire up the engines, and then get the aircraft to cruising
altitude. That's when you are using the tool properly.
The large chunks of data is the important bit. Much much larger than
traditional rdbms rows. By several hundred orders of magnitude.
> So you break down your database access into very simple processes.
> Assume your database acess is VERY slow, and rethink how you do
> things. (Of course the peice in the puzzle 'we' are missing is
> MapReduce! - the 'processing' part of the BigTable mindset)
Oh what i'd do to have access to MapReduce.
I know it's going to sound glib, but if it is taking four seconds to
render a page, you are using the tool wrong.
What we have to do here is together, as a group, start to explore all
the ways of using this toolset, and come up with best practices on how
to do things. So yes, seeing things like "it takes X seconds to save Y
records" is important. It starts to give us all a feel for how not to
do things.
But the next step is important. Exploring how to use it, instead of
getting depressed that our initial attempts are wrong. We will keep at
it until we get it.
Why didn't i see that one coming? =)
And no, I'm not a google engineer.
I know it's going to sound glib, but if it is taking four seconds to
render a page, you are using the tool wrong.
Also, i must say something about Insert speed.
Yes, its slow. Very slow compared to mySql, but, you cant compare ;).
I dont know exactly how the indexes work behind the scenes, but i
hope it works the way google indexes works (Any Google Engineer here
to help us on this?).
Ok, yes, valid point. I guess the question i keep forgetting to ask
people is, do you want to write code for a platform that will
distribute your code across hundreds of nodes and replicate your data
across hundreds of nodes such that your application can scale to
millions of users, without having to pony up the money and sysadmin
effort at the start of the project to install and configure those
hundreds of nodes?
Because if your intended audience can be served from a single box
sitting in a co-lo somewhere, using the techniques and languages you
already know and love, then yeah, GAE could quite possibly be the
wrong tool for your application.
I'm however doing this purely for the mental challenge. I like the
fact that this platform by the very way it is implemented, is
encouraging us developers to build code in a way that will scale. Yes,
this means that just about everything we are used to doing is going to
suck. Because everything we are used to doing is optimised for the
single sql database engine behind a single webserver app, because that
is what we have been developing on for the last decade. When you go
from single webserver to multiple servers, suddenly things get nasty.
You get session replication issues. And when you have to shard your
database you wind up with data correctness issues.
Writing for scale is hard. GAE is interesting because it makes the
scaling issues obvious because it either makes them illegal to do
(JOINs) or exceedingly slow (Pulling back hundreds of records and
computing something before rendering a single page). Because it is
obvious from the get go, we can learn how to write for scale.
On Sun, Apr 13, 2008 at 6:32 PM, Dmitry Rubinstein <dim...@gmail.com> wrote:
>
>
> On Sun, Apr 13, 2008 at 3:13 AM, Brett Morgan <brett....@gmail.com>
> wrote:
> > I know it's going to sound glib, but if it is taking four seconds to
> > render a page, you are using the tool wrong.
>
> Or, maybe, I'm using a wrong tool.
Ok, yes, valid point.
Agree to disagree?
All I'm trying to do is to encourage people to see this as an
opportunity to understand a new tool, instead of bitching that it
doesn't handle like the old tools that they have been using up until
now. Because, in all honesty, an Su-30 makes for a horrible car, with
really bad cornering, and horrible clearence, what with the wings
hanging out the side and all. But as a jet fighter? Now that's a
different question...
=)
still planning to add some features like embed youtube in window
but...
the basic stuff:
{ = <ul>
* item = <li>item
[ text ] = blockquote
--- = <hr>
@title@ = <h1>
===subtitle=== <h2>
[http://www. name] => external link
{wikiobj Name} => text link (open a aewindow)
|wikiobj Name} => open a aewindow of wiki page
||wikiobj "Name" window_to_close|| => open a aewindow of wiki page,
and close window_to_close page
[IMG:url] => insert external image