In-memory database (inspired by 864GB of RAM for $12,000)

1,862 views
Skip to first unread message

Alexey Petrushin

unread,
Jan 27, 2012, 12:23:45 PM1/27/12
to nod...@googlegroups.com
Today I saw this note about 864GB of RAM for $12,000 and get a little stunned ( http://37signals.com/svn/posts/3090-basecamp-nexts-caching-hardware ). 

I mean - if the memory is so cheap now, maybe we can use in-memory databases for some situations? In average projects the size of database is about a couple of tens or hundreds of GB (not talking about analytics and other data-heavy apps).

For example, let's suppose we can split (shard) our dataset into pieces ~ 50Mb each (online organizers, tasks managers, sites, an so on, splitting data by account id).

In this specific situation (set of small independent datasets) we can ignore most complexities of DB-technology:
We don't care about consistency, availability, concurrency, MVCC, locking and throughput for the 50Mb (the only problem - long IO operations - but Node.JS is good with it).

Another problem - indexes, but I believe it's easy. If it's easy to build an index with CouchDB using map/reduce with lots of limitations. Then it's should be even easier to build an index using arbitrary code and without any limitation.

Persistence & fault-tolerance - save changes after every update to mysql and read the whole dataset in-memory after starting the app-process.

Deployment - there are N node processes with M accounts in each and balancing proxy sending request to N-th node by consistent account-id hash.

Tempted to try such architecture (in-memory, in-process, ad-hook DB for Node.JS) and would like to hear a critique about it (I mean use it only in cases when we can split our data into small independent pieces, I don't consider this approach outside of this area).
So, what do You think?

And, maybe someone already using this, or maybe there are some such database projects on github?

Thanks

Axel Kittenberger

unread,
Jan 27, 2012, 12:43:00 PM1/27/12
to nod...@googlegroups.com
Redis is a no-sql database optimized for memory-fit.

> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en

Alex Gadea

unread,
Jan 27, 2012, 1:17:18 PM1/27/12
to nod...@googlegroups.com
Most databases (nosql and sql based) keep their hot data in memory anyway.  Nosql databases are far more aggressive in how they accomplish this and you will see a huge huge performance dropoff when your dataset can no longer be stored in memory and the db has to hit disk.  The relatively low-cost of memory has made nosql databases a realistic possibility for many companies.  

Having said that, creating a node-based memory database would be an interesting exercise and could serve the needs of many app bound caching requirements that do not need to exceed the node memory boundaries.

Alex

kowsik

unread,
Jan 27, 2012, 1:45:08 PM1/27/12
to nod...@googlegroups.com
From a week of hacking (a while back): https://github.com/pcapr/memcouchd

Last I checked, most of the CouchDB tests were passing. :)

K.
---
http://blitz.io
@pcapr

Alexey Petrushin

unread,
Jan 27, 2012, 1:57:24 PM1/27/12
to nod...@googlegroups.com
> Redis is a no-sql database optimized for memory-fit.
Yes, but I mean a custom, in-process database with logic specific and optimized to Your app. 
So You can simplify lots of common tasks.  For example You can easily implement Tag Cloud, without using background tasks.

Or You can use unique data structures that otherwise hard to do in SQL or NoSQL databases.

So, advantages of such approach is: simplicity and flexibility.

Tim Caswell

unread,
Jan 27, 2012, 2:16:18 PM1/27/12
to nod...@googlegroups.com
So something like this?

var db = {}
// Custom code here

Remember that javascript objects have some pretty small limits when
doing these kinds of things. The GC gets very angry when there are
millions of objects or millions of properties in an object (including
array keys).

You can packs all sorts of data into node buffers though. I once made
a database for a tile-based game in node. The world was split into
1024x1024 tile sections of map. Each square could be one of 256
different tiles. I kept each of these map tiles in a node buffer 1MB
in size.

These larger tiles were kept in a javascript two-dimensional sparse
array. I would allocate a new 1mb tile every time a value was set in
an area that's never been used before. The basic idea is something
like:


function Chunk(){}
Chunk.prototype.get = function (x, y) {

Tim Caswell

unread,
Jan 27, 2012, 2:21:44 PM1/27/12
to nod...@googlegroups.com
gah, premature send. Anyway.

function getValue(x, y) {
var tx = x >> 10;
var ty = y >> 10;
var ix = x % 1024;
var iy = y % 1024;
return numToTile(tiles[y][x][y * 1024 + x]);
}

The important part is that I balance my objects so that there is both
minimal number of objects and minimal number of properties on each
object. I store the bulk of the data in tightly packed node buffers
(which live outside the V8 heap).

I wrote a script to stress test this system and it happily used 20Gb
of ram before I got tired of running my laptop on pure swap space.

On the other hand, nStore used to keep an in-memory index of the file
offsets of all the fields in it's key-value database. The actual
values were stored on disk in JSON format within a single file. When
I stress tested this at a million documents, the GC in V8 fell apart
and brought my process to a crawl. I was measuring thousands of ms
for each and every new property I added to my index object.

Alexey Petrushin

unread,
Jan 28, 2012, 2:38:16 AM1/28/12
to nod...@googlegroups.com
> Remember that javascript objects have some pretty small limits when
> doing these kinds of things.  The GC gets very angry when there are
> millions of objects or millions of properties in an object (including
> array keys).
Didn't know that. But maybe it would be possible to use B+Tree, it should be balanced automatically.
I checked this implementation http://blog.conquex.com/?p=84 (the same code but whithout the MooTools dependency https://gist.github.com/1693229) the overhead in memory compared to array seems to be little.

jmartins

unread,
Jan 28, 2012, 8:00:24 AM1/28/12
to nod...@googlegroups.com

Do you know http://voltdb.com/ ? It's in memory and very fast.

regards
Joao Martins

Anshuman Ghosh

unread,
Jan 27, 2012, 12:50:04 PM1/27/12
to nod...@googlegroups.com

And if not using redis, you can also define mysql tables to be in-memory. And mysql cluster tables/indexes are in memory as well.

John Piekos

unread,
Jan 27, 2012, 1:36:58 PM1/27/12
to nodejs
VoltDB is all in memory and does a lot of what you suggest. It will
auto-partition your data set, so no need for the application to worry,
and you'd get ACID properties.
And it has a node.js client.

On github: https://github.com/VoltDB/voltdb

There's a new node.js client and sample app that will be available
soon - let me know if you're interested in trying it out.

John

On Jan 27, 12:23 pm, Alexey Petrushin <alexey.petrus...@gmail.com>
wrote:

Michael Pantaleo

unread,
Jan 28, 2012, 2:46:15 PM1/28/12
to nodejs
I would like to recommend that you take a look at Globals (http://
GlobalsDB.org/), a NoSQL offering from InterSystems, which exposes a
powerful multi-dimensional data engine, that also lies at the core of
their commercial product Caché (http://www.intersystems.com/cache/
index.html).  With over 30+ years of optimizations to the data engine
that runs Globals / Caché, the database performs at speeds comparable
to that of in-memory databases, while at the same time maintaining
persistency (data is not lost when a machine is turned off or
crashes).

Cheers,
Michael

On Jan 27, 9:23 am, Alexey Petrushin <alexey.petrus...@gmail.com>
wrote:

Nuno Job

unread,
Jan 28, 2012, 4:49:33 PM1/28/12
to nod...@googlegroups.com
volt has acid semantics but its based on short lived locks. this means you cant really do mixing analytics/write kind of scenarios but it can work out for very fast OLTP scenarios, preserving acid semantics.

the focus on your analysis to see if volt is a great fit should be the short lived locks. if you are fine with that, its probably going to be super fast. if you need to do something that goes and preserves the lock for a bit, well its kind of like a node processing doing intensive cpu. you will have to wait. and it will suck. engineers at volt probably would agree, they think of volt as the future super fast oltp computation. open source of course, super cool project.

as for integrating volt with node i wonder really. would love to read a blog post about it from someone that tried it out.

nuno

jmartins

unread,
Jan 29, 2012, 10:16:31 AM1/29/12
to nod...@googlegroups.com

The only problem about VoltDB is that Voltdb is build in Java not in Nodejs :-)

regards
Joao Martins

John Piekos

unread,
Jan 29, 2012, 9:59:14 PM1/29/12
to nodejs
Hi Nuno,

Yes, VoltDB is very fast for OLTP and fast for real time analytics/
leaderboard type reporting. It isn't fast at deep analytics, other
technology is better suited.

Look for some node.js/voltdb blog posts in the near future. Some
interesting and performance and throughput!

John

Nuno Job

unread,
Jan 29, 2012, 10:47:33 PM1/29/12
to nod...@googlegroups.com
Thanks John,

I'll make sure to stay tuned. I love how clear you guys are about what VoltDB is. It's so easy to reason with technology when people don't endorse bad usage of it, even if they have a vested interest in it.

Keep it up, super interested in those blog posts. Feel free to send me a private msg with your twitter so I make sure I dont loose out on those :)

Nuno

On Sun, Jan 29, 2012 at 6:59 PM, John Piekos <jpi...@gmail.com> wrote:
is better suite

Alan Gutierrez

unread,
Jan 30, 2012, 1:57:07 AM1/30/12
to nod...@googlegroups.com
On 1/27/12 12:23 PM, Alexey Petrushin wrote:
> Today I saw this note about 864GB of RAM for $12,000 and get a little
> stunned (
> http://37signals.com/svn/posts/3090-basecamp-nexts-caching-hardware ).
>
> I mean - if the memory is so cheap now, maybe we can use in-memory
> databases for some situations?
>
> Another problem - indexes, but I believe it's easy. If it's easy to
> build an index with CouchDB using map/reduce with lots of limitations.
> Then it's should be even easier to build an index using arbitrary code
> and without any limitation.
>
> Persistence & fault-tolerance - save changes after every update to mysql
> and read the whole dataset in-memory after starting the app-process.
>
> Deployment - there are N node processes with M accounts in each and
> balancing proxy sending request to N-th node by consistent account-id hash.
>
> Tempted to try such architecture (in-memory, in-process, ad-hook DB for
> Node.JS) and would like to hear a critique about it (I mean use it only
> in cases when we can split our data into small independent pieces, I
> don't consider this approach outside of this area).
> So, what do You think?
>
> And, maybe someone already using this, or maybe there are some such
> database projects on github?

I'm working on a Node.js implementation of a b-tree.

Indexes - It is an index.
Persistence and fault tolerance - Leaf page writes are always appends.
Branch and leaf page rewrites at split and merge create replacement
files that are linked into place.
Deployment - Written for Node.js in CoffeeScript. Deploy along with the
rest of your Node.js modules.

It's not in memory implementation because it's not that much more
difficult to do the paging. The premise of app specific database
architecture is the key premise.

My desire is to implement a networked consensus algorithm, and then
offer the community the database primitives necessary to experiment with
different database designs, b-tree index, write-ahead log, consensus.
This b-tree can be used as both an index and a write ahead log.

Not done. Working on balancing the tree. You're welcome to watch the
repo if it interests you.

http://bigeasy.github.com/strata/
https://github.com/bigeasy/strata

--
Alan Gutierrez - http://twitter.com/bigeasy - http://github.com/bigeasy

Joshua Kehn

unread,
Jan 30, 2012, 2:12:54 AM1/30/12
to nod...@googlegroups.com
Would be interested if it wasn't in CoffeeScript.

Regards,

–Josh
____________________________________
Joshua Kehn | @joshkehn
http://joshuakehn.com

Dean Landolt

unread,
Jan 30, 2012, 9:47:52 AM1/30/12
to nod...@googlegroups.com
On Mon, Jan 30, 2012 at 2:12 AM, Joshua Kehn <josh...@gmail.com> wrote:
Would be interested if it wasn't in CoffeeScript.

Then port it.

This comment is approximately as useful as all those "where's teh coffeescript version?" posts. Perhaps less so because coffeescript compiles to reasonably legible javascript.

Now I'm no coffeescript fan either but this is kind of lib is awesome to see. And really impressive from what I've seen of it. Let's not piss on it because it's not in our preferred dialect.

Keep up the great work, Alan.

Alexey Petrushin

unread,
Jan 30, 2012, 1:37:54 PM1/30/12
to nod...@googlegroups.com
I would like to specify a little more exactly why I would like to use in-memory db. 

The top-speed isn't the goal in this case (usually it's the main reason of using in-memory db, but in this case it's not), average-speed (mysql-like) is fine.

The goals are - simplicity and flexibility of domain model. Not the top-speed.

I want to work with domain model like with a pencil and paper, like drawing a sketch, fast and freely, without any constraints.
To make fast prototyping and try new ideas that otherwise is hard or slow to implement in traditional DB design.

But to be able to do it (to be able to drop all the usual limitations of DB design) You has to accept some another limitations:
- Data has to be split into small independent pieces (let's call it tenants)
- App overall can scale, but the single tenant can't, it always will be limited and processed only on the one node.
- App overall will be available and fault tolerant, but small percent of tenants may sometimes fail and be unavailable.

For some applications those limitations are acceptable.

And because You need flexibility and ability to implement custom rules and logic - it should be in JavaScript and in Node.JS (in same or other process).

P.S.
You can use it to quickly test new ideas and build prototypes. When You build a prototype You just ignore all the indexes and performance issues and just query data by iterating over all tenant data.

And if it turns out that idea is bad - just throw it out. 
Most of ideas are bad, and this approach helps to quickly try it and throw out without spending much time on it.

And if it's good - then You may use it, You don't have to completely rebuild Your prototype with MySQL or MongoDB or something other.
Load app with pressure, measure bottleneck and start to refactoring and adding indexes iteratively.

This is the main goal. 

OldMster

unread,
Jan 30, 2012, 3:46:20 PM1/30/12
to nod...@googlegroups.com
I would second Michael's recommendation of Globals.  If you prefer a pure open source product, then try GT.M (available on sourceforge).  Both are similar, and have a similar heritage.  For GT.M (available at  http://sourceforge.net/projects/fis-gtm/ ), Rob Tweed has a node.js API available at http://www.mgateway.com.  Intersystems was working on a 'pure' node.js API for Globals, but I haven't checked lately to see if it is available.  Both are schemaless databases (although bolt on's are available to add an SQL API to the data, but then you have to define the schema), and are very good for quick prototyping, but still offer production quality data integrity, speed, and safety.
They are the only database I use regularly.  I've looked (and continue to look) at others, but none have yet offered any feature that made the cost of the learning curve worth it.
Mark

Michael Pantaleo

unread,
Feb 1, 2012, 8:56:33 PM2/1/12
to nodejs
As Mark states in his response, Globals (http://GlobalsDB.org/) is a
pure schemaless Database which is not limited to one of the known
database types, or basic data models, by which NoSQL databases are
most often defined (i.e. Key/Value, Column-Oriented, Document, and
Graph).  Globals gives you the flexibility to build whatever paradigm
suites your needs, on top of the core engine (global structures).

With regard to interfaces to the Globals Database, we realized that
Node.js was/is a wildly popular technology, so we added it as an
interface to Globals early last year (2011).  We now support the
following three interfaces to Globals, with several others coming
later this year: Java, Node.js, .NET (will be released soon)

Please come and check out Globals for yourself, and download a free
copy of the software at the following location:  http://GlobalsDB.org/
downloads/

We are also currently hosting our 3rd Globals Programming Challenge,
which will take place on the 10th of February (next Friday) @ 18:00
EST.  Come and compete for a chance of winning our Grand Prize - US
$3,500.

- Michael P.

On Jan 30, 12:46 pm, OldMster <msi...@verizon.net> wrote:
> I would second Michael's recommendation of Globals.  If you prefer a pure
> open source product, then try GT.M (available on sourceforge).  Both are
> similar, and have a similar heritage.  For GT.M (available at  http://sourceforge.net/projects/fis-gtm/), Rob Tweed has a node.js API
> available athttp://www.mgateway.com.  Intersystems was working on a 'pure'

Michael Pantaleo

unread,
Feb 1, 2012, 8:59:47 PM2/1/12
to nodejs
I forgot to add the URL to the Globals Challenge: http://GlobalsDB.org/mchallenges/

Matt

unread,
Feb 2, 2012, 10:41:22 AM2/2/12
to nod...@googlegroups.com
I keep hearing you guys post about Globals here on the list, and I assume you work there (?), but I never hear of anyone actually using it (in the node.js community, I don't know about the Java and .NET communities)... Do you have many node.js users, or any examples of things that are using it to good advantage? Did any Node users take part in last year's challenge?

(I don't mean this as a bash at Globals - I'm genuinely curious)

OldMster

unread,
Feb 2, 2012, 10:54:49 AM2/2/12
to nod...@googlegroups.com
The node.js hospital check in Kiosk that was posted here a few days ago uses Caché as the database.  Globals is 'stripped down' version of Caché.  Michael can detail it better than I can I'm sure, but basically Globals is Caché stripped down to the database engine, with API's only, instead of the native scripting language.

Caché, Globals, GT.M is what the NoSQL databases will be when they grow up...  (Putting on flame resistant suit :-)  )

Mark

OldMster

unread,
Feb 2, 2012, 10:56:28 AM2/2/12
to nod...@googlegroups.com
I forgot to add that Michael may work for InterSystems, but I'm just a long time user.  Mostly I argue with them and tell them how bad they are..... :-)

Mark

Matt

unread,
Feb 2, 2012, 11:24:05 AM2/2/12
to nod...@googlegroups.com
On Thu, Feb 2, 2012 at 10:54 AM, OldMster <msi...@verizon.net> wrote:
The node.js hospital check in Kiosk that was posted here a few days ago uses Caché as the database.  Globals is 'stripped down' version of Caché.  Michael can detail it better than I can I'm sure, but basically Globals is Caché stripped down to the database engine, with API's only, instead of the native scripting language.

Ah awesome - I've posted on that thread to ask about it then.
 
Caché, Globals, GT.M is what the NoSQL databases will be when they grow up...  (Putting on flame resistant suit :-)  )

I'd love to know why.

rtweed

unread,
Feb 2, 2012, 11:35:41 AM2/2/12
to nodejs
Michael works with InterSystems (the vendors of Globals and Cache),
but the other folks that talk about it here (including myself) are
independent users of Globals, Cache and/or GT.M

You're going to see a lot of growing use of Node.js with these
database technologies in healthcare in the coming months. I'm doing a
lot of work with the WorldVistA group with the Open Source VistA
electronic healthcare record (EHR). VistA will run on either Cache or
GT.M (though not Globals). VistA was developed at the Dept of
Veterans Affairs (VA) and is used to manage and support healthcare for
all US veterans (I believe > 4m patients), and is increasingly being
used outside the VA as a very powerful and fully functional
alternative to the eye-wateringly expensive commercial EHRs. The use
of Node.js is allowing us to massively enhance its UI and functional
capabilities - it's a very exciting time and Node is going to be
making a huge difference.

Rob

On Feb 2, 3:41 pm, Matt <hel...@gmail.com> wrote:
> I keep hearing you guys post about Globals here on the list, and I assume
> you work there (?), but I never hear of anyone actually using it (in the
> node.js community, I don't know about the Java and .NET communities)... Do
> you have many node.js users, or any examples of things that are using it to
> good advantage? Did any Node users take part in last year's challenge?
>
> (I don't mean this as a bash at Globals - I'm genuinely curious)
>

Michael Pantaleo

unread,
Feb 2, 2012, 2:06:13 PM2/2/12
to nodejs
Matt,

Yes, I work for InterSystems, and we released Globals (http://
GlobalsDB.org/) less than a year ago as a Free, NoSQL Database to
promote out technology as an alternative database.  There are several
applications that use the Node.js interface to Globals, and I'm sure
there will be many more to follow.  We only released the Node.js
interface to Globals in late August of last year (2011), and we will
be releasing the Node.js Windows version in the next few weeks.  With
less than 6 months as an interface to Globals, Node.js has become
quite popular (I use it quite a bit myself and love it).  The high
performance and highly scalable goals for Node.js are comparable to
those set forth for Globals, and Caché, and thus offers a perfect fit
as an interface to our database.  Since Globals offers comparable
speeds to that of an in-memory database, we offer synchronous Node
method invocations to the Globals Database, to optimize throughput, as
well as the traditional asynchronous methods that are most familiar to
Node/JS developers.

Several people have used the Node.js interface for the previous two
Globals Programming Challenges, and their applications were quite
nice: http://GlobalsDB.org/mchallenges/entries/ (See the application
from "robtweed" for Challenge #1, and the application from "daimor"
for Challenge #2).  We encourage anyone to participate in our
Challenges (http://globalsdb.org/mchallenges/), with a change to win
US $3,500, and you are free to use any of the interfaces that we
support: Java, Node.js, .NET (coming very soon).

- Michael P

On Feb 2, 7:41 am, Matt <hel...@gmail.com> wrote:
> I keep hearing you guys post about Globals here on the list, and I assume
> you work there (?), but I never hear of anyone actually using it (in the
> node.js community, I don't know about the Java and .NET communities)... Do
> you have many node.js users, or any examples of things that are using it to
> good advantage? Did any Node users take part in last year's challenge?
>
> (I don't mean this as a bash at Globals - I'm genuinely curious)
>

Juraj Vitko

unread,
Feb 2, 2012, 7:17:50 PM2/2/12
to nodejs
That's $13 per GB (ignoring the cost the host machine and operation
costs for simplicity). I'd say the first question is whether a data
stored in that GB of RAM can make you more than $13 (in reasonable
time), and from then on you can scale out. The next limit to look out
for is how much of RAM can you slot into a single node. After that,
it's about the throughput, latency and the price of cluster
interconnects, and your ability to deal with the utter ugliness of
data sharding. The next limit is how processable will this memory be -
i.e. how many hardware threads (and L1/2 caches and memory
controllers) you can unleash to work with this huge RAM and what the
cost of the thread synchronization will be. (You don't want to have a
2TB RAM machine capped on 1000 reqs/per sec.) But of course, we are
already talking about architecture with the potential to leave much
bigger clusters of classic solutions in a pile of dust, and enabling
novelty business models in the process (a big forum where you don't
have to wait a few seconds to post a message and another few seconds
to reload the page to see other's messages - a real-time, 100k user,
socket.io, acid pub/sub anyone? things like that).

But I think the most important point you are hinting at is to have a
primitive, but insanely fast, programmable and predictable database
residing in-process with Node.js (allocating data outside of the V8
heap of course). I believe this is where it's all going. If we look at
the area of database development, where is the main bottleneck? OK, I
see three.

First, we spend quite insane amount of money in form of CPU cycles
just to have the DB separated from the application server, out of
multitude of fears, the most prominent being not letting our "savage"
app coders on the holy DB code (eventually the same folks end up
working on app and DB anyway); to have the DB serve multiple web
servers, which historically are slow and can't handle many concurrent
users when not employing an evented I/O (yet nowadays the reality is
that a single Node.js will be waiting for your DB server, no matter
how optimized or cached); and to achieve better stability - in case
your web server goes down, your DB will stay up (historically, your
app server is a trash hardware, and your holy DB is a rugged,
expensive machine with a RAID controller with a battery backed cache,
etc., unlikely to ever go down -- well today we prefer to have
redundancy everywhere and do it on cheap).

Second, we try to keep the DB API simple and clean to keep things
fast, maintainable and reliable (yet what eventually always happens is
that we are forced to bring some form of scripting to the database,
e.g. SQL stored procs, Lua in Redis - because it simply makes a lot of
sense).

Third, DBs evolve slowly, the reason being that C/C++ compared to
other languages is a horrid environment for evolution of complex
software. The DB is not just storing data - it contains query logic,
persistence logic, cluster logic - lots of coding and lots of
potential for evolution. So why not to keep the core as simple and
fast as possible in C/C++, and throw the rest of the DB logic to the
environment where we can already observe *explosive* evolution of
existing frameworks/modules, which Node.js undoubtedly is? (Fun fact -
today I noticed: http://rubygems.org/gems - 7 years - 2202 packages;
http://search.npmjs.org/ - 2 years - 6862 packages -- and this is just
comparing Node.js with another language that is considered to be
*absolutely great* for development.)

I think interesting to mention here will be Redis with its Lua
scripting support, and the Alchemy Database, which is a SQL subset
built in Lua on top of the terse Redis API: http://code.google.com/p/alchemydatabase/
(I'm not advocating SQL or no-SQL here here at all.)

Another case in point appears to be the GlobalDB just mentioned in
this discussion and the Caché product build around it - but this is
the first time I see these, so pardon me if I'm way off.

So, what I think we need, is a Node.js native module with perhaps even
more limited/simpler API than Redis has, working in its own memory
segment, perhaps with is own memory allocator, and then we can all
have fun building our flavors of DB logic around that - clustering,
persistence, transactions - whatever - and we may start seeing some
really interesting things quite soon. Btw - if someone is intent on
having the Web/App server and DB separated, they can still do it - the
Node.js running the DB module can act as a standalone server too.

J.




On Jan 27, 7:23 pm, Alexey Petrushin <alexey.petrus...@gmail.com>
wrote:
> Today I saw this note about 864GB of RAM for $12,000 and get a little
> stunned (http://37signals.com/svn/posts/3090-basecamp-nexts-caching-hardware).
>
> I mean - if the memory is so cheap now, maybe we can use in-memory
> databases for some situations? In average projects the size of database is
> about a couple of tens or hundreds of GB (not talking about analytics and
> other data-heavy apps).
>
> For example, let's suppose we can split (shard) our dataset into pieces ~
> 50Mb each (online organizers, tasks managers, sites, an so on, splitting
> data by account id).
>
> In this specific situation (set of small independent datasets) we can
> ignore most complexities of DB-technology:
> We don't care about consistency, availability, concurrency, MVCC, locking
> and throughput for the 50Mb (the only problem - long IO operations - but
> Node.JS is good with it).
>
> Another problem - indexes, but I believe it's easy. If it's easy to build
> an index with CouchDB using map/reduce with lots of limitations. Then it's
> should be even easier to build an index using arbitrary code and without
> any limitation.
>
> Persistence & fault-tolerance - save changes after every update to mysql
> and read the whole dataset in-memory after starting the app-process.
>
> Deployment - there are N node processes with M accounts in each and
> balancing proxy sending request to N-th node by consistent account-id hash.
>
> Tempted to try such architecture (in-memory, in-process, ad-hook DB for
> Node.JS) and would like to hear a critique about it (I mean use it only in
> cases when we can split our data into small independent pieces, I don't
> consider this approach outside of this area).
> So, what do You think?
>
> And, maybe someone already using this, or maybe there are some such
> database projects on github?
>
> Thanks

OldMster

unread,
Feb 2, 2012, 8:10:10 PM2/2/12
to nod...@googlegroups.com
Matt,
Ok, maybe Globals was a stretch, it is extraordinarily fast, but doesn't have all the bells and whistles of Cache and GT.M

My 'when they grow up' comment was just because Cache and GT.M have been doing this for over 30 years, and are capable of running any enterprise of any size.  So they've got data integrity built in, 'hot' standby capabilities, 'warm' standby capabilities, the ability to have data distributed across multiple servers appear as a single database, manage enormous datasets (All US Veterans Administration medical records, for example).  Most of the features, tools, etc. that the newer No-SQL databases are working on, are probably already there, because someone has already needed them. I have no doubt that they (new No-SQL database) will come up with new features that aren't included, but that is the great thing about diversity.  And I would bet that any really great features that do appear will also eventually appear in Cache and GT.M - competition is a good thing, even (especially perhaps) in the open source world.

Mark

OldMster

unread,
Feb 2, 2012, 8:18:48 PM2/2/12
to nod...@googlegroups.com
Matt,

One other thing, perhaps the biggest item is confidence.  When something doesn't work as expected in my Cache or GT.M database, I am 99.9999% certain that it is problem I created, not a bug in Cache or GT.M.  That confidence is based on the experience I've had using them, the years of experience they have behind them, and the large base of users beating on them every day in production environments. I don't think I'd have that confidence with the newer No-SQL databases.  That doesn't mean the problem wouldn't  99.9999% of the time be mine!

Mark

Matt

unread,
Feb 2, 2012, 11:27:41 PM2/2/12
to nod...@googlegroups.com
I get that, to some extent, though often 30 year old products were designed for 30 year old networks and environments, so don't really fit in today's modern scenarios. I'm genuinely interested in whether Globals (Cache/GTM) have solved some problem that Mongo/Couch/Cassandra/Riak haven't seen, in a way that is basically ACID compliant. But I really haven't played with any of them enough.


Mark

Dobes

unread,
Feb 3, 2012, 1:03:11 AM2/3/12
to nodejs
Hi Matt,

I don't work for Globals but I think Michael Pantaleo does. It seems
pretty new as a standalone project - their commercial product Caché
which Globals appears to be intended as a gateway drug for may have
been around much longer. The last (and first) contest I believe they
said it had only 2 or 3 entrances due to not promoting it enough in
advance.

Compared to other NoSQL databases Globals seems to be focused on
performance. From what I could tell (I don't mind being corrected
here) it doesn't have any kind of reporting or Map/Reduce systems
built in so it's not as generally applicable as something like CouchDB
or MongoDB, just as CouchDB and MongoDB lack reporting capabilities
like joins and complex query support that make them more special
purpose than MySQL and PostgreSQL. I think globals might be a good
alternative to MemCached or Redis but I don't know these products that
well yet as I've yet to build an app that needed a pure key/value
store, I always seem to need at least some level of reporting built
into the database.

The main uses for key/value stores like these that I know of is for
caches and session storage, and in that case performance is key
(otherwise why bother caching?). That's probably why Globals touts
performance as its key feature. Although I'd be curious to hear of
other common uses of GlobalsDB, MemCache, or Redis.

I was interested in their contest but I unfortunately won't have time
starting Feb 10th so perhaps I'll catch the next one.

Hope that helps,

Dobes


On Feb 2, 11:41 pm, Matt <hel...@gmail.com> wrote:
> I keep hearing you guys post about Globals here on the list, and I assume
> you work there (?), but I never hear of anyone actually using it (in the
> node.js community, I don't know about the Java and .NET communities)... Do
> you have many node.js users, or any examples of things that are using it to
> good advantage? Did any Node users take part in last year's challenge?
>
> (I don't mean this as a bash at Globals - I'm genuinely curious)
>

rtweed

unread,
Feb 3, 2012, 3:22:48 AM2/3/12
to nodejs
Dobes

You may find this helpful to answer many of your questions about
Globals:

http://gradvs1.mgateway.com/docs/nosql_in_globals.pdf

This is a more specific version of a paper that George James and I
wrote a couple of years before, which is applicable to Cache, Globals
and GT.M:

http://www.mgateway.com/docs/universalNoSQL.pdf

The only reason that Map/Reduce isn't available is because nobody has
implemented it yet. There's no technical limitation. I (and others)
have pushed InterSystems on this for some time (for Cache, though for
Globals would be nice too), since it's a common feature of many/most
modern NoSQL databases, unfortunately to no avail.

Perhaps someone out there could implement a Javascript/Node-based Map/
Reduce for Globals :-)

Regarding Matt's comment:

"I get that, to some extent, though often 30 year old products were
designed for 30 year old networks and environments, so don't really
fit in today's modern scenarios."

That may be true of other 30-year old products, but hasn't been true
of Cache, GT.M or Globals which have been continually able to embrace,
adapt to, and benefit from most of the major changes in the IT
landscape over their lifetime. Chris Casey's recent announcement of
the booking kiosks at a UK hospital is a great example - direct
integration of Cache with Node and Sencha Touch without any other
intermediate tiers or layers. In my experience, there's nothing that
more modern database technologies are designed to do today that
couldn't also be achieved (often more quickly and simply) with Cache,
Globals or GT.M.

All of which is why I've been a long-term and passionate advocate of
these technologies. If their integration with Node can give them more
of the recognition they deserve, I'll be very happy :-)

Rob

OldMster

unread,
Feb 3, 2012, 11:55:52 AM2/3/12
to nod...@googlegroups.com
Matt,
Your skepticism is healthy, and I would agree that most systems tend to ossify after 30 years.  With Caché/GT.M/Globals, the developers definitely ossified, but the systems have adapted to new capabilities quite well.  You did point out one of the things Cache/GT.M have that I omitted, ACID transaction processing.  Since both are used extensively in large financial institutions, I tend to forget that since it is so basic to the environment.  

I don't know that I could point to a specific challenge that they have 'risen to' that Mongo, CouchDB et al haven't or can't rise to also.  You ask why use Cache/GT.M instead of Mongo, CouchDB?  I tend to ask it the other way, why use Mongo, CouchDB, or any other 'new' system if it doesn't offer anything that Cache/GT.M does?  Maybe it's because I'm older now, and instead of mistrusting anyone over 30, I now doubt anyone under 30?  :-)

Also, don't think of Cache/GT.M/Globals as just a key/value pair database.  They are really multi-key/value databases, and until you've used it, the difference is difficult to appreciate.  Joins really aren't necessary, the data is (for the most part, it is possible to do it wrong) naturally related.

I think it is difficult for most traditional IT folks to wrap their mind around it, because from the beginning it was targeted at a different use case than other databases.  The technology was originally developed at a health care institution to handle health care data, as opposed to financial data that almost all systems were being developed for. Financial data tends to be modeled on nice neat tables made up columns of numbers, with an occasional string thrown in. A lot of attention is paid to CPU  and sequential access performance since that data is typically used for large aggregations and analysis.  

Healthcare data is messy, lots of strings/text with very few numbers, inconsistent actions for the same activities, poorly defined workflows, etc.  Sequential access to healthcare data is not terribly valuable, but very fast access to any particular piece of data is.  Say when a patient shows up in the ER, and you need to see his/her records from their last visit 5 years ago.  So the technology behind Caché/GT.M are optimized for random access to string/textual data of inconsistent structure.  The database was designed to optimize what was then (and still is) the slowest part of any computer system, the data storage system.  That doesn't mean CPU cycles were wantonly wasted, but it wasn't the highest priority item.

This is too long and too 'preachy', but I hope I helped you understand it better.  Sometimes my passion for the technology gets in the way.....

Mark



Dobes

unread,
Feb 5, 2012, 6:45:18 AM2/5/12
to nodejs
On Feb 3, 4:22 pm, rtweed <rob.tw...@gmail.com> wrote:
> You may find this helpful to answer many of your questions about
> Globals:
>
> http://gradvs1.mgateway.com/docs/nosql_in_globals.pdf
>
> This is a more specific version of a paper that George James and I
> wrote a couple of years before, which is applicable to Cache, Globals
> and GT.M:
>
> http://www.mgateway.com/docs/universalNoSQL.pdf
>

The papers do provide some more information about Globals, thanks.

The papers above are saying that you can model the other kinds of
NoSQL database in GlobalsDB. I suspect that since essentially all
NoSQL database use some kind of B-Tree or hash internally, and
GlobalsDB can be used in place of a plain B-Tree or Hash, this almost
goes without saying. However, one's own implementation of a given
NoSQL paradigm on top of Globals is unlikely to be as reliable as
Globals itself. So when you are saying Globals gives you "NoSQL
capabilities combined with the reliability and maturity needed for
business-critical applications" I feel like this isn't a true story
because you have to handcraft this new immature layer of NoSQL on top
of Globals. My document store built on top of Globals is probably not
as good as CouchDB, for example.

I'm not trying to nay-say against GlobalsDB at all - I just want to
point out that it should be evaluated as what it is and not what you
can build with it. I can build all kinds of NoSQL databases using the
highly mature, been there forever, runs like a speed demon "fs"
module, too. It doesn't mean that the "fs" module is itself a NoSQL
database of any kind.

> The only reason that Map/Reduce isn't available is because nobody has
> implemented it yet. There's no technical limitation.  I (and others)
> have pushed InterSystems on this for some time (for Cache, though for
> Globals would be nice too), since it's a common feature of many/most
> modern NoSQL databases, unfortunately to no avail.

Sure, the same could be said for sharing, automatic indexing,
reporting, even an SQL query engine. But if an when some sort of
indexing, sharding, map/reduce, query engine etc. system shows up
built using Globals it will lack the track record and performance of
Globals itself. It may in fact turn out to be inferior to something
already built.

> That may be true of other 30-year old products, but hasn't been true
> of Cache, GT.M or Globals which have been continually able to embrace,
> adapt to, and benefit from most of the major changes in the IT
> landscape over their lifetime.

Sounds cool.

Realizing that Globals is actually quite low-level I think it would be
useful for caching data and web session storage. Since it's not a
network service it doesn't actually replace redis or memcached out of
the box but you could wrap it with your own node.js server to provide
a shared web cache or session storage, which is a typical use for
memcached.

rtweed

unread,
Feb 5, 2012, 9:26:56 AM2/5/12
to nodejs

> Realizing that Globals is actually quite low-level I think it would be
> useful for caching data and web session storage.  Since it's not a
> network service it doesn't actually replace redis or memcached out of
> the box but you could wrap it with your own node.js server to provide
> a shared web cache or session storage, which is a typical use for
> memcached.

..correct, and this is exactly how our EWD web app framework uses
global storage, and very fast and effective it is too. The ewdGateway
module provides a Node interface to a Cache or GT.M system using EWD:

https://github.com/robtweed/ewdGateway



Reply all
Reply to author
Forward
0 new messages