Intrinsic datastores for Node.js

854 views
Skip to first unread message

Juraj Vitko

unread,
Feb 21, 2012, 4:09:51 AM2/21/12
to nodejs
https://github.com/ypocat/nodejsdb (or http://nodejsdb.com)

tl;dr - There are standalone database products (free or not), and
that's perfectly cool, but we already know how that works, so let's
try something different now.

The general idea is to get Node.js and a data storage engine into a
tighter relationship, primarily to have more control of the data, but
also simpler stack, and even higher performance in accessing the data.

I'm using the name "Intrinsic" because "In-process" is not exactly
accurate. E.g. there may be a shared-memory implementation shared by
multiple Nodes, or synchronized in-process implementation shared by
different Node Isolates (if these make it into Node), etc.

I really like the base concept of Redis, because it provides simple,
reliable, predictable and fast primitive building blocks (in the form
of commands) which can support various app logic strategies, and it's
not hiding the complexities and overheads of storing and querying
data, that more complex DB's do. (So you are more likely to have more
stable production in the end, instead of fiascos with overflowing
shards etc.)

This is also a vague follow-up to this discussion (in this group)
http://goo.gl/mDWqR - although I believe we should not insist only on
in-memory implementations at this time.

As for the basic set of basic data structures and operations, that I
believe would support the above, I think we need:

1) fast unordered Hash Map (key, value) (candidate: http://code.google.com/p/sparsehash/)

2) Ordered Map (with minimal empty 'value' overhead to allow for
Ordered Set implementation if someone wants it) (candidate:
http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf)

3) a list that can be used for FIFO, LIFO, stack, etc. - probably
something close STL's Deque. (http://www.cplusplus.com/reference/stl/
deque/)

I think the API for the above should be as simple as possible, so that
we can have multiple implementations and various optimizations later,
while keeping the amount of needed work down. Also, terse API is
simple to use.

From Node, we could do something like:

require('a-nodejsdb-impl').open('/path/db', function(err, db) {
var users = db.map('users');
var users_ordered_by_email = db.smap('users_by_email');
users.on('put', function(k, v) {
users_ordered_by_email.put(v.email, k);
});
users.put(1234, { fname: 'john', lname: 'smith', email: 'a@b.c' });
}

..which implements a basic User table with primary key on ID and
ordered index on Email. (The difference being that it gives you 200k
operations per second and you don't need a separate DB server.)

So if you guys have any constructive input regarding this, please post
it here.

Dave Clements

unread,
Feb 21, 2012, 7:45:17 AM2/21/12
to nodejs
I don't know if this is possible or feasible, but from a usability
perspective a fully transparent intrinsic database would be
outstanding:

require('memorymodder');
var users = { fname: 'john', lname: 'smith', email: '...@b.c' };


where memorymodder alters the underlying v8 storage structure, so
objects arrays strings etc. seamlessly rely on a Redis like db. id
assignments etc would be automated, but if we wanted to set one we
could extend the prototypes like {data: here}.id('manualid_123');

dave

On Feb 21, 9:09 am, Juraj Vitko <juraj.vi...@gmail.com> wrote:
> https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com)
>
> tl;dr - There are standalone database products (free or not), and
> that's perfectly cool, but we already know how that works, so let's
> try something different now.
>
> The general idea is to get Node.js and a data storage engine into a
> tighter relationship, primarily to have more control of the data, but
> also simpler stack, and even higher performance in accessing the data.
>
> I'm using the name "Intrinsic" because "In-process" is not exactly
> accurate. E.g. there may be a shared-memory implementation shared by
> multiple Nodes, or synchronized in-process implementation shared by
> different Node Isolates (if these make it into Node), etc.
>
> I really like the base concept of Redis, because it provides simple,
> reliable, predictable and fast primitive building blocks (in the form
> of commands) which can support various app logic strategies, and it's
> not hiding the complexities and overheads of storing and querying
> data, that more complex DB's do. (So you are more likely to have more
> stable production in the end, instead of fiascos with overflowing
> shards etc.)
>
> This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR- although I believe we should not insist only on
> in-memory implementations at this time.
>
> As for the basic set of basic data structures and operations, that I
> believe would support the above, I think we need:
>
> 1) fast unordered Hash Map (key, value) (candidate:http://code.google.com/p/sparsehash/)
>
> 2) Ordered Map (with minimal empty 'value' overhead to allow for
> Ordered Set implementation if someone wants it) (candidate:http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf)
>
> 3) a list that can be used for FIFO, LIFO, stack, etc. - probably
> something close STL's Deque. (http://www.cplusplus.com/reference/stl/
> deque/)
>
> I think the API for the above should be as simple as possible, so that
> we can have multiple implementations and various optimizations later,
> while keeping the amount of needed work down. Also, terse API is
> simple to use.
>
> From Node, we could do something like:
>
> require('a-nodejsdb-impl').open('/path/db', function(err, db) {
>   var users = db.map('users');
>   var users_ordered_by_email = db.smap('users_by_email');
>   users.on('put', function(k, v) {
>     users_ordered_by_email.put(v.email, k);
>   });
>   users.put(1234, { fname: 'john', lname: 'smith', email: '...@b.c' });

Juraj Vitko

unread,
Feb 21, 2012, 9:31:54 AM2/21/12
to nodejs
I would see V8 as a further possible improvement, but I think that for
getting new ideas implemented quickly, native-addons + NPM is the best
way as of now.

The overhead for calling the addon doesn't seem to be that bad either
- e.g. if you look at rawhash (https://github.com/pconstr/rawhash) -
all the addon hashing methods were faster than the native v8 Object,
quoting:

150K operations:

Sparse 509 ms
Dense 330 ms
Map 463 ms
{} 754 ms
> > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-although I believe we should not insist only on

@siculars

unread,
Feb 21, 2012, 11:25:51 AM2/21/12
to nodejs
I am sure your intentions are well placed but I, for one, would
absolutely hate to see nodejs bloated with any notion of an internal
database. IMHO, this would be an absolute boondoggle existing simply
to introduce never ending bug reports, instability in the bounded
memory model and distract core contributors by, potentially, pulling
them away from their core competencies. You are correct, redis is a
great model. If you want something like it, simply endeavor to find a
way to link it and wrap a more "native" api around that.

Best,
Alexander

On Feb 21, 4:09 am, Juraj Vitko <juraj.vi...@gmail.com> wrote:
> https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com)
>
> tl;dr - There are standalone database products (free or not), and
> that's perfectly cool, but we already know how that works, so let's
> try something different now.
>
> The general idea is to get Node.js and a data storage engine into a
> tighter relationship, primarily to have more control of the data, but
> also simpler stack, and even higher performance in accessing the data.
>
> I'm using the name "Intrinsic" because "In-process" is not exactly
> accurate. E.g. there may be a shared-memory implementation shared by
> multiple Nodes, or synchronized in-process implementation shared by
> different Node Isolates (if these make it into Node), etc.
>
> I really like the base concept of Redis, because it provides simple,
> reliable, predictable and fast primitive building blocks (in the form
> of commands) which can support various app logic strategies, and it's
> not hiding the complexities and overheads of storing and querying
> data, that more complex DB's do. (So you are more likely to have more
> stable production in the end, instead of fiascos with overflowing
> shards etc.)
>
> This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR- although I believe we should not insist only on
> in-memory implementations at this time.
>
> As for the basic set of basic data structures and operations, that I
> believe would support the above, I think we need:
>
> 1) fast unordered Hash Map (key, value) (candidate:http://code.google.com/p/sparsehash/)
>
> 2) Ordered Map (with minimal empty 'value' overhead to allow for
> Ordered Set implementation if someone wants it) (candidate:http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf)
>
> 3) a list that can be used for FIFO, LIFO, stack, etc. - probably
> something close STL's Deque. (http://www.cplusplus.com/reference/stl/
> deque/)
>
> I think the API for the above should be as simple as possible, so that
> we can have multiple implementations and various optimizations later,
> while keeping the amount of needed work down. Also, terse API is
> simple to use.
>
> From Node, we could do something like:
>
> require('a-nodejsdb-impl').open('/path/db', function(err, db) {
>   var users = db.map('users');
>   var users_ordered_by_email = db.smap('users_by_email');
>   users.on('put', function(k, v) {
>     users_ordered_by_email.put(v.email, k);
>   });
>   users.put(1234, { fname: 'john', lname: 'smith', email: '...@b.c' });

Juraj Vitko

unread,
Feb 21, 2012, 2:15:18 PM2/21/12
to nodejs
I think you have misunderstood me - this is not about extending Node
or V8 directly. It's about extending it with native addons, just like
there are hundreds in NPM already.
In fact, I'm saying that I think it would be counterproductive to hack
V8 (or Node, for that matter), as this is perfectly implementable as a
native addon.
It's also why I posted it into "nodejs", and not "nodejs-dev".
> > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-although I believe we should not insist only on

Liam

unread,
Feb 21, 2012, 2:53:17 PM2/21/12
to nodejs
Any insight on how Redis deployments enable "intrinsic" access to
cached data currently? Is there a shmem implementation of the API?

On Feb 21, 1:09 am, Juraj Vitko <juraj.vi...@gmail.com> wrote:
> https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com)
>
> tl;dr - There are standalone database products (free or not), and
> that's perfectly cool, but we already know how that works, so let's
> try something different now.
>
> The general idea is to get Node.js and a data storage engine into a
> tighter relationship, primarily to have more control of the data, but
> also simpler stack, and even higher performance in accessing the data.
>
> I'm using the name "Intrinsic" because "In-process" is not exactly
> accurate. E.g. there may be a shared-memory implementation shared by
> multiple Nodes, or synchronized in-process implementation shared by
> different Node Isolates (if these make it into Node), etc.
>
> I really like the base concept of Redis, because it provides simple,
> reliable, predictable and fast primitive building blocks (in the form
> of commands) which can support various app logic strategies, and it's
> not hiding the complexities and overheads of storing and querying
> data, that more complex DB's do. (So you are more likely to have more
> stable production in the end, instead of fiascos with overflowing
> shards etc.)
>
> This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR- although I believe we should not insist only on
> in-memory implementations at this time.
>
> As for the basic set of basic data structures and operations, that I
> believe would support the above, I think we need:
>
> 1) fast unordered Hash Map (key, value) (candidate:http://code.google.com/p/sparsehash/)
>
> 2) Ordered Map (with minimal empty 'value' overhead to allow for
> Ordered Set implementation if someone wants it) (candidate:http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf)
>
> 3) a list that can be used for FIFO, LIFO, stack, etc. - probably
> something close STL's Deque. (http://www.cplusplus.com/reference/stl/
> deque/)
>
> I think the API for the above should be as simple as possible, so that
> we can have multiple implementations and various optimizations later,
> while keeping the amount of needed work down. Also, terse API is
> simple to use.
>
> From Node, we could do something like:
>
> require('a-nodejsdb-impl').open('/path/db', function(err, db) {
>   var users = db.map('users');
>   var users_ordered_by_email = db.smap('users_by_email');
>   users.on('put', function(k, v) {
>     users_ordered_by_email.put(v.email, k);
>   });
>   users.put(1234, { fname: 'john', lname: 'smith', email: '...@b.c' });

Juraj Vitko

unread,
Feb 21, 2012, 3:08:16 PM2/21/12
to nodejs
Redis is a standalone DB, accessed either by TCP/IP or Unix domain
sockets (I haven't seen a Node.js driver for the latter though).

Quick google revealed (glad you asked, I did not know about these):
http://code.google.com/p/redis/issues/detail?id=276 [Feature Request]
redis as embedded database [Open]
http://code.google.com/p/redis/issues/detail?id=231 Feature request:
support for unix domain sockets [Fixed]

I think it would not be entirely straightforward to use Redis as an
embedded DB, because it uses fork() to implement the snapshotting, but
that's just my guess.

Anyway, I'm kind of trying to see what would people consider a good
base API. I assume this API would be much leaner than the one in Redis
(http://redis.io/commands), as in Node one would be able to manipulate
data and/or Buffers more directly, not needing e.g. BRPOPLPUSH and the
like.



On Feb 21, 9:53 pm, Liam <networkimp...@gmail.com> wrote:
> Any insight on how Redis deployments enable "intrinsic" access to
> cached data currently? Is there a shmem implementation of the API?
>
> On Feb 21, 1:09 am, Juraj Vitko <juraj.vi...@gmail.com> wrote:
>
>
>
>
>
>
>
> >https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com)
>
> > tl;dr - There are standalone database products (free or not), and
> > that's perfectly cool, but we already know how that works, so let's
> > try something different now.
>
> > The general idea is to get Node.js and a data storage engine into a
> > tighter relationship, primarily to have more control of the data, but
> > also simpler stack, and even higher performance in accessing the data.
>
> > I'm using the name "Intrinsic" because "In-process" is not exactly
> > accurate. E.g. there may be a shared-memory implementation shared by
> > multiple Nodes, or synchronized in-process implementation shared by
> > different Node Isolates (if these make it into Node), etc.
>
> > I really like the base concept of Redis, because it provides simple,
> > reliable, predictable and fast primitive building blocks (in the form
> > of commands) which can support various app logic strategies, and it's
> > not hiding the complexities and overheads of storing and querying
> > data, that more complex DB's do. (So you are more likely to have more
> > stable production in the end, instead of fiascos with overflowing
> > shards etc.)
>
> > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-although I believe we should not insist only on

Liam

unread,
Feb 21, 2012, 3:40:53 PM2/21/12
to nodejs
For API, I'd suggest mirroring the Redis API for a first draft. Are
there particular features you have in mind which it doesn't provide?

Also, before investing a ton of time, it'd be wise to explore just how
much more performance can be obtained via shmem access to cached data
vs domain sockets.

Assuming it's a lot, then I'd guess you'd want to query the Redis guys
as to whether they've considered a shmem IPC scheme.

On Feb 21, 12:08 pm, Juraj Vitko <juraj.vi...@gmail.com> wrote:
> Redis is a standalone DB, accessed either by TCP/IP or Unix domain
> sockets (I haven't seen a Node.js driver for the latter though).
>
> Quick google revealed (glad you asked, I did not know about these):http://code.google.com/p/redis/issues/detail?id=276[Feature Request]
> redis as embedded database [Open]http://code.google.com/p/redis/issues/detail?id=231Feature request:
> > > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-althoughI believe we should not insist only on

Joshua Holbrook

unread,
Feb 21, 2012, 4:09:53 PM2/21/12
to nod...@googlegroups.com
Have you guys heard of https://github.com/hij1nx/eventvat ? It's
supposed to be pretty redis-like.

--Josh

> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en

--
Joshua Holbrook
Engineer
Nodejitsu Inc.
jo...@nodejitsu.com

Juraj Vitko

unread,
Feb 21, 2012, 5:45:14 PM2/21/12
to nodejs
Haha, that's one of the reasons for this post - to discover more
existing implementations. I wasn't aware of this one at all, thanks, I
will add it to the nodejsdb.com page.

But in my own implementation I would probably be looking at a much
smaller set of commands, e.g. why to have .decr() command when the API
is synchronous and we can do map.put(key, --map.get(key))
(yeah .decr() is probably a tad faster but you get my point).

Funny, I was just writing config & deployment for a Redis I was going
to install on my aws site - now I don't have to do it as I can use
EventVat (funny name btw:).


On Feb 21, 11:09 pm, Joshua Holbrook <josh.holbr...@gmail.com> wrote:
> Have you guys heard ofhttps://github.com/hij1nx/eventvat? It's
> >> > > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-althoughIbelieve we should not insist only on
> j...@nodejitsu.com

Juraj Vitko

unread,
Feb 21, 2012, 5:47:50 PM2/21/12
to nodejs
Whoops, I'm too tired already - did not realize that EventVat is a JS
implementation (not a native addon), thus the V8 object limits apply,
so not really a go. But good to know about.


On Feb 22, 12:45 am, Juraj Vitko <juraj.vi...@gmail.com> wrote:
> Haha, that's one of the reasons for this post - to discover more
> existing implementations. I wasn't aware of this one at all, thanks, I
> will add it to the nodejsdb.com page.
>
> But in my own implementation I would probably be looking at a much
> smaller set of commands, e.g. why to have .decr() command when the API
> is synchronous and we can do map.put(key, --map.get(key))
> (yeah .decr() is probably a tad faster but you get my point).
>
> Funny, I was just writing config & deployment for a Redis I was going
> to install on my aws site - now I don't have to do it as I can use
> EventVat (funny name btw:).
>
> On Feb 21, 11:09 pm, Joshua Holbrook <josh.holbr...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Have you guys heard ofhttps://github.com/hij1nx/eventvat?It's
> > >> > > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-althoughIbelievewe should not insist only on

Juraj Vitko

unread,
Feb 21, 2012, 5:56:29 PM2/21/12
to nodejs
Liam, as I describe in my original post, I think Redis is too verbose
already, and many of the commands would not be necessary for an in-
process Node.js addon DB.

Having a shmem access to Redis would be interesting, as would be to
actually measure the overhead of the domain socket interface to Redis,
however even if the speed would be the same as with embedded DB -- the
speed is not everything I'm aiming for here - but things like zero-
config, the simplicity of software stack, and easy replacement of
various implementations having the same base API. (Check the
nodejsdb.com page where I write about this as well.)
> > > > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-althoughIbelieve we should not insist only on

Liam

unread,
Feb 21, 2012, 7:09:34 PM2/21/12
to nodejs
I assume you know that writing a robust, persistent datastore is a big
undertaking. If what you're aiming for here is a memory-only db, then
draft any API that suits you. (And maybe consider a synchronous API to
a memory-only SQLite db?) I'd guess you'll need a draft API to attract
feedback. Also keep in mind that there's overhead in crossing the
Javascript/C++ boundary in V8.

For a persistent db, you're more likely to make something generally
useful by building on an existing system. I'm sure you can bundle
Redis with your native module and make configuration easy.
> > > > > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-althoughIbelievewe should not insist only on

Dobes

unread,
Feb 21, 2012, 11:05:39 PM2/21/12
to nodejs
There was recently some discussion around the "globals" database which
is a native node.js module providing a sorted key/value store you
could build some other kinds of database on top of.

Berkeley DB http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html
has an embedded database that has been popular in the past. It hasn't
been ported to a node.js module yet, but it could be a good
candidate. It allows you to operate as a simple key/value store or as
a transactional, distributed database.

In response to another poster, I'd be wary of database systems that
just try to "transparently" persist objects you're changing because
you're opening up a can of worms when it comes to concurrency control,
unless you only allow single-process access to the database.

If you look to other programming languages and stack you can see a
variety of in-process databases available, but they aren't as popular
as databases that run separately and are accessed via an API. Off the
top of my head I'd guess that is because:

1. People want to run multiple applications that share the same
database, or a cluster where there are more app server instances than
database instances. In this case the database can operate more
robustly in a single process handling things like caching, concurrency
control, and so on. Even if you have only a single web server
instance you often want to run background jobs and workers that need
concurrent access.
2. When you start to split your application across multiple pieces of
hardware (or just multiple virtual machines) you're going to have to
introduce a TCP API of some sort anyway, so your in-process database
has a limited lifespan anyway

I think there are a class of applications that do make good use of
embedded databases - for example, desktop applications. When you have
a single-tenant database running on just one computer, the embedded
database makes sense. This is where you see embedded databases like
Berkeley DB and SQLite being used quite commonly. Even applications
people consider to be "document" applications like Microsoft Word are
actually now using some kind of database engine under the hood.

If an embedded database is suitable for your future plans, it's
probably best to wrap an existing battle-tested library like SQLLite
and BerkeleyDB (there are a few others, too, whose names I cannot
recall). Or use one that's already wrapped like GlobalsDB if it suits
your needs.

Cheers,

Dobes


On Feb 21, 5:09 pm, Juraj Vitko <juraj.vi...@gmail.com> wrote:
> https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com)
>
> tl;dr - There are standalone database products (free or not), and
> that's perfectly cool, but we already know how that works, so let's
> try something different now.
>
> The general idea is to get Node.js and a data storage engine into a
> tighter relationship, primarily to have more control of the data, but
> also simpler stack, and even higher performance in accessing the data.
>
> I'm using the name "Intrinsic" because "In-process" is not exactly
> accurate. E.g. there may be a shared-memory implementation shared by
> multiple Nodes, or synchronized in-process implementation shared by
> different Node Isolates (if these make it into Node), etc.
>
> I really like the base concept of Redis, because it provides simple,
> reliable, predictable and fast primitive building blocks (in the form
> of commands) which can support various app logic strategies, and it's
> not hiding the complexities and overheads of storing and querying
> data, that more complex DB's do. (So you are more likely to have more
> stable production in the end, instead of fiascos with overflowing
> shards etc.)
>
> This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR- although I believe we should not insist only on
> in-memory implementations at this time.
>
> As for the basic set of basic data structures and operations, that I
> believe would support the above, I think we need:
>
> 1) fast unordered Hash Map (key, value) (candidate:http://code.google.com/p/sparsehash/)
>
> 2) Ordered Map (with minimal empty 'value' overhead to allow for
> Ordered Set implementation if someone wants it) (candidate:http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf)
>
> 3) a list that can be used for FIFO, LIFO, stack, etc. - probably
> something close STL's Deque. (http://www.cplusplus.com/reference/stl/
> deque/)
>
> I think the API for the above should be as simple as possible, so that
> we can have multiple implementations and various optimizations later,
> while keeping the amount of needed work down. Also, terse API is
> simple to use.
>
> From Node, we could do something like:
>
> require('a-nodejsdb-impl').open('/path/db', function(err, db) {
>   var users = db.map('users');
>   var users_ordered_by_email = db.smap('users_by_email');
>   users.on('put', function(k, v) {
>     users_ordered_by_email.put(v.email, k);
>   });
>   users.put(1234, { fname: 'john', lname: 'smith', email: '...@b.c' });

Juraj Vitko

unread,
Feb 22, 2012, 5:04:26 AM2/22/12
to nodejs
Dobes, good that you brought up the Globals DB here - when compiling a
list of existing Node.js native addon DB implementations for the
nodejsdb.com page, I did not include the embedded Globals because
setting it up seemed not straightforward - e.g. you need a Linux
machine and do some setup steps outside of the `npm install` step. I
haven't used Globals yet and from their page I don't really understand
it, but I'm sure it would be good to have it as an in-process /
intrinsic option.

Re: your reasoning about standalone DBs - that reasoning is of course
sound - but what I'm proposing here is finding a basic/primitive set
of DB building blocks, which, besides allowing implementation of
simplistic intrinsic/in-process DBs with minimal config and easy
deployment etc., also allowing implementation (and more importantly,
JS-like evolution) of standalone database products based on Node.js.
One can imagine this being similar to bolting V8 on MySQL, or Lua on
Redis, but from the other way - bolting a DB on Node, in the form of a
few function in a native addon, and implementing all the complex logic
in JS. (Perhaps later implementing proven concepts in github.com/d5/
node.native).

(On the nodejsdb.com page I'm listing the implementations I'm
currently aware of. Here I'm trying to get a random input from the
Node.js community. Also, by talking about this, I'm less likely to
shelve this into the "don't have time" bucket.)

As I'm saying above, I think a lot could be done on top of a very
simple key-value + ordered-key-value + fast-list APIs. E.g. if someone
wanted to write an ACID SQL engine on top of it, they should be able
to.

It just occurred to me today that ordered map could in fact be
(internally) implemented on top of a fast unordered key-value map, by
storing the red-black tree nodes into it as key-value pairs, where the
value is [left-key,right-key,data]. This could even make a lot of
sense on cache-optimized key-value implementations like the one I
spotted today: http://read.seas.harvard.edu/~kohler/pubs/mao12cache.pdf


On Feb 22, 6:05 am, Dobes <dob...@gmail.com> wrote:
> There was recently some discussion around the "globals" database which
> is a native node.js module providing a sorted key/value store you
> could build some other kinds of database on top of.
>
> Berkeley DBhttp://www.oracle.com/technetwork/database/berkeleydb/overview/index....
> > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-although I believe we should not insist only on

Juraj Vitko

unread,
Feb 22, 2012, 5:53:37 AM2/22/12
to nodejs
OK, turns out that the DB in the paper (http://read.seas.harvard.edu/
~kohler/pubs/mao12cache.pdf) actually is a tree one already (supports
range queries while being faster than memcache, mongo, redis, etc.). I
haven't read the paper fully, but I assume it will support ordered
iteration as well. Good times, great day.
> > > This is also a vague follow-up to this discussion (in this group)http://goo.gl/mDWqR-althoughI believe we should not insist only on

Alexey Petrushin

unread,
Feb 22, 2012, 1:52:33 PM2/22/12
to nod...@googlegroups.com
> Anyway, I'm kind of trying to see what would people consider a good 
base API
I believe CouchDB like API is quite satisfying. In its original form it enforces constraints like: 
- Map/Reduce as tool for building indexes
- only one document can be used at a time for building indexes
- documents are immutable

But, in our case performance requirements and the properties of data are different. We can drop these constraints and use arbitrary functions to build indexes, grant it access to multiple documents and allow it also to change existing documents. 
I believe such API will cover almost all use cases, and the code itself also should be simple.

One more interesting project - LevelDB, and its node binding

Alexey Petrushin

unread,
Feb 22, 2012, 2:05:02 PM2/22/12
to nod...@googlegroups.com
Basically to do so we need:
- an ordered b-tree, supporting sequential access and range queries
- non-blocking persistence (preserving db integrity in crush case)
- integrity of indexes (indexes also stored in b-tree, so, we basically need some sort of transactions for storing documents in multiple b-tree)
- garbage collection for indexes
- and, it would be nice to have lazy loading

Sadly, I have no idea how to do such things in Node.JS :)

David Cope

unread,
Feb 22, 2012, 2:07:24 PM2/22/12
to nod...@googlegroups.com
Someone just slashdotted this http://hyperdex.org/, first time I've seen it. No node bindings, but it looks promising.


Dean Landolt

unread,
Feb 22, 2012, 2:10:42 PM2/22/12
to nod...@googlegroups.com
What you're looking for are fingertrees [1]. They're badass. I have an implementation nearly complete in javascript (still have to figure out how to do the split function properly). I'll toss it up on github when I get a chance. 

Alexey Petrushin

unread,
Feb 22, 2012, 2:13:53 PM2/22/12
to nod...@googlegroups.com
Hmm, maybe we don't need index integrity, after all if each dataset is independent and small we can rebuild indexes it every time we loading DB in memory.

In such case non-blocking robust persistence can be achieved with storing database as append-only journal. So, it seems that we basically need:

Juraj Vitko

unread,
Feb 22, 2012, 2:37:17 PM2/22/12
to nodejs
The problem with tree (ordered) maps is that they usually take more
space and are slower than hash maps. So that's why I proposed a
separate Map and OrderedMap.

Plus, often times one needs to store a sequential structure, e.g. to
implement a Queue, or Stack, etc. It's possible to implement a list on
top of ordered map though, but one has to generate unique keys.

Doubly linked list takes up too much memory, and Array is terrible for
insertion, but C++ STL's Deque solves this by providing a list
consisting of multiple Array chunks.

HOWEVER - the paper that I noticed today (http://goo.gl/71aFc) - seems
to offer a cool ordered Map structure that also happens to be faster
than any Hash Map currently in use. It also does this in SMP way, so
multiple Node instances in local Node Cluster could even use such DB
together. Let's hope they don't slap a patent on it.

Re: the CouchDB - I'd imagine this is already a higher level
functionality - something that people should be able to implement and
evolve on top of the basic primitive operations,
like .put(k,v), .get(k), .range(k1, k2).

I'm looking for a good paper or book pondering complex schema design
on key/value stores, but haven't had much time to do a proper googling
for it yet.


On Feb 22, 9:13 pm, Alexey Petrushin <alexey.petrus...@gmail.com>
wrote:

Juraj Vitko

unread,
Feb 22, 2012, 2:42:10 PM2/22/12
to nodejs
LevelDB I already have in the links up there (http://nodejsdb.com),
this is what I got (copy&paste):

Rawhash - In-memory key:value cache where keys are binary Buffers mem,
kv
Node-LevelDB - NodeJS bindings to levelDB, with SSTable disk storage
approach disk, kv
node-cask - Bitcask clone for node, based on node-mmap disk, kv
node-gdbm - interface to GNU GDBM disk, kv
node-sqlite3 - "Asynchronous, non-blocking SQLite3 bindings for
Node.js" disk, sql
node-memcache - in-process memcached for Node.js mem, kv

If anyone knows of more native in-process datastore Node.js addons, or
even non-native ones, pls let me know.


On Feb 22, 8:52 pm, Alexey Petrushin <alexey.petrus...@gmail.com>
wrote:

Dean Landolt

unread,
Feb 22, 2012, 3:14:29 PM2/22/12
to nod...@googlegroups.com
On Wed, Feb 22, 2012 at 2:37 PM, Juraj Vitko <juraj...@gmail.com> wrote:
The problem with tree (ordered) maps is that they usually take more
space and are slower than hash maps. So that's why I proposed a
separate Map and OrderedMap.

Plus, often times one needs to store a sequential structure, e.g. to
implement a Queue, or Stack, etc. It's possible to implement a list on
top of ordered map though, but one has to generate unique keys.

Doubly linked list takes up too much memory, and Array is terrible for
insertion, but C++ STL's Deque solves this by providing a list
consisting of multiple Array chunks.

 
Yes, deques are great, but you'll probably also want fast indexed access. Then you'll find you may need (sublinear) concat, slice, splice, reverse, etc. Finger trees are a general purpose data structure that let you compose collections that do all of things quite easily. Oh, and it has this other advantage of being persistent (read: immutable), so you also get some other neat advantages (everything's a snapshot -- it can't mutate on you even if you dip into the event loop). Using structural sharing it's able to offer all of the above handy operations while returning you what appears to be a completely new list, and very quickly.

And this immutability has another neat benefit: you could pass these immutable objects around between "worker" threads within the same process freely. It's a shame node-core had to punt on threaded workers but that doesn't mean this can't be done from a module. It's possible libuv may never be threadsafe, but no bother -- being able to safely share certain types of objects between threads means  the worker environment could be limited to pure compute. Thus, you could theoretically build a kickass fibanocci benchmark, and all would finally be right with the world. 

 
HOWEVER - the paper that I noticed today (http://goo.gl/71aFc) - seems
to offer a cool ordered Map structure that also happens to be faster
than any Hash Map currently in use. It also does this in SMP way, so
multiple Node instances in local Node Cluster could even use such DB
together. Let's hope they don't slap a patent on it.
 

This looks really cool. Although it's important to remember that there are a number of ways to accomplish this. I haven't looked closely at this paper yet but I bet clojure's shallow tries have similar qualities and ought to port to javascript pretty cleanly.

But yeah, there's a ton of interesting data structure goodness out there the javascript community needs to start stealing :)

 
Re: the CouchDB - I'd imagine this is already a higher level
functionality - something that people should be able to implement and
evolve on top of the basic primitive operations,
like .put(k,v), .get(k), .range(k1, k2).

 
Much of what couch does is commodity, yes, but there is something particularly unique about their b+tree implementation. It will cache calculated reduction values in its interior nodes so they don't have to be run over and over. This can be really handy but it's not without its problems. Interestingly, fingertrees let you do the same thing in a more flexible yet correct manner (composable "meters" that measure and persist any data you want about children..."monoids" FTW). This feature is what allows you to do all the things I mentioned above.

 
I'm looking for a good paper or book pondering complex schema design
on key/value stores, but haven't had much time to do a proper googling
for it yet.


If you find anything I'd be really interested in hearing about it.

Juraj Vitko

unread,
Feb 22, 2012, 3:34:06 PM2/22/12
to nodejs
Clojure looks quite intriguing to me, I wish day had at least 48 hours
so that I can go over all the things I want:)

Have you considered quick and dirty port of the Finger Trees using
ClojureScript? Actually I've noticed quite a lot JDK dependencies in
that code, so not sure..

From a quick scan of a relevant paper on Finger Trees (http://
www.soi.city.ac.uk/~ross/papers/FingerTree.pdf):

"As search trees, finger trees seem to be 3 to 5 times slower than the
best balanced binary tree implementations. Nevertheless, they may be
competitive when a wider set of operations is required."

That said, they look very interesting, considering all the reasons you
stated. Also, the paper is from 2006, I'm sure optimizations are
constantly being added.

Pls let me know when you port it over so I can add you to the
nodejsdb.com page.

Also, given the current V8 object limits (I've seen it choke on ~40m
objs), do you consider this as a limitation for you implementation?
Are you considering to make it a native module later perhaps? Or is
this something that could be implemented on top of the ordered map API
I'm pondering here?


On Feb 22, 10:14 pm, Dean Landolt <d...@deanlandolt.com> wrote:

Juraj Vitko

unread,
Feb 22, 2012, 6:13:02 PM2/22/12
to nodejs
Very interesting. I just had a bit longer read of their website and
paper than I wanted, and it looks like they are only beating Redis/
Mongo/Cassandra in a cluster config, and only because of a new Cluster
hashing/addressing technique.

This Slashdot comment seems to suggest that as well:
http://hardware.slashdot.org/comments.pl?sid=2686615&cid=39128091

And another one that stood out to me (single point of failure):
http://hardware.slashdot.org/comments.pl?sid=2686615&cid=39128435

Anyway, the thing is pretty cool as it is.

Imagine using the Locality sensitive hashing[1] to address a
distributed cluster of Nodes (with capital N), where each Node.js (or
local cluster of Nodes.js) would be using the in-process/intrinsic
datastore based on the Fast Multicore Key-Value paper[2]. Cluster
logic, query algorithms, configuration, web-service APIs, monitoring
tools, etc. would be left as a tasty prey to the hungry packs of JS
coder wolves of the Node.js community:) I think the idea is shaping up
nicely here.

[1] http://en.wikipedia.org/wiki/Locality-sensitive_hashing
[2] http://read.seas.harvard.edu/~kohler/pubs/mao12cache.pdf


On Feb 22, 9:07 pm, David Cope <davidandrewc...@gmail.com> wrote:
> Someone just slashdotted thishttp://hyperdex.org/, first time I've seen it. No node bindings, but it looks promising.

mgutz

unread,
Feb 23, 2012, 6:01:07 PM2/23/12
to nod...@googlegroups.com
Node already has a boondongle---Windows. Sorry, couldn't resist.

Juraj Vitko

unread,
Feb 23, 2012, 6:07:21 PM2/23/12
to nodejs
LOL. (But truth is, from my original post it's not clear that I did
not mean to extend Node or v8 directly, but instead via native
addons.)
Reply all
Reply to author
Forward
0 new messages