High level database abstraction

185 views
Skip to first unread message

Tim Caswell

unread,
Dec 9, 2009, 1:04:56 AM12/9/09
to nod...@googlegroups.com
I've been playing around with a high level database abstraction something along the lines of DataMapper/MongoMapper/ActiveRecord. The backends I wish to target are Postgresql, Sqlite, MongoDB, and a simple JSON-in-directory-of-flat-files thing I made up.

As far as final implementation, I'm a ways off. The pure-js postgres backend is done enough to be useful in it's own right, and pretty fast now that it uses the raw encoding type. I've read through the mongo line protocol and it doesn't seem much harder than the one for postgres. At least for implementing the bare parts I need. Sqlite seems more of a problem since it requires writing a c extension to node and I'm not even sure it's async interface is async enough for node. So I invented a fourth type. It works on the local filesystem like sqlite, but will be a hacked up flat-file system. Basically it will collections of json documents in separate files. This is still in planning stages. The advantage is super easy setup. It's written in pure javascript and doesn't need a separate running daemon.

For the high-level abstraction, I want something consistent across all four backends, but be able to take advantage of the individual strengths. PostgreSQL and Sqlite are both relational systems with static schemas. They will work quite different internally to something document based like MondoDB or my JsonDB. I'm wondering if having a common api for both of these backend types is even a good idea. I'm thinking either not allow arbitrary fields in the relational ones, or automatically add a _json_ field and serialize the rest of the columns there.

Well, here is some sample code using my made up api. I would love questions and/or comments.

http://gist.github.com/252299

The project will live here for now:

http://github.com/creationix/persistance

Chris Winberry

unread,
Dec 9, 2009, 2:32:08 PM12/9/09
to nod...@googlegroups.com

What about using a process to run the SQLite command line tool?


var sys = require("sys");

var dbproc = process.createChildProcess("bash");
dbproc.queue = [];
dbproc.callback = null;
dbproc.buffer = "";

dbproc.addListener("output", function (data) {
this.buffer += data;
if (this.buffer.indexOf("<<<<done>>>>") > -1) {
var tmpBuffer = this.buffer.replace(/<<<<done>>>>[\r\n*]/, "");
this.buffer = "";
var callback = this.callback;
this.callback = null;
callback(tmpBuffer);
this.processQueue();
}
});

dbproc.addListener("error", function (data) {
  sys.puts("dbproc:error: " + data);
});

dbproc.addListener("exit", function (data) {
  sys.puts("dbproc:exit: " + data);
});

dbproc.processQueue = function processQueue () {
if (this.callback)
return;
if (!this.queue.length)
return;
var queueItem = this.queue.shift();
this.callback = queueItem[1];
this.write(queueItem[0] + ";echo \"<<<<done>>>>\"\n");
}

dbproc.addCommand = function addCommand (command, callback) {
this.queue.push([command, callback]);
this.processQueue();
}

dbproc.addCommand("sqlite3 test.db \"create table t1 (t1key INTEGER PRIMARY KEY,data TEXT,num double,timeEnter DATE);\"", function (result) {
});
dbproc.addCommand("sqlite3 test.db \"insert into t1 (data,num) values ('This is sample data',3);\"", function (result) {
});
dbproc.addCommand("sqlite3 test.db \"insert into t1 (data,num) values ('More sample data',6);\"", function (result) {
});
dbproc.addCommand("sqlite3 test.db \"insert into t1 (data,num) values ('And a little more',9);\"", function (result) {
});
dbproc.addCommand("sqlite3 test.db  \"select * from t1 limit 2;\"", function (result) {
sys.puts(result);
});


--

You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.



Tim Caswell

unread,
Dec 9, 2009, 2:50:36 PM12/9/09
to nod...@googlegroups.com
Interesting hack, I like it.  I wonder if we could speed it up some by making sqlite3 itself be the child process instead of bash.

Tautologistics

unread,
Dec 9, 2009, 3:58:41 PM12/9/09
to nodejs
One could but it would require the parsing of the sqlite interactive
shell prompt and formatting. I only had 10 minutes spare time to write
a quick proof of concept.

Tim Caswell

unread,
Dec 9, 2009, 4:30:43 PM12/9/09
to nod...@googlegroups.com
After some testing, I found that if you send several queries in before the output get's caught, there will be no separator between the result sets at all. So each request needs to be a new sqlite3 process. Also the only type information is null or non-null. The only way I can seem to get the column names is by turning headers on and parsing the first line as headers.

So basically it seems to work, but not as fast as I had hoped and all data gets converted to strings. The data types may be ok since we can use a higher-level abstraction that knows what the types should be and convert accordingly.

I think that if I queue up the requests and not send another till the first has returned, it will fix the speed problem too. More testing...

http://gist.github.com/252838

Also, I know there is a race condition with the callback variable. I think queueing up the requests will fix that too.

Flinn Mueller

unread,
Dec 9, 2009, 4:38:51 PM12/9/09
to nod...@googlegroups.com
What about running a binding to sqlite in a thread like Safari does for html5 db support? It's probably much more complex to implement but proven.

Chris Winberry

unread,
Dec 9, 2009, 4:49:44 PM12/9/09
to nod...@googlegroups.com
Strange, the way I set it up is that commands get queued and a command
will not get sent to the process unless the previous one has received
the end token.

Can you post your code?
--
Sent from Gmail for mobile | mobile.google.com

Tim Caswell

unread,
Dec 9, 2009, 5:00:17 PM12/9/09
to nod...@googlegroups.com
I'm all for that, but I can't make heads or tails out of the sqlite docs since I'm not very experienced with c code. So as a stopgap I'm writing a wrapper around the command line sqlite3 in pure js.

Tim Caswell

unread,
Dec 9, 2009, 5:18:58 PM12/9/09
to nod...@googlegroups.com
ok, my proof of concept is complete. Now it's time to start refining my db apis to match.

http://gist.github.com/252838

I benchmarked it on my laptop and got 3,571 requests per second (running 10,000 requests in a single queue).

It's much slower when I run 100,000 queries instead of 10,000 (more than a factor of 10). I'm guessing its the shift call on the queue object. On my machine node is using 97% cpu and sqlite3 is about 3%.

Is there a faster queue implementation we can use. I know that javascript arrays aren't really arrays, so I imagine shift is a very costly method on large lists.

Flinn Mueller

unread,
Dec 9, 2009, 5:54:50 PM12/9/09
to nod...@googlegroups.com, nod...@googlegroups.com


On Dec 9, 2009, at 5:18 PM, Tim Caswell <t...@creationix.com> wrote:

> ok, my proof of concept is complete. Now it's time to start
> refining my db apis to match.
>
> http://gist.github.com/252838
>
> I benchmarked it on my laptop and got 3,571 requests per second
> (running 10,000 requests in a single queue).
>
> It's much slower when I run 100,000 queries instead of 10,000 (more
> than a factor of 10). I'm guessing its the shift call on the queue
> object. On my machine node is using 97% cpu and sqlite3 is about 3%.

That's interesting that you mention that. I noticed a significant
performance drop off when testing node-tyrant between 10-100k put and
get operations. Still haven't dug into the source of the slow down but
all ops were complete relatively fast but the script took minutes
after ops before it finished. Is that what you are seeing?

Greg Ritter

unread,
Dec 9, 2009, 6:47:45 PM12/9/09
to nod...@googlegroups.com
Shift/unshift definitely are costly on large arrays.

I found this as an example of an allegedly faster implementation of queues:

http://safalra.com/web-design/javascript/queues/

I can't vouch for it, as I haven't used it.
> --
>
> You received this message because you are subscribed to the Google Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
>
>
>



--
Greg Ritter - Codespin Inc. - (780) 716-GREG

Howard Rauscher

unread,
Dec 9, 2009, 7:10:57 PM12/9/09
to nod...@googlegroups.com
This is a whole lot faster if you are just using push and unshift on an array.

http://gist.github.com/252976


Howard Rauscher
how...@gmail.com
m: 214.957.1408

Tautologistics

unread,
Dec 9, 2009, 7:54:57 PM12/9/09
to nodejs
Shift/Unshift do not scale. Object creation, however, is fast. I just
whipped this up for ya:

http://github.com/tautologistics/nodemisc/blob/master/arrayspeed.js

If you need a queue for a large numer of items, arrays won't cut it...

function FastQueue () {
this.head = null;
this.tail = null;
}
FastQueue.prototype.push = function FastQueue__push (data) {
var item = { data: data, prev: null, next: null };
if (!this.head)
this.head = item;
if (this.tail) {
this.tail.next = item;
item.prev = this.tail;
}
this.tail = item;
}
FastQueue.prototype.pop = function FastQueue__pop () {
var item = null;
if (this.tail) {
item = this.tail;
this.tail = item.prev;
if (this.tail)
this.tail.next = null;
if (this.head === this.tail)
this.tail = null;
} else if (this.head) {
item = this.head;
this.head = null;
}
return(item ? item.data : null);
}
FastQueue.prototype.unshift = function FastQueue__unshift (data) {
var item = { data: data, prev: null, next: null };
if (this.head) {
this.head.prev = item;
item.next = this.head;
if (!this.tail)
this.tail = this.head;
}
this.head = item;
}
FastQueue.prototype.shift = function FastQueue__shift () {
if (!this.head)
return(null);
var item = this.head;
this.head = item.next;
if (this.head)
this.head.prev = null;
if (!this.head || !this.head.next)
this.tail = null;
return(item.data);

Tim Caswell

unread,
Dec 9, 2009, 9:07:56 PM12/9/09
to nod...@googlegroups.com
Thanks, but for my simple case I just inlined some delayed shift logic. All I need is push and shift. Now I'm getting steady 8000 req's per second independent of the queue size.

http://gist.github.com/252838

Tim Caswell

unread,
Dec 10, 2009, 12:32:17 AM12/10/09
to nod...@googlegroups.com
I just noticed that I had misspelled the word persistence.  Should I rename my github repo and kick off the already 8 followers it has or keep the unique spelling as a feature to make it easy to search for in google?

Also, I've added some docs to the project.


ps. I feel that I'm spamming the list with discussions of my project.  Is there another channel for such discussions.  Maybe a separate mailing list for library developers of node while the main mailing list is for core features, support, patch request?

Tim Caswell

unread,
Dec 10, 2009, 1:03:03 AM12/10/09
to Tim Caswell
FYI, the project is now called node-persistence. Sorry for the disruption.

Aurynn Shaw

unread,
Dec 11, 2009, 2:26:44 PM12/11/09
to nodejs


On Dec 8, 10:04 pm, Tim Caswell <t...@creationix.com> wrote:
> I've been playing around with a high leveldatabaseabstraction something along the lines of DataMapper/MongoMapper/ActiveRecord.  The backends I wish to target are Postgresql, Sqlite, MongoDB, and a simple JSON-in-directory-of-flat-files thing I made up.
This is looking more like an ORM layer above the database drivers (of
which postgres-js, and the otherwise-discussed sqlite driver)

I've also been thinking about the actual DB API implementation, akin
to Perl's DBI, or Python's DB-API spec. Something that Persistence
sits on top of, and provides the high-level data structures.

Is that more what you had in mind? Or were you going more for the
ActiveRecord/SQLAlchemy viewpoint?

Thanks,
Aurynn

--
Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

as...@commandprompt.com

Tim Caswell

unread,
Dec 11, 2009, 3:37:42 PM12/11/09
to nod...@googlegroups.com
I'm thinking more along the lines of DataMapper which sits atop Dataobjects. For now I'm going to be including 4 main "drivers" inside the main persistence library.

One thing I'm finding as I finalize the sqlite driver is that I can get away with more if I know more context about what the query is doing. If the low-level driver just did queries and returned results I could'nt put in sqlite specific code and get things like insert ids, column types, and the like. Also it would be hard to have a single api that works for relational databases and document databases.

In the end I think my "lower" layer drivers will have methods like "save", "remove" and "find" that will take a table name and some json data. Then the higher level api will be object oriented and there will be smart objects with methods on them like "save" and "remove" that take no arguments because all the required parameters are known internally to the object.

I tried to document where I'm heading. Read the README.markdown for more details.

http://github.com/creationix/node-persistence

Aurynn Shaw

unread,
Dec 11, 2009, 8:01:50 PM12/11/09
to nod...@googlegroups.com
Yes, this is definitely sounding like an ORM, not the DB-API.

So we do need to talk about how the underlying drivers should be
implemented - how to add pluggable driver support into this layer and
provide a way for other people to add consistent ORM-like layers.

I think that that should be a different thread, though. :)

Thanks,
Aurynn

Tim Caswell

unread,
Dec 12, 2009, 9:39:22 AM12/12/09
to nod...@googlegroups.com
I think I've settled on a line to draw between the two layers.

The lower layer which I call `drivers` has a connection object that performs raw queries and generates `store` objects. These `store` objects have basically the same simple model as [lawnchair].

Then the higher layer that can be built on top of this works more like active record or data mapper. It has validations, relationships, and all other advanced logic that's not directly related to data manipulation of a record at a time.

My github repo is in a little flux at the moment, but the driver api in the readme is done for now.

[lawnchair]: http://brianleroux.github.com/lawnchair/
Reply all
Reply to author
Forward
0 new messages