Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DBMS and lisp, etc.

176 views
Skip to first unread message

Chris C apel

unread,
May 18, 2004, 1:47:23 PM5/18/04
to
Hi everyone,

I'm pretty new to lisp, and I struggled for a while to get ILISP to
work, gave up and got SLIME to work immediately, struggled to work
with CMUCL, gave up and went to SBCL, struggled to get UCW to work
with paserve *and* mod_lisp on SBCL and failed. So now I'm using TBNL.

Now, I'm planning on having a data-driven site, so at some point I'm
going to need a backend database, like MySQL or PgSQL. I installed and
configured PgSQL, and just as I was sitting down to work out some
basic schemas, I stopped myself and said "You know, one of the
advantages to lisp is the flexible prototyping. Why are you binding
yourself to a database schema so early on?"

So here's my question to the group. What's a good way to start out
using normal lisp structures in global variables as a psuedo-database
during initial development and then switch to a real RDBMS backend? I
can see two issues that need to be considered right off.

1) Data access migration. I think this is simple, though, as I can
write a data access layer specific to my application and implement it
for a RDBMS after I have a clear idea of my data structure. Are there
packages to generate such a layer based on CLOS classes, perhaps?

2) Persistence. This was the main issue I had in mind writing this
post. When my SLIME backend goes down and I need to restart emacs (not
necessary, I'm sure, but I'm no SLIME guru) I don't want to forever
lose all of my test data. Perhaps the solution is to dump a core file
with my test data in it periodically (what are the functions for that,
now?). But I'd rather persist just those objects related to my test
data, and not everything else as well.

Maybe I should princ to a file and then do a load to get it back. I'm
pretty sure the only data I would be using would have full read/print
consistency. But now that I think about it, my data's going to have
lots of circular references. That's not a problem in a DB, but you
can't just princ it. In fact, my data's going to be pretty
complicated, period. Along the lines of a wiki, except with lots more
metadata.

As you can probably see, I'm not very imaginative yet. But I hope to
improve with practice, and I hope the group can give me some good
pointers.

Chris Capel

John Thingstad

unread,
May 18, 2004, 2:05:38 PM5/18/04
to
Google for "lisp in a box"

Yoda sais:
"Not ready for the power of Lisp is he.
His eyes always in the future. Never his mind on where he was.
What he was doing."

--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Will Hartung

unread,
May 18, 2004, 2:29:35 PM5/18/04
to
"Chris C apel" <ch...@ibanktech.net> wrote in message
news:69d0f981.0405...@posting.google.com...

> So here's my question to the group. What's a good way to start out
> using normal lisp structures in global variables as a psuedo-database
> during initial development and then switch to a real RDBMS backend? I
> can see two issues that need to be considered right off.
>
> 1) Data access migration. I think this is simple, though, as I can
> write a data access layer specific to my application and implement it
> for a RDBMS after I have a clear idea of my data structure. Are there
> packages to generate such a layer based on CLOS classes, perhaps?

Well, obviously you can do whatever you want to do, I'm not aware of any
tools to help you here though.

The other problem is essentially enforing the Relational view of your data
in a decidedly non-relational system. The reason this is important is that
if you don't take the discipline now to view your data relationally for
eventual storage in a DBMS, then you'll have more headache later when you
only want to change the query and persistence layer.

Also, depending on your goals, you may want to pencil in any transactional
boundaries in your system even if your crude backing store does not actually
handle them. That way you can think through the basic issues on the
assumption that you will inevitably deal with a transactional store later
on, and not have to retrofit that in later.

Finally, you'll need some way to filter and order your data. As you can
imagine, it's fairly easy to simply use hash tables to get basic references,
but one of the features of the relational model is being able to query and
order on arbitrary data elements, including those that are not indexed at
all. Now, you may not feel you'll need to do that at all, but it's a shame
to toss out a real benefit of the DBMS and design around it at this point.
Kind of depends on your data model.

Also, try and think of "big" queries in your system, getting as much as
practical in a single query rather than grabbing the nodes of your tree, and
then lazily filling them in with queries to the child bits. That's fast on
in memory systems, but will destroy performance on a DBMS. Round trips to
the DBMS are Evil. SQL Joins are your friend.

You don't need a "generic OO -> Relational" mapping system. Just a layer
that seperates your data from the back end, and unless you have 1000 tables,
you can write this yourself, by hand. By hand you can specialize your
queries with a lot less complexity than using an OR mapper.

> 2) Persistence. This was the main issue I had in mind writing this
> post. When my SLIME backend goes down and I need to restart emacs (not
> necessary, I'm sure, but I'm no SLIME guru) I don't want to forever
> lose all of my test data. Perhaps the solution is to dump a core file
> with my test data in it periodically (what are the functions for that,
> now?). But I'd rather persist just those objects related to my test
> data, and not everything else as well.

As you say, you can store everything is one, large, single rooted data
structure and the simply "print" to save it, and then "read" to load it.
That will work peachy for smaller data sets (with the definition of "small"
being related to your machine performance and your patience for waiting for
it to save and load).

When that seems to be too slow, and you're not ready to layer in a DBMS yet,
you may want to look into some implementation specific bits of your Lisp, as
many have the ability to dump, essentially, FASL files, which are normally
used to store compiled code, but some Lisps can also save and load FASLs of
any data structure. These load and save MUCH faster than "print" and "read",
but of course you have an enormous binary blob of data that can't be changed
outside of your image.

Regards,

Will Hartung
(wi...@msoft.com)


niko...@random-state.net

unread,
May 18, 2004, 3:18:09 PM5/18/04
to
Chris C apel <ch...@ibanktech.net> wrote:

> pretty sure the only data I would be using would have full read/print
> consistency. But now that I think about it, my data's going to have
> lots of circular references. That's not a problem in a DB, but you
> can't just princ it. In fact, my data's going to be pretty

Yes you can. See *print-circle* in CLHS. ;-)

Cheers,

-- Nikodemus

Ng Pheng Siong

unread,
May 18, 2004, 8:01:19 PM5/18/04
to
According to Chris C apel <ch...@ibanktech.net>:

> So here's my question to the group. What's a good way to start out
> using normal lisp structures in global variables as a psuedo-database
> during initial development and then switch to a real RDBMS backend?

I use SQLite, an embeddable SQL library. SQLite keeps the database in one
file. Starting over is as simple as blowing the old file away. Sometimes
I get lazy and use the SQL command line to make quick changes.

> When my SLIME backend goes down and I need to restart emacs

Maybe this will help?

http://home.comcast.net/~bc19191/blog/040304.html


--
Ng Pheng Siong <ng...@netmemetic.com>

http://firewall.rulemaker.net -+- Firewall Change Management & Version Control
http://sandbox.rulemaker.net/ngps -+- ZServerSSL/Zope Windows Installers

Chris C apel

unread,
May 19, 2004, 8:47:41 AM5/19/04
to
> > When my SLIME backend goes down and I need to restart emacs
>
> Maybe this will help?
>
> http://home.comcast.net/~bc19191/blog/040304.html

Maybe it will, but it seems to me like half the time I messed up lisp
somehow with stack overflows from trying to print out circular data
structures as part of building stack traces (it seems to happen a
couple seconds after my page is displayed in the browser)
(*print-circle* doesn't seem to work here) or my lisp process
disappears when I kill emacs. I'll certainly try it if I notice my
lisp process still running and healthy.

OT: Does anyone know how to change your posting name in Google Groups?
Maybe I should get a real newsreader, but maybe later. And I have a
typo in my name.

Christopher Browne

unread,
May 20, 2004, 12:06:39 AM5/20/04
to
After takin a swig o' Arrakan spice grog, "Will Hartung" <wi...@msoft.com> belched out:

> "Chris C apel" <ch...@ibanktech.net> wrote in message
> news:69d0f981.0405...@posting.google.com...
>> So here's my question to the group. What's a good way to start out
>> using normal lisp structures in global variables as a psuedo-database
>> during initial development and then switch to a real RDBMS backend? I
>> can see two issues that need to be considered right off.
>>
>> 1) Data access migration. I think this is simple, though, as I can
>> write a data access layer specific to my application and implement
>> it for a RDBMS after I have a clear idea of my data structure. Are
>> there packages to generate such a layer based on CLOS classes,
>> perhaps?
>
> Well, obviously you can do whatever you want to do, I'm not aware of
> any tools to help you here though.
>
> The other problem is essentially enforing the Relational view of
> your data in a decidedly non-relational system. The reason this is
> important is that if you don't take the discipline now to view your
> data relationally for eventual storage in a DBMS, then you'll have
> more headache later when you only want to change the query and
> persistence layer.

Hear, hear!

I'm working with systems where we contend with developers (of Java,
mostly) that never bothered grasping anything relational but that need
to store data in a relational database.

There are the irritating factors that:

a) In many cases, they never bother UPDATEing data, but instead
are prone to DELETE and INSERT data, apparently because it's
easier to do that.

b) They may not grasp the implications of transactions and data
visibility. In a multiuser system, this is sure to be
troublesome.

Then there's the disaster of writing Perl code with objects where the
objects try to store themselves as SQL data. Where they could run
SELECTs that would calculate complex things for many objects en masse,
the "object-relational" mapping seems to be basically a recipe for
doing both badly :-(.

> Also, try and think of "big" queries in your system, getting as much
> as practical in a single query rather than grabbing the nodes of
> your tree, and then lazily filling them in with queries to the child
> bits. That's fast on in memory systems, but will destroy performance
> on a DBMS. Round trips to the DBMS are Evil. SQL Joins are your
> friend.

Throw in CURSORs, that allow you to grab the result in "bite-sized"
groups, and that'll add some further balance to this; a big query
doesn't HAVE to get pulled to the LISP side all at once.

A neat approach that might even be supported (I dunno) would be to
have some sort of LOOP-ON-QUERY macro that would take a SELECT and use
CURSOR inside without requiring the programmer to consciously allocate
any of the extra bits of data structure...

> As you say, you can store everything is one, large, single rooted data
> structure and the simply "print" to save it, and then "read" to load it.
> That will work peachy for smaller data sets (with the definition of "small"
> being related to your machine performance and your patience for waiting for
> it to save and load).

Note that this is pretty much the way that applications that serialize
their data in XML form function.
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://www.ntlug.org/~cbbrowne/unix.html
It's easy to identify people who can't count to ten. They're in front
of you in the supermarket express lane. -June Henderson

Will Hartung

unread,
May 20, 2004, 3:20:06 PM5/20/04
to

"Christopher Browne" <cbbr...@acm.org> wrote in message
news:2h2p6eF...@uni-berlin.de...

> After takin a swig o' Arrakan spice grog, "Will Hartung" <wi...@msoft.com>
belched out:
> > The other problem is essentially enforing the Relational view of
> > your data in a decidedly non-relational system. The reason this is
> > important is that if you don't take the discipline now to view your
> > data relationally for eventual storage in a DBMS, then you'll have
> > more headache later when you only want to change the query and
> > persistence layer.
>
> Hear, hear!
>
> I'm working with systems where we contend with developers (of Java,
> mostly) that never bothered grasping anything relational but that need
> to store data in a relational database.

The mindsets are quite different. The other harder part is the debate over
where the data processing gets done (in the DB via stored procedures, or in
the host language -- triggers are a particularly cantankerous point of
contention).

> There are the irritating factors that:
>
> a) In many cases, they never bother UPDATEing data, but instead
> are prone to DELETE and INSERT data, apparently because it's
> easier to do that.

If you're persisting an object graph, it seems "easier" (particularly with
cascading deletes) to delete the root object and then save out the entire
graph again.

This becomes a PITA if you have other dependencies on the data to be deleted
(ala foreign keys in the DB). It's also inefficient if the scope of the
change is small and the size of the data is large.

On the other hand, you could read the "old" version from the DB and then
compare against the current version, but that incurs potentially the cost of
reading all of the data again, and deserializing it into your object graph,
which, again for a small change, is also expensive to do.

You can compare against a cached version of the old graph (cue the "Why
doesn't Lisp have a built in deep copy function" thread), but then you have
the constraint that the app has sole access to make data changes to database
(and thus the persisted version hasn't changed behind the back of the cached
version). This of course isn't always practical. The DB is, in fact, a data
store for crying out loud and folks get it in their silly little heads that
its OK to actually manipulate the data there!

Ideally, you'll just track that actual changes to your graph and persist
those appropriately (through crafty use of update, insert, and delete, after
re-reading the changed rows, just to ensure you're not smashing something by
using some kind of optimistic locking model...)..

Aaiiiee!!

"I just want to save my @*#$*@ DATA!! How hard can this be!?!???"

It's all compromise, and what's worse, is almost none of the problems folks
encounter are seen by lone developers independently running against 50 rows
of test data, all of the issues show up in production, under load, with lots
of (typically angry) users.

> Then there's the disaster of writing Perl code with objects where the
> objects try to store themselves as SQL data. Where they could run
> SELECTs that would calculate complex things for many objects en masse,
> the "object-relational" mapping seems to be basically a recipe for
> doing both badly :-(.

If you use "relational" objects, the O/R schemes aren't so bad, but it's not
clear to me that they're really worth the effort yet. The biggest thing that
the O/R tools bring to the table is portability across RDBMS's, because its
not clear to me that writing the SQL for a specific DB is no less work than
specifying all of the stuff for O/R tools. The primary complaint about
embedded SQL is that it doesn't port well (you have to rewrite to work on
different DBs).

But, most projects, particularly in house projects, simply don't have the
portability issue. It's a non-starter, they made their DB choice way up
front in the project, typically, and they're not moving anytime soon.

After that commitment is made, a lot can be done with simple "wizards" to
help eliminate most of the drudgery of crafting actual SQL for the project,
and you retain all of the control over the SQL that you need to get the real
performance out of the DBMS.

> > Also, try and think of "big" queries in your system, getting as much
> > as practical in a single query rather than grabbing the nodes of
> > your tree, and then lazily filling them in with queries to the child
> > bits. That's fast on in memory systems, but will destroy performance
> > on a DBMS. Round trips to the DBMS are Evil. SQL Joins are your
> > friend.
>
> Throw in CURSORs, that allow you to grab the result in "bite-sized"
> groups, and that'll add some further balance to this; a big query
> doesn't HAVE to get pulled to the LISP side all at once.
>
> A neat approach that might even be supported (I dunno) would be to
> have some sort of LOOP-ON-QUERY macro that would take a SELECT and use
> CURSOR inside without requiring the programmer to consciously allocate
> any of the extra bits of data structure...

The hard part is keeping the cursor open for any real length of time,
they're real resource bottlenecks with databases. So, while you can get
things in bite size chunks, its best to get in, do your business, and leave.
Also, the queries (especially those with ORDER BY on non-indexed fields)
consume a lot of resources on the DB whether you want one row, or all of
them.

If I have to load an object graph, or more specifically, a tree (like, say
order header, order detail, item detail, pricing, etc), it's usually works
out to just create a big 'ol SQL statement joining all of the appropriate
tables all at once and slurping the whole thing, building the graph as the
rows come in. This is MUCH faster than building it piecemeal. It's not clear
to me how well O/R tools are able construct loaders like this automagically.

Regards,

Will Hartung
(wi...@msoft.com)


Chris C apel

unread,
May 20, 2004, 8:30:47 PM5/20/04
to
"Will Hartung" <wi...@msoft.com> wrote in message news:<2gv1ucF...@uni-berlin.de>...

> "Chris C apel" <ch...@ibanktech.net> wrote in message
> news:69d0f981.0405...@posting.google.com...
> > 1) Data access migration. I think this is simple, though, as I can
> > write a data access layer specific to my application and implement it
> > for a RDBMS after I have a clear idea of my data structure. Are there
> > packages to generate such a layer based on CLOS classes, perhaps?
>
> The other problem is essentially enforing the Relational view of your data
> in a decidedly non-relational system. The reason this is important is that
> if you don't take the discipline now to view your data relationally for
> eventual storage in a DBMS, then you'll have more headache later when you
> only want to change the query and persistence layer.
>
> Also, try and think of "big" queries in your system, getting as much as
> practical in a single query rather than grabbing the nodes of your tree, and
> then lazily filling them in with queries to the child bits. That's fast on
> in memory systems, but will destroy performance on a DBMS. Round trips to
> the DBMS are Evil. SQL Joins are your friend.

That's a great point. I'm imagining a sort of data access layer where
I have around one function for every different time I access my data,
tailored to that specific instance. Otherwise, it's too easy to write
a bunch of data access functions that do small pieces and then try to
combine multiple calls to these methods. That's, as you say, a *bad
thing* when dealing with RDBMS's. But with (almost) one function for
each data access point, I force myself to think about how I can get my
data over the line in the most efficient way possible. Is this a good
design principle for DAL's in general?

> You don't need a "generic OO -> Relational" mapping system. Just a layer
> that seperates your data from the back end, and unless you have 1000 tables,
> you can write this yourself, by hand. By hand you can specialize your
> queries with a lot less complexity than using an OR mapper.

Agreed. I do have a question about data representation, though. While
it's tempting to just have references to related objects as sublists
in my data structure, in the DBMS they're going to be keyed and ID'd.
Is it OK, practical, and good practice to hide this in the data layer,
with the application code treating the references as transparent? I
would tend to think so, but, as I've hinted, either SLIME or SBCL in
combination with TBNL (and maybe Emacs, too) seem to have problems
with my circular references (regardless of *print-circle*, it tries at
random to print out my structure at the SLIME REPL). The alternative
would be to use a hash table and ID generator for my "tables".

This train of thought leads me to believe it would be difficult to
write a /generic/ data access layer that would be significantly less
complicated than the equivalent SQL in any case, except for
marshalling issues, and still perform acceptably under most real-world
conditions.

And all that seems like a shame, because data access seems, on its
face, to be something that could be macro-ized and library-ized to a
huge extent. For instance, in my day job, I find myself dealing with a
huge program (for what it is) whose job is simply to manipulate the
contents of five simple tables. Though there's probably a better way,
(especially in C#, with its datasets and SqlBuilders and whatnot) I
find myself writing a function that manually generates the SQL for
each changed row. And it's broken, too, because the INSERTs need to go
in order of table A, B, C, D, E, while the deletes need to go in E, D,
C, B, A, something I didn't realize until today. That means iterating
twice over each table-cache. I don't know that /I/ could do much
better if I were faced with the same task in Lisp. But faced with such
large amounts of repetition, one really does have to wonder how much
of it would be necessary with an optimal database library.

> Finally, you'll need some way to filter and order your data. As you can
> imagine, it's fairly easy to simply use hash tables to get basic references,
> but one of the features of the relational model is being able to query and
> order on arbitrary data elements, including those that are not indexed at
> all. Now, you may not feel you'll need to do that at all, but it's a shame
> to toss out a real benefit of the DBMS and design around it at this point.
> Kind of depends on your data model.

So you're saying what here? That my data access layer should include a
way to order the data that translates well into SQL?

> [..] Many [lisps] have the ability to dump, essentially, FASL files, which are


> normally used to store compiled code, but some Lisps can also save and load
> FASLs of any data structure. These load and save MUCH faster than "print" and
> "read",

Sounds like a cleaner way to do it, too.

Thanks for the points,

Chris Capel

Christopher Browne

unread,
May 20, 2004, 11:17:53 PM5/20/04
to
Clinging to sanity, ch...@ibanktech.net (Chris C apel) mumbled into her beard:

>> Also, try and think of "big" queries in your system, getting as much as
>> practical in a single query rather than grabbing the nodes of your tree, and
>> then lazily filling them in with queries to the child bits. That's fast on
>> in memory systems, but will destroy performance on a DBMS. Round trips to
>> the DBMS are Evil. SQL Joins are your friend.
>
> That's a great point. I'm imagining a sort of data access layer where
> I have around one function for every different time I access my data,
> tailored to that specific instance. Otherwise, it's too easy to write
> a bunch of data access functions that do small pieces and then try to
> combine multiple calls to these methods. That's, as you say, a *bad
> thing* when dealing with RDBMS's. But with (almost) one function for
> each data access point, I force myself to think about how I can get my
> data over the line in the most efficient way possible. Is this a good
> design principle for DAL's in general?

For a somewhat orthogonal view on this, consider the 'functional'
abstractions of Lisp and APL.

In Lisp you can use things like MAPCAR and APPLY to do things "en
masse," and there are doubtless significant optimization opportunities
in this.

In APL, the "reduce" operator is a parallel notion that allows hefty
optimization and even parallelization, on vector systems. APL systems
have typically been interpreted, traditionally assumed to be a
"performance killer," but if the operators being interpreted are all
invoking embarrassingly parallelizable operations, working on big
vectors of data, it'll mostly be harnessing the tuned-into-assembler
vector ops that are Way Fast.

Lisp SERIES provides further sorts of entertainment in "mass
processing of sequences."

The way you get high performance out of Perl is akin to this; you can
get almost-better-than-C behaviour if you're pounding the string
processing libraries that do en-masse stuff to big streams of text.

In all of these languages, what tends to "suck" is to process things
by processing things one-by-one with some form of interpreted logic
wrapped around it.

-> (loop for i from 1 to 1000000
do (perform-bunch-of-stuff-with-object (aref foo i)))

-> Anything where APL runs with loops where you process just a few
scalars inside the loop

-> Something in Perl that has to instantiate and 'interpret' a
million individual Perl objects.

Perl hash table with 10^6 entries: GOOD.
Set of 10^6 Perl objects each needing processing: BAD.

The Lisp example doesn't *have* to suck, but if processing the 10^6
items requires doing k * 10^6 individual I/O operations, k ops for
each item, it will quite likely "suck."


--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me

http://cbbrowne.com/info/linuxdistributions.html
"...It is meaningless to anyone unwilling to commit to forever using a
single vendor's operating system. Historically that seems to have
been a bad choice. Are you convinced that times have changed?"
-- Les Mikesell <l...@mcs.com>

Will Hartung

unread,
May 21, 2004, 5:08:09 PM5/21/04
to

"Chris C apel" <ch...@ibanktech.net> wrote in message
news:69d0f981.04052...@posting.google.com...

> That's a great point. I'm imagining a sort of data access layer where
> I have around one function for every different time I access my data,
> tailored to that specific instance. Otherwise, it's too easy to write
> a bunch of data access functions that do small pieces and then try to
> combine multiple calls to these methods. That's, as you say, a *bad
> thing* when dealing with RDBMS's. But with (almost) one function for
> each data access point, I force myself to think about how I can get my
> data over the line in the most efficient way possible. Is this a good
> design principle for DAL's in general?

First, ANY DAL is better than none. It's one thing to have SQL bundled in
your code, but it's another to have it bundled in your logic. Specifically,
having at least 1 level of abstraction between how you get your data, and
where you use your data is, I think, important, as it gives you that much
more of an opportunity to refactor things later compared to having the
access directly in your logic.

i.e.

(let ((orders (sql-results (format nil "SELECT * FROM ORDERTABLE WHERE
CUSTCODE='~A`" custcode)))
....)

is worse than

(let ((orders (get-orders-for-cust custcode)))
...)

Even if get-orders-for-cust is a simple wrapper.

In Days Of Yore, with ISAM style databases, this was the typical paradigm
for processing.

use OrderCustomerIndex
Find Order 'CUST001'
Fetch Order
while(order.custcode = 'CUST001')
use OrderDetailOrderIndex
Find OrderDetail order.orderno
Fetch OrderDetail
while(orderdetail.orderno = order.orderno)
use ItemItemCodeIndex
Find Item orderdetail.itemcode
Fetch Item
...
Fetch Next OrderDetail
end while
Fetch Next Order
end while

So, if you had 10 orders with 5 lines items each, this code would "hit" the
DB library 100 times (10 for each order, once for the OrderDetail and once
for Item data).

When DB calls are "cheap", this kind of thing isn't that expensive (since
every DBMS (Relational and Object) on the planet essentially has an ISAM
(B-Tree) core, they do this kind of thing all day long).

So, if you wrote your code with a DAL, you could easily come up with
something like this:

(defun process-orders (cust-code)
(let ((orders (find-orders cust-code)))
(dolist (order orders)
(let ((details (find-order-details (order-no order))))
(dolist (detail details)
(let ((item (find-item (detail-item-code detail))))
(do-stuff order detail item)))))))

But this is a trap, and this is where THE problem occurs with most any
relational mapping process. Naively implemented with SQL, again for 10
orders with 5 line items each, you end up with SIXTY(!!!) SQL hits to the
database. (1 query for the initial orders, 1 query for each order for its
detail, 1 query for each detail for the item data). This is Bad. Round trips
to the DB are SLLOOWWW, and this will KILL your performance.

Instead, you want to suck in the entire graph with a single query:

SELECT orders.*, orderdetails.*, items.*
FROM orders, orderdetails, items
WHERE orders.custcode = ?
AND orderdetails.orderno = orders.orderno
AND items.itemcode = orderdetails.itemcode

Then, you use that to build your graph so you can do this:

(defun process-orders (cust-code)
(let ((orders (find-orders cust-code)))
(dolist (order orders)
(let ((details (order-details order)))
(dolist (detail details)
(let ((item (detail-item detail)))
(do-stuff order detail item)))))))

And THAT's all well and good, but say you wanted to get the total of all
canceled orders:

(defun canceled-orders-total (cust-code)
(let ((orders (find-orders cust-code))
(total 0))
(dolist (order orders)
(when (equal (order-status order) "C")
(setf total (+ total (order-total order)))))
total))

This is fine, but you just sucked the bulk of ALL of the orders, including
their detail and item information, to get at a simple header field.

Now, certainly this could have been done with a simple query as well:

SELECT sum(total) FROM orders WHERE status = 'C' AND custcode=?

The point is that balancing the granularity of what your DAL does, and how
it does it, needs to be determined up front, at least in a rough edged way.
This is what I mean by being aware that an RDBMS is going to be your
eventual persistent store. If you don't assume that, you may very well end
up with a lot of code like the first example that will immediately suck as
soon as you upgrade to the SQL database.

But it's all a trade off. For example, the canceled-orders-total function.
Sure it slurps in far more information than it needs, but on the other hand,
it's called once a night, at 2am for a status report, so..who cares? Cue the
"Curse of early optimization" thread.

True story, we had a client that would run a Account Receivables Aging
report as part of their End Of Day processing. They watch it run for 45
minutes, then wait for the 150 page printout, grab it off the printer, go to
the LAST page and grab the total outstanding dollar amount, then toss the
report. That's an extreme example of this kind of balance. All they wanted
was that last number (which was a 10 second query to the DB, vs 45 minutes
and a small rainforest). It's also the kind of thing that runs great with 10
rows in the DB, but is just horrible in production. This was simple misuse
of the report, but I can guarantee you that if this was run nightly, and
then consulted in the morning (vs a barrier to closing the daily
operations), they'd have never complained about this, and we would have
never given them a better option.

That's the game of balancing abstraction, functionality and performance in
the world of data persistence.

> > You don't need a "generic OO -> Relational" mapping system. Just a layer

> > that separates your data from the back end, and unless you have 1000


tables,
> > you can write this yourself, by hand. By hand you can specialize your
> > queries with a lot less complexity than using an OR mapper.
>
> Agreed. I do have a question about data representation, though. While
> it's tempting to just have references to related objects as sublists
> in my data structure, in the DBMS they're going to be keyed and ID'd.
> Is it OK, practical, and good practice to hide this in the data layer,
> with the application code treating the references as transparent? I
> would tend to think so, but, as I've hinted, either SLIME or SBCL in
> combination with TBNL (and maybe Emacs, too) seem to have problems
> with my circular references (regardless of *print-circle*, it tries at
> random to print out my structure at the SLIME REPL). The alternative
> would be to use a hash table and ID generator for my "tables".

I like Object Graphs that are as pre-populated as necessary for the task,
and part of the DAL is to manage those relationships for me when the data is
persisted.

Depending on the system and the platform, object sharing of data with the
same identity may or may not be an issue.

If I have two complete Order objects, both for the same Customer, but
fetched at a different time:

(let ((orderA (find-order 1234))
(orderB (find-order 3456)))
...)

The question is how important is this concern:
(eq (order-customer orderA) (order-customer orderB)) vs
(equal (order-customer orderA) (order-customer orderB))

Most folks don't really have an issue with the two Customers in this case
having the same data, but not sharing identity. On the other hand, some
systems want to make sure that if someone updates the Customer Address,
every instance is updated as well. Depends on the application of course.

If you use actual CLOS objects vs plain lists, you can control how your
items print out, and how much detail you'll see for any particular instance.
There are also means to not print entire huge lists as well.

> This train of thought leads me to believe it would be difficult to
> write a /generic/ data access layer that would be significantly less
> complicated than the equivalent SQL in any case, except for
> marshalling issues, and still perform acceptably under most real-world
> conditions.
>
> And all that seems like a shame, because data access seems, on its
> face, to be something that could be macro-ized and library-ized to a
> huge extent.

Yes, you'd like to think so. To a certain level, it can be macro'd and
library'd to make it easier, but at that point you're still dealing with
SQL, and not necessarily your internal application datastructures.

Managing that interface between application data and the database is the
trick, and you'd like to think that IT could be macro'd and library'd
easily. But the dark side is the nature of SQL databases really throws a
wrench in the works.

A simple example from our current application.

We store a lot of tagged data in our DB, identified by a string Alias. The
detail is that we keep the entire history of that data in the DB. So, if you
update a value 5 times, there's 5 rows in the database.

Typically we need "the latest" value for several of these aliases. As a
generic SQL query, getting "the latest" is, essentially, a horrible query:

SELECT * FROM Table WHERE alias = 'Alais' and date = (Select max(date) from
table where alias = 'Alias')

But, we also tend to need "the previous value" as well (which would be the
one before the latest). Also, in one part of are system, we'll want several
hundred of these values. Which pretty much means:

SELECT * FROM Table WHERE alias = 'Alias' ORDER BY date DESC

You'll note that this in fact potentially reads ALL of the historical data,
even though in fact we only want the second row, particularly if DATE is not
indexed properly. Not a problem with 5 rows. A real problem with 5000. So,
somewhere between 5 and 5000 Happiness converts to Blood Boiling Rage.

With our current data load, it's is basically "cheaper" for us to load up
essentially the entire history for each data item, and toss most of it away
in the program.

e.g. SELECT * FROM Table where alias in ('Alias1','Alias2', ..., 'Alias100')

Even though we may well be only interested in 10% of the final data, it is
cheaper for us to throw data away than to send multiple queries to the
database. You can imagine how unintuitive that seems, but when a large part
of the cost of getting the data is the query itself, moreso than the actual
data it returns, you can see how this complicates the problem significantly.
As we get more and more data, though, we'll need to be smarter about this.

So, you have O/R tools that spend a lot of time on transactions, caching,
pre-fetching, lazy loading, backfilling writes, stale data, etc. Add in the
need to Join tables, and the problem space becomes very large. Dealing with
managing the granularity of the data access to the back end (as coarse as
possible) with the granularity of the data access at the front end (as fine
as possible) is quite a challenge.

> For instance, in my day job, I find myself dealing with a
> huge program (for what it is) whose job is simply to manipulate the
> contents of five simple tables. Though there's probably a better way,
> (especially in C#, with its datasets and SqlBuilders and whatnot) I
> find myself writing a function that manually generates the SQL for
> each changed row. And it's broken, too, because the INSERTs need to go
> in order of table A, B, C, D, E, while the deletes need to go in E, D,
> C, B, A, something I didn't realize until today. That means iterating
> twice over each table-cache. I don't know that /I/ could do much
> better if I were faced with the same task in Lisp. But faced with such
> large amounts of repetition, one really does have to wonder how much
> of it would be necessary with an optimal database library.

The key here is to realize that you consider things like iterating over your
table-cache as "expensive". Compared to a hit against the DB, odds are
pretty good, they're not even on the same orders of magnitude. Iterate away,
and make it perfect for the DB.

> > Finally, you'll need some way to filter and order your data. As you can
> > imagine, it's fairly easy to simply use hash tables to get basic
references,
> > but one of the features of the relational model is being able to query
and
> > order on arbitrary data elements, including those that are not indexed
at
> > all. Now, you may not feel you'll need to do that at all, but it's a
shame
> > to toss out a real benefit of the DBMS and design around it at this
point.
> > Kind of depends on your data model.
>
> So you're saying what here? That my data access layer should include a
> way to order the data that translates well into SQL?

If you want to leverage that capability from the DB, filtering and
sequencing, absolutely. Mind, that doesn't necessarily mean you need to
expose the filtering and ordering to you DAL clients, you can use simple
application specific options instead.

(find-orders-for-cust-code cust-code :canceled t)

vs

(find-orders-for-cust-code cust-code :with-filter '(= order.status "C"))

The first lets the DAL interpret "canceled = T" however it wishes, whereas
the second is more "generic", and perhaps exposes more about the data model
that may well change later.

The key is that the DB doesn't have to marshal the data in order to filter
and order it, whereas you do. Since the cost is typically in the query
itself and the marshalling, you may as well let the DB do what it does best
before you incur that cost. Doing stuff in your code that the DB does better
and cheaper seems excessive, but it certainly makes your queries more
complicated. Odds are for many queries, the runtime is going to be similar
with or without filtering and ordering (of course there are exceptions, but
that's the dark art of SQL optimizations), so since you "get it for free",
no reason for you to do it yourself.

Its a real puzzle. On the one hand I say that "the DB is evil. Treat it like
City Hall. A nice thing to have, but don't go there if you don't have to",
but then I assert that you should let the DB do what it does best.

If your DB is small enough (and will ALWAYS be small enough) that it can fit
into RAM, is cheap enough to load on startup, and you simply need to persist
the changes (ala the "Prevalance" concept), then you can approach your data
model in a much more "native" fashion (lists of maps of lists). The downside
being that as soon as you run out of RAM, all of your models break, and
break hard. That's why the assumption about using an RDBMS needs to be made
up front, and kept in your mind throughout the entire project. If you're
NEVER going to do that, the data will never be that big, and basically
you'll never have to really worry about querying the database dynamically on
the fly (vs, say, simply using it as a persistent store incrementally and
then loading from it at startup), then you can ignore the way an RDBMS
works. But if having more data than RAM is a very real possibility, design
up front with this in mind.

So, after all that rambling, the goal of the DAL is to enable you as a data
user to have as fine and simple access to the data as practical.

I'd love to be able to do stuff like this all day long:

(defun update-order-total (orderno)
(let ((order (find-order orderno)))
(when order
(setf (order-total order) 0)
(dolist (detail (order-details order))
(incf (order-total order) (detail-line-total detail))))))

If that was all in memory, that's all I would need to do and I'd be done.
But think through all that is going on here, in this simple example, and
toss in persistence, locking, transaction boundaries, minimizing database
access yet ensure data integrity and cache coherency, and this problem
becomes MUCH more difficult.

How much of the order do I really load in get-order? When do I load the
order-details? How do I manage the change to order-total? What if this order
is locked? If this loop fails half way through, what is the actual value of
the order-total? Was it stored in the DB?

Consider the (incf (order-total order) (detail-line-total detail))
expression. I can easily see a data layer hitting a db, or other persistent
change mechanism, for each change to order-total. And we'd all agree that
would be terrible (so, Don't Do That, use a temp variable, whatever). The
point being that there is nothing wrong with this code, but depending on how
your DAL works, it may be good, bad or simply indifferent code.

The trick is finding the happy medium for your project, something that lets
you start now and move forward later. As long as you keep the "limitations"
of an RDBMS in the back of your mind while developing, you shouldn't have
too much of a problem when you actually do switch over. But if you disregard
those limitations, thinking you'll "burn that bridge when you come to it",
there will be more than that bridge ablaze in the end I fear.

Regards,

Will Hartung
(wi...@msoft.com)


Rob Warnock

unread,
May 22, 2004, 9:11:13 PM5/22/04
to
Will Hartung <wi...@msoft.com> wrote:
+---------------

| Typically we need "the latest" value for several of these aliases. As a
| generic SQL query, getting "the latest" is, essentially, a horrible query:
|
| SELECT * FROM Table WHERE alias = 'Alais' and date = (Select max(date) from
| table where alias = 'Alias')
|
| But, we also tend to need "the previous value" as well (which would be the
| one before the latest). Also, in one part of are system, we'll want several
| hundred of these values. Which pretty much means:
|
| SELECT * FROM Table WHERE alias = 'Alias' ORDER BY date DESC
|
| You'll note that this in fact potentially reads ALL of the historical data,
| even though in fact we only want the second row, particularly if DATE is not
| indexed properly. Not a problem with 5 rows. A real problem with 5000. So,
| somewhere between 5 and 5000 Happiness converts to Blood Boiling Rage.
+---------------

PostgreSQL supports the LIMIT qualifier, doesn't DpANS SQL?

SELECT * FROM Table WHERE alias = 'Alias' ORDER BY date DESC LIMIT 10

That would get you just the 10 latest values, no matter how many rows
in the table matched.


-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

David Steuber

unread,
May 22, 2004, 11:41:37 PM5/22/04
to
rp...@rpw3.org (Rob Warnock) writes:

Oracle also supports LIMIT. The DB itself may internaly still go
through the entire table to do the sort even though it only returns up
to LIMIT results.

--
I wouldn't mind the rat race so much if it wasn't for all the damn cats.

Rob Warnock

unread,
May 23, 2004, 5:54:52 AM5/23/04
to
David Steuber <da...@david-steuber.com> wrote:
+---------------
| rp...@rpw3.org (Rob Warnock) writes:
| > [LIMIT 10] would get you just the 10 latest values, no matter

| > how many rows in the table matched.
|
| Oracle also supports LIMIT. The DB itself may internaly still go
| through the entire table to do the sort even though it only returns
| up to LIMIT results.
+---------------

Possibly, though after a number of runs perhaps the query optimizer
might be able to do something better. At least you'd be *telling* it
in a declarative way what you really want, whether or not a particular
query optimizer contains the mythical "sufficiently-smart compiler"
to make use of the information...

Dmitri Ivanov

unread,
May 24, 2004, 6:29:33 AM5/24/04
to
Hello David,
"David Steuber" <da...@david-steuber.com> wrote:

DS>> PostgreSQL supports the LIMIT qualifier, doesn't DpANS SQL?
DS>>
DS>> SELECT * FROM Table WHERE alias = 'Alias' ORDER BY date DESC
DS>> LIMIT 10 That would get you just the 10 latest values, no matter
DS>> how many rows in the table matched.
DS>
DS> Oracle also supports LIMIT. The DB itself may internaly still go
DS> through the entire table to do the sort even though it only returns
DS> up to LIMIT results.

FYG, ODBC also has a similar possibility by means of
SQLSetAttribute (... SQL_ATTR_MAX_ROWS ...)
--
Sincerely,
Dmitri Ivanov
lisp.ystok.ru


0 new messages