Relational database & OO

ajsp...@gmail.com

unread,

Nov 1, 2006, 8:52:37 PM11/1/06

to

My question relates to good design when it comes to creating an OO
application that uses a relational database to store most of its data,
how do people go about designing and storing the data once it has been
retrieved from the database? Do you keep it as records, or when you get
the data, do you create collections of objects which represent the data
which you have obtained from your database?

I am wanting to create an application that is for time management and I
want most of the data to be on my webservers database, but in creating
my OO 'client' I really want to create a good design.

Anthony

Frans Bouma

unread,

Nov 2, 2006, 3:37:03 AM11/2/06

to

ajsp...@gmail.com wrote:

It's IMHO always a good idea to start with a design for your abstract
datamodel. This is a level above E/R, so it can have inheritance etc.
For the technique, please check http://www.orm.net (Object Role
Modelling, formely known as NIAM)

This gives you a good idea which entities are identifyable within your
system, their attributes and their relations with other entities. Once
you've modelled this out, you can go in several directions: you can
pick this model and turn it into a domain model (thus a class model)and
simply get an o/r mapper of your liking and store the entities into a
physical datamodel matching your domain model.

You can also pick it up and create a physical data model in a database
with it and use it as well for your entity classes, which are then
consumed by your application logic.

Please read my essay about this here:
http://weblogs.asp.net/fbouma/archive/2006/08/23/Essay_3A00_-The-Databas
e-Model-is-the-Domain-Model.aspx

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

topmind

unread,

Nov 2, 2006, 11:44:07 AM11/2/06

to

This is an age-old quesition. OO and relational tend to have different
philosophical tilts such that there is no easy or universal way to meld
them.

-T-
oop.ismad.com

H. S. Lahman

unread,

Nov 2, 2006, 12:21:51 PM11/2/06

to

Responding to Ajspowart...

I assume the time management application does something complex rather
than simply acting as a pipeline between RDb and UI (i.e., CRUD/USER
processing). If not, you can stop reading.

Solving a time management problem is a quite different subject matter
than persisting the relevant data. Solve the time management problem in
you application first. Then, when you know what object attributes need
to be persisted, define a suitable Data Model for the database. The RDB
Data Model and the solution's Class Model will typically be different
for non-CRUD/USER applications because they need to be optimized
differently.

Finally, provide a subsystem for the application that maps between the
two views. The interface to the DB Access subsystem is defined by the
time management solution's need to obtain and store data to initialize
and save objects. The DB Access subsystem then maps those requests into
SQL queries or whatever that are appropriate for the Data Model.

Typically, on the solution side one will have factory objects to
instantiate the solution's objects. Those factory objects will request
the appropriate data from the DB Access to do the instantiation and
initialization. Conversely, when it is time to save attributes from the
solution objects, some object extracts the relevant attribute values
sends them off to the DB Access subsystem to be stored.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
h...@pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
in...@pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH

Doug Pardee

unread,

Nov 2, 2006, 4:00:27 PM11/2/06

to

ajsp...@gmail.com wrote:
> when it comes to creating an OO application that uses a relational
> database to store most of its data, how do people go about designing
> and storing the data once it has been retrieved from the database?
> Do you keep it as records, or when you get the data, do you create
> collections of objects which represent the data which you have
> obtained from your database?

Martin Fowler's book "Patterns of Enterprise Application Architecture"
covers this territory. It would be a good starting point.

aloha.kakuikanu

unread,

Nov 2, 2006, 4:17:45 PM11/2/06

to

Or yeah. A tiny time management application certainly needs to be
hammered with "Enterprise strength" approach.

Keep it simple. Write a minimal amount of code that transfers the
information from the database to the GUI. Don't create classes unless
you really forced to. Ignore all the temptations to design for "future
extensibilty" -- just admit you don't have crystall ball to know the
future. Embed all the SQL in java code, because it is the most simple,
readable and easily maintainable way to program database application.

AndyW

unread,

Nov 2, 2006, 5:40:38 PM11/2/06

to

On Thu, 02 Nov 2006 17:21:51 GMT, "H. S. Lahman"
<h.la...@verizon.net> wrote:

>Responding to Ajspowart...
>
>> My question relates to good design when it comes to creating an OO
>> application that uses a relational database to store most of its data,
>> how do people go about designing and storing the data once it has been
>> retrieved from the database? Do you keep it as records, or when you get
>> the data, do you create collections of objects which represent the data
>> which you have obtained from your database?
>>
>> I am wanting to create an application that is for time management and I
>> want most of the data to be on my webservers database, but in creating
>> my OO 'client' I really want to create a good design.
>
>I assume the time management application does something complex rather
>than simply acting as a pipeline between RDb and UI (i.e., CRUD/USER
>processing). If not, you can stop reading.
>
>Solving a time management problem is a quite different subject matter
>than persisting the relevant data. Solve the time management problem in
>you application first. Then, when you know what object attributes need
>to be persisted, define a suitable Data Model for the database. The RDB
>Data Model and the solution's Class Model will typically be different
>for non-CRUD/USER applications because they need to be optimized
>differently.

I agree with the solution, but I really think one is talking about a
Persistant Storage Engine (PSE) rather than perhaps an OO database. A
PSE can often be implemented using an object schema or RDB model.

The difference between what I think should be a true OO database and a
PSE is that to me the former has no separation of state between what
is executing and what is stored. There shouldnt be any concept of
storage or transience or separation of application and data - just
"IS-ness of the whole" as I call it.

I made the comment in another thread that OO does not have a concept
of transience - its a programmers abstraction and probably the cause
of many RDB vs OO DB debates..

Once a developer starts talking about persistance and those related
terms they are talking about separation of data and code and to be
honest, they may as well use an RDB.

I've always felt that a good way to understand OO DB concepts is to do
some CORBA development. One calls an interface on an object in ones
client application, the fact that the behind the scenes execution is
done in a different country on the other side of the world and may or
may not invovle a cluster of database servers (object or relational)
is completely transparent to the client developer. To the client -
things just happen.

Andy

fre...@gmail.com

unread,

Nov 3, 2006, 12:53:45 AM11/3/06

to

> Once a developer starts talking about persistance and those related
> terms they are talking about separation of data and code and to be
> honest, they may as well use an RDB.

If the developer uses a RDB, he will not have to care about the
persistance aspect at all. Persistence is a feature at the bottom of
the RDB stack. The user of a RDB doesn't know of the data he is working
with is cached or actually read from disk. Even if persistence is not
needed, one would probably need a RDB for all other features.

> I've always felt that a good way to understand OO DB concepts is to do
> some CORBA development. One calls an interface on an object in ones
> client application, the fact that the behind the scenes execution is
> done in a different country on the other side of the world and may or
> may not invovle a cluster of database servers (object or relational)
> is completely transparent to the client developer. To the client -
> things just happen.

The same could be said about SQL and ODBC/JDBC/ADO. The fact that
behind the scenes execution is done using a all-in-memory lightweight
database or a monster database like DB2 is completely transparent to
the client developer.

Fredrik Bertilsson

Matt McGill

unread,

Nov 3, 2006, 7:54:57 AM11/3/06

to

fre...@gmail.com wrote:
> > Once a developer starts talking about persistance and those related
> > terms they are talking about separation of data and code and to be
> > honest, they may as well use an RDB.
>
> If the developer uses a RDB, he will not have to care about the
> persistance aspect at all. Persistence is a feature at the bottom of
> the RDB stack. The user of a RDB doesn't know of the data he is working
> with is cached or actually read from disk. Even if persistence is not
> needed, one would probably need a RDB for all other features.

An RDBMS does a great job of handling all of that mess, and more (ACID
transactions are the real kicker - thank goodness we don't have to
re-implement /that/ wheel) for _data_, but the OO programmer who wants
her _objects_ (data married to related behavior) to 'persist' across
multiple user sessions/program executions/phases of the moon is working
at a somewhat higher level. The problem then becomes one of
instantiating relationships - if an object is persistent, it needs to
be loaded from the storage mechanism at some point. Who does that? Who
loads other persistent objects associated with the first? Where are the
transaction boundaries? If these persistence-related concerns are
sprinkled about your business logic, your code will be much harder to
adapt to changing requirements.

Quoting AndyW:
>> The difference between what I think should be a true OO database and a
>> PSE is that to me the former has no separation of state between what
>> is executing and what is stored. There shouldnt be any concept of
>> storage or transience or separation of application and data - just
>> "IS-ness of the whole" as I call it.
>>
>> I made the comment in another thread that OO does not have a concept
>> of transience - its a programmers abstraction and probably the cause
>> of many RDB vs OO DB debates..

I agree - unfortunately, for those of us still doing ORM, the
frameworks available still require you to do a lot of grunt work to
maintain your objects. Hibernate, for example, is designed for fairly
short 'sessions' of work which will be persisted. When a Session has
closed, the objects loaded in that session are considered 'detached,'
and changes to their internal state will not be persisted unless they
are merge()ed with another Session. Worse, accessing a lazily-loaded
collection on a detached object throws an exception, because the
Session is closed and the collection proxy has no way of obtaining its
objects.

I've been lately relying on a combination of AOP and transitive
persistence to try and seperate these sorts of persistence concerns out
from the rest of my code entirely. Spring has some useful AOP helper
classes which allow for declarative transaction demarcation, which is
one piece of the puzzle. I've written some of my own AOP wrappers for
re-attaching detached objects to new sessions automagically. Making use
of Hibernate's transitive persistence mechanism to transparently
cascade re-attachments when necessary completes the picture.

Using these techniques, the high-level application and business logic
for our current app has stayed *very* clean and simple - significant
program changes have been comparatively easy because there is no
persistence-related code which needs to change when the business logic
needs to change.

What's been /really/ interesting has been trying to allow for
relationships between objects which are persisted to entirely different
data stores =) I haven't pinned down a good way to do this with
Hibernate just yet (assuming one can be found).

fre...@gmail.com

unread,

Nov 3, 2006, 10:12:41 AM11/3/06

to

> but the OO programmer who wants
> her _objects_ (data married to related behavior) to 'persist' across
> multiple user sessions/program executions/phases of the moon is working
> at a somewhat higher level.

If the OO programmer only needs persistence, and RDBMS is a huge
overkill. A much simpler tool would do.

> The problem then becomes one of
> instantiating relationships - if an object is persistent, it needs to
> be loaded from the storage mechanism at some point. Who does that? Who
> loads other persistent objects associated with the first? Where are the
> transaction boundaries?

A RDBMS is not a storage mechanism. A storage mechnism is much simpler.
Transactions and persistence are two orthogonal features.

> If these persistence-related concerns are
> sprinkled about your business logic, your code will be much harder to
> adapt to changing requirements.

If you use SQL in your application, your application will contain no
persistence related logic at all.

Fredrik Bertilsson

Matt McGill

unread,

Nov 3, 2006, 11:26:56 AM11/3/06

to

fre...@gmail.com wrote:
> If the OO programmer only needs persistence, and RDBMS is a huge
> overkill. A much simpler tool would do.

What would you suggest as a simpler tool? In Java the serialization
mechanism is the first thing that comes to my mind, but I don't see how
that would be anywhere near sufficient. It's sort of all-or-nothing.
Sure, you can serialize your entire object graph to a file when your
application shuts down, and then pull it all back in, if you want. But
that might not be such a hot idea if you've got 4 million customer
objects, and only 200 customers are ever using your application at a
time. Having the ability to be very selective about which objects you
actually have in memory at any given time is sort of nice, and the
querying capabilities built into an RDBMS give you the ability to be
selective.

>
> > The problem then becomes one of
> > instantiating relationships - if an object is persistent, it needs to
> > be loaded from the storage mechanism at some point. Who does that? Who
> > loads other persistent objects associated with the first? Where are the
> > transaction boundaries?
>
> A RDBMS is not a storage mechanism. A storage mechnism is much simpler.
> Transactions and persistence are two orthogonal features.

I grant you that transactions and persistence are seperate concerns,
and that an RDBMS does a lot more than just store data. But it /does/
store data. What do you mean "not a storage mechanism"? Maybe the
better question is, "What did you think I meant by 'storage
mechanism'?" because we clearly aren't on the same page.

> > If these persistence-related concerns are
> > sprinkled about your business logic, your code will be much harder to
> > adapt to changing requirements.
>
> If you use SQL in your application, your application will contain no
> persistence related logic at all.

How can you consider 'INSERT INTO table_name VALUES (1, 2, 3)' to be
anything /other/ than persistence logic? It's an imperitive statement
that persists data in just about as explicit a way as I can think of.
When you write an application using straight SQL all over the place,
there is /no/ seperation of application flow or business logic from
persistence logic at all - it's just a big mishmash.

As transactionality and persistence are orthogonal, so is business
logic and the means by which the data that logic operates on is stored.
Maybe it's in a database. Maybe it's in flat files. Maybe there /is/ no
database, and verything is 'persistence' because it's all in memory all
the time. It shouldn't matter to your business logic.

-Matt McGill

PS

unread,

Nov 3, 2006, 12:20:30 PM11/3/06

to

"aloha.kakuikanu" <aloha.k...@yahoo.com> wrote in message
news:1162502265.4...@b28g2000cwb.googlegroups.com...

>
> Doug Pardee wrote:
>> ajsp...@gmail.com wrote:
>> > when it comes to creating an OO application that uses a relational
>> > database to store most of its data, how do people go about designing
>> > and storing the data once it has been retrieved from the database?
>> > Do you keep it as records, or when you get the data, do you create
>> > collections of objects which represent the data which you have
>> > obtained from your database?
>>
>> Martin Fowler's book "Patterns of Enterprise Application Architecture"
>> covers this territory. It would be a good starting point.
>
> Or yeah. A tiny time management application certainly needs to be
> hammered with "Enterprise strength" approach.

My newsreader must show different posts than yours does. I didn't see the
"tiny application" mention.

>
> Keep it simple. Write a minimal amount of code that transfers the
> information from the database to the GUI. Don't create classes unless
> you really forced to. Ignore all the temptations to design for "future
> extensibilty" -- just admit you don't have crystall ball to know the
> future. Embed all the SQL in java code, because it is the most simple,
> readable and easily maintainable way to program database application.

Again I didn't see where the Java application was mentioned.

I need to read more carefully from now on.

Must go. Got to get over to the database newsgroups to so some trolling.

PS

fre...@gmail.com

unread,

Nov 3, 2006, 12:47:23 PM11/3/06

to

> > If the OO programmer only needs persistence, and RDBMS is a huge
> > overkill. A much simpler tool would do.
>
> What would you suggest as a simpler tool?

Berkeley DB for example.

> Sure, you can serialize your entire object graph to a file when your
> application shuts down, and then pull it all back in, if you want. But
> that might not be such a hot idea if you've got 4 million customer
> objects, and only 200 customers are ever using your application at a
> time.

You just proved that persistence is not the only feature we use in a
database. Concurrency is another important feature. Otherwise you have
to implement that in your application.

> Having the ability to be very selective about which objects you
> actually have in memory at any given time is sort of nice, and the
> querying capabilities built into an RDBMS give you the ability to be
> selective.

Very useful, but completely orthogonal to persistence.

> > A RDBMS is not a storage mechanism. A storage mechnism is much simpler.
> > Transactions and persistence are two orthogonal features.
>
> I grant you that transactions and persistence are seperate concerns,
> and that an RDBMS does a lot more than just store data. But it /does/
> store data.

Not necessarly. You might use a all-in-RAM RDBMS.

> > > If these persistence-related concerns are
> > > sprinkled about your business logic, your code will be much harder to
> > > adapt to changing requirements.
> >
> > If you use SQL in your application, your application will contain no
> > persistence related logic at all.
>
> How can you consider 'INSERT INTO table_name VALUES (1, 2, 3)' to be
> anything /other/ than persistence logic?

An insert statement may not write anything to disk at all. It might
only put the data into the cache. A all-in-RAM database would not write
anything to disk.

> When you write an application using straight SQL all over the place,
> there is /no/ seperation of application flow or business logic from
> persistence logic at all - it's just a big mishmash.

SQL is not about persistence, so your conclustion is false. SQL is
about data management.

> As transactionality and persistence are orthogonal, so is business
> logic and the means by which the data that logic operates on is stored.

How data is stored is hidden deep inside the DBMS. When you use Oracle,
you have no idea how the data is stored.

> Maybe it's in a database. Maybe it's in flat files. Maybe there /is/ no
> database, and verything is 'persistence' because it's all in memory all
> the time. It shouldn't matter to your business logic.

If your application only needs flat files, a database is overkill. If
your application needs a RDBMS, flat files are never an option. If your
application operates all in memory, you would probably still need a
RDBMS.

Fredrik Bertilsson

Matt McGill

unread,

Nov 3, 2006, 1:42:05 PM11/3/06

to

fre...@gmail.com wrote:
> > I grant you that transactions and persistence are seperate concerns,
> > and that an RDBMS does a lot more than just store data. But it /does/
> > store data.
>
> Not necessarly. You might use a all-in-RAM RDBMS.

Oy. I know this is bad form, but I can't help it. From dictionary.com:
store /stɔr, stoʊr/ [stawr, stohr] –verb (used with object)
8. to supply or stock with something, as for future use.
9. to accumulate or put away, for future use (usually fol. by up or
away).
10. to deposit in a storehouse, warehouse, or other place for keeping.
11. Computers. to put or retain (data) in a memory unit.

Last time I checked, RAM was considered to be a memory unit. And I'm
pretty sure I've read plenty of talk about 'storing' data in a stack,
list, or some other ADT. If the word 'store' is no longer allowed to
refer to anything other than disk writes, I never got the memo.

We're both nitpicking each other to death here and apparently mapping
entirely different meanings to the same set of words. Evidently
'persistence' evokes very different concepts in my mind and in yours,
perhaps as a result of different development backgrounds? Anyway, I'm
going to stop cluttering this thread.

Berkley DB would indeed be a good choice of persistence back-end for
certain types of applications - thank you for mentioning it.

H. S. Lahman

unread,

Nov 3, 2006, 4:16:59 PM11/3/06

to

Responding to AndyW...

>>Solving a time management problem is a quite different subject matter
>>than persisting the relevant data. Solve the time management problem in
>>you application first. Then, when you know what object attributes need
>>to be persisted, define a suitable Data Model for the database. The RDB
>>Data Model and the solution's Class Model will typically be different
>>for non-CRUD/USER applications because they need to be optimized
>>differently.
>
>
> I agree with the solution, but I really think one is talking about a
> Persistant Storage Engine (PSE) rather than perhaps an OO database. A
> PSE can often be implemented using an object schema or RDB model.

That's fine. In fact, it is one of the advantages of the approach I
proposed. It really doesn't matter to the time management problem
solution whether the data is persisted in an RDB, an OODB, flat files,
or clay tablets. The persistence mechanisms are fully encapsulated in
the DB Access subsystem. So one could regard the DB Access subsystem I
proposed as being a PSE from the perspective of the problem solution.

Among other things, that allows one to change one's mind about how the
data is stored without having to touch the problem solution in any way.
All one needs to do is replace the DB Access subsystem underneath the
same subsystem interface.

Another advantage is reuse. Once one starts to abstract a DB Access
subsystem for, say, the RDB paradigm, one quickly realizes that the
abstractions are things like Schema, Table, Tuple, Query, etc. that are
quite generic. So one can reuse the DB Access subsystem across
applications with relatively little extra effort. (All one needs is an
identity mapping between the problem solution and the RDB artifacts,
which is easily described in configuration data for a particular
application.) In fact, RAD IDEs are based upon exactly that sort of
problem-independent reuse.

<aside>
People tend to discount swapping paradigms as an advantage because RDBs
have dominated for 2+ decades. However, I've personally watched the
paradigm for persistence changing from paper tape/punched cards to flat
sequential disk to ISAM to CODASYL to RDBs to OODBs (in at least some
niches). Each shift required major surgery for huge amounts of legacy
code because the persistence mechanisms were not encapsulated. And each
paradigm shift was regarded as the last possible -- until some new
technology appeared. So I wouldn't bet against another paradigm shift
in the future just because RDBs have been around awhile.

I would also argue that modern RDB or OODB engines are not as portable
as the vendors would have us believe. Each vendor has their own SQL
extensions or whatever. And they each have their own unique quirks that
need to be addressed in the way the data is accessed for optimal
performance. So even when one just changes vendors within the RDB
paradigm or the OODB paradigm, one may have to do some tweaking and that
should be isolated from the problem solution code.
</aside>

aloha.kakuikanu

unread,

Nov 3, 2006, 5:14:29 PM11/3/06

to

Matt McGill wrote:
> Evidently
> 'persistence' evokes very different concepts in my mind and in yours,
> perhaps as a result of different development backgrounds? Anyway, I'm
> going to stop cluttering this thread.

Persistence is a significant idea from a programmer perspective who is
unfamiliar with database management fundamentals. "You can save a
subgraph of your program's object spaghetty into a file and it can
outlive the program!" Wow, what a big deal.

In relational world you deal with logical concepts, and lifetime
doesn't apply to logical entities. The predicate "John was married to
Kathy in 2005" may be a valid statement or not, but anyhow its validity
is independent of the puny artifact whether your application is
currently running or not. Therefore, all what database system does is
maintaining an environment where customer can query such predicates.
Now, it is up to database vendors how faithfully they do this job. If
customer cares about quality of ther data, e.g. minimal chances of data
lost, no phantom data, etc, then they would likely to choose
traditional database vendor. If they have lesser requirements, they
could choose a vendor who sacrifices data quality in favor of other
features (cost, speed, etc).

topmind

unread,

Nov 3, 2006, 5:32:28 PM11/3/06

to

That is more or less the "Big Iron" view of RDBMS. I used to use a lot
of XBase (dBase derivative), and it made creating, managing, and using
app-level tables quite easy. I didn't have to worry much about RAM/Disk
dichotomies: it cached what it could automatically. Unfortunately,
tools moved away from "nimble tables" because OO was supposed to
replace all that. Well, OO stunk at it and I want my nimble tables
back. Radio Killed the Video Show. It changed the way I think about
programming.

Tables are not something to hide/wrap away in a formal closet, but
fantastic tools when used and implimented right.

-T-

AndyW

unread,

Nov 3, 2006, 7:06:57 PM11/3/06

to

I would agree but again to me its still really at the end of the day
talking about relational issues ant not OO ones. Persistance to me
immediately evokes the need to be storing data - once that separation
is made, then one I think is in the relational world, no matter what
else they choose to believe (although this is not a bad thing).

>
>I would also argue that modern RDB or OODB engines are not as portable
>as the vendors would have us believe. Each vendor has their own SQL
>extensions or whatever. And they each have their own unique quirks that
>need to be addressed in the way the data is accessed for optimal
>performance. So even when one just changes vendors within the RDB
>paradigm or the OODB paradigm, one may have to do some tweaking and that
>should be isolated from the problem solution code.
></aside>

I agree with the portability thing to be honest, unless the database
is kept very simple and genereric, its often better to just have the
simple API that sits on top of a plug-in architecture.

I sometimes use CORBA IDL to describe a problem because I can look at
it from the client side and see the OO and business mechanism being
expressed or I can look at it from the implementation side (server
side if you will) and see the often non-OO details.

Here is a basic example that you may find in a customer care &
billing system.

Module CustomerAccount {
typedef float Account_balance;
typedef float Amount_value;

interface Account {
exception NoFunds {...}
exception TransactionError {...]

void Withdraw( in Amount_value someAmount )
raises NoFunds;

void Deposit( in Amount_value someAmount )
raises TransationError;
}

interface ServiceAccount : Account {...}
interface ConsumerAccount : Account {...]

struct AccountDetail {
Customer customer;
AccountType typeOfAccount
Account account;
etc.
}
typedef sequence<AccountDetail> customerAcounts;
}

I've left out the customer definiton, but in all general purposes its
a list of client accounts, each one contains a customer (billing and
contact details) and an account (which can be of any type depending on
your business structure.

Now, from the implementation side - there is no persistance, thats
where the OO side of things really shines thru. The developer in
america or the one in japan who may be implementing this have no
knowledge of the technical detail behind the interface they are
looking at. All they see is business functions (in their programming
language of choice).

This is the level that I prefer to work at - in other words - I sweat
the detail later and to me is a benefit of OO.

However, the developer who is working on the implementation side
(unfortunately and confusingly called a server in corba terms) which
may be located in the UK, gets to see the detail behind the scenes.
Now they may be using a persistance storage engine, it may be a
relational database, or, if a telco solution, it actually may be an
interface to a real time billing system and may not have a database at
all (thats in this context).

Now from the client side point of view, they may be just trying to get
a list of customers who belong to a business entity, display the list
on the screen, then allow the CSR to see that customers real time
billing detail. The programmer may figure that the customer data has
come from a local database, but might not be aware that the $59.50c
that the customer currently owes is produced by a real time billing
sytem over in Hong Kong (ok, I know it will be slow - this is just an
example).

From the server side, its time to sweat the detail - this is where the
plug-innable architecture works for the server side stuff. What brand
of RDB/ODB cluster is used and how its used, or if another system is
being interconnected to (and requires an interface in SOA terms). It
doesnt even have to be an OO system at the back end. Procedural or
RPC functions may be good enough.

To answer the orginal question, to me, working with an OO database is
like working as the client programmer in the example above. Using a
relational database to me is like being the server side programmer in
the above example. Two conceptually different ways of thinking about
the same problem - neither is wrong, but each has its time and place.

Its for that reason I'm not keen on really going down the OODB is
better than RDB etc(although I do like that debate for fun). To be
honest, its important to select the correct technology for the problem
being solved and to do that, one has to be aware of the strengths and
weaknesses of each - perhaps by doing some product evaluation first or
by doing some small pilot projects, or even using a mix of both. I
would suggest bias if one is going down the x is better than y path,
and that isnt all thats best for the customer.

As to why these technologies are not as pervasive as they perhaps
should be. I would suggest that it would be more to do with the
number of people who develop large scale customer care and billing
systems vs the number of people who develop single user applications
that run on PCs. There isnt much of a demand for enterprise level
architecture on the desktop so the latter group of developers are not
really going to generate that much demand as they would for the more
simple to understand technologies.

Andy

Robert Martin

unread,

Nov 3, 2006, 9:35:18 PM11/3/06

to

The answer depends a great deal on the application itself. There is
nothing intrinsically wrong with dealing directly with table data;
indeed, that is often the simplest solution. On the other hand,
complex applications often need to have complex behavior associated
with the data, and so objects become important. The objects are
sometimes related to the relational tables, but often they are not.

Keep this in mind. Relational tables contain data in a behavior
independent way. Objects express behavior in a data independent way.
This is not to say that there isn't behavior implied by the relational
tables, or data implied by the objects. Rather those implications are
kept abstract. The behavior implied by the relational tables could be
written in any language an in any form. The data implied by the
objects could be expressed in any of a menagerie of different forms.

So, here's the trick. Start out simple and just use table data.
Create objects only when you need to. For example, you may need an
object to help you isolate your application to make it testable. Or
you may need an object to make your application more generic.
--
Robert C. Martin (Uncle Bob)  | email: uncl...@objectmentor.com
Object Mentor Inc.   | blog:  www.butunclebob.com
The Agile Transition Experts  | web:   www.objectmentor.com
800-338-6716   |

Robert Martin

unread,

Nov 3, 2006, 9:39:49 PM11/3/06

to

On 2006-11-02 15:17:45 -0600, "aloha.kakuikanu"
<aloha.k...@yahoo.com> said:

> Keep it simple. Write a minimal amount of code that transfers the
> information from the database to the GUI. Don't create classes unless
> you really forced to. Ignore all the temptations to design for "future
> extensibilty" -- just admit you don't have crystall ball to know the
> future. Embed all the SQL in java code, because it is the most simple,
> readable and easily maintainable way to program database application.

This is very good advice. I would add the DRY principle. "Don't
Repeat Yourself" (See "The Pragmatic Programmer" by Dave Thomas and
Andy Hunt).

For example, if you find yourself building the same SQL query in more
than one place, or a similar query, then consolidate that code into a
single module, which will probably be a class.

aloha.kakuikanu

unread,

Nov 3, 2006, 10:18:09 PM11/3/06

to

Robert Martin wrote:
> On 2006-11-02 15:17:45 -0600, "aloha.kakuikanu"
> <aloha.k...@yahoo.com> said:
>
> > Keep it simple. Write a minimal amount of code that transfers the
> > information from the database to the GUI. Don't create classes unless
> > you really forced to. Ignore all the temptations to design for "future
> > extensibilty" -- just admit you don't have crystall ball to know the
> > future. Embed all the SQL in java code, because it is the most simple,
> > readable and easily maintainable way to program database application.
>
> This is very good advice. I would add the DRY principle. "Don't
> Repeat Yourself" (See "The Pragmatic Programmer" by Dave Thomas and
> Andy Hunt).
>
> For example, if you find yourself building the same SQL query in more
> than one place, or a similar query, then consolidate that code into a
> single module, which will probably be a class.

Yes, if a SQL query is repeated in 2 places, then the surrounding JDBC
glue code is repeated as well. It should be refactored into a single
function. This doesn't really change the fact that SQL remains inlined
into java code, so that the code is still very readable.

It is not uncommon for OOP folks to extend your idea, and insist that
one should always refactor SQL code into a dedicated module. Even if
you don't have to, just in case if some other part of the system may
require the same query. I suggest that the cases where you need the
same query are exceptions rather than norm. If you have duplicating
query, perhaps, you should reexamine your functionality? Chances are it
is a design flaw.

fre...@gmail.com

unread,

Nov 4, 2006, 2:07:10 AM11/4/06

to

> Yes, if a SQL query is repeated in 2 places, then the surrounding JDBC
> glue code is repeated as well. It should be refactored into a single
> function. This doesn't really change the fact that SQL remains inlined
> into java code, so that the code is still very readable.

It also depends on the size and complexity of the SQL statement. If the
SQL statement are simple like
select id, description from company where city=? or
update employee set status=? where id=?
there are still no reasons to put it into a separate function, even if
it is called from multiple points. The function call itself will have
the same verbosity as the SQL statement.

Fredrik Bertilsson
http://frebe.php0h.com

fre...@gmail.com

unread,

Nov 4, 2006, 2:30:16 AM11/4/06

to

> Oy. I know this is bad form, but I can't help it. From dictionary.com:
> store /stɔr, stoʊr/ [stawr, stohr] –verb (used with object)
> 8. to supply or stock with something, as for future use.
> 9. to accumulate or put away, for future use (usually fol. by up or
> away).
> 10. to deposit in a storehouse, warehouse, or other place for keeping.
> 11. Computers. to put or retain (data) in a memory unit.
>
> Last time I checked, RAM was considered to be a memory unit. And I'm
> pretty sure I've read plenty of talk about 'storing' data in a stack,
> list, or some other ADT. If the word 'store' is no longer allowed to
> refer to anything other than disk writes, I never got the memo.

In that case, calling object.setSomething(data) is also about
"storing". Actually a program is all about storing, because you store
data in different variables all the time. In other words, "store" is
not a very useful word in this context. I didn't introduce the word
"store" in this thread. I was talking about "persistence" and the
common misconception that a RDBMS is mainly used for persistence, and
that using SQL would reveal anything about how data is persisted.

> Evidently
> 'persistence' evokes very different concepts in my mind and in yours,
> perhaps as a result of different development backgrounds?

Everybody with a solid background using RDBMS knows that a RDBMS is
about much more than persistence, and would still use a RDBMS even if
persistence is not needed. Many people from OO-land implements a lot of
data management features by them self in every application, instead of
using the features already provided by the RDBMS, and uses the RDBMS
only for persistence. If you want to, I can give you real-world
examples with the various downsides with this approach.

Fredrik Bertilsson
http://frebe.php0h.com

sjde...@yahoo.com

unread,

Nov 4, 2006, 5:00:56 AM11/4/06

to

aloha.kakuikanu wrote:
> Or yeah. A tiny time management application certainly needs to be
> hammered with "Enterprise strength" approach.
>
> Keep it simple. Write a minimal amount of code that transfers the
> information from the database to the GUI. Don't create classes unless
> you really forced to. Ignore all the temptations to design for "future
> extensibilty" -- just admit you don't have crystall ball to know the
> future.

All excellent advice.

> Embed all the SQL in java code, because it is the most simple,
> readable and easily maintainable way to program database application.

This doesn't make sense to me. If you're going to take the time to
force all your database calls through one language, you're better off
just doing an full mandatory database layer from the start without
embedded SQL calls. In limited cases, mandating only one language for
DB access may be desirable (e.g to ease the burden of installing
appropriate database connectors for multiple libraries).

But in general, if you do mandate one language then you might as well
mandate a full-featured DB layer--much of the benefit of allowing
ad-hoc SQL calls is lost if you can't do it in the same language as the
rest of the application.

Whatever you choose, if you do use embedded SQL (which I think is
usually a good idea in practice) you _must_ be rigorous about factoring
out any nontrivial queries that are repeated more than once.

fre...@gmail.com

unread,

Nov 4, 2006, 7:03:04 AM11/4/06

to

> > Embed all the SQL in java code, because it is the most simple,
> > readable and easily maintainable way to program database application.
>
> This doesn't make sense to me. If you're going to take the time to
> force all your database calls through one language, you're better off
> just doing an full mandatory database layer from the start without
> embedded SQL calls.

This doesn't make sense to me. What different languages are you
talkning about. Who is talking time to forcing anything. The original
statement was to use SQL whenever it is appropiate, not forcing it into
a special layer.

> In limited cases, mandating only one language for
> DB access may be desirable (e.g to ease the burden of installing
> appropriate database connectors for multiple libraries).

All languages should be allowed to embedd SQL.

> But in general, if you do mandate one language then you might as well
> mandate a full-featured DB layer

Nobody mandated only one language.

> Whatever you choose, if you do use embedded SQL (which I think is
> usually a good idea in practice) you _must_ be rigorous about factoring
> out any nontrivial queries that are repeated more than once.

Of course. But this is the fact with all code fragements. As soon as
you have a nontrivial fragement of code that are repeated more than
once you should to factory out it into a function. That is the art or
programming.

Fredrik Bertilsson
http://frebe.php0h.com

Dmitry A. Kazakov

unread,

Nov 4, 2006, 9:42:35 AM11/4/06

to

On 3 Nov 2006 14:14:29 -0800, aloha.kakuikanu wrote:

> Matt McGill wrote:
>> Evidently
>> 'persistence' evokes very different concepts in my mind and in yours,
>> perhaps as a result of different development backgrounds? Anyway, I'm
>> going to stop cluttering this thread.
>
> Persistence is a significant idea from a programmer perspective who is
> unfamiliar with database management fundamentals. "You can save a
> subgraph of your program's object spaghetty into a file and it can
> outlive the program!" Wow, what a big deal.
>
> In relational world you deal with logical concepts, and lifetime
> doesn't apply to logical entities.

This is of course wrong. Persistence addresses not time, but scope. If the
scope is bound to time, that makes the thing "real-time," which alone is
unrelated to persistence.

> The predicate "John was married to
> Kathy in 2005" may be a valid statement or not, but anyhow its validity
> is independent of the puny artifact whether your application is
> currently running or not.

That depends on the scope again. In a different scope the predicate might
turn worthless, wrong or illegal.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

H. S. Lahman

unread,

Nov 4, 2006, 11:10:23 AM11/4/06

to

Responding to AndyW...

I argue that the DB Access subsystem needs to deal with the particular
persistence mechanisms that are at hand. The abstractions one needs to
do that will be different for flat files (file/Line), RDBs
(Table/Tuple), OODBs (Object/Relationship), or whatever. The situation
is analogous to a UI where one uses Window/Control abstractions for a
GUI and Page/Section abstractions for a browser. In addition, one needs
to optimize differently for each paradigm and appropriate abstractions
facilitate that.

However, within the context of a particular persistence paradigm, one
should be able to provide subsystem reuse across all applications that
happen to use that particular persistence paradigm. OTOH,...

That's fine too. Note that the DB Access subsystem is free to use such
generic mechanisms. That would make the DB Access subsystem even more
reusable -- so long as one doesn't run into a situation where CORBA
isn't available. B-) The advantage still remains that the subsystem
interface is defined in terms of the particular application solution's
needs and doesn't even have this level of dependence on the persistence
access mechanisms.

> However, the developer who is working on the implementation side
> (unfortunately and confusingly called a server in corba terms) which
> may be located in the UK, gets to see the detail behind the scenes.
> Now they may be using a persistance storage engine, it may be a
> relational database, or, if a telco solution, it actually may be an
> interface to a real time billing system and may not have a database at
> all (thats in this context).
>
> Now from the client side point of view, they may be just trying to get
> a list of customers who belong to a business entity, display the list
> on the screen, then allow the CSR to see that customers real time
> billing detail. The programmer may figure that the customer data has
> come from a local database, but might not be aware that the $59.50c
> that the customer currently owes is produced by a real time billing
> sytem over in Hong Kong (ok, I know it will be slow - this is just an
> example).
>
> From the server side, its time to sweat the detail - this is where the
> plug-innable architecture works for the server side stuff. What brand
> of RDB/ODB cluster is used and how its used, or if another system is
> being interconnected to (and requires an interface in SOA terms). It
> doesnt even have to be an OO system at the back end. Procedural or
> RPC functions may be good enough.

But that performance hit is important, especially on the server side.
There is no free lunch and the more one abstracts things the more
overhead one pays for dealing with all possible contexts. That's why
full code generators for OOA models usually produce C code rather than
OOPL code and why a simple message-based interface will run rings around
the sort of remote object access one can get with interoperability
infrastructures like CORBA. In the end one needs to make informed
trade-offs between developer convenience and overhead. Often Moore's
Law pushes those trade-offs towards the convenience side, but not always.

> To answer the orginal question, to me, working with an OO database is
> like working as the client programmer in the example above. Using a
> relational database to me is like being the server side programmer in
> the above example. Two conceptually different ways of thinking about
> the same problem - neither is wrong, but each has its time and place.
>
> Its for that reason I'm not keen on really going down the OODB is
> better than RDB etc(although I do like that debate for fun). To be
> honest, its important to select the correct technology for the problem
> being solved and to do that, one has to be aware of the strengths and
> weaknesses of each - perhaps by doing some product evaluation first or
> by doing some small pilot projects, or even using a mix of both. I
> would suggest bias if one is going down the x is better than y path,
> and that isnt all thats best for the customer.

I think RDBs and OODBs have different niches. One should use the tool
that is best suited to the application's needs. However,...

> As to why these technologies are not as pervasive as they perhaps
> should be. I would suggest that it would be more to do with the
> number of people who develop large scale customer care and billing
> systems vs the number of people who develop single user applications
> that run on PCs. There isnt much of a demand for enterprise level
> architecture on the desktop so the latter group of developers are not
> really going to generate that much demand as they would for the more
> simple to understand technologies.

I agree that RDBs dominate because of the nature of the IT market. The
same data is commonly used in multiple problem contexts, which makes
RDBs a good choice since they shine for ad hoc queries that are
independent of any particular problem context.

sjde...@yahoo.com

unread,

Nov 4, 2006, 12:56:41 PM11/4/06

to

fre...@gmail.com wrote:
> > > Embed all the SQL in java code, because it is the most simple,
> > > readable and easily maintainable way to program database application.
> >
> > This doesn't make sense to me. If you're going to take the time to
> > force all your database calls through one language, you're better off
> > just doing an full mandatory database layer from the start without
> > embedded SQL calls.
>
> This doesn't make sense to me. What different languages are you
> talkning about. Who is talking time to forcing anything.

I'm talking about whatever language(s) your application is written in.

I think there was some confusion here. If you substitute "application
code" for "java code" in your statement above then I'm in agreement
with you.

I have, however, worked places that mandated all database access go
through Java code (even access from non-Java programs); while there are
sometimes reasons for such a stipulation, I certainly wouldn't say it's
the easiest, most readable, most maintainable way of doing things in
general.

Matt McGill

unread,

Nov 4, 2006, 1:59:10 PM11/4/06

to

fre...@gmail.com wrote:
> In that case, calling object.setSomething(data) is also about
> "storing". Actually a program is all about storing, because you store
> data in different variables all the time. In other words, "store" is
> not a very useful word in this context. I didn't introduce the word
> "store" in this thread.

That's kind of what I had in mind - I meant 'storage mechanism' to mean
something which can be used to selectively obtain references to objects
when they need to be used, and to which objects can be tucked away
until they're needed again. An ORM framework sitting on top of an RDBMS
often does that job. But you're right on both counts - you didn't use
the term initially, and it probably wasn't a good choice on my part.

I haven't, in general, done such a hot job of communicating clearly in
this thread, have I? Communicating well is a skill, and I'm still
learning it. Thanks for your patience.

Just so I have things straight in the future: when you're talking about
persistence, you're talking about the means by which data is moved to a
persistent storage medium, like a hard disk?

I was talking about a way of treating object instances as if they exist
independent of any particular running process. So they would be
'persistent' in the sense that they stick around (conceptually, at
least) until they are explicitly destroyed (and terminating the process
which created them does not count as explicit destruction).

> Everybody with a solid background using RDBMS knows that a RDBMS is
> about much more than persistence, and would still use a RDBMS even if
> persistence is not needed. Many people from OO-land implements a lot of
> data management features by them self in every application, instead of
> using the features already provided by the RDBMS, and uses the RDBMS
> only for persistence. If you want to, I can give you real-world
> examples with the various downsides with this approach.

I would certainly agree that re-implementing RDBMS features in
application-level code would be A Bad Thing. But it seems to me that if
those of us in OO-land can avoid making that mistake and instead rely
on those features (here I'm thinking particularly of ACID transactions
and query capabilities), we can achieve useful results.

In the interest of identifying things which OO people should /not/ do,
can you post what you would consider to be a particularly pathological
example of inappropriately re-implementing the data management features
an RDBMS provides for free?

-Matt McGill

topmind

unread,

Nov 4, 2006, 11:51:19 PM11/4/06

to

Robert Martin wrote:
> On 2006-11-01 19:52:37 -0600, ajsp...@gmail.com said:
>
> > My question relates to good design when it comes to creating an OO
> > application that uses a relational database to store most of its data,
> > how do people go about designing and storing the data once it has been
> > retrieved from the database? Do you keep it as records, or when you get
> > the data, do you create collections of objects which represent the data
> > which you have obtained from your database?
> >
> > I am wanting to create an application that is for time management and I
> > want most of the data to be on my webservers database, but in creating
> > my OO 'client' I really want to create a good design.
> >
> > Anthony
>
> The answer depends a great deal on the application itself. There is
> nothing intrinsically wrong with dealing directly with table data;
> indeed, that is often the simplest solution. On the other hand,
> complex applications often need to have complex behavior associated
> with the data, and so objects become important. The objects are
> sometimes related to the relational tables, but often they are not.

This is an OOP myth. Often much or most of the behavior CAN be
converted into a declarative form (data).

For an extreme example, a brain could be more or less modeled with a
schema such as:

table: Links
=================
sourceNode_ID
destinationNode_ID
weight // weighting factor, can be negative in some models

table: Node
===============
node_ID
activationFuncIndicator // see note
activationWeight // the "volume" given to activation function

There are about 5 activation functions in common use: unit_step,
sigmoid, piecewise_linear, gaussian, and identity. (I haven't reviewed
my schema model closely, so buyer beware. This model allows "Y splits",
which real neurons don't directly allow IIRC, but can be modeled with
explicit neurons such that they are still interchangable.)

>
> Keep this in mind. Relational tables contain data in a behavior
> independent way. Objects express behavior in a data independent way.
> This is not to say that there isn't behavior implied by the relational
> tables, or data implied by the objects. Rather those implications are
> kept abstract. The behavior implied by the relational tables could be
> written in any language an in any form. The data implied by the
> objects could be expressed in any of a menagerie of different forms.
>
> So, here's the trick. Start out simple and just use table data.
> Create objects only when you need to. For example, you may need an
> object to help you isolate your application to make it testable. Or
> you may need an object to make your application more generic.
> --
> Robert C. Martin (Uncle Bob) | email: uncl...@objectmentor.com

-T-

fre...@gmail.com

unread,

Nov 5, 2006, 1:44:23 AM11/5/06

to

> Just so I have things straight in the future: when you're talking about
> persistence, you're talking about the means by which data is moved to a
> persistent storage medium, like a hard disk?

Yes.

> I was talking about a way of treating object instances as if they exist
> independent of any particular running process. So they would be
> 'persistent' in the sense that they stick around (conceptually, at
> least) until they are explicitly destroyed (and terminating the process
> which created them does not count as explicit destruction).

The most common (or maybe the only) way to archieve that is to move the
data to a persistence storage medium.

> > Everybody with a solid background using RDBMS knows that a RDBMS is
> > about much more than persistence, and would still use a RDBMS even if
> > persistence is not needed. Many people from OO-land implements a lot of
> > data management features by them self in every application, instead of
> > using the features already provided by the RDBMS, and uses the RDBMS
> > only for persistence. If you want to, I can give you real-world
> > examples with the various downsides with this approach.
>
> I would certainly agree that re-implementing RDBMS features in
> application-level code would be A Bad Thing. But it seems to me that if
> those of us in OO-land can avoid making that mistake and instead rely
> on those features (here I'm thinking particularly of ACID transactions
> and query capabilities), we can achieve useful results.
>
> In the interest of identifying things which OO people should /not/ do,
> can you post what you would consider to be a particularly pathological
> example of inappropriately re-implementing the data management features
> an RDBMS provides for free?

Queries (or predicate logic) is the first obvious one. When OO people
separates SQL statements into a dedicated layer, they also try to limit
the number of distinct SQL statements because the burden or modifying
interfaces for every new statement. The consequence is that rather
simple select statements are used and additional filtering is done in
the application. This will hit performance and reduce maintainability
of the application.

Caching is the second issue. Because OO people want to play with an
object graph instead of predicate logic, they need the graph or parts
of it to be virtually in memory all the time. This will very quickly
lead to huge RAM consumtion, unless you use caching in your
application. The DBMS already do caching for you, which synchronizes
the cach with transactions and handles all concurrency issues. But if
you try to do application caching, the realibility of the cached data
will be rather low.

Transactions and concurrency is the thirst issue. Because OO
applications like to have state that are shared between different
threads (client calls), you end up with having to solve concurrency
issues in the application. In Java it is done using "synchronized". As
soon as you are locking resources, you have the risk of deadlock. But a
RDBMS is much better detecting deadlocks, than for example the JVM.
Emulating rollback in your application is also a very tricky task.

There are also a common miconception that databases only should be used
for "permanent" data. Other data should be handled using low-level
collection features in the applications. But temporary tables are a
very useful if you need features like sorting and searching, but don't
need persistence.

Views are also very underused. As soon as you have a indentical select
statement that are called from multiple points in your application, a
view should be created. A view can also contain a considerable about of
business logic that can be reused in an effecient way, and accross
different programming languages.

The main cause for all these problems is the fact that OO people like
to use objects as data structures and creating a domain model.
According to Ted Codd and Chris Date, the table (relation) is the only
(high-level) data structure. Using other data structures will cause an
impedance mismatch. But objects are still very useful for other
purposes. As a matter of fact, the relational model needs
classes/objects for defining data types others but the existing onces
like strings and dates.

Fredrik Bertilsson
http://frebe.php0h.com

AndyW

unread,

Nov 5, 2006, 6:36:43 AM11/5/06

to

On Sat, 4 Nov 2006 15:42:35 +0100, "Dmitry A. Kazakov"
<mai...@dmitry-kazakov.de> wrote:

>On 3 Nov 2006 14:14:29 -0800, aloha.kakuikanu wrote:
>
>> Matt McGill wrote:
>>> Evidently
>>> 'persistence' evokes very different concepts in my mind and in yours,
>>> perhaps as a result of different development backgrounds? Anyway, I'm
>>> going to stop cluttering this thread.
>>
>> Persistence is a significant idea from a programmer perspective who is
>> unfamiliar with database management fundamentals. "You can save a
>> subgraph of your program's object spaghetty into a file and it can
>> outlive the program!" Wow, what a big deal.
>>
>> In relational world you deal with logical concepts, and lifetime
>> doesn't apply to logical entities.
>
>This is of course wrong. Persistence addresses not time, but scope. If the
>scope is bound to time, that makes the thing "real-time," which alone is
>unrelated to persistence.

Not sure where you get your definition from, but persistance refers
something over time. An item is transient if it exists in a shorter
timeframe that its creator and persistent if it lasts longer.

You may need to explain what you mean by the word 'scope'.

Dmitry A. Kazakov

unread,

Nov 5, 2006, 9:15:24 AM11/5/06

to

Scope = frame, {} brackets in C. It is not necessarily a time frame. When X
is persistent relatively to the program A, that merely means that the scope
where X exists exceeds one of A. It does not mean that X exists out of any
scope. There must always be a larger scope of some other program B, where X
does exist. After all, X is a program artefact, it can't live in vacuum. In
the case of a database, B could be DBMS. Within the scope of B X is no more
persistent. Obviously, persistence is relative to the scope of the
beholder. There is no absolutely persistent thing.

Now time is an orthogonal issue. You can associate some execution time with
scopes, but it would be an artificial association. Because the same program
might run on different computers in different times and places. Time and
space are abstracted away. This is not the case for real-time programs,
which are called *real* time for exactly that matter.

------
There are two logical errors made in both camps. One is to bind things to
absolute time. Another is to claim that things themselves are absolute. In
reality contradictory things both co- and not exist. There is no
distinguished "set of facts" one could put into a DB and then deduce
everything else...

Matt McGill

unread,

Nov 5, 2006, 11:23:24 PM11/5/06

to

I'll preface my replies by saying that almost all of my experience with
persistent objects has involved Hibernate, one of a number of ORM
frameworks out there for Java (and now .NET, I believe) programmers to
use. My aim here is not so much to focus on the strengths and
weaknesses of various ORM implementations (I think Toplink was already
mentioned, in a none-to-appreciative way), but rather to explore how
well an ORM framework, when used properly, might be able to address the
examples you've brought up. Naturally, /mis/using an ORM framework, or
using it naively, will not end well.

> Queries (or predicate logic) is the first obvious one. When OO people
> separates SQL statements into a dedicated layer, they also try to limit
> the number of distinct SQL statements because the burden or modifying
> interfaces for every new statement. The consequence is that rather
> simple select statements are used and additional filtering is done in
> the application. This will hit performance and reduce maintainability
> of the application.

So the problem here would be that SQL statements are being seperated
out into a layer or particular module, and that the task of maintaining
the interface for that layer or module causes OO programmers to
artificially limit themselves to simple queries. Being thus limitted,
they start performing join and selections in-memory on the query
results, when more complicated queries could have done those things
much more efficiently. Correct?

I can see how maintaining such a data access layer by hand might
predispose OO developers to use SQL in a naive way, or implement
functionality in-memory that a query could have given for free (doing
selections or joins this way could end up being particularly
disasterous). But when making use of an ORM framework, you don't need a
hand-created data access layer at all. Moreover, you get (at least with
Hibernate) an object-aware query language which, while limitting in
some ways (you can only join on mapped relationships, for example), is
actually more flexible than straight SQL in other ways (polymorphic
queries).

I'm not trying to claim HQL (Hibernate's query language) is the
greatest thing since sliced bread - it has to be used with care. But
when it /is/ used correctly I think there are some benefits over SQL. I
hope to post some code examples that I think can illustrate these
benefits as soon as I get a chance.

> Caching is the second issue. Because OO people want to play with an
> object graph instead of predicate logic, they need the graph or parts
> of it to be virtually in memory all the time. This will very quickly
> lead to huge RAM consumtion, unless you use caching in your
> application. The DBMS already do caching for you, which synchronizes
> the cach with transactions and handles all concurrency issues. But if
> you try to do application caching, the realibility of the cached data
> will be rather low.

I'm not sure I follow. My experience has been with multi-user web
applications, so the goal is typically to pull only those parts of the
object graph into memory that are actually necessary, and to keep them
there only during the processing of a request. If you let the instances
(which are /not/ shared between threads, as you indicate below) hang
around in memory for long periods of time, the data goes stale.

My understanding is that for many (most?) database-backed applications,
the database resides on a different server than the client of the
database (whether that client is a web/application server, or
individual client machines). The database's caching helps to mitigate
disk access times, but there's still the network overhead of
transferring the data to the client of the database. Unless I'm missing
something, you've got that problem to some extent regardless of whether
you're using OO/ORM on top of an RDBMS, or straight SQL. Sure, if most
of your business logic is implemented in stored procedures on the
database, then the only data going over the wire is the data to
display. But for the applications I've worked on thus far most of the
data requested ends up being displayed in some form.

Hibernate, on the other hand, has some sophisticated caching mechanisms
built in. The actual cache functionality is supplied by third-party
cache libraries, and you can take your pick (there are a bunch). I
agree that rolling your own would be a dumb idea, but there's no reason
to.

Hibernate's cache obviously can't always be made use of. Rich clients
all connecting to a shared database would not be able to make use of
the cache, for obvious reasons. But in the event that an application
server is hosting a multi-user application, and that application is the
only one which writes to a particular set of tables, Hibernate's cache
is simply a tool which can give significant performance improvement in
the form of saved round-trips to the database, and which is not to my
knowledge readily available to those using a SQL-only approach. Memory
consumption is obviously a concern, but the cache can be configured on
a per-object basis, with maximum sizes, object counts, and expiration
times. There is even support for caching the results of queries (this
only makes sense in a small set of situations, but for those situations
it's a huge help).

> Transactions and concurrency is the thirst issue. Because OO
> applications like to have state that are shared between different
> threads (client calls), you end up with having to solve concurrency
> issues in the application. In Java it is done using "synchronized". As
> soon as you are locking resources, you have the risk of deadlock. But a
> RDBMS is much better detecting deadlocks, than for example the JVM.
> Emulating rollback in your application is also a very tricky task.

ACID transactions are my favorite RDBMS function, for this very reason
=) I'd prefer not to think about the complexity that rolling my own
transaction support would add to even the simplest of situations, but
this is not necessary. The transaction support provided by the
underlying RDBMS is more than sufficient.

Even better, transaction demarcation can be done declaratively rather
than programmatically, with a little AOP. If I want some class's
saveChanges() method (which will modify 'persistent' objects) to be
transactional, I don't even have to begin/commit the transaction in the
code, and remember to catch exceptions and roll back. Frameworks like
Spring provide helpful proxy classes which can be configured to
intercept method calls and do such things for you. You actually get
much more than this - you can state that certain methods may only be
called /within existing transactions/ for example, or vica versa.

I disagree that OO applications like to share state between threads.
Hibernate is designed specifically /not/ to do this (a Hibernate
Session is not a thread-safe object - each thread creates its own
Session, those Sessions are independent, and they would return
different instances when queried for the same sets of objects). Do you
have a specific example that led you to conclude this?

> There are also a common miconception that databases only should be used
> for "permanent" data. Other data should be handled using low-level
> collection features in the applications. But temporary tables are a
> very useful if you need features like sorting and searching, but don't
> need persistence.

This is just a matter of where your code lies and what data structures
are available to you. If your code is in stored procedures sitting on
the DB, you will naturally use temp tables for collections of data. If
your code is running on an application server and your collection of
data is sitting on the application server already (because a user put
it there, for example), you are naturally going to use whatever data
structure best suits your purposes for sorting or searching in whatever
programming language you've chosen. If the data came from a query, why
didn't you order the query in the first place?

> Views are also very underused. As soon as you have a indentical select
> statement that are called from multiple points in your application, a
> view should be created. A view can also contain a considerable about of
> business logic that can be reused in an effecient way, and accross
> different programming languages.

You've got me here, I'm guilty on this one. I'm sure I could be taking
advantage of views, and our applications typically are not at this
point. Plenty of room for improvement here.

A view every time a SQL statement is repeated more than once, though? I
understand the bit about encapsulating certain types of business logic,
but I don't see why a view should immediately be created over a table
because the same set of fields are selected in two places. Wouldn't
that instead be an indicator that you need to eliminate the redundancy
in your code by introducing a function/method/class/whatever which can
be called from both places? Can you give specific examples to
illustrate?

> The main cause for all these problems is the fact that OO people like
> to use objects as data structures and creating a domain model.
> According to Ted Codd and Chris Date, the table (relation) is the only
> (high-level) data structure. Using other data structures will cause an
> impedance mismatch. But objects are still very useful for other
> purposes. As a matter of fact, the relational model needs
> classes/objects for defining data types others but the existing onces
> like strings and dates.

Objects are not data structures in that sense of the word. They contain
data structures, and associate behavior with them. They also have
relationships with other objects. As H. S. Lahman has helped me to see,
the difference between containing a data structure and having a
relationship is not always immediately obvious due to the nature of
today's 3GL OO languages. The impedance mismatch is a result of trying
to map object relationships which which do not conform to the
relational model onto the relational model anyway.

I'm having a hard time with the 'table is the only high-level data
structure' statement. Do you recall in which paper(s) Codd and Date put
forward that view? (I'm not insinuating they didn't, but reading their
statements in the context of the rest of their work might help me out
here - I know who they are, but have not read any of their
publications).

Again, thanks for bearing with me until I sorted my terminology out,
this is a much more interesting line of thought now =) Examples would
be helpful on both sides of the fence for all of the potential pitfalls
you have highlighted. I'll see what I can come up with, so others can
poke holes.

-Matt McGill

Matt McGill

unread,

Nov 5, 2006, 11:40:05 PM11/5/06

to

All you've done is represent the relationships between the neurons and
a couple weights. Assuming you had inserted the data for every neuron
in your brian, and each interconnection, into your tables. Would the
RDBMS suddenly start establishing network connections and posting
inflamatory comments to usenet? Or would it just sit there, waiting to
be queried?

The data about connections and weights doesn't represent behavior, it
assumes its existence elsewhere. If you don't write some code to
activate some neurons, run queries to find connected neurons, apply the
activation functions (which would themselves need to be implemented in
code), and repeat, then nothing happens. You've not made any behavior
declarative. In fact, you've effectively provided an example for
Martin's argument - the logic which would be necessary to drive a brain
simulation off of the data in these tables could be encapsulated in
Neuron and NeuronLink objects, which get their data (weights and
function indicator) from the tables and then do the right things with
them. Those objects could then be used as part of a larger simulation
involving sensory organs, nerves, muscle tissue, etc. You don't /have/
to use objects naturally. You could use structured programming just as
well.

-Matt McGill

topmind

unread,

Nov 6, 2006, 12:43:37 PM11/6/06

to

You are correct that the RDBMS (or DBMS) would not "run" actual
progression of activity. A seperate "propogation engine" would be
needed. But that is generally treated as an implimentation detail
seperate from the data modeling and/or data filling (configuration)
step. This allows for a nice *seperation of concerns*.

A more practical example may be a GUI and GUI event engine. The app
developer does not have to concern themselves with the implementation
details of GUI event engine; they just set the appropriate attributes
(asuming the IDE does not do it). One can do this with OO also, but it
is not encouraged and gets confusing because the separation of data and
behavior is often fuzzy, and often turns OOP into a roll-your-own
database-like thing anyhow (a navigational one) where one has to invent
query languages etc. to inspect stuff.

I liken attribute-driven programming to a player-piano roll: punching
the holes in the roll for a particular song and implementing the roll
reader and player system are seperate concerns. It allows one to
mentally split the problem into fairly clean partitions.

The one filling in the brain info above for a particular personality
does not have to concern themselves with implementing neurons. And, I
would hate to have to track all that stuff without a DBMS of some sort
(relational or otherwise).

There are standardized, well-understood tools to manage attributes and
collections, but the same is NOT true of behavior. This is probably the
biggest flaw of OOP in my opinion. In order to mix behavior and
attributes (or hide the difference), OO has to take itself down to the
lowest-common-denominator between behavior and data handling.
Attribute-centric programming, on the othe hand, converts much more of
the app into attributes so that off-the-shelf attribute management
systems (DBMS) can be used for a larger part, saving a lot of
wheel-reinventing.

This also provides consistency over roll-your-own found in OO classes.
Collection-handling and attribute handling are largely built-in to a
DBMS such that one does not have to impliment getX, deleteX, findX,
sortX, sumX, saveX, findXwhereYisPinkOnEvenSundays, etc. for each and
every entity. This comes out-of-the-box. OO does not factor
commonly-needed collection and attribute handling issues properly:
every shop reinvents the wheel, and differently. OO results in messy
shanty-town designs with cables, wires, and corridors going every which
way, depending on the mood of the shanty'd family.

Centralized governed city planning may be boring, but it works!

(I agree that existing DBMS could use improvements. I wish there were
more experiments in that area. But even with their flaws, they usually
beat OO.)

>
> The data about connections and weights doesn't represent behavior, it
> assumes its existence elsewhere. If you don't write some code to
> activate some neurons, run queries to find connected neurons, apply the
> activation functions (which would themselves need to be implemented in
> code), and repeat, then nothing happens. You've not made any behavior
> declarative. In fact, you've effectively provided an example for
> Martin's argument - the logic which would be necessary to drive a brain
> simulation off of the data in these tables could be encapsulated in
> Neuron and NeuronLink objects, which get their data (weights and
> function indicator) from the tables and then do the right things with
> them. Those objects could then be used as part of a larger simulation
> involving sensory organs, nerves, muscle tissue, etc. You don't /have/
> to use objects naturally. You could use structured programming just as
> well.
>
> -Matt McGill

-T-

Matt McGill

unread,

Nov 6, 2006, 4:58:01 PM11/6/06

to

topmind wrote:
> You are correct that the RDBMS (or DBMS) would not "run" actual
> progression of activity. A seperate "propogation engine" would be
> needed. But that is generally treated as an implimentation detail
> seperate from the data modeling and/or data filling (configuration)
> step. This allows for a nice *seperation of concerns*.

You, sir, are a genious. OO is obviously ridiculous because it entails
messy details like 'behavior,' which can be really hard to express
clearly. If those silly OO guys would just stick to relational models,
things would be so much simpler! Thank you for making this clear.

The future of software development is handing some intern an enormous
ERD and saying: "Take care of those implementation details for me. And
make me some coffee."

-Matt McGill

Robert Martin

unread,

Nov 6, 2006, 6:49:39 PM11/6/06

to

On 2006-11-06 15:58:01 -0600, "Matt McGill" <matt....@gmail.com> said:

> The future of software development is handing some intern an enormous
> ERD and saying: "Take care of those implementation details for me. And
> make me some coffee."

Cream and sugar? Or just black?

--
Robert C. Martin (Uncle Bob) | email: uncl...@objectmentor.com

Patrick May

unread,

Nov 6, 2006, 9:30:41 PM11/6/06

to

Robert Martin <uncl...@objectmentor.com> writes:
> On 2006-11-06 15:58:01 -0600, "Matt McGill" <matt....@gmail.com> said:
> > The future of software development is handing some intern an
> > enormous ERD and saying: "Take care of those implementation
> > details for me. And make me some coffee."
>
> Cream and sugar? Or just black?

I like my interns the way I like my coffee. Ground up and in the
freezer.

Regards,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | Large scale, mission-critical, distributed OO
| systems design and implementation.
p...@spe.com | (C++, Java, Common Lisp, Jini, middleware, SOA)

fre...@gmail.com

unread,

Nov 7, 2006, 12:14:16 AM11/7/06

to

> > Queries (or predicate logic) is the first obvious one. When OO people
> > separates SQL statements into a dedicated layer, they also try to limit
> > the number of distinct SQL statements because the burden or modifying
> > interfaces for every new statement. The consequence is that rather
> > simple select statements are used and additional filtering is done in
> > the application. This will hit performance and reduce maintainability
> > of the application.
>
> So the problem here would be that SQL statements are being seperated
> out into a layer or particular module, and that the task of maintaining
> the interface for that layer or module causes OO programmers to
> artificially limit themselves to simple queries. Being thus limitted,
> they start performing join and selections in-memory on the query
> results, when more complicated queries could have done those things
> much more efficiently. Correct?

Yes.

> I can see how maintaining such a data access layer by hand might
> predispose OO developers to use SQL in a naive way, or implement
> functionality in-memory that a query could have given for free (doing
> selections or joins this way could end up being particularly
> disasterous). But when making use of an ORM framework, you don't need a
> hand-created data access layer at all. Moreover, you get (at least with
> Hibernate) an object-aware query language which, while limitting in
> some ways (you can only join on mapped relationships, for example), is
> actually more flexible than straight SQL in other ways (polymorphic
> queries).

What is the difference in using HQL strings instead of SQL strings?
While I am not an expert in HQL I think most people agree that HQL is
more limited than SQL. All modifications in the HQL language seem to
make it more like SQL. Besides SQL has a rather high degree of
standardization. The level of standardization for HQL is zero. You are
entirely limited to one product.

> I'm not trying to claim HQL (Hibernate's query language) is the
> greatest thing since sliced bread - it has to be used with care. But
> when it /is/ used correctly I think there are some benefits over SQL.

SQL is not the best possible relational language. I am hoping that some
day database vendors will agree upon a more modern language. The
strange thing is that HQL copies all the bad syntax from SQL. Why don't
the Hibernate guys make something new and modern instead? But my main
objection against HQL is that it is not a relational language. Instead
it is a strage mix between a network database language and a relational
language.

> > Caching is the second issue. Because OO people want to play with an
> > object graph instead of predicate logic, they need the graph or parts
> > of it to be virtually in memory all the time. This will very quickly
> > lead to huge RAM consumtion, unless you use caching in your
> > application. The DBMS already do caching for you, which synchronizes
> > the cach with transactions and handles all concurrency issues. But if
> > you try to do application caching, the realibility of the cached data
> > will be rather low.
>
> I'm not sure I follow. My experience has been with multi-user web
> applications, so the goal is typically to pull only those parts of the
> object graph into memory that are actually necessary, and to keep them
> there only during the processing of a request.

That is a good design rule. But I have seen examples of the opposite
too, motivated by that it is "more OO".

> If you let the instances
> (which are /not/ shared between threads, as you indicate below) hang
> around in memory for long periods of time, the data goes stale.

That is the problem, but that doesn't stop OO people from doing it.

> My understanding is that for many (most?) database-backed applications,
> the database resides on a different server than the client of the
> database (whether that client is a web/application server, or
> individual client machines). The database's caching helps to mitigate
> disk access times, but there's still the network overhead of
> transferring the data to the client of the database.

If the database and application/web server resides on different boxes,
it is very important with a fast network connection between.
Application caching has so many disadvantages that this has to be
avoided. I can't see any real reasons why the application and database
server should be connected by a slow network, normally the boxes are
standing next to each other.

My recommendation is to have the DBMS and application server in the
same box or even in the same process.

> Hibernate, on the other hand, has some sophisticated caching mechanisms
> built in. The actual cache functionality is supplied by third-party
> cache libraries, and you can take your pick (there are a bunch). I
> agree that rolling your own would be a dumb idea, but there's no reason
> to.

The RDBMS already have sophisticated caching mechanisms built in.
Hibernate can not beat it.

> Even better, transaction demarcation can be done declaratively rather
> than programmatically, with a little AOP. If I want some class's
> saveChanges() method (which will modify 'persistent' objects) to be
> transactional, I don't even have to begin/commit the transaction in the
> code, and remember to catch exceptions and roll back. Frameworks like
> Spring provide helpful proxy classes which can be configured to
> intercept method calls and do such things for you. You actually get
> much more than this - you can state that certain methods may only be
> called /within existing transactions/ for example, or vica versa.

I also use implicit transactions. My SQL-based approach does not stop
you from using spring beans at all. (Another solution is to put
transaction start and ending in a web filter.)

> I disagree that OO applications like to share state between threads.
> Hibernate is designed specifically /not/ to do this (a Hibernate
> Session is not a thread-safe object - each thread creates its own
> Session, those Sessions are independent, and they would return
> different instances when queried for the same sets of objects). Do you
> have a specific example that led you to conclude this?

As soon as you use caching, you share state between threads.

> A view every time a SQL statement is repeated more than once, though? I
> understand the bit about encapsulating certain types of business logic,
> but I don't see why a view should immediately be created over a table
> because the same set of fields are selected in two places. Wouldn't
> that instead be an indicator that you need to eliminate the redundancy
> in your code by introducing a function/method/class/whatever which can
> be called from both places? Can you give specific examples to
> illustrate?

In some cases there are probably an indicator of redundancy, in the
same was as a DAO method used from multiple points is an indicator of
reduncancy in the calling code. And I don't suggest defining view for
trivial select statements. But there are still plenty of scenarios
there views are a suitable solution.

> > The main cause for all these problems is the fact that OO people like
> > to use objects as data structures and creating a domain model.
> > According to Ted Codd and Chris Date, the table (relation) is the only
> > (high-level) data structure. Using other data structures will cause an
> > impedance mismatch. But objects are still very useful for other
> > purposes. As a matter of fact, the relational model needs
> > classes/objects for defining data types others but the existing onces
> > like strings and dates.
>
> Objects are not data structures in that sense of the word. They contain
> data structures, and associate behavior with them. They also have
> relationships with other objects. As H. S. Lahman has helped me to see,
> the difference between containing a data structure and having a
> relationship is not always immediately obvious due to the nature of
> today's 3GL OO languages. The impedance mismatch is a result of trying
> to map object relationships which which do not conform to the
> relational model onto the relational model anyway.

Objects relationships are pointers. The purpose with the relational
model is to avoid the use of pointers.

> I'm having a hard time with the 'table is the only high-level data
> structure' statement. Do you recall in which paper(s) Codd and Date put
> forward that view? (I'm not insinuating they didn't, but reading their
> statements in the context of the rest of their work might help me out
> here - I know who they are, but have not read any of their
> publications).

http://www.dcs.warwick.ac.uk/~hugh/TTM/TTM-TheAskewWall-printable.pdf
http://www.dbmsmag.com/int9410.html

You can start with reading these documents. Here you get a pretty good
motivation why objects should not be mapped to classes, but can
successfully be mapped to data types. The find the exacty quote, I am
afraid that you have to buy the book "The third manifesto",
http://www.aw-bc.com/catalog/academic/product/0,1144,0321399420,00.html.
At www.thethirdmanifesto.com you could also find other material for
deeper understanding of the relational model. At www.dbdebunk.com you
can also find useful reading about the relational model.

Fredrik Bertilsson

topmind

unread,

Nov 7, 2006, 1:34:15 AM11/7/06

to

Matt McGill wrote:
> topmind wrote:
> > You are correct that the RDBMS (or DBMS) would not "run" actual
> > progression of activity. A seperate "propogation engine" would be
> > needed. But that is generally treated as an implimentation detail
> > seperate from the data modeling and/or data filling (configuration)
> > step. This allows for a nice *seperation of concerns*.
>
> You, sir, are a genious. OO is obviously ridiculous because it entails
> messy details like 'behavior,' which can be really hard to express
> clearly.

Two-thirds right. The trick is to minimize behavior by using heavier
declarative approaches so that we can use declarative engines, which
are more evolved than "behavioral engines" at this stage in history.
Maybe "behavior maths" will somebody be invented, but until then,
relational has a leg up.

Behavior and data are pretty much the same thing, just different views
of the same thing. To a compiler/interpreter, machine language is data,
for example. Building attribute-centric applications in many ways is
like building a domain-specific interpreter.

> If those silly OO guys would just stick to relational models,
> things would be so much simpler! Thank you for making this clear.
>
> The future of software development is handing some intern an enormous
> ERD and saying: "Take care of those implementation details for me. And
> make me some coffee."

I'll take an "enormous ERD" over an enormous class diagram any day.
Relational is far more consistent than OO connections. A relational
mess is a 10-fold improvement over the equivalent navigational-slash-OO
mess. Dr. Codd was an IT genious who slayed the navigational dragons of
the 60's. But the OO Smeagols tried to bring navies back because
building shanty town random-class apps made them feel empowered to
create messes and gain job protection instead of sticking with sound
relational modeling.

And relational does not rule out putting functions inside of table
cells. However, i've found that it does not add much to the mix except
in specific circumstances.

>
> -Matt McGill

-T-

topmind

unread,

Nov 7, 2006, 1:35:28 AM11/7/06

to

Patrick May wrote:
> Robert Martin <uncl...@objectmentor.com> writes:
> > On 2006-11-06 15:58:01 -0600, "Matt McGill" <matt....@gmail.com> said:
> > > The future of software development is handing some intern an
> > > enormous ERD and saying: "Take care of those implementation
> > > details for me. And make me some coffee."
> >
> > Cream and sugar? Or just black?
>
> I like my interns the way I like my coffee. Ground up and in the
> freezer.

Bill Clinton merged with Jeffery Dalmer?

Thomas Gagne

unread,

Nov 10, 2006, 9:53:04 AM11/10/06

to

H. S. Lahman wrote:
> <snip>

> Solving a time management problem is a quite different subject matter
> than persisting the relevant data. Solve the time management problem
> in you application first. Then, when you know what object attributes
> need to be persisted, define a suitable Data Model for the database.
> The RDB Data Model and the solution's Class Model will typically be
> different for non-CRUD/USER applications because they need to be
> optimized differently.

That's one way of doing it. My experience shows that 70-80% of a system
is queries. Whether inquiry transactions or reports, all the business
systems I've participated coding or designing spent little production
time changing data. In the online systems I'm familiar with, update and
change transactions were preceded and followed with queries. Long after
transactions are done posting, management looks at reports to see how
their business is doing. Given statistics like these it makes little
sense to design your application or OO model before designing your database.

Additionally, relational data models can be more easily proven
correct--or correct enough--before an investment is made in coding.

Lastly, your database is language-neutral. It shouldn't matter what
language the application sitting in front of the database is written in,
or even what paradigm it's born from. Flexibility starts with a good
database design and extends through the application--not the other way
around.

<http://bogs.in-streamco.com/anything.php>

Thomas Gagne

unread,

Nov 10, 2006, 10:09:46 AM11/10/06

to

AndyW wrote:
> <snip>
>
> The difference between what I think should be a true OO database and a
> PSE is that to me the former has no separation of state between what
> is executing and what is stored. There shouldnt be any concept of
> storage or transience or separation of application and data - just
> "IS-ness of the whole" as I call it.
>
"Just be the ball, be the ball, be the ball. You're not being the ball
Danny."

Pretending the database doesn't exist is a recipe for pain. Too many OO
programmers, enamored with the benefits of OO design inside an
application try imposing object graphs to the database thinking that if
they could only become one, their wonder twin powers would activate.

For OO beginners, the database is the elephant in the room. It's the
800 lb. gorilla they think can be negotiated with. What we forget is
the database lives independently of our application. Long after our
application has stopped running and users have logged off, the database
is still there and has other potential customers--including managers,
programmers, and everyone's favorite: auditors.

If the only way to make sense of your database is to look at through the
binoculars of an OO-designed application, then developers had better be
prepared to multiple any of their times estimates by 5 because most of
their time will be consumed writing reports. As I mentioned in an
earlier post, 70-80% of a system's utility is in its reports. People
are always looking for new ways to look at data, but rarely do they look
for new ways of posting or changing data.

--
Visit <http://blogs.instreamfinancial.com/anything.php>
to read my rants on technology and the finance industry.

Thomas Gagne

unread,

Nov 10, 2006, 10:23:32 AM11/10/06

to

fre...@gmail.com wrote:
> <snip>
>> The problem then becomes one of
>> instantiating relationships - if an object is persistent, it needs to
>> be loaded from the storage mechanism at some point. Who does that? Who
>> loads other persistent objects associated with the first? Where are the
>> transaction boundaries?
>>
>
> A RDBMS is not a storage mechanism. A storage mechnism is much simpler.
> Transactions and persistence are two orthogonal features.
>
Will you explain what you mean by "Transactions and peristence are two
orthogonal features?"

Thomas Gagne

unread,

Nov 10, 2006, 10:36:19 AM11/10/06

to

H. S. Lahman wrote:
> <snip>

> People tend to discount swapping paradigms as an advantage because
> RDBs have dominated for 2+ decades. However, I've personally watched
> the paradigm for persistence changing from paper tape/punched cards to
> flat sequential disk to ISAM to CODASYL to RDBs to OODBs (in at least
> some niches). Each shift required major surgery for huge amounts of
> legacy code because the persistence mechanisms were not encapsulated.
> And each paradigm shift was regarded as the last possible -- until
> some new technology appeared. So I wouldn't bet against another
> paradigm shift in the future just because RDBs have been around awhile.
Can you be more precise about what you mean when you say, "Each shift
required major surgery ... because the persistence mechanisms were not
encapsulated."

If we were to look back at those transitions with an eye towards
designing the original applications again so the surgery was more
cosmetic, how would you have implemented the application's database
interface differently and are you doing that today?

H. S. Lahman

unread,

Nov 10, 2006, 12:03:41 PM11/10/06

to

Responding to Gagne...

>> People tend to discount swapping paradigms as an advantage because
>> RDBs have dominated for 2+ decades. However, I've personally watched
>> the paradigm for persistence changing from paper tape/punched cards to
>> flat sequential disk to ISAM to CODASYL to RDBs to OODBs (in at least
>> some niches). Each shift required major surgery for huge amounts of
>> legacy code because the persistence mechanisms were not encapsulated.
>> And each paradigm shift was regarded as the last possible -- until
>> some new technology appeared. So I wouldn't bet against another
>> paradigm shift in the future just because RDBs have been around awhile.
>
> Can you be more precise about what you mean when you say, "Each shift
> required major surgery ... because the persistence mechanisms were not
> encapsulated."

At each shift there were huge amounts of legacy code around that used
the old paradigm for persistence and needed to be upgraded. Typically
the persistence mechanisms were not encapsulated in a single application
subsystem. Thus the direct reads and writes were sprinkled ubiquitously
throughout the code.

Worse, the application processing was often structured around the
preferred organization for the paradigm. So it was not simply a matter
of 1:1 statement replacement. Often one had to modify the basic flow of
control of the application. For example, the way one collects related
data from multiple ISAM files is quite different than the way one
employs SQL table joins.

>
> If we were to look back at those transitions with an eye towards
> designing the original applications again so the surgery was more
> cosmetic, how would you have implemented the application's database
> interface differently and are you doing that today?

Encapsulate the persistence mechanism behind a single subsystem
interface (an API in the Procedural Days). Design the subsystem
interface in terms of what the problem solution's needs for data are,
which will be independent of how the data is stored. Then let the
subsystem provide the mapping of that interface into the persistence
paradigm de jour.

Thus the application solution always requests, "Save this pile of data I
call X" and "Give me the pile of data I saved before as X." The
persistence access subsystem maps the X identity and the pile of data
into records in ISAM files, RDB tables, clay tablets, or whatever. Now
one can substitute the persistence paradigms by replacing one subsystem
implementation without touching either the interface or the problem
solution.

fre...@gmail.com

unread,

Nov 10, 2006, 12:33:37 PM11/10/06

to

> Worse, the application processing was often structured around the
> preferred organization for the paradigm.

How could we do it different know, if we don't know anything about the
future paradigm?

> Encapsulate the persistence mechanism behind a single subsystem
> interface (an API in the Procedural Days). Design the subsystem
> interface in terms of what the problem solution's needs for data are,
> which will be independent of how the data is stored.

What is your definition of "store"? Store to a persistent medium, store
into a variable, store into another process, or?

> Then let the
> subsystem provide the mapping of that interface into the persistence
> paradigm de jour.

If we use a RDBMS, the persistence part is already separated. The
application has no idea about if, when or how data is persisted.

> Thus the application solution always requests, "Save this pile of data I
> call X" and "Give me the pile of data I saved before as X." The
> persistence access subsystem maps the X identity and the pile of data
> into records in ISAM files, RDB tables, clay tablets, or whatever.

Is X always an identifier? Should you be allowed to use any predicate
logic in this interface ("give me the pile of data I saved before
having X=5 or Y=6")?

Fredrik Bertilsson

H. S. Lahman

unread,

Nov 10, 2006, 12:37:07 PM11/10/06

to

Responding to Gagne...

>> Solving a time management problem is a quite different subject matter
>> than persisting the relevant data. Solve the time management problem
>> in you application first. Then, when you know what object attributes
>> need to be persisted, define a suitable Data Model for the database.
>> The RDB Data Model and the solution's Class Model will typically be
>> different for non-CRUD/USER applications because they need to be
>> optimized differently.
>
> That's one way of doing it. My experience shows that 70-80% of a system
> is queries. Whether inquiry transactions or reports, all the business
> systems I've participated coding or designing spent little production
> time changing data. In the online systems I'm familiar with, update and
> change transactions were preceded and followed with queries. Long after
> transactions are done posting, management looks at reports to see how
> their business is doing. Given statistics like these it makes little
> sense to design your application or OO model before designing your
> database.

Note that I was careful to qualify with "non-CRUD/USER".

That sort of proportion is a symptom that the application is USER/CRUD
processing. The application is basically a pipeline between the
database and the UI and its main purpose in life is to convert between
the two views. For that sort of situation the RAD layered model
infrastructures already provide lots of automation so an OO approach
would probably be overkill by reinventing that wheel.

> Additionally, relational data models can be more easily proven
> correct--or correct enough--before an investment is made in coding.

I'm not sure I buy that. More easily than what? The RDM normalization
can be applied beyond the RDB's table/tuple paradigm. ISAM files,
CODASYL files, and other data representations can be normalized using
the same basic rules. And OO Class Models are routinely normalized as
part of the basic paradigm methodology.

However, I don't see that as being very relevant. My point is that the
application's problem solution doesn't care how the data is stored. If
it doesn't care how it is stored, it certainly doesn't care how the
storage mechanism is validated.

> Lastly, your database is language-neutral. It shouldn't matter what
> language the application sitting in front of the database is written in,
> or even what paradigm it's born from. Flexibility starts with a good
> database design and extends through the application--not the other way
> around.

That's true enough but I would make it even stronger. RDBs are designed
to be problem-independent, not just language independent, which is
pretty much my point.

The data structures one needs to optimize the solution to a /particular/
problem in an application are often quite different than the structures
best suited to optimizing generic, ad hoc access of the same data. So
if one is solving a non-CRUD/USER problem where special optimization is
usually required, one wants to separate the views of the solution from
those of the RDB.

In addition, the OO construction paradigm for solving individual
problems is quite different. For example, in an OO solution there is no
construct remotely similar to an RDB join. That's because OO
relationships are instantiated at the object level rather than the table
level, relationships are navigated differently, and the paradigm employs
peer-to-peer communications.

fre...@gmail.com

unread,

Nov 10, 2006, 2:12:08 PM11/10/06

to

> > Additionally, relational data models can be more easily proven
> > correct--or correct enough--before an investment is made in coding.
>
> I'm not sure I buy that. More easily than what? The RDM normalization
> can be applied beyond the RDB's table/tuple paradigm.

What is the "RDB's table/tuple paradigm"?

> And OO Class Models are routinely normalized as
> part of the basic paradigm methodology.

Many class diagrams would break 1NF. I also see a problem with applying
to 2 & 3 NF because the id of the object is not a value itself, but a
pointer. Because object may be easily cloned, I suppose that would
break 2NF.

> However, I don't see that as being very relevant. My point is that the
> application's problem solution doesn't care how the data is stored.

Neither do the relational model or SQL.

> If it doesn't care how it is stored, it certainly doesn't care how the
> storage mechanism is validated.

I guess Thomas is talking about how the business rules are validated.

> > Lastly, your database is language-neutral. It shouldn't matter what
> > language the application sitting in front of the database is written in,
> > or even what paradigm it's born from. Flexibility starts with a good
> > database design and extends through the application--not the other way
> > around.
>
> That's true enough but I would make it even stronger. RDBs are designed
> to be problem-independent, not just language independent, which is
> pretty much my point.

The relational model is used for modelling data, problem-independent or
not. Just because some data could be considered "problem-dependent", it
may very well me modelled using the RM.

> The data structures one needs to optimize the solution to a /particular/
> problem in an application are often quite different than the structures
> best suited to optimizing generic, ad hoc access of the same data.

In some special scenarios, B-trees might not be the best solution and
arrays or hashtables might be better choices. But I think that is
low-level optimization without significant impact on average enterprise
applications.

> So
> if one is solving a non-CRUD/USER problem where special optimization is
> usually required, one wants to separate the views of the solution from
> those of the RDB.

Using low-level collection classes is not a good idea for modern
enterprise applications. There are a lot of issues like concurrency or
transactions, that you have to solve by yourself in that case.

Fredrik Bertilsson

Matt McGill

unread,

Nov 10, 2006, 5:44:19 PM11/10/06

to

I think you might be misunderstanding. As you've already pointed out to
me, 'store' is a rather overloaded word, which can be applied at many
different conceptual levels. In this case, I think there are at least
three distinct conceptual levels:

1. What the data /is/

This would be the highest level of abstraction - information systems
deal with /information/, which is data interpreted in a particular
context. The application is typically dealing with things at this
level. IOW, the application is presenting the user with a timesheet
(information), as opposed to just a collection of numbers (data).

2. How the data is /represented/

We can represent time sheet data in a number of different ways - the
relational model is a particularly good one, but other models were used
before it existed.

3. How is the data /saved to a persistent storage medium/

This is, as you pointed out, entirely encapsulated by an RDBMS, or by a
file system for example.

I think Lahman is talking about abstracting an interface (making an
API, if you will) between levels one and two. So your application would
have a generic way of saying "give me time sheet X" or "give me all
time sheets for user Y". If flat files were being used at the time the
application was written, some implementation of the API would be
written which uses flat files. Perhaps one file is named after each
user, and stores a series of time sheets. When the RDBMS came along, it
would then be far easier to convert the legacy data, and very little
(no?) application code would need to be rewritten.

Note that I'm not trying to imply that the above API has to have
anything to do with objects. It could just as easily been procedural in
nature and achieved the same result.

-Matt McGill

fre...@gmail.com

unread,

Nov 11, 2006, 12:37:26 AM11/11/06

to

> Will you explain what you mean by "Transactions and peristence are two
> orthogonal features?"

Persistence is about storing something to a persistent medium.
Transactions is about letting either all operations complete or none
operation complete. A subset of operations in a transaction can never
complete if not all operations complete. In many cases a transaction
ends with writing the changelog to persistent medium, but that is not
necessary. For all-in-RAM databases, data is never written to
persistent medium, but you might still benefit from transactions.

Fredrik Bertilsson

fre...@gmail.com

unread,

Nov 11, 2006, 1:06:17 AM11/11/06

to

> 1. What the data /is/
>
> This would be the highest level of abstraction - information systems
> deal with /information/, which is data interpreted in a particular
> context. The application is typically dealing with things at this
> level. IOW, the application is presenting the user with a timesheet
> (information), as opposed to just a collection of numbers (data).

Can you give some examples of data in this level?

> 2. How the data is /represented/
>
> We can represent time sheet data in a number of different ways - the
> relational model is a particularly good one, but other models were used
> before it existed.

Classes is a very good tool for represent data. Numbers, strings and
dates are built-in in most databases, other custom data types might be
represented by custom classes registered to the database.

> 3. How is the data /saved to a persistent storage medium/
>
> This is, as you pointed out, entirely encapsulated by an RDBMS, or by a
> file system for example.

The strange thing is that you don't mention data structures at all. I
think we have a agreement that classes is a good tool for representing
data (level 2), but the disagreement is about how to represent data
structures.

> I think Lahman is talking about abstracting an interface (making an
> API, if you will) between levels one and two. So your application would
> have a generic way of saying "give me time sheet X" or "give me all
> time sheets for user Y".

That means that predicate logic should be included in the interface?

> If flat files were being used at the time the
> application was written, some implementation of the API would be
> written which uses flat files. Perhaps one file is named after each
> user, and stores a series of time sheets. When the RDBMS came along, it
> would then be far easier to convert the legacy data, and very little
> (no?) application code would need to be rewritten.

What if I want to know who is working a particular day? Using flat
files, I would use the existing API function "give me all time sheets
for user Y", parsing the time sheets and find out if the person is
working a particalar day or not. All this processing would be done in
the layer on top the "persistence layer".

Later when RDBMS came along, I would have to write a SQL query to join
all tables that form a time sheet for user Y. In the layer above I
would have to parse and process this verbose data in the same way as
before.

But the best way using a RDBMS would be to write a new API function
returning the the users working a particular day, supported by a SQL
select statement. But before SQL was introduced, we would never realize
that the API should have this function, because it contained
non-persistence processing.

The problem is if you want to prepare for a future "paradigm", you have
to know something about it, which we don't. Otherwise we will use the
next generation database using the previous generations interface. In
every database generation shift, the border between what we consider
"persistence logic" and "business logic" has moved. It is likely to
happen in the future, why it is impossible to define an interface that
will stand for future changes.

Fredrik Bertilsson

H. S. Lahman

unread,

Nov 11, 2006, 3:40:16 PM11/11/06

to

Responding to Frebe73...

>>Worse, the application processing was often structured around the
>>preferred organization for the paradigm.
>
>
> How could we do it different know, if we don't know anything about the
> future paradigm?

The rest of the message explains exactly that.

>>Encapsulate the persistence mechanism behind a single subsystem
>>interface (an API in the Procedural Days). Design the subsystem
>>interface in terms of what the problem solution's needs for data are,
>>which will be independent of how the data is stored.
>
>
> What is your definition of "store"? Store to a persistent medium, store
> into a variable, store into another process, or?

The application solution doesn't care if the data is stored in an RDB,
flat files, an OODB, shared memory, or on clay tablets.

>> Then let the
>>subsystem provide the mapping of that interface into the persistence
>>paradigm de jour.
>
>
> If we use a RDBMS, the persistence part is already separated. The
> application has no idea about if, when or how data is persisted.

And if you decide to use an OODBMS? Or flat files? An RDBMS is a very
particular, albeit currently very common, persistence mechanism. The
application solution needs to be decoupled from particular persistence
mechanisms.

>>Thus the application solution always requests, "Save this pile of data I
>>call X" and "Give me the pile of data I saved before as X." The
>>persistence access subsystem maps the X identity and the pile of data
>>into records in ISAM files, RDB tables, clay tablets, or whatever.
>
>
> Is X always an identifier? Should you be allowed to use any predicate
> logic in this interface ("give me the pile of data I saved before
> having X=5 or Y=6")?

X=5 is just another way of defining identity.

H. S. Lahman

unread,

Nov 11, 2006, 4:28:29 PM11/11/06

to

Responding to Frebe73...

>>>Additionally, relational data models can be more easily proven
>>>correct--or correct enough--before an investment is made in coding.
>>
>>I'm not sure I buy that. More easily than what? The RDM normalization
>>can be applied beyond the RDB's table/tuple paradigm.
>
>
> What is the "RDB's table/tuple paradigm"?

Say, what?!? Are you saying you don't know what an RDB table is or what
a tuple is within the table? Or that the tables, keyed tuples, and
relationships in an RDB represent a specific implementation of the
relational data model?

>>And OO Class Models are routinely normalized as
>>part of the basic paradigm methodology.
>
>
> Many class diagrams would break 1NF. I also see a problem with applying
> to 2 & 3 NF because the id of the object is not a value itself, but a
> pointer. Because object may be easily cloned, I suppose that would
> break 2NF.

Actually, 1NF is much more commonly broken in RDBs than in Class Models.
A classic example is a telephone number, which will almost always be
stored in the RDB as a single number but if the elements of the number
(e.g., area code) are important to the problem in hand, they will always
be broken out as distinct attributes in a Class.

Objects abstract uniquely identifiable problem space entities. An
address in process memory is unique, so that satisfies the mapping. It
is actually more versatile that the RDB paradigm. Consider 6-32 screws
in an inventory. They are effectively clones without explicit identity
values but they are still uniquely identifiable in the problem space.
So long as the object corresponding to each screw has a unique address,
it is identifiable in the same sense that the physical screws in the
problem space are. The only way you can avoid 2/3NF problems for that
situation in an RDB is by adding an artificial explicit identity (e.g.,
autonumber) to the tuple itself.

>>However, I don't see that as being very relevant. My point is that the
>>application's problem solution doesn't care how the data is stored.
>
>
> Neither do the relational model or SQL.

The RDM, yes. But try using SQL on flat sequential files or an OODB.
Note that RDM and RDB are not synonyms. An RDB is a special case of the
RDM.

>>If it doesn't care how it is stored, it certainly doesn't care how the
>>storage mechanism is validated.
>
>
> I guess Thomas is talking about how the business rules are validated.

You are pulling sentences out of context and responding to them out of
context.

>>>Lastly, your database is language-neutral. It shouldn't matter what
>>>language the application sitting in front of the database is written in,
>>>or even what paradigm it's born from. Flexibility starts with a good
>>>database design and extends through the application--not the other way
>>>around.
>>
>>That's true enough but I would make it even stronger. RDBs are designed
>>to be problem-independent, not just language independent, which is
>>pretty much my point.
>
>
> The relational model is used for modelling data, problem-independent or
> not. Just because some data could be considered "problem-dependent", it
> may very well me modelled using the RM.

I said RDBs, not the RDM. An RDB is one of many possible
implementations of the RDM.

>>The data structures one needs to optimize the solution to a /particular/
>>problem in an application are often quite different than the structures
>>best suited to optimizing generic, ad hoc access of the same data.
>
>
> In some special scenarios, B-trees might not be the best solution and
> arrays or hashtables might be better choices. But I think that is
> low-level optimization without significant impact on average enterprise
> applications.

I guess you don't do a lot of high performance applications.

>>So
>>if one is solving a non-CRUD/USER problem where special optimization is
>>usually required, one wants to separate the views of the solution from
>>those of the RDB.
>
>
> Using low-level collection classes is not a good idea for modern
> enterprise applications. There are a lot of issues like concurrency or
> transactions, that you have to solve by yourself in that case.

By "enterprise applications" do you mean server-side layers? I am
talking about client-side applications solving a particular business
problem. Any concurrency relevant to client-side applications is
completely different than the concurrency related to processing parallel
transactions in the DBMS. The tuple-based relationships of the OO
paradigm work quite well in concurrent environments.

Thomas Gagne

unread,

Nov 11, 2006, 9:21:17 PM11/11/06

to

fre...@gmail.com wrote:
> <snip>

> Caching is the second issue. Because OO people want to play with an
> object graph instead of predicate logic, they need the graph or parts
> of it to be virtually in memory all the time. This will very quickly
> lead to huge RAM consumtion, unless you use caching in your
> application. The DBMS already do caching for you, which synchronizes
> the cach with transactions and handles all concurrency issues. But if
> you try to do application caching, the realibility of the cached data
> will be rather low.
>

But to what ends? Does the application need to do that, or has it
simply substituted its own set processing for the RDB's--which is
designed to do such things.

I guess the question is--why is data being cached? How much data is
being cached? Is data being ached to accelerate DB transactions
(eliminate joins to lookup tables) or as a egotistical indulgence in OO
purism? Students of logic might recognize that as a complex question
(repeat the questions with your best Barbara Walters impersonation for
greatest affect).

Are there reasons to pull entire tables into memory? Perhaps--but it's
not an especially practical solution if the amount of data exceeds what
can be resident in RAM. Even were RAM limits expanded they would not
compete with what can be saved to disk--or a network--or the internet.
It is very few applications indeed that need RAM-resident memory to
compete with those resources (nuclear explosion simulations come to mind).

Anyway, the warning here is when an application is pulling large amounts
of data into memory at once and attempting or planning to operate on the
resulting data graph it is likely there is a better solution.
> <snip>

>
> The main cause for all these problems is the fact that OO people like
> to use objects as data structures and creating a domain model.
> According to Ted Codd and Chris Date, the table (relation) is the only
> (high-level) data structure. Using other data structures will cause an
> impedance mismatch. But objects are still very useful for other
> purposes. As a matter of fact, the relational model needs
> classes/objects for defining data types others but the existing onces
> like strings and dates.
>
>

Interesting food for thought.

Thomas Gagne

unread,

Nov 11, 2006, 9:35:38 PM11/11/06

to

H. S. Lahman wrote:
> Responding to Gagne...
>
>>> <snip>

>>
>> Can you be more precise about what you mean when you say, "Each shift
>> required major surgery ... because the persistence mechanisms were
>> not encapsulated."
>
> At each shift there were huge amounts of legacy code around that used
> the old paradigm for persistence and needed to be upgraded. Typically
> the persistence mechanisms were not encapsulated in a single
> application subsystem. Thus the direct reads and writes were
> sprinkled ubiquitously throughout the code.

Excellent. Thanks.

Is that different from sprinkling SQL throughout code, and binding a
database's schema throughout the code with table names, column names,
and relationships?

>
> Worse, the application processing was often structured around the
> preferred organization for the paradigm. So it was not simply a
> matter of 1:1 statement replacement. Often one had to modify the
> basic flow of control of the application. For example, the way one
> collects related data from multiple ISAM files is quite different than
> the way one employs SQL table joins.
>
>>
>> If we were to look back at those transitions with an eye towards
>> designing the original applications again so the surgery was more
>> cosmetic, how would you have implemented the application's database
>> interface differently and are you doing that today?
>
> Encapsulate the persistence mechanism behind a single subsystem
> interface (an API in the Procedural Days). Design the subsystem
> interface in terms of what the problem solution's needs for data are,
> which will be independent of how the data is stored. Then let the
> subsystem provide the mapping of that interface into the persistence
> paradigm de jour.
>
> Thus the application solution always requests, "Save this pile of data
> I call X" and "Give me the pile of data I saved before as X." The
> persistence access subsystem maps the X identity and the pile of data
> into records in ISAM files, RDB tables, clay tablets, or whatever.
> Now one can substitute the persistence paradigms by replacing one
> subsystem implementation without touching either the interface or the
> problem solution.

Is the application's idea of X necessarily reflected in the persistence
paradigm de jour? What I mean to ask, is are you deliberately or
accidentally assuming a customer in the application is mirrored in the
database as a customer?

Thomas Gagne

unread,

Nov 11, 2006, 9:39:58 PM11/11/06

to

fre...@gmail.com wrote:
>> Worse, the application processing was often structured around the
>> preferred organization for the paradigm.
>>
>
> How could we do it different know, if we don't know anything about the
> future paradigm?
>

Leave SQL out of the application. Push it as close to the database as
possible.
> <snip>

>
> If we use a RDBMS, the persistence part is already separated. The
> application has no idea about if, when or how data is persisted.
>

Only when the application designer/programmer resists sprinkling SQL
around like garlic salt.

AndyW

unread,

Nov 11, 2006, 9:48:51 PM11/11/06

to

Is likely that if its a real OO program anyhow it will be a
distributed application, rather than partially cached.

Thomas Gagne

unread,

Nov 11, 2006, 9:54:46 PM11/11/06

to

That seems an over-simplification. Would you characterize an online
trading system as CRUD? For every single trade how many queries are
performed before and after? For any single banking transactions how
many queries before and after? I suppose that when humans are involved
they tend to look at what things look like before and after
transactions--just to make sure. Eliminate humans and you may still
have a 10-1 ratio of logical DB reads to writes.

>
>> Additionally, relational data models can be more easily proven
>> correct--or correct enough--before an investment is made in coding.
>
> I'm not sure I buy that. More easily than what? The RDM
> normalization can be applied beyond the RDB's table/tuple paradigm.
> ISAM files, CODASYL files, and other data representations can be
> normalized using the same basic rules. And OO Class Models are
> routinely normalized as part of the basic paradigm methodology.
>
> However, I don't see that as being very relevant.

I guess the point was relational DBs have at least relational calculus
behind them, and we can always ask a design a) is all the necessary data
saved and b) are all the relationships represented?

Granted this is anecdotal, but the experienced DBs I've known would all
end up at nearly identical DB designs given the same requirements. I've
not seen that consistency with OO designers.
> <snip>

>
>> Lastly, your database is language-neutral. It shouldn't matter what
>> language the application sitting in front of the database is written
>> in, or even what paradigm it's born from. Flexibility starts with a
>> good database design and extends through the application--not the
>> other way around.
>
> That's true enough but I would make it even stronger. RDBs are
> designed to be problem-independent, not just language independent,
> which is pretty much my point.

I missed that, but now I'm interested in what you mean by
"problem-independent." If you're referring to the RDB before a DB
design is given it, then I think I understand what you're talking about.

>
> The data structures one needs to optimize the solution to a
> /particular/ problem in an application are often quite different than
> the structures best suited to optimizing generic, ad hoc access of the
> same data. So if one is solving a non-CRUD/USER problem where special
> optimization is usually required, one wants to separate the views of
> the solution from those of the RDB.

We'd better pick an application or problem domain and skip the
non-CRUD/USER descriptions. They confuse me 'cause not everyone may use
the same definition.

AndyW

unread,

Nov 11, 2006, 9:57:58 PM11/11/06

to

I do not believe there is any normalisation in the OO paradigm, nor is
there as far as I can see any need for it since technically there is
no separation of behavour from data.

OO however isnt notable for its condusiveness towards high performance
(it rather focuses on natural performance). Hence people will often
re-organise their data to eek some more performance out of it. How
they mess up or organise their data, isnt as far as I am aware part of
the OO paradigm.

One common way, tho is to organise the meta-data instead so that the
actual objects can be located and accessed faster.

topmind

unread,

Nov 11, 2006, 10:23:29 PM11/11/06

to

There is a long discussion on C2 that ponders such issues:

http://www.c2.com/cgi/wiki?ResultSetSizeIssues

>
> Anyway, the warning here is when an application is pulling large amounts
> of data into memory at once and attempting or planning to operate on the
> resulting data graph it is likely there is a better solution.
> > <snip>
> >
> > The main cause for all these problems is the fact that OO people like
> > to use objects as data structures and creating a domain model.
> > According to Ted Codd and Chris Date, the table (relation) is the only
> > (high-level) data structure. Using other data structures will cause an
> > impedance mismatch. But objects are still very useful for other
> > purposes. As a matter of fact, the relational model needs
> > classes/objects for defining data types others but the existing onces
> > like strings and dates.
> >
> >
> Interesting food for thought.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.

-T-

fre...@gmail.com

unread,

Nov 12, 2006, 1:34:58 AM11/12/06

to

> >>Encapsulate the persistence mechanism behind a single subsystem
> >>interface (an API in the Procedural Days). Design the subsystem
> >>interface in terms of what the problem solution's needs for data are,
> >>which will be independent of how the data is stored.
> >
> >
> > What is your definition of "store"? Store to a persistent medium, store
> > into a variable, store into another process, or?
>
> The application solution doesn't care if the data is stored in an RDB,
> flat files, an OODB, shared memory, or on clay tablets.

Your definition of "store" is pretty wide, even variable assignments
are obviously included in your definition of store. I can't see how the
problem slolution could be independent from how the data is stored. But
the problem solution can be independent from how data is persisted.

> >> Then let the
> >>subsystem provide the mapping of that interface into the persistence
> >>paradigm de jour.
> >
> > If we use a RDBMS, the persistence part is already separated. The
> > application has no idea about if, when or how data is persisted.
>
> And if you decide to use an OODBMS? Or flat files? An RDBMS is a very
> particular, albeit currently very common, persistence mechanism. The
> application solution needs to be decoupled from particular persistence
> mechanisms.

If a RDBMS was mainly used for persistence, your statement would be
true. But switching from a RDBMS to a OODBMS does not only include
switching persistence mechanism. You also switch data model, drops
relational calculus, drops predicate logic for data retrieval, and many
other things.

Why does the application need to be decoupled from the type of database
used? If you use a OODBMS, I suppose you want a much tigther
integration between the objects in the application and the database. If
we would have a working OODBMS widely used, nobody would suggest it to
be separated. The only reason why you want to separate is because the
impedance mismatch. If the mismatch does not exists, no need for
separation.

Do you seriously suggest that switching from a RDBMS to flat files is a
realistic task? You will basically have to implement features like
transactions, caching, concurrency all by yourself.

What is your definition of "persistence"? My definition is: Storing to
a persistent media.

> >>Thus the application solution always requests, "Save this pile of data I
> >>call X" and "Give me the pile of data I saved before as X." The
> >>persistence access subsystem maps the X identity and the pile of data
> >>into records in ISAM files, RDB tables, clay tablets, or whatever.
> >
> >
> > Is X always an identifier? Should you be allowed to use any predicate
> > logic in this interface ("give me the pile of data I saved before
> > having X=5 or Y=6")?
>
> X=5 is just another way of defining identity.

Does this answer imply that the "persistence" subsystem can only
deliver data belonging to a given identity? Predicate logic can not be
a part of this interface? If you have a SQL database, you can only use
select statements using primary key columns in the where condition? Why
do you need a SQL database at all in that case? Something like Berkeley
DB would do the job much better.

Fredrik Bertilsson

fre...@gmail.com

unread,

Nov 12, 2006, 2:03:28 AM11/12/06

to

> >>The RDM normalization
> >>can be applied beyond the RDB's table/tuple paradigm.
> >
> > What is the "RDB's table/tuple paradigm"?
>
> Say, what?!? Are you saying you don't know what an RDB table is or what
> a tuple is within the table? Or that the tables, keyed tuples, and
> relationships in an RDB represent a specific implementation of the
> relational data model?

Tuples are a fundamental part of the relational model. Table is another
word for relation. Relations are also a fundamental part of the
relational model. These two concepts does NOT represent a specific
implementation of the relational model. If tuples or relations does not
exists, relational calculus is not possible, nor normalization.

But no existing production database qualify as a relational database.
That is my some people prefer using the name "SQL database" instead of
RDB.

> >>And OO Class Models are routinely normalized as
> >>part of the basic paradigm methodology.
> >
> > Many class diagrams would break 1NF. I also see a problem with applying
> > to 2 & 3 NF because the id of the object is not a value itself, but a
> > pointer. Because object may be easily cloned, I suppose that would
> > break 2NF.
>
> Actually, 1NF is much more commonly broken in RDBs than in Class Models.
> A classic example is a telephone number, which will almost always be
> stored in the RDB as a single number but if the elements of the number
> (e.g., area code) are important to the problem in hand, they will always
> be broken out as distinct attributes in a Class.

What is why you should not save telephone number as a single number.
But you can make a view that concatenates the full number.

> Objects abstract uniquely identifiable problem space entities. An
> address in process memory is unique, so that satisfies the mapping.

What mapping?

> It is actually more versatile that the RDB paradigm.

What are the difference between the RDB paradigm and the RM paradigm?

> Consider 6-32 screws
> in an inventory. They are effectively clones without explicit identity
> values but they are still uniquely identifiable in the problem space.
> So long as the object corresponding to each screw has a unique address,
> it is identifiable in the same sense that the physical screws in the
> problem space are. The only way you can avoid 2/3NF problems for that
> situation in an RDB is by adding an artificial explicit identity (e.g.,
> autonumber) to the tuple itself.

And what is the normalization problem with that?

> >>However, I don't see that as being very relevant. My point is that the
> >>application's problem solution doesn't care how the data is stored.
> >
> > Neither do the relational model or SQL.
>
> The RDM, yes. But try using SQL on flat sequential files or an OODB.

SQL using flag files would be possible and is done all the time. Most
SQL databases uses flat files for persistence. Actually JDO claim that
an hybrid SQL language (JDO-SQL) can be used on non-relational
databases. Obviously the underlying database need to support relational
calculus or the JDO product has to implement it on top.

> Note that RDM and RDB are not synonyms. An RDB is a special case of the
> RDM.

In fact, no current production database qualify as a RDB. But if we had
a database qualifying as a relational database, that would be an
implementation of the relational model, not a special case.

> >>>Lastly, your database is language-neutral. It shouldn't matter what
> >>>language the application sitting in front of the database is written in,
> >>>or even what paradigm it's born from. Flexibility starts with a good
> >>>database design and extends through the application--not the other way
> >>>around.
> >>
> >>That's true enough but I would make it even stronger. RDBs are designed
> >>to be problem-independent, not just language independent, which is
> >>pretty much my point.
> >
> > The relational model is used for modelling data, problem-independent or
> > not. Just because some data could be considered "problem-dependent", it
> > may very well me modelled using the RM.
>
> I said RDBs, not the RDM. An RDB is one of many possible
> implementations of the RDM.

Current SQL databases has some limitations that make them not qualify
as relational databases. But in what way does these limitations force
them to be problem-independent?

> >>The data structures one needs to optimize the solution to a /particular/
> >>problem in an application are often quite different than the structures
> >>best suited to optimizing generic, ad hoc access of the same data.
> >
> > In some special scenarios, B-trees might not be the best solution and
> > arrays or hashtables might be better choices. But I think that is
> > low-level optimization without significant impact on average enterprise
> > applications.
>
> I guess you don't do a lot of high performance applications.

That depends on your definition of "high performance". But my main area
is "enterprise applications" there optimization by using low-level
collection classes does not play a very important role. Current SQL
provide enough performance in my area.

> >>So
> >>if one is solving a non-CRUD/USER problem where special optimization is
> >>usually required, one wants to separate the views of the solution from
> >>those of the RDB.
> >
> > Using low-level collection classes is not a good idea for modern
> > enterprise applications. There are a lot of issues like concurrency or
> > transactions, that you have to solve by yourself in that case.
>
> By "enterprise applications" do you mean server-side layers?

I mean applications for accounting, payroll processing, logistics,
requirement management, etc. Not necessary on the server-side.

> I am talking about client-side applications solving a particular business
> problem.

I talking about applications solving a particular business problem, but
not neccessary on the client-side. I think the clide-side should be
focused on presentation logic.

> Any concurrency relevant to client-side applications is
> completely different than the concurrency related to processing parallel
> transactions in the DBMS.

I guess the client-side of an application does not have very much
concurrency to deal with at all.

> The tuple-based relationships of the OO
> paradigm work quite well in concurrent environments.

Do you have any pointers to some website explaining "tuple-based
relationsships of the OO paradigm".

Fredrik Bertilsson

Matt McGill

unread,

Nov 12, 2006, 8:40:55 AM11/12/06

to

> I do not believe there is any normalisation in the OO paradigm, nor is
> there as far as I can see any need for it since technically there is
> no separation of behavour from data.

My understanding was that normalization had to do with removing
redundancy. My 'understandings' haven't been so hot lately, so here's a
quote from a set of pages on the relational model from the University
of Texas: "Simply stated, normalization is the process of removing
redundant data from relational tables by decomposing (splitting) a
relational table into smaller tables by projection."
(http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html)

Why does the packaging of behavior with data remove the need to
eliminate redundancy?

-Matt McGill

fre...@gmail.com

unread,

Nov 12, 2006, 10:04:54 AM11/12/06

to

> > If we use a RDBMS, the persistence part is already separated. The
> > application has no idea about if, when or how data is persisted.
> >
> Only when the application designer/programmer resists sprinkling SQL
> around like garlic salt.

SQL is about data management, relational calculus and predicate logic.
Persistence is something else.

What is your definition of "persistence"?

Fredrik Bertilsson

Matt McGill

unread,

Nov 12, 2006, 10:22:11 AM11/12/06

to

fre...@gmail.com wrote:
> What is the difference in using HQL strings instead of SQL strings?
> While I am not an expert in HQL I think most people agree that HQL is
> more limited than SQL. All modifications in the HQL language seem to
> make it more like SQL. Besides SQL has a rather high degree of
> standardization. The level of standardization for HQL is zero. You are
> entirely limited to one product.

I agree that the lack of standardization is a significant drawback. The
problem can be made tolerable by encapsulating your queries behind
interfaces which specify the intent of each query. That's not so
foreign a concept even for strict RM guys, right? This would be
something like encapsulating your commonly-used SQL in stored
procedures, callable from your application. You could use
vendor-specific extensions in those stored procedures without *too*
much trepidation because you could at least provide different
implementations of the same stored procedures if you were ever forced
to switch DB vendors (which could happen if your primary vendor makes
some dramatic change to its licensing scheme).

I do think there are benefits to object-aware query languages like HQL,
but the benefits are OO-specific (polymorphic queries, for example). I
concede that HQL doesn't stack up very well against SQL in the areas
where SQL shines the brightest (do I sound 'vociferous' yet? =] More on
that later).

> If the database and application/web server resides on different boxes,
> it is very important with a fast network connection between.
> Application caching has so many disadvantages that this has to be
> avoided. I can't see any real reasons why the application and database
> server should be connected by a slow network, normally the boxes are
> standing next to each other.
>
> My recommendation is to have the DBMS and application server in the
> same box or even in the same process.

I'm not sure this would be a good idea, but I'm no expert on this. The
organization I work for is fairly small, and our applications are not
accessed all that heavily, so we might be able to get away with having
the DBMS and application server co-located. But for larger operations,
where both the app server and the DBMS are kept extremely busy, I would
think that you'd need to keep them on different machines so that each
can take full advantage of the CPU(s) when necessary. And what about
clustering?

>
> > Hibernate, on the other hand, has some sophisticated caching mechanisms
> > built in. The actual cache functionality is supplied by third-party
> > cache libraries, and you can take your pick (there are a bunch). I
> > agree that rolling your own would be a dumb idea, but there's no reason
> > to.
>
> The RDBMS already have sophisticated caching mechanisms built in.
> Hibernate can not beat it.

Again, the purpose of Hibernate's cache is in eliminating round-trips
to the database. It has nothing to do with RDBMS caching.

>
> > Even better, transaction demarcation can be done declaratively rather
> > than programmatically, with a little AOP. If I want some class's
> > saveChanges() method (which will modify 'persistent' objects) to be
> > transactional, I don't even have to begin/commit the transaction in the
> > code, and remember to catch exceptions and roll back. Frameworks like
> > Spring provide helpful proxy classes which can be configured to
> > intercept method calls and do such things for you. You actually get
> > much more than this - you can state that certain methods may only be
> > called /within existing transactions/ for example, or vica versa.
>
> I also use implicit transactions. My SQL-based approach does not stop
> you from using spring beans at all. (Another solution is to put
> transaction start and ending in a web filter.)

Good point =)

> > I disagree that OO applications like to share state between threads.
> > Hibernate is designed specifically /not/ to do this (a Hibernate
> > Session is not a thread-safe object - each thread creates its own
> > Session, those Sessions are independent, and they would return
> > different instances when queried for the same sets of objects). Do you
> > have a specific example that led you to conclude this?
>
> As soon as you use caching, you share state between threads.

Are we perhaps using different definitions again? I am referring to
multiple threads accessing the same location in memory. Using
Hibernate, this does not occur even with application-level caching.
Hibernate caches the data that comprises the state of an object. When
you ask for that object in a thread, you get a new instance of the
object bound to that thread containing a deep copy of the state. No two
threads will be sharing a reference to the same object. So unless by
'share state between threads' you mean 'multiple threads operating on
different copies of the same conceptual entity,' you are incorrect.

If that *is* what you mean, than you do it all the time in your own
applications, it just isn't as explicit. A PHP page which allows users
to view/modify database records is creating multiple temporary 'copies'
of the data in the DB (the copies just happen to reside in the client's
web browser once the page is rendered). This isn't caching, but it
certainly does force you to deal with concurrency in your application
code. If users A and B both view the page at roughly the same time,
make overlapping changes, and submit their changes at roughly the same
time, whose changes end up reflected in the DB? And does the loser know
that his changes were lost? The DB won't save you here, you need some
sort of explicit versioning scheme to fix this problem.

>
> > A view every time a SQL statement is repeated more than once, though? I
> > understand the bit about encapsulating certain types of business logic,
> > but I don't see why a view should immediately be created over a table
> > because the same set of fields are selected in two places. Wouldn't
> > that instead be an indicator that you need to eliminate the redundancy
> > in your code by introducing a function/method/class/whatever which can
> > be called from both places? Can you give specific examples to
> > illustrate?
>
> In some cases there are probably an indicator of redundancy, in the
> same was as a DAO method used from multiple points is an indicator of
> reduncancy in the calling code. And I don't suggest defining view for
> trivial select statements. But there are still plenty of scenarios
> there views are a suitable solution.

A function/procedure/method being called in multiple places is not by
itself an indicator of redundancy. Consider the strlen() function.
Obtaining the length of a string is a step in any number of totally
different operations. Similarly, obtaining a particular entity (whether
record, or object) might be a step in any number of totally unrelated
operations. So a DAO method used from multiple points is a /good/ sign
(if it will only ever be used in one place, why was the DAO created in
the first place?).

Views are great - they can encapsulate business logic, and they can
isolate you from schema changes, and I don't use them enough. But I
don't think they have all that much to do with eliminating redundancy
in your application code.

>
> > > The main cause for all these problems is the fact that OO people like
> > > to use objects as data structures and creating a domain model.
> > > According to Ted Codd and Chris Date, the table (relation) is the only
> > > (high-level) data structure. Using other data structures will cause an
> > > impedance mismatch. But objects are still very useful for other
> > > purposes. As a matter of fact, the relational model needs
> > > classes/objects for defining data types others but the existing onces
> > > like strings and dates.
> >

> > Objects are not data structures in that sense of the word. They contain
> > data structures, and associate behavior with them. They also have
> > relationships with other objects. As H. S. Lahman has helped me to see,
> > the difference between containing a data structure and having a
> > relationship is not always immediately obvious due to the nature of
> > today's 3GL OO languages. The impedance mismatch is a result of trying
> > to map object relationships which which do not conform to the
> > relational model onto the relational model anyway.
>
> Objects relationships are pointers. The purpose with the relational
> model is to avoid the use of pointers.

Right. I don't take it as a given that pointers are inherently evil,
which is why I don't think the relational model is the end-all and
be-all of the software development universe =) However, I need to save
the data in my objects somehow, and I don't feel like re-implementing
all the great features in an RDBMS.

Now, regarding 'vociferous ignorance' - I think a lot of it gets thrown
around on usenet all the time, on /both/ sides of any debate.
Ultimately, if our assertions cannot be verified experimentally, they
carry no weight.

Could we perhaps pick a realistic problem domain, and then start to
explore various approaches to modeling and implementation in enough
detail to draw some conclusions? Obviously a reasonable problem would
have to be agreed upon which does not obviously favor a particular
approach, and is both non-trivial enough to be meaningful and simple
enough to be implemented in one's spare time.

What about forum software? Certainly a problem domain we all understand
well enough.

H. S. Lahman

unread,

Nov 12, 2006, 11:07:00 AM11/12/06

to

Responding to Frebe73...

>>>What is your definition of "store"? Store to a persistent medium, store
>>>into a variable, store into another process, or?
>>
>>The application solution doesn't care if the data is stored in an RDB,
>>flat files, an OODB, shared memory, or on clay tablets.
>
>
> Your definition of "store" is pretty wide, even variable assignments
> are obviously included in your definition of store. I can't see how the
> problem slolution could be independent from how the data is stored. But
> the problem solution can be independent from how data is persisted.

That is pretty much the point. All the solution cares about is storing
a pile of data in terms of identity that it will use later to access
that same pile of data. How the data is actually stored is not relevant
to the solution so one can use a very broad definition of data storage.

>>>>Then let the
>>>>subsystem provide the mapping of that interface into the persistence
>>>>paradigm de jour.
>>>
>>>If we use a RDBMS, the persistence part is already separated. The
>>>application has no idea about if, when or how data is persisted.
>>
>>And if you decide to use an OODBMS? Or flat files? An RDBMS is a very
>>particular, albeit currently very common, persistence mechanism. The
>>application solution needs to be decoupled from particular persistence
>>mechanisms.
>
>
> If a RDBMS was mainly used for persistence, your statement would be
> true. But switching from a RDBMS to a OODBMS does not only include
> switching persistence mechanism. You also switch data model, drops
> relational calculus, drops predicate logic for data retrieval, and many
> other things.

Exactly my point -- the Data Model changes, not the Class Model for the
<OO> problem solution. It is up to the application's persistence access
subsystem to map the solution Class Model to the persistence Data Model
de jour. IOW, the subsystem interface decoupling is provided so that
only the persistence access subsystem needs to understand the Data Model
used to actually store the data.

>>>>Thus the application solution always requests, "Save this pile of data I
>>>>call X" and "Give me the pile of data I saved before as X." The
>>>>persistence access subsystem maps the X identity and the pile of data
>>>>into records in ISAM files, RDB tables, clay tablets, or whatever.
>>>
>>>
>>>Is X always an identifier? Should you be allowed to use any predicate
>>>logic in this interface ("give me the pile of data I saved before
>>>having X=5 or Y=6")?
>>
>>X=5 is just another way of defining identity.
>
>
> Does this answer imply that the "persistence" subsystem can only
> deliver data belonging to a given identity? Predicate logic can not be
> a part of this interface? If you have a SQL database, you can only use
> select statements using primary key columns in the where condition? Why
> do you need a SQL database at all in that case? Something like Berkeley
> DB would do the job much better.

The decoupling is provided by a pure message-based interface -- {message
ID, <data packet>}. The problem solution uses the message ID to
determine how to map the data packet into its object attributes. The
persistence access subsystem uses the message ID to determine how to map
the data packet into ISAM tables, RDB records, SQL queries, or whatever.
Each has its own unique view of the data in the data packet based upon
the identity in message.

fre...@gmail.com

unread,

Nov 12, 2006, 11:08:24 AM11/12/06

to

First Normal Form: All values of the columns are atomic.

How does that apply to objects, if any object may contain an arbitrary
hierachy of other objects?

Second Normal Form: Every non-key column is fully dependent upon the
primary key.

If you don't have keys, how an object graph be in 2NF?

After carefully reading the normal forms, it is hard to understand how
they could be applied to an objects graph. The foundation is simply too
different.

Fredrik Bertilsson

fre...@gmail.com

unread,

Nov 12, 2006, 11:26:59 AM11/12/06

to

> > Your definition of "store" is pretty wide, even variable assignments
> > are obviously included in your definition of store. I can't see how the
> > problem slolution could be independent from how the data is stored. But
> > the problem solution can be independent from how data is persisted.
>
> That is pretty much the point. All the solution cares about is storing
> a pile of data in terms of identity that it will use later to access
> that same pile of data. How the data is actually stored is not relevant
> to the solution so one can use a very broad definition of data storage.

The solution cares about data and how it is structured. The relational
model is one way of structuring data. The OO model (or the network
model) is another way. The mechanism for retrieving data, relational
calculus or OO pointer traversal, is very important for the solution.
According to your argumentation, a map is the only data structure
needed.

> > If a RDBMS was mainly used for persistence, your statement would be
> > true. But switching from a RDBMS to a OODBMS does not only include
> > switching persistence mechanism. You also switch data model, drops
> > relational calculus, drops predicate logic for data retrieval, and many
> > other things.
>
> Exactly my point -- the Data Model changes, not the Class Model for the
> <OO> problem solution. It is up to the application's persistence access
> subsystem to map the solution Class Model to the persistence Data Model
> de jour. IOW, the subsystem interface decoupling is provided so that
> only the persistence access subsystem needs to understand the Data Model
> used to actually store the data.

The Class Model for the problem solution does not change? I am
currently rewriting the class model in an application (because someone
decided to load the entire objects graph at startup from the database
which caused terrible performance after a while).

Classes should be used for defining data types, not for creating data
structures. The latter is already done by relations. If you don't have
competing models for data structure, mapping is not necessary.

> >>>>Thus the application solution always requests, "Save this pile of data I
> >>>>call X" and "Give me the pile of data I saved before as X." The
> >>>>persistence access subsystem maps the X identity and the pile of data
> >>>>into records in ISAM files, RDB tables, clay tablets, or whatever.
> >>>
> >>>
> >>>Is X always an identifier? Should you be allowed to use any predicate
> >>>logic in this interface ("give me the pile of data I saved before
> >>>having X=5 or Y=6")?
> >>
> >>X=5 is just another way of defining identity.
> >
> >
> > Does this answer imply that the "persistence" subsystem can only
> > deliver data belonging to a given identity? Predicate logic can not be
> > a part of this interface? If you have a SQL database, you can only use
> > select statements using primary key columns in the where condition? Why
> > do you need a SQL database at all in that case? Something like Berkeley
> > DB would do the job much better.
>
> The decoupling is provided by a pure message-based interface -- {message
> ID, <data packet>}. The problem solution uses the message ID to
> determine how to map the data packet into its object attributes. The
> persistence access subsystem uses the message ID to determine how to map
> the data packet into ISAM tables, RDB records, SQL queries, or whatever.
> Each has its own unique view of the data in the data packet based upon
> the identity in message.

Ok, my original assumption was obviously true. All you need is a
persistent map, with two operators, put and get. Using a SQL database
in this context is just stupid.

Do you by any chance have some pointers to open source sample
application that are built using your principles? I would recommend a
look at the source code of http://www.oscommerce.com/ to show a
successfull application that doesn't try to hide SQL statements in
separate layers.

Fredrik Bertilsson

H. S. Lahman

unread,

Nov 12, 2006, 11:34:26 AM11/12/06

to

Responding to Gagne...

>> At each shift there were huge amounts of legacy code around that used
>> the old paradigm for persistence and needed to be upgraded. Typically
>> the persistence mechanisms were not encapsulated in a single
>> application subsystem. Thus the direct reads and writes were
>> sprinkled ubiquitously throughout the code.
>
> Excellent. Thanks.
>
> Is that different from sprinkling SQL throughout code, and binding a
> database's schema throughout the code with table names, column names,
> and relationships?

It is pretty much the same. The SQL statements marry the application to
a particular flavor of persistence. (In practice, they marry the
application to a particular RDB vendor.) That's why I advocate
isolating the SQL to a single persistence access subsystem.

However, there are other reasons. Often one needs to optimize the DB
access in the way one constructs joins, uses caches, etc. because the DB
access is almost always the performance bottleneck on the application
side. One also wants to isolate those optimizations so that the problem
solution is unaffected. IOW, one does not want to build the solution to
the customer's problem around computing space issues if it can be avoided.

Another is simply paradigm mismatch. For example, the OO paradigm uses
relationships in a fundamentally different way that an RDB. As a result
selection joins are a mainstay of RDB access but there is no equivalent
construct in an OO application because all collaborations are peer-to-peer.

>>> If we were to look back at those transitions with an eye towards
>>> designing the original applications again so the surgery was more
>>> cosmetic, how would you have implemented the application's database
>>> interface differently and are you doing that today?
>>
>>
>> Encapsulate the persistence mechanism behind a single subsystem
>> interface (an API in the Procedural Days). Design the subsystem
>> interface in terms of what the problem solution's needs for data are,
>> which will be independent of how the data is stored. Then let the
>> subsystem provide the mapping of that interface into the persistence
>> paradigm de jour.
>>
>> Thus the application solution always requests, "Save this pile of data
>> I call X" and "Give me the pile of data I saved before as X." The
>> persistence access subsystem maps the X identity and the pile of data
>> into records in ISAM files, RDB tables, clay tablets, or whatever.
>> Now one can substitute the persistence paradigms by replacing one
>> subsystem implementation without touching either the interface or the
>> problem solution.
>
> Is the application's idea of X necessarily reflected in the persistence
> paradigm de jour? What I mean to ask, is are you deliberately or
> accidentally assuming a customer in the application is mirrored in the
> database as a customer?

There has to be an unambiguous mapping between an OO Class Model and the
persistence Data Model because both are ultimately abstracting the same
customer problem space. However, the mapping is often not 1:1 once one
is outside the realm of CRUD/USER processing.

Basically one has a pure message-based subsystem interface of {message
ID, <data packet>}. The problem solution has its own unique view of how
to map the data in the data packet into its object attributes based upon
the message ID. Similarly, the persistence access subsystem has its own
unique view of how to map the data in the data packet into things like
SQL queries or ISAM records.

[The interface to the persistence access subsystem is designed to
resolve the problem solution's needs. Sometimes it is convenient for
the problem solution to store/access data in bulk. Consequently a
single request may contain data from multiple objects that will map into
multiple tables, records, or clay tablets on the persistence side.
Therefore the notion of identity will tend to be a bit more complex
(i.e., message ID is really a compound identity whose elements apply to
different parts of the data packet). But the notion of each side
uniquely mapping data packet to message identity still applies.]

fre...@gmail.com

unread,

Nov 12, 2006, 11:49:26 AM11/12/06

to

> > What is the difference in using HQL strings instead of SQL strings?
> > While I am not an expert in HQL I think most people agree that HQL is
> > more limited than SQL. All modifications in the HQL language seem to
> > make it more like SQL. Besides SQL has a rather high degree of
> > standardization. The level of standardization for HQL is zero. You are
> > entirely limited to one product.
>
> I agree that the lack of standardization is a significant drawback. The
> problem can be made tolerable by encapsulating your queries behind
> interfaces which specify the intent of each query. That's not so
> foreign a concept even for strict RM guys, right?

I assume that many Hibernate users would hate to hide alll their HSQL
statements in the database layer. After all Hibernate is supposed to be
the replacement for the database layer.

> This would be
> something like encapsulating your commonly-used SQL in stored
> procedures, callable from your application.

Stored procedures should only be used when the problem could not be
solved by using a view. Trivial SQL statements should NOT be hidden
inside stored procedures.

> You could use
> vendor-specific extensions in those stored procedures without *too*
> much trepidation because you could at least provide different
> implementations of the same stored procedures if you were ever forced
> to switch DB vendors (which could happen if your primary vendor makes
> some dramatic change to its licensing scheme).

Yes, stored procedures may be used for some vendors specific stuff,
like retriving last generated id, and vendor migration would be easier.

> > My recommendation is to have the DBMS and application server in the
> > same box or even in the same process.
>
> I'm not sure this would be a good idea, but I'm no expert on this. The
> organization I work for is fairly small, and our applications are not
> accessed all that heavily, so we might be able to get away with having
> the DBMS and application server co-located. But for larger operations,
> where both the app server and the DBMS are kept extremely busy, I would
> think that you'd need to keep them on different machines so that each
> can take full advantage of the CPU(s) when necessary. And what about
> clustering?

If you don't keep them on the same box, make sure the network between
is fast. Clustering is another reason for not reimplement data
management in the application server. Some databases has built-in
clustering support. Doing it yourself is not trivial. Application
caching is a very big problem in clusters. It is very important to
follow the rule of a stateless application server.

> A PHP page which allows users
> to view/modify database records is creating multiple temporary 'copies'
> of the data in the DB (the copies just happen to reside in the client's
> web browser once the page is rendered). This isn't caching, but it
> certainly does force you to deal with concurrency in your application
> code. If users A and B both view the page at roughly the same time,
> make overlapping changes, and submit their changes at roughly the same
> time, whose changes end up reflected in the DB? And does the loser know
> that his changes were lost? The DB won't save you here, you need some
> sort of explicit versioning scheme to fix this problem.

This is normally solved by optimistic locking, which is trivial.

> > > A view every time a SQL statement is repeated more than once, though? I
> > > understand the bit about encapsulating certain types of business logic,
> > > but I don't see why a view should immediately be created over a table
> > > because the same set of fields are selected in two places. Wouldn't
> > > that instead be an indicator that you need to eliminate the redundancy
> > > in your code by introducing a function/method/class/whatever which can
> > > be called from both places? Can you give specific examples to
> > > illustrate?
> >
> > In some cases there are probably an indicator of redundancy, in the
> > same was as a DAO method used from multiple points is an indicator of
> > reduncancy in the calling code. And I don't suggest defining view for
> > trivial select statements. But there are still plenty of scenarios
> > there views are a suitable solution.
>
> A function/procedure/method being called in multiple places is not by
> itself an indicator of redundancy. Consider the strlen() function.
> Obtaining the length of a string is a step in any number of totally
> different operations. Similarly, obtaining a particular entity (whether
> record, or object) might be a step in any number of totally unrelated
> operations. So a DAO method used from multiple points is a /good/ sign
> (if it will only ever be used in one place, why was the DAO created in
> the first place?).

A view being called in multiple places are not by itself an indicator
of redundacy, according to the argumentation above.

> Could we perhaps pick a realistic problem domain, and then start to
> explore various approaches to modeling and implementation in enough
> detail to draw some conclusions? Obviously a reasonable problem would
> have to be agreed upon which does not obviously favor a particular
> approach, and is both non-trivial enough to be meaningful and simple
> enough to be implemented in one's spare time.
>
> What about forum software? Certainly a problem domain we all understand
> well enough.

Sure. Forum software sounds ok, my suggestion would be a online (bus
trip) reservation application, but it is up to you. Put the
requirements on a website somewhere and I will give you the link to my
running demo and the sources. I will implement it using LAMP.

Fredrik Bertilsson

H. S. Lahman

unread,

Nov 12, 2006, 12:50:53 PM11/12/06

to

Responding to Frebe73...

>>>>The RDM normalization
>>>>can be applied beyond the RDB's table/tuple paradigm.
>>>
>>>What is the "RDB's table/tuple paradigm"?
>>
>>Say, what?!? Are you saying you don't know what an RDB table is or what
>>a tuple is within the table? Or that the tables, keyed tuples, and
>>relationships in an RDB represent a specific implementation of the
>>relational data model?
>
>
> Tuples are a fundamental part of the relational model. Table is another
> word for relation. Relations are also a fundamental part of the
> relational model. These two concepts does NOT represent a specific
> implementation of the relational model. If tuples or relations does not
> exists, relational calculus is not possible, nor normalization.
>
> But no existing production database qualify as a relational database.
> That is my some people prefer using the name "SQL database" instead of
> RDB.

Which is pretty much my point. RDM and RDB are not synonyms and an RDB
is a specific implementation of the RDM (however imperfect some people
may view it).

>>>>And OO Class Models are routinely normalized as
>>>>part of the basic paradigm methodology.
>>>
>>>Many class diagrams would break 1NF. I also see a problem with applying
>>>to 2 & 3 NF because the id of the object is not a value itself, but a
>>>pointer. Because object may be easily cloned, I suppose that would
>>>break 2NF.
>>
>>Actually, 1NF is much more commonly broken in RDBs than in Class Models.
>> A classic example is a telephone number, which will almost always be
>>stored in the RDB as a single number but if the elements of the number
>>(e.g., area code) are important to the problem in hand, they will always
>>be broken out as distinct attributes in a Class.
>
>
> What is why you should not save telephone number as a single number.

And when was the last time you saw an RDB that stored a telephone number
as individual fields? If the fields are separated they need to be in
their own table to avoid 3NF problems. That's because the simple
domains are individually dependent solely on the identity of the
telephone number, which /is/ the telephone number. When was the last
time you saw a table where every field was part of a compound key (i.e.,
no non-key attributes)? That is inherently inefficient and most DBAs
will deliberately denormalize to avoid that inefficiency.

>>Objects abstract uniquely identifiable problem space entities. An
>>address in process memory is unique, so that satisfies the mapping.
>
>
> What mapping?

The mapping of object abstractions to identifiable problem space entities.

>>It is actually more versatile that the RDB paradigm.
>
>
> What are the difference between the RDB paradigm and the RM paradigm?

The relational data model is a mathematical model. The RDB paradigm is
a way of applying that model to practical data storage.

>>Consider 6-32 screws
>>in an inventory. They are effectively clones without explicit identity
>>values but they are still uniquely identifiable in the problem space.
>>So long as the object corresponding to each screw has a unique address,
>>it is identifiable in the same sense that the physical screws in the
>>problem space are. The only way you can avoid 2/3NF problems for that
>>situation in an RDB is by adding an artificial explicit identity (e.g.,
>>autonumber) to the tuple itself.
>
>
> And what is the normalization problem with that?

I said specifically that one /avoids/ the normalization problem through
the kludge of providing an explicit tuple identity that does not exist
in the problem space.

>>>>However, I don't see that as being very relevant. My point is that the
>>>>application's problem solution doesn't care how the data is stored.
>>>
>>>Neither do the relational model or SQL.
>>
>>The RDM, yes. But try using SQL on flat sequential files or an OODB.
>
>
> SQL using flag files would be possible and is done all the time. Most
> SQL databases uses flat files for persistence. Actually JDO claim that
> an hybrid SQL language (JDO-SQL) can be used on non-relational
> databases. Obviously the underlying database need to support relational
> calculus or the JDO product has to implement it on top.

I'm talking about SQL and flat sequential files where there is no
embedded tuple identity and the files are read by character or by block.
SQL is meaningless in that context.

>>Note that RDM and RDB are not synonyms. An RDB is a special case of the
>>RDM.
>
>
> In fact, no current production database qualify as a RDB. But if we had
> a database qualifying as a relational database, that would be an
> implementation of the relational model, not a special case.

I don't agree with that assertion, but let's not go there; this is a OO
forum.

>>>>>Lastly, your database is language-neutral. It shouldn't matter what
>>>>>language the application sitting in front of the database is written in,
>>>>>or even what paradigm it's born from. Flexibility starts with a good
>>>>>database design and extends through the application--not the other way
>>>>>around.
>>>>
>>>>That's true enough but I would make it even stronger. RDBs are designed
>>>>to be problem-independent, not just language independent, which is
>>>>pretty much my point.
>>>
>>>The relational model is used for modelling data, problem-independent or
>>>not. Just because some data could be considered "problem-dependent", it
>>>may very well me modelled using the RM.
>>
>>I said RDBs, not the RDM. An RDB is one of many possible
>>implementations of the RDM.
>
>
> Current SQL databases has some limitations that make them not qualify
> as relational databases. But in what way does these limitations force
> them to be problem-independent?

RDBs are designed for optimization in a multi-client environment around
ad hoc queries where one cannot anticipate why the data is being
accessed (i.e., what problem a particular client is solving).

>>>>So
>>>>if one is solving a non-CRUD/USER problem where special optimization is
>>>>usually required, one wants to separate the views of the solution from
>>>>those of the RDB.
>>>
>>>Using low-level collection classes is not a good idea for modern
>>>enterprise applications. There are a lot of issues like concurrency or
>>>transactions, that you have to solve by yourself in that case.
>>
>>By "enterprise applications" do you mean server-side layers?
>
>
> I mean applications for accounting, payroll processing, logistics,
> requirement management, etc. Not necessary on the server-side.

OK; been there and done that. I've never seen one where relationships
instantiated at the object level would not be more efficient than
relationships instantiated at the class level. In effect one completely
eliminates an index search in almost all situations. No matter how you
gussy up the index one is looking at at least O(NlnN) overhead for
class-based indexes. That's because the OO relationships are
instantiated to optimize the specific problem in hand rather than ad hoc
queries.

>>Any concurrency relevant to client-side applications is
>>completely different than the concurrency related to processing parallel
>>transactions in the DBMS.
>
>
> I guess the client-side of an application does not have very much
> concurrency to deal with at all.

Only if the application is employed in batch mode. Client applications
like accounts payable, accounts receivable, inventory control, and GL
are usually available interactively for multiple <programmatic> users
nowadays.

>>The tuple-based relationships of the OO
>>paradigm work quite well in concurrent environments.
>
>
> Do you have any pointers to some website explaining "tuple-based
> relationsships of the OO paradigm".

Any OOA/D book will do. In an OO context an object maps relationally to
a tuple in a class set relation. Relationships are defined at the class
level but they are instantiated at the object level.

1 R1 *
[A] --------------- [B]

If we have

A1 related to B2, B3
A2 related to B1
A3 related to B4, B5, B6

then A1 has a relationship collection of {B1, B3}; A2 has a relationship
collection of {B1}; and A3 has a relationship collection of {B4, B5,
B6}. When the relationship is navigated one only accesses the members
of the [B] set that are specifically related to the A in hand. IOW, the
navigation always accesses /only/ the relevant members of [B] for the A
in hand.

That contrasts with the RDB-style class relationship where one
potentially accesses every member of [B] to locate the members of [B]
related to the A in hand. One can reduce the exhaustive search by
providing a special index on the [B] set that is ordered for the
specific query, but one still has an O(NlnO) search. One could also
provide a "custom" index for every member of the [A] set just like the
OO paradigm does routinely, but that would run out of resources pretty
quickly.

When designing a large OO application one /always/ looks for ways to
eliminate searches, especially class-based searches. Using a
class-based search without very good justification is a good way to get
burned at the stake by OO reviewers.

[Apocryphal anecdote. Once upon a time we were tasked with speeding up
an application that was horrendously slow. Eventually we improved
performance by more that three orders of magnitude. However, the single
biggest improvement was to eliminate searches of a single class index to
find individual objects. When replaced with proper relationship
instantiation there was improvement in overall performance of nearly two
orders of magnitude.]

H. S. Lahman

unread,

Nov 12, 2006, 2:04:47 PM11/12/06

to

Responding to Gagne...

>>> That's one way of doing it. My experience shows that 70-80% of a
>>> system is queries. Whether inquiry transactions or reports, all the
>>> business systems I've participated coding or designing spent little
>>> production time changing data. In the online systems I'm familiar
>>> with, update and change transactions were preceded and followed with
>>> queries. Long after transactions are done posting, management looks
>>> at reports to see how their business is doing. Given statistics like
>>> these it makes little sense to design your application or OO model
>>> before designing your database.
>>
>>
>> Note that I was careful to qualify with "non-CRUD/USER".
>>
>> That sort of proportion is a symptom that the application is USER/CRUD
>> processing. The application is basically a pipeline between the
>> database and the UI and its main purpose in life is to convert between
>> the two views. For that sort of situation the RAD layered model
>> infrastructures already provide lots of automation so an OO approach
>> would probably be overkill by reinventing that wheel.
>
> That seems an over-simplification. Would you characterize an online
> trading system as CRUD? For every single trade how many queries are
> performed before and after? For any single banking transactions how
> many queries before and after? I suppose that when humans are involved
> they tend to look at what things look like before and after
> transactions--just to make sure. Eliminate humans and you may still
> have a 10-1 ratio of logical DB reads to writes.

If 70-80% of the code in the system was related to processing queries, I
would have to say that sure sounds like CRUD/USER. But that seems high
even for CRUD/USER. Typically there is as much code devoted to
presentation of the query data in the UI as there is for formulating DB
the queries. For a distributed system one needs netwroking
infrastructure as well (though that is sometimes provided transparently
by RAD infrastructures). And typically there are at least some relevant
business rules and policies between the DB and UI, if for no other
reason than error checking.

In terms of the classic layered models used for CRUD/USER
infrastructures, the real issue is what goes on the the Business layer.
If the code for the Presentation and Data layers dominate the
application, then one is in the CRUD/USER realm. Then the Business
layer is usually just providing a mapping function from one view to the
other.

However, once the code in the Business layer dominates the application,
one is out of the realm of CRUD/USER. At that point the layered model
breaks down and the Presentation and Data layers just become low level
service subsystems. IOW, the model changes to

[Business Problem]
/ \
/ \
[UI] [Data Access]

A priori I would expect an online trading application to be outside the
realm of CRUD/USER because the rules and policies driving the various
types of transactions are complicated. In fact, I would not be
surprised if there were multiple subsystems to be involved in the
Business "layer". So I have a problem understanding why so much of the
application would be tied up in query processing.

There is also an issue of how important the queries are. In typical IT
applications most, if not all, of the business objects needed to solve a
given problem are initialized from with data from a DB. To do that
initialization one needs to get the data form the DB. But is getting
the data from the DB really important to solving the business problem?

I submit that it is not if the business objects exist primarily to apply
business rules in some complex manner. That is, simply initializing the
business objects is not what the application is about. In an extreme
case, suppose one has a single factory object to instantiate objects of
each class and each factory has a unique query it uses to acquire the
initialization data from the DB. Stepping back a bit, all those factory
objects are just infrastructure for the initialization of the objects
that actually solve the problem. From that perspective the query
processing is quite peripheral to the problem solution even though there
is quite a lot of it.

Now take this one step further. If the business objects don't "see" the
queries because of decoupling through the factory objects, why not go
the next step and decouple the factory objects from the DB queries? The
data the factory needs to initialize an object of class [X] doesn't
depend on the storage mechanisms; the factory just needs the values. If
the factory just asks for the data through a generic subsystem
interface, then one can probably implement query construction much more
generically.

That is, in the DB Access subsystem one has a very different problem in
hand. Now one must map the interface to the persistence paradigm and
its Data Model. Often one can come up with much more efficient
representations that simplify things.

For example, in the DB Access subsystem one might have a single Query
object that knew how to format SQL queries given a set of identifiers
and data from the interface and some sort of external configuration data
(perhaps in the DB itself) for mapping data. Basically that single
object would just plug in values while concatenating a string. Now the
problem solution AND the DB access are both simpler. [This is one of
the benefits of isolating the DB Access that we discussed in the other
subthread.]

IOW, the volume of code tied up in query processing drops dramatically
because of abstraction in the DB Access subsystem even though the number
and variety of actual DB queries is large. Meanwhile the data access of
the problem solution's factories is reduced to a set of simple interface
calls with not special knowledge needed.

>>> Additionally, relational data models can be more easily proven
>>> correct--or correct enough--before an investment is made in coding.
>>
>>
>> I'm not sure I buy that. More easily than what? The RDM
>> normalization can be applied beyond the RDB's table/tuple paradigm.
>> ISAM files, CODASYL files, and other data representations can be
>> normalized using the same basic rules. And OO Class Models are
>> routinely normalized as part of the basic paradigm methodology.
>>
>> However, I don't see that as being very relevant.
>
> I guess the point was relational DBs have at least relational calculus
> behind them, and we can always ask a design a) is all the necessary data
> saved and b) are all the relationships represented?
>
> Granted this is anecdotal, but the experienced DBs I've known would all
> end up at nearly identical DB designs given the same requirements. I've
> not seen that consistency with OO designers.

That's true, but the OO designer is dealing with a much bigger problem
because of the need for behaviors. The Data Model only needs to worry
about data while the Class Model is just one <static> piece of a much
larger, largely dynamic structure.

>>> Lastly, your database is language-neutral. It shouldn't matter what
>>> language the application sitting in front of the database is written
>>> in, or even what paradigm it's born from. Flexibility starts with a
>>> good database design and extends through the application--not the
>>> other way around.
>>
>>
>> That's true enough but I would make it even stronger. RDBs are
>> designed to be problem-independent, not just language independent,
>> which is pretty much my point.
>
> I missed that, but now I'm interested in what you mean by
> "problem-independent." If you're referring to the RDB before a DB
> design is given it, then I think I understand what you're talking about.

The RDB paradigm optimizes for ad hoc queries. That is, the RDB needs
to provide access to the same data for a lot of different purposes
(i.e., different problems being solved by different client
applications). To date the RDB is the most efficient means of providing
that sort of very generic access.

fre...@gmail.com

unread,

Nov 12, 2006, 2:16:02 PM11/12/06

to

> > But no existing production database qualify as a relational database.
> > That is my some people prefer using the name "SQL database" instead of
> > RDB.
>
> Which is pretty much my point. RDM and RDB are not synonyms and an RDB
> is a specific implementation of the RDM (however imperfect some people
> may view it).

So if it existed a relational database the fullfills Codd's 12 rules
and a standardized language for true relational calculus, it would be
OK to use it without hiding it from the rest of the application?

> >>Actually, 1NF is much more commonly broken in RDBs than in Class Models.
> >> A classic example is a telephone number, which will almost always be
> >>stored in the RDB as a single number but if the elements of the number
> >>(e.g., area code) are important to the problem in hand, they will always
> >>be broken out as distinct attributes in a Class.
> >
> > What is why you should not save telephone number as a single number.
>
> And when was the last time you saw an RDB that stored a telephone number
> as individual fields? If the fields are separated they need to be in
> their own table to avoid 3NF problems. That's because the simple
> domains are individually dependent solely on the identity of the
> telephone number, which /is/ the telephone number.

I have never seen any application that extracts parts of the telephone
number, why you can use the full number without violating 1 NF. But if
parts of the number was extracted, they should be in different columns
instead.

> >>It is actually more versatile that the RDB paradigm.
> >
> > What are the difference between the RDB paradigm and the RM paradigm?
>
> The relational data model is a mathematical model. The RDB paradigm is
> a way of applying that model to practical data storage.

There are different vendors trying to implement RM without success. But
there are no RDB paradigm definition. If you have pointers to that
definition, you are welcome.

> >>Consider 6-32 screws
> >>in an inventory. They are effectively clones without explicit identity
> >>values but they are still uniquely identifiable in the problem space.
> >>So long as the object corresponding to each screw has a unique address,
> >>it is identifiable in the same sense that the physical screws in the
> >>problem space are. The only way you can avoid 2/3NF problems for that
> >>situation in an RDB is by adding an artificial explicit identity (e.g.,
> >>autonumber) to the tuple itself.
> >
> > And what is the normalization problem with that?
>
> I said specifically that one /avoids/ the normalization problem through
> the kludge of providing an explicit tuple identity that does not exist
> in the problem space.

Didn't you say that class models was normalized on a routinely manner?
Now you avoid normalization? If I don't remember wrong, there is
nothing in the RM or a SQL database that forces you to define primary
keys either.

> >>I said RDBs, not the RDM. An RDB is one of many possible
> >>implementations of the RDM.
> >
> > Current SQL databases has some limitations that make them not qualify
> > as relational databases. But in what way does these limitations force
> > them to be problem-independent?
>
> RDBs are designed for optimization in a multi-client environment around
> ad hoc queries where one cannot anticipate why the data is being
> accessed (i.e., what problem a particular client is solving).

HSQLDB and Apache Derby are examples of SQL databases not optimized for
a multi-client environment. But in most cases you obviously want a
database optimized for a multi-client environment. When you say "ad hoc
queries", are you talking about relational calculus? Relational
calculus is useful regardless one can anticipate why the data is being
accessed, or not. It is a powerful way of describing the data you want.

> > I mean applications for accounting, payroll processing, logistics,
> > requirement management, etc. Not necessary on the server-side.
>
> OK; been there and done that. I've never seen one where relationships
> instantiated at the object level would not be more efficient than
> relationships instantiated at the class level. In effect one completely
> eliminates an index search in almost all situations. No matter how you
> gussy up the index one is looking at at least O(NlnN) overhead for
> class-based indexes. That's because the OO relationships are
> instantiated to optimize the specific problem in hand rather than ad hoc
> queries.

That is no question about the fact that network databases using
pointers provided better performance compared to RDBs in some
scenarios. But one of the reason why pointer based databases was
abandoned, is the fact that number of different ways you might access
the same data is high in enterprise applications. Using the
pointer-based approach, you need to define and maintain new pointer
sets for every way you might want to access data. This approach is also
very inflexible for changes.

> Any OOA/D book will do. In an OO context an object maps relationally to
> a tuple in a class set relation. Relationships are defined at the class
> level but they are instantiated at the object level.
>
> 1 R1 *
> [A] --------------- [B]
>
> If we have
>
> A1 related to B2, B3
> A2 related to B1
> A3 related to B4, B5, B6
>
> then A1 has a relationship collection of {B1, B3}; A2 has a relationship
> collection of {B1}; and A3 has a relationship collection of {B4, B5,
> B6}. When the relationship is navigated one only accesses the members
> of the [B] set that are specifically related to the A in hand. IOW, the
> navigation always accesses /only/ the relevant members of [B] for the A
> in hand.
>
> That contrasts with the RDB-style class relationship where one
> potentially accesses every member of [B] to locate the members of [B]
> related to the A in hand. One can reduce the exhaustive search by
> providing a special index on the [B] set that is ordered for the
> specific query, but one still has an O(NlnO) search. One could also
> provide a "custom" index for every member of the [A] set just like the
> OO paradigm does routinely, but that would run out of resources pretty
> quickly.

But still nobody wants to go back to network databases. In an average
enterprise application, it is simply too many ways for seaching data,
for being able to maintain pointer sets for every criteria. When you
are searching using multiple criterias (using predicate logic), your
pointer-based approach provides little help.

> When designing a large OO application one /always/ looks for ways to
> eliminate searches, especially class-based searches. Using a
> class-based search without very good justification is a good way to get
> burned at the stake by OO reviewers.

I have seen class-based searches in OO applications. Traversing all
objects to find matching objects. It worked as long as all data could
be kept in memory. But when 16GB wasn't enough, we had to rewrite the
application to use select statements instead.

O(ln N) is still much better than O(N).

Fredrik Bertilsson

H. S. Lahman

unread,

Nov 12, 2006, 2:35:58 PM11/12/06

to

Responding to AndyW...

>>I'm not sure I buy that. More easily than what? The RDM normalization
>>can be applied beyond the RDB's table/tuple paradigm. ISAM files,
>>CODASYL files, and other data representations can be normalized using
>>the same basic rules. And OO Class Models are routinely normalized as
>>part of the basic paradigm methodology.
>
>
> I do not believe there is any normalisation in the OO paradigm, nor is
> there as far as I can see any need for it since technically there is
> no separation of behavour from data.

Actually there is. Every Class Model should be normalized. Most OOA/D
authors only mention this in passing because it is complicated to talk
about 3NF when objects do not usually have explicit identity. Talking
about 3NF in the OO context is further complicated by the use of ADTs
and presence of behaviors. Nonetheless all OOA/D authors provide a
suite of rules for identifying classes and responsibilities that
essentially distill 3NF. The objects /do/ have identity, albeit often
implicit, and their properties depend on that identity in the same
manner that Data Model properties depend on explicit tuple identity.

In a Data Model there is only one sort of property: data. In a Class
Model there happen to be two sorts of properties: knowledge
responsibilities and behavior responsibilities. But they are still
properties and subject to normalization because they are related to
object identity. Thus one cannot have exactly the same behavior in
objects of two different classes that are unrelated by generalization in
a well-formed OO application.

IOW, the same set theory underlies both the Class Model and the Data Model.

[BTW, there is a lot of distinction between knowledge and behavior
properties in OOA/D. Among other things, knowledge access is assumed to
be synchronous while behavior access is assumed to be asynchronous. One
reason is that the asynchronous model is more general than the 3GL
synchronous model (procedural message passing). Another is that
separation of knowledge and behavior access is crucial to ensuring data
integrity in an OO context.]

> OO however isnt notable for its condusiveness towards high performance
> (it rather focuses on natural performance). Hence people will often
> re-organise their data to eek some more performance out of it. How
> they mess up or organise their data, isnt as far as I am aware part of
> the OO paradigm.

OOPLs tend to carry inherent performance penalties because of their high
level of abstraction. So full code generators for OOA model typically
target straight C for the implementation for exactly this reason.
However, once one does that there is no intrinsic performance penalty.

However, the way one optimizes for nonfunctional requirements is usually
different for the problem solution than for accessing stored data.
That's because the paradigm for storing the data requires unique
optimization. So when accessing, say, an RDB one optimizes joins and
provides caching is a quite different way than one does it for, say,
accessing flat ISAM files.

In addition, 3GLs allow one to tailor solutions for specific problems.
So one will almost always have better performance in a 3GL program than
one would have if one used, say, RDB-style relationships. That's
because the RDB paradigm is optimized for generic, ad hoc access of the
data by many different clients solving different problems. That is an
important reason why one needs to isolate the persistence paradigm in a
subsystem for no-CRUD/USER applications. That separation of concerns
allows one to optimize both problems efficiently.

NENASHI, Tegiri

unread,

Nov 12, 2006, 4:28:56 PM11/12/06

to

"H. S. Lahman" <h.la...@verizon.net> wrote in
news:1aJ5h.2229$bj1.1326@trndny05:

>
> And when was the last time you saw an RDB that stored a telephone
> number as individual fields?

Pardon me but the example of the phone number is silly.

The modern relational theory permits to store values of user-defined type:
it does not limit itself by primitive data types. Several modern databases
instrument the user-defined type. One can utilize the user-defined type
and extract the area code in the fashion that is theoretically pure. If
the database does not offer the user-defined type, one can emulate the the
type with the string datatype of the proper format: that is what one does
in pratique. What is there to discuss ?

> If the fields are separated they need to
> be in their own table to avoid 3NF problems. That's because the
> simple domains are individually dependent solely on the identity of
> the telephone number, which /is/ the telephone number.

You must make yourself familiar with the modern relational theory before
you try to make the ridiculous declarations. It makes a decade that the
user-defined datatype is utilized in the modern database.

One of the examples of C.J. Date:

TYPE POINT { X NUMERIC, Y NUMERIC } ;

One can create the phone datatype: type phone {area integer, local_office,
extension integer} or analogue that suits the requirements.

...

>
> The relational data model is a mathematical model. The RDB paradigm
> is a way of applying that model to practical data storage.

There is the relational data model that is founded on the rigorous
mathematiques; there is no "RDB paradigm". One must read the works of C.J.
Date to comprehend the relational model. The real SQL database, one
regrets to say, is an imperfect "way of applying that model to practical
data storage" but that is not a "paradigm": it is called an imperfect
realization of the relational model.

>
>>>Consider 6-32 screws
>>>in an inventory. They are effectively clones without explicit
>>>identity values but they are still uniquely identifiable in the
>>>problem space. So long as the object corresponding to each screw has
>>>a unique address, it is identifiable in the same sense that the
>>>physical screws in the problem space are. The only way you can avoid
>>>2/3NF problems for that situation in an RDB is by adding an
>>>artificial explicit identity (e.g., autonumber) to the tuple itself.
>>
>>
>> And what is the normalization problem with that?
>
> I said specifically that one /avoids/ the normalization problem
> through the kludge of providing an explicit tuple identity that does
> not exist in the problem space.

Why is it that one needs to have the identity of the screw ? When one
needs to have the identity like of a car, the identity is the candidate
key that can be the VIN or one other attribute or set of attributes.

>
>>>>>However, I don't see that as being very relevant. My point is that
>>>>>the application's problem solution doesn't care how the data is
>>>>>stored.
>>>>
>>>>Neither do the relational model or SQL.
>>>
>>>The RDM, yes. But try using SQL on flat sequential files or an OODB.
>>
>>
>> SQL using flag files would be possible and is done all the time. Most
>> SQL databases uses flat files for persistence. Actually JDO claim
>> that an hybrid SQL language (JDO-SQL) can be used on non-relational
>> databases. Obviously the underlying database need to support
>> relational calculus or the JDO product has to implement it on top.
>
> I'm talking about SQL and flat sequential files where there is no
> embedded tuple identity and the files are read by character or by
> block.
> SQL is meaningless in that context.

I can not comprehend this: why is it that one needs to read the flat file
with the SQL ? It is without reason. The flat file and the relational
model is like the apple and the orange.

....

Your comprehension of the SQL database performance is not good. There are
lot of fashions to join relational tables in addition to the row by row
nested loops: hash join, merge join, semi-join, et cetera.. You can have
benefit when you make yourself familiar with for example Oracle or SQL
server. The modern SQL database operates with sets of rows very
effectively using multiblock disk reads when your network datamodel
traverses the graph node by node or block by block: that is not efficient.

In sum, your comprehension of the relational model and the SQL database is
not adequate to critique the one or the other.

--\
Tegi

H. S. Lahman

unread,

Nov 13, 2006, 9:32:57 AM11/13/06

to

Responding to Frebe73...

>>>Your definition of "store" is pretty wide, even variable assignments
>>>are obviously included in your definition of store. I can't see how the
>>>problem slolution could be independent from how the data is stored. But
>>>the problem solution can be independent from how data is persisted.
>>
>>That is pretty much the point. All the solution cares about is storing
>>a pile of data in terms of identity that it will use later to access
>>that same pile of data. How the data is actually stored is not relevant
>>to the solution so one can use a very broad definition of data storage.
>
>
> The solution cares about data and how it is structured. The relational
> model is one way of structuring data. The OO model (or the network
> model) is another way. The mechanism for retrieving data, relational
> calculus or OO pointer traversal, is very important for the solution.
> According to your argumentation, a map is the only data structure
> needed.

All very true. But the solution (outside CRUD/USER) usually needs the
data structured in a different way for optimization than the storage
paradigm.

>>>If a RDBMS was mainly used for persistence, your statement would be
>>>true. But switching from a RDBMS to a OODBMS does not only include
>>>switching persistence mechanism. You also switch data model, drops
>>>relational calculus, drops predicate logic for data retrieval, and many
>>>other things.
>>
>>Exactly my point -- the Data Model changes, not the Class Model for the
>><OO> problem solution. It is up to the application's persistence access
>>subsystem to map the solution Class Model to the persistence Data Model
>>de jour. IOW, the subsystem interface decoupling is provided so that
>>only the persistence access subsystem needs to understand the Data Model
>>used to actually store the data.
>
>
> The Class Model for the problem solution does not change? I am
> currently rewriting the class model in an application (because someone
> decided to load the entire objects graph at startup from the database
> which caused terrible performance after a while).

Huh? Usually one looks for ways to read/write as much as possible in a
transaction because the DB is the performance bottleneck for
non-CRUD/USER applications. That's why one has read and write caches on
the client side.

Note that memory-mapped OODBs are /designed/ to make that optimization
the default.

> Classes should be used for defining data types, not for creating data
> structures. The latter is already done by relations. If you don't have
> competing models for data structure, mapping is not necessary.

In the OO paradigm classes describe problem space entities with both
knowledge and behavior. That alone is quite different than a Data MOdel
that only describes data. Because of the need to resolve dynamics one
needs to tailor the classes to the specific problem in hand.

So recasting the Class Model to make it look like a storage Data Model
is a serious mistake in an OO context.

>>>>>>Thus the application solution always requests, "Save this pile of data I
>>>>>>call X" and "Give me the pile of data I saved before as X." The
>>>>>>persistence access subsystem maps the X identity and the pile of data
>>>>>>into records in ISAM files, RDB tables, clay tablets, or whatever.
>>>>>
>>>>>
>>>>>Is X always an identifier? Should you be allowed to use any predicate
>>>>>logic in this interface ("give me the pile of data I saved before
>>>>>having X=5 or Y=6")?
>>>>
>>>>X=5 is just another way of defining identity.
>>>
>>>
>>>Does this answer imply that the "persistence" subsystem can only
>>>deliver data belonging to a given identity? Predicate logic can not be
>>>a part of this interface? If you have a SQL database, you can only use
>>>select statements using primary key columns in the where condition? Why
>>>do you need a SQL database at all in that case? Something like Berkeley
>>>DB would do the job much better.
>>
>>The decoupling is provided by a pure message-based interface -- {message
>>ID, <data packet>}. The problem solution uses the message ID to
>>determine how to map the data packet into its object attributes. The
>>persistence access subsystem uses the message ID to determine how to map
>>the data packet into ISAM tables, RDB records, SQL queries, or whatever.
>> Each has its own unique view of the data in the data packet based upon
>>the identity in message.
>
>
> Ok, my original assumption was obviously true. All you need is a
> persistent map, with two operators, put and get. Using a SQL database
> in this context is just stupid.

You finally seem to Get It. The interface to the persistence access
subsystem that the problem solution needs is basically a simple put/get.
But why does that make actually storing the data in an RDB and using
SQL to access it stupid?

H. S. Lahman

unread,

Nov 13, 2006, 10:35:30 AM11/13/06

to

Responding to Frebe73...

>>>But no existing production database qualify as a relational database.
>>>That is my some people prefer using the name "SQL database" instead of
>>>RDB.
>>
>>Which is pretty much my point. RDM and RDB are not synonyms and an RDB
>>is a specific implementation of the RDM (however imperfect some people
>>may view it).
>
>
> So if it existed a relational database the fullfills Codd's 12 rules
> and a standardized language for true relational calculus, it would be
> OK to use it without hiding it from the rest of the application?

No, it depends upon what the implementation of it was. As I've pointed
out several times, Codd's rules underlie both Class Models and Data
Models but those models are constructed quite differently. It is the
implementation of those rules in particular database that creates the
conflict with the problem solution view.

>>>>Actually, 1NF is much more commonly broken in RDBs than in Class Models.
>>>> A classic example is a telephone number, which will almost always be
>>>>stored in the RDB as a single number but if the elements of the number
>>>>(e.g., area code) are important to the problem in hand, they will always
>>>>be broken out as distinct attributes in a Class.
>>>
>>>What is why you should not save telephone number as a single number.
>>
>>And when was the last time you saw an RDB that stored a telephone number
>>as individual fields? If the fields are separated they need to be in
>>their own table to avoid 3NF problems. That's because the simple
>>domains are individually dependent solely on the identity of the
>>telephone number, which /is/ the telephone number.
>
>
> I have never seen any application that extracts parts of the telephone
> number, why you can use the full number without violating 1 NF. But if
> parts of the number was extracted, they should be in different columns
> instead.

And that breaks 1NF because a telephone number is not a simple domain
_in the problem space_. The DBA is deliberately denormalizing by
choosing a level of abstraction that is higher than the problem space
where the data exists.

Contrary to your original assertion, that sort of 1NF denormalization is
much more common in a database than in an OO application. That's
because the OO application abstracts to the problem in hand so if the
telephone number's elements are needed to solve the problem they will
/always/ we abstracted as separate simple domains.

>>>>It is actually more versatile that the RDB paradigm.
>>>
>>>What are the difference between the RDB paradigm and the RM paradigm?
>>
>>The relational data model is a mathematical model. The RDB paradigm is
>>a way of applying that model to practical data storage.
>
>
> There are different vendors trying to implement RM without success. But
> there are no RDB paradigm definition. If you have pointers to that
> definition, you are welcome.

Spare me the Topmind forensic ploy of being deliberately obtuse. You
don't need a documentation pointer to know that an RDB stores data in
tuples within tables subject to Normal Form rules, defines relationships
between tables, implements those relationships at the table level.

>>>>Consider 6-32 screws
>>>>in an inventory. They are effectively clones without explicit identity
>>>>values but they are still uniquely identifiable in the problem space.
>>>>So long as the object corresponding to each screw has a unique address,
>>>>it is identifiable in the same sense that the physical screws in the
>>>>problem space are. The only way you can avoid 2/3NF problems for that
>>>>situation in an RDB is by adding an artificial explicit identity (e.g.,
>>>>autonumber) to the tuple itself.
>>>
>>>And what is the normalization problem with that?
>>
>>I said specifically that one /avoids/ the normalization problem through
>>the kludge of providing an explicit tuple identity that does not exist
>>in the problem space.
>
>
> Didn't you say that class models was normalized on a routinely manner?
> Now you avoid normalization? If I don't remember wrong, there is
> nothing in the RM or a SQL database that forces you to define primary
> keys either.

You have brought out three Topmind-style forensic ploys at once here.

First, you deflect the discussion from clones and identity to the
details of auotnumber keys.

Second, you put words in my mouth to the effect that using an autonumber
key was a normalization problem.

Third, you respond to my clarification that it was not with a completely
non sequitur assertion that I am avoiding normalization in Class Models.

I don't play those games with Bryce and I won't play them with you. Ta-ta.

fre...@gmail.com

unread,

Nov 13, 2006, 12:46:21 PM11/13/06

to

> > The Class Model for the problem solution does not change? I am
> > currently rewriting the class model in an application (because someone
> > decided to load the entire objects graph at startup from the database
> > which caused terrible performance after a while).
>
> Huh? Usually one looks for ways to read/write as much as possible in a
> transaction because the DB is the performance bottleneck for
> non-CRUD/USER applications.

Read/write as much as possible in a transaction? Does this mean that
you think it is a good idea to load the entire objects graph at
application startup? The main reason why the DB might be a performance
bottleneck is because the network connection between the application
and the database server, is slow. This can be avoided by either using
stored procedure or an application server. Caching on the client is
always a bad idea.

> That's why one has read and write caches on
> the client side.

Caching is one of the features a DBMS does much better than you can
manage to do by yourself in your application. Cache products normally
only allow you to find data by id. Caches needs to be replicated in
case of clustering. Caches outside the DBMS can not fully be
synchronized with database transactions.

> Note that memory-mapped OODBs are /designed/ to make that optimization
> the default.

The same apply to SQL databases.

> > Ok, my original assumption was obviously true. All you need is a
> > persistent map, with two operators, put and get. Using a SQL database
> > in this context is just stupid.
>
> You finally seem to Get It. The interface to the persistence access
> subsystem that the problem solution needs is basically a simple put/get.
> But why does that make actually storing the data in an RDB and using
> SQL to access it stupid?

You don't need relational calculus or predicate logic to find something
by an id. A much simpler database would be enought.

Fredrik Bertilsson

Thomas Gagne

unread,

Nov 13, 2006, 1:08:34 PM11/13/06

to

fre...@gmail.com wrote:
> SQL is about data management, relational calculus and predicate logic.
> Persistence is something else.
>
> What is your definition of "persistence"?
>

Are you asking me? I think I may poorly <snip>ped the message I was
responding to. I think I intended to respond to Lahman responding to
you. I was trying to add to the question of how do you create an
application that is insulated from the advances of DBMs? If RDB aren't
the final word what should we be doing today as coders to make the next
leap less painful? I suggest leaving SQL out of the application as much
as possible.

If you still want my definition of persistence let me know. I'll work
on something worth quoting. :-)

H. S. Lahman

unread,

Nov 13, 2006, 1:10:40 PM11/13/06

to

Responding to Tegiri...

>>And when was the last time you saw an RDB that stored a telephone
>>number as individual fields?
>
>
> Pardon me but the example of the phone number is silly.
>
> The modern relational theory permits to store values of user-defined type:
> it does not limit itself by primitive data types. Several modern databases
> instrument the user-defined type. One can utilize the user-defined type
> and extract the area code in the fashion that is theoretically pure. If
> the database does not offer the user-defined type, one can emulate the the
> type with the string datatype of the proper format: that is what one does
> in pratique. What is there to discuss ?
>
>>If the fields are separated they need to
>>be in their own table to avoid 3NF problems. That's because the
>>simple domains are individually dependent solely on the identity of
>>the telephone number, which /is/ the telephone number.
>
>
> You must make yourself familiar with the modern relational theory before
> you try to make the ridiculous declarations. It makes a decade that the
> user-defined datatype is utilized in the modern database.
>
> One of the examples of C.J. Date:
>
> TYPE POINT { X NUMERIC, Y NUMERIC } ;
>
> One can create the phone datatype: type phone {area integer, local_office,
> extension integer} or analogue that suits the requirements.

The RDB schema is supposed to reflect the nature of the data _in the
problem space_. A telephone number is not a simple domain in the
problem space. The fact that SQL and RAD tools provide mechanisms for
extracting the elements of the phone number don't change that.

Defining an ADT is fine. But if one does that, then the RDB must
support ADTs and that ADT needs to be defined in the database schema.
In most databases the DBA doesn't do that because SQL already provides a
mechanism for extracting the elements (so long as the DBA stores the
telephone number as a text string). IOW, the ADT is usually defined by
the client at the SQL access level, not in the RDB schema.

Note that the point in this subthread context is simply that 1NF is
often violated in RDB schemas.

>>The relational data model is a mathematical model. The RDB paradigm
>>is a way of applying that model to practical data storage.
>
>
> There is the relational data model that is founded on the rigorous
> mathematiques; there is no "RDB paradigm". One must read the works of C.J.
> Date to comprehend the relational model. The real SQL database, one
> regrets to say, is an imperfect "way of applying that model to practical
> data storage" but that is not a "paradigm": it is called an imperfect
> realization of the relational model.

I gave up on reading Date after the Manifesto. I don't want to go down
that rabbit hole.

The implementation of the RDM in RDBs in terms of keyed tuples, tables,
and table relationships is quite well-defined. Organizing and storing
data in such implementations is a paradigm, Date notwithstanding.

>>>>Consider 6-32 screws
>>>>in an inventory. They are effectively clones without explicit
>>>>identity values but they are still uniquely identifiable in the
>>>>problem space. So long as the object corresponding to each screw has
>>>>a unique address, it is identifiable in the same sense that the
>>>>physical screws in the problem space are. The only way you can avoid
>>>>2/3NF problems for that situation in an RDB is by adding an
>>>>artificial explicit identity (e.g., autonumber) to the tuple itself.
>>>
>>>
>>>And what is the normalization problem with that?
>>
>>I said specifically that one /avoids/ the normalization problem
>>through the kludge of providing an explicit tuple identity that does
>>not exist in the problem space.
>
>
> Why is it that one needs to have the identity of the screw ? When one
> needs to have the identity like of a car, the identity is the candidate
> key that can be the VIN or one other attribute or set of attributes.

Because each screw in the inventory bin is uniquely identifiable in the
problem space even if it does not have an explicit and unique name.

>>>>>>However, I don't see that as being very relevant. My point is that
>>>>>>the application's problem solution doesn't care how the data is
>>>>>>stored.
>>>>>
>>>>>Neither do the relational model or SQL.
>>>>
>>>>The RDM, yes. But try using SQL on flat sequential files or an OODB.
>>>
>>>
>>>SQL using flag files would be possible and is done all the time. Most
>>>SQL databases uses flat files for persistence. Actually JDO claim
>>>that an hybrid SQL language (JDO-SQL) can be used on non-relational
>>>databases. Obviously the underlying database need to support
>>>relational calculus or the JDO product has to implement it on top.
>>
>>I'm talking about SQL and flat sequential files where there is no
>>embedded tuple identity and the files are read by character or by
>>block.
>> SQL is meaningless in that context.
>
>
> I can not comprehend this: why is it that one needs to read the flat file
> with the SQL ? It is without reason. The flat file and the relational
> model is like the apple and the orange.

Please explain that to Bertilsson. That is exactly my point. Forcing
the problem solution to employ SQL constructs binds the solution to a
particular storage implementation.

I am only comparing RDB mechanisms, particularly commercial RDB
mechanisms, to OO processing. And then I am focusing only on the way
relationships are instantiated. As I pointed out, there are mechanisms
for emulating OO tuple-based relationship instantiation in RDBs. (The
join examples you cite are what I was referring to in the last quoted
sentence.) However, techniques like "canned" joins defeat the purpose
of supporting ad hoc queries because they incorporate problem-specific
solutions in the database structure. As a result, doing so too often
just bloats the database, can lead to performance problems, and can
become a maintenance nightmare on the application side.

Thomas Gagne

unread,

Nov 13, 2006, 1:28:30 PM11/13/06

to

H. S. Lahman wrote:
> Responding to Gagne...
>

> <snip>

>
> In terms of the classic layered models used for CRUD/USER
> infrastructures, the real issue is what goes on the the Business
> layer. If the code for the Presentation and Data layers dominate the
> application, then one is in the CRUD/USER realm. Then the Business
> layer is usually just providing a mapping function from one view to
> the other.
>
> However, once the code in the Business layer dominates the
> application, one is out of the realm of CRUD/USER.

It's not that 70-80% of the code is for queries, it's that 70-80% of its
utilization is queries. Even if you ignore queries and look at the
update transactions themselves through your favorite statistics
gatherer--the ratio of logical reads to writes can exceed 10-1.

To be clear, I'm actually not talking about the code implementing a
presentation layer. I'm thinking only of what's implemented in the
business layer. Unless the presentation layer is allowed direct access
to the database, it must go through another layer--typically the
business layer. Regardless the presentation (ATM, POS, Teller,
Internet, batch program) transactions are implemented in the business
layer (or at least they are here--YMMV).

So it's less how the presentation layer is coded as it is how the
business layer is utilized.

> <snip>

>
>>>> Additionally, relational data models can be more easily proven
>>>> correct--or correct enough--before an investment is made in coding.
>>>
>>>
>>> I'm not sure I buy that. More easily than what? The RDM
>>> normalization can be applied beyond the RDB's table/tuple paradigm.
>>> ISAM files, CODASYL files, and other data representations can be
>>> normalized using the same basic rules. And OO Class Models are
>>> routinely normalized as part of the basic paradigm methodology.
>>>
>>> However, I don't see that as being very relevant.
>>
>> I guess the point was relational DBs have at least relational
>> calculus behind them, and we can always ask a design a) is all the
>> necessary data saved and b) are all the relationships represented?
>>
>> Granted this is anecdotal, but the experienced DBs I've known would
>> all end up at nearly identical DB designs given the same
>> requirements. I've not seen that consistency with OO designers.
>
> That's true, but the OO designer is dealing with a much bigger problem
> because of the need for behaviors. The Data Model only needs to worry
> about data while the Class Model is just one <static> piece of a much
> larger, largely dynamic structure.

Agreed, so why not exploit the data model's ability to be proven correct
and work up from there?

fre...@gmail.com

unread,

Nov 13, 2006, 1:28:01 PM11/13/06

to

> > So if it existed a relational database the fullfills Codd's 12 rules
> > and a standardized language for true relational calculus, it would be
> > OK to use it without hiding it from the rest of the application?
>
> No, it depends upon what the implementation of it was. As I've pointed
> out several times, Codd's rules underlie both Class Models and Data
> Models but those models are constructed quite differently. It is the
> implementation of those rules in particular database that creates the
> conflict with the problem solution view.

Rule 1 : The information Rule.

"All information in a relational data base is represented explicitly at
the logical level and in exactly one way - by values in tables."

This rule underlies class models?

> >>>>It is actually more versatile that the RDB paradigm.
> >>>
> >>>What are the difference between the RDB paradigm and the RM paradigm?
> >>
> >>The relational data model is a mathematical model. The RDB paradigm is
> >>a way of applying that model to practical data storage.
> >
> > There are different vendors trying to implement RM without success. But
> > there are no RDB paradigm definition. If you have pointers to that
> > definition, you are welcome.
>
> Spare me the Topmind forensic ploy of being deliberately obtuse. You
> don't need a documentation pointer to know that an RDB stores data in
> tuples within tables subject to Normal Form rules, defines relationships
> between tables, implements those relationships at the table level.

The concept you describe here is parts of the relational model. There
are other (minor) issues that make them breaking some of Codd's 12
rules. There are no RDB paradigm that are different from the RM
paradigm. You just invented it in a deparate attempt to hijack the
relational model. The funny thing is that you pretend that your pointer
mess follow the relational model, but in the same time you strongly
argueing against fundamental parts of the relational model.

> >>>>Consider 6-32 screws
> >>>>in an inventory. They are effectively clones without explicit identity
> >>>>values but they are still uniquely identifiable in the problem space.
> >>>>So long as the object corresponding to each screw has a unique address,
> >>>>it is identifiable in the same sense that the physical screws in the
> >>>>problem space are. The only way you can avoid 2/3NF problems for that
> >>>>situation in an RDB is by adding an artificial explicit identity (e.g.,
> >>>>autonumber) to the tuple itself.
> >>>
> >>>And what is the normalization problem with that?
> >>
> >>I said specifically that one /avoids/ the normalization problem through
> >>the kludge of providing an explicit tuple identity that does not exist
> >>in the problem space.
> >
> > Didn't you say that class models was normalized on a routinely manner?
> > Now you avoid normalization? If I don't remember wrong, there is
> > nothing in the RM or a SQL database that forces you to define primary
> > keys either.
>
> You have brought out three Topmind-style forensic ploys at once here.
>
> First, you deflect the discussion from clones and identity to the
> details of auotnumber keys.
>
> Second, you put words in my mouth to the effect that using an autonumber
> key was a normalization problem.
>
> Third, you respond to my clarification that it was not with a completely
> non sequitur assertion that I am avoiding normalization in Class Models.

You said: "one /avoids/ the normalization problem".

You have not been able to in any way show how 1NF, 2NF or 3NF can be
applied to class models. The only thing you can do is to give examples
of relational data models that don't apply to the different normal
forms.

Fredrik Bertilsson

H. S. Lahman

unread,

Nov 14, 2006, 5:40:31 PM11/14/06

to

Responding to Gagne...

>> In terms of the classic layered models used for CRUD/USER
>> infrastructures, the real issue is what goes on the the Business
>> layer. If the code for the Presentation and Data layers dominate the
>> application, then one is in the CRUD/USER realm. Then the Business
>> layer is usually just providing a mapping function from one view to
>> the other.
>>
>> However, once the code in the Business layer dominates the
>> application, one is out of the realm of CRUD/USER.
>
> It's not that 70-80% of the code is for queries, it's that 70-80% of its
> utilization is queries. Even if you ignore queries and look at the
> update transactions themselves through your favorite statistics
> gatherer--the ratio of logical reads to writes can exceed 10-1.
>
> To be clear, I'm actually not talking about the code implementing a
> presentation layer. I'm thinking only of what's implemented in the
> business layer. Unless the presentation layer is allowed direct access
> to the database, it must go through another layer--typically the
> business layer. Regardless the presentation (ATM, POS, Teller,
> Internet, batch program) transactions are implemented in the business
> layer (or at least they are here--YMMV).

Actually, I'm talking about the Business layer as well. If a large
fraction of that code in that layer is just constructing queries,
managing DB transactions, and decoding datasets, then I would say that
is definitely CRUD/USER.

OTOH, if the central objects of the Business layer have a bunch of
responsibilities that capture business rules and policies and those
objects need to collaborate in complex ways to solve some problem, then
one is moving out of the CRUD/USER realm. At that point the DB
activities become peripheral (i.e., just initialize the objects that do
the real solution work).

I'm just sort of surprised that fraction would be so high for an online
trading system. I would expect the <trade> transaction rules to be
fairly complicated (split trades, posting tax info, audit trails, etc.).
OTOH, if one had a front end that was essentially an order entry
system that effectively batched orders for some other software to
process, there wouldn't be much beside input checking and data integrity
validation.

Also, as I indicated, once one isolates the persistence mechanisms, it
is fairly common to find that one can greatly simplify the query
processing because one can abstract the invariants of the persistence
mechanisms themselves without being distracted by problem-specific issues.

>>>>> Additionally, relational data models can be more easily proven
>>>>> correct--or correct enough--before an investment is made in coding.
>>>>
>>>>
>>>>
>>>> I'm not sure I buy that. More easily than what? The RDM
>>>> normalization can be applied beyond the RDB's table/tuple paradigm.
>>>> ISAM files, CODASYL files, and other data representations can be
>>>> normalized using the same basic rules. And OO Class Models are
>>>> routinely normalized as part of the basic paradigm methodology.
>>>>
>>>> However, I don't see that as being very relevant.
>>>
>>>
>>> I guess the point was relational DBs have at least relational
>>> calculus behind them, and we can always ask a design a) is all the
>>> necessary data saved and b) are all the relationships represented?
>>>
>>> Granted this is anecdotal, but the experienced DBs I've known would
>>> all end up at nearly identical DB designs given the same
>>> requirements. I've not seen that consistency with OO designers.
>>
>>
>> That's true, but the OO designer is dealing with a much bigger problem
>> because of the need for behaviors. The Data Model only needs to worry
>> about data while the Class Model is just one <static> piece of a much
>> larger, largely dynamic structure.
>
> Agreed, so why not exploit the data model's ability to be proven correct
> and work up from there?

That's what I don't buy -- that the Data Model is somehow uniquely
validatable.

An OO Class Model is routinely normalized using the RDM and can be
validated in the same manner as a Data Model. And -- if one is a
translationist like me B-) -- the entire OOA model is itself executable.
[It is routine to run exactly the same test suite for functional
requirements against both the model and the generated executable; all
that changes is the test harness. When one runs the test suite against
the model, one is validating the correctness of the solution. When one
runs the test suite against the executable, one is just validating the
transformation process.]

Thomas Gagne

unread,

Nov 15, 2006, 9:53:49 AM11/15/06

to

H. S. Lahman wrote:
> <snip>

> That's what I don't buy -- that the Data Model is somehow uniquely
> validatable.
>
> An OO Class Model is routinely normalized using the RDM and can be
> validated in the same manner as a Data Model.

Are you saying your OO models are normalized the same way your RDB
models are? I guess I'm wondering if you're thinking of systems with
1:1 ratio of tables to classes.

Relational Set Theory has calculus. The model can be tested against the
normal forms (as deep as you want), and finally it's far easier to have
a complete test showing all the data is stored rather than having some
missing.

Object models can be infinitely complex and there's no single yardstick
of correctness. Some designers create deep class hierarchies, others
create wide ones. Some model collections, and others don't. Most
languages thought to be OO offer varying degrees of support for OO.
Unless you're using Smalltalk or another of the /purer/ OO languages the
graphs created often include numerous compromises, work-arounds, and
sub-optimal compositions.

Just as complex systems are more difficult to predict than static
systems, object graphs modeling behavior (especially those handicapped
with only partial support for OO) are more difficult to cover 100% with
unit tests than a database, which doesn't change with every clock cycle.

I feel better moving forward knowing

> And -- if one is a translationist like me B-) -- the entire OOA
> model is itself executable. [It is routine to run exactly the same
> test suite for functional requirements against both the model and the
> generated executable; all that changes is the test harness. When one
> runs the test suite against the model, one is validating the
> correctness of the solution. When one runs the test suite against the
> executable, one is just validating the transformation process.]

The way we develop and deploy Smalltalk there's no difference between
what's tested and what's delivered. But I realize that may be a luxury
with Smalltalk.

H. S. Lahman

unread,

Nov 15, 2006, 1:24:27 PM11/15/06

to

Responding to Gagne...

>> That's what I don't buy -- that the Data Model is somehow uniquely
>> validatable.
>>
>> An OO Class Model is routinely normalized using the RDM and can be
>> validated in the same manner as a Data Model.
>
> Are you saying your OO models are normalized the same way your RDB
> models are? I guess I'm wondering if you're thinking of systems with
> 1:1 ratio of tables to classes.

Yes. In a well-formed OO application the Class Model should fully
conform to at least 3NF. [A number of the higher NFs are about how
identity is described (e.g.. compound keys) and those tend not to be
relevant in a Class Model without explicit identity. They do become
relevant, though, if one provides explicit identity attributes.] There
are some exceptions, such as dependent knowledge attributes, but those
are highlighted by unique notation in UML. OOA/D also has additional
rules about things like generalization (e.g., the union of subset
members must be a complete set of parent members).

[In practice many OOA/D authors don't describe NF explicitly because it
gets complicated to talk about NF when one doesn't have explicit
identity attributes. Instead they simply provide a suite of
methodological rules and guidelines for forming a Class Model using
Sermon On The Mount Mode. But those rules and guidelines are just
rephrasing NF.]

Typically the Class Model will not map 1:1 to a storage Data Model
outside of CRUD/USER processing. (That's the main reason why I advocate
isolating persistence access mechanisms in a subsystem; such isolation
encapsulates the conversion in views.) Two models of the problem space
can be in 3NF without being exactly the same.

For example, in an OO application the developer can abstract problem
space delegation so that properties are "owned" by quite different
objects than in the Data Model, which is common when abstracting
entities that are conceptual rather than concrete. In addition,
behavior properties are largely anthropomorphized at the developer's
discretion. IOW, the developer has a lot of leeway in allocating which
abstractions own particular properties. But ensuring NF compliance is
not an option.

> Relational Set Theory has calculus. The model can be tested against the
> normal forms (as deep as you want), and finally it's far easier to have
> a complete test showing all the data is stored rather than having some
> missing.

One can apply the calculus against NF in exactly the same way. Even
when developers just validate that the Sermon On The Mount rules have
been followed, they are essentially doing that, albeit quite indirectly.

<Hot Button>
However, the indirection also entails risk that it won't be done right.
That's why I would rather explain NF to a novice developer than try to
hide it behind a bunch of methodological guidelines. How many software
developers today don't understand NF? RAD tools have pretty much killed
the notion that a DBA is one of the Anointed Few who understands such
mysteries. There is a lot of condescension implicit in the notion that
NF will just confuse the OO developer.
</Hot Button>

>
> Object models can be infinitely complex and there's no single yardstick
> of correctness. Some designers create deep class hierarchies, others
> create wide ones. Some model collections, and others don't. Most
> languages thought to be OO offer varying degrees of support for OO.
> Unless you're using Smalltalk or another of the /purer/ OO languages the
> graphs created often include numerous compromises, work-arounds, and
> sub-optimal compositions.

Which is why quite different models of the same problem space can still
be in NF.

Also, note that I am talking about a Class Model in UML, which describes
static structure at a 4GL level of abstraction. By the time one gets to
the OOPL level, the OOA/D should have already taken care of things like NF.

> Just as complex systems are more difficult to predict than static
> systems, object graphs modeling behavior (especially those handicapped
> with only partial support for OO) are more difficult to cover 100% with
> unit tests than a database, which doesn't change with every clock cycle.

But that doesn't prevent one from isolating the static structure in an
abstract notion like a Class Model and ensuring that at least that part
of the model is rigorously constructed. (And if the developer doesn't
bother with OOA/D bubbles & arrows, the developer is still obligated to
define objects as if they had been used.)

> I feel better moving forward knowing
>
>> And -- if one is a translationist like me B-) -- the entire OOA
>> model is itself executable. [It is routine to run exactly the same
>> test suite for functional requirements against both the model and the
>> generated executable; all that changes is the test harness. When one
>> runs the test suite against the model, one is validating the
>> correctness of the solution. When one runs the test suite against the
>> executable, one is just validating the transformation process.]
>
> The way we develop and deploy Smalltalk there's no difference between
> what's tested and what's delivered. But I realize that may be a luxury
> with Smalltalk.

I think there is a big difference. There is one Smalltalk source being
built and tested. There is only a single level of abstraction involved
in what is being built and tested. That is quite different from
providing an abstract solution model at the 4GL level, validating it,
and then producing a 3GL (or Assembly) model and testing it.

Among other things the OOA models built in translation only address
functional requirements. The transformation engine optimizes for
nonfunctional requirements specified in things like MDA Marking and
Transformation Models. That provides a lot of separation of concerns
between the 4GL and 3GL models.

topmind

unread,

Nov 15, 2006, 1:40:59 PM11/15/06

to

H. S. Lahman wrote:
> Responding to Gagne...

>

> Typically the Class Model will not map 1:1 to a storage Data Model

Us table fans cringe at calling it "storage model". It is already an
abstraction, not a hardware model.

> outside of CRUD/USER processing. (That's the main reason why I advocate
> isolating persistence access mechanisms in a subsystem; such isolation
> encapsulates the conversion in views.) Two models of the problem space
> can be in 3NF without being exactly the same.
>
> For example, in an OO application the developer can abstract problem
> space delegation so that properties are "owned" by quite different
> objects than in the Data Model, which is common when abstracting
> entities that are conceptual rather than concrete. In addition,
> behavior properties are largely anthropomorphized at the developer's
> discretion. IOW, the developer has a lot of leeway in allocating which
> abstractions own particular properties. But ensuring NF compliance is
> not an option.

Note that I used to use local temporary (app-level) tables to do
similar things. However, the tools stopped readily supporting this when
vendors switched to OOP-centric designs to fit the fad. It was sort of
a self-fullfilling prophecy. I am still pissed at OOP for killing
nimble tables.

-T-

AndyW

unread,

Nov 15, 2006, 10:28:49 PM11/15/06

to

On Wed, 15 Nov 2006 18:24:27 GMT, "H. S. Lahman"
<h.la...@verizon.net> wrote:

>Responding to Gagne...
>
>>> That's what I don't buy -- that the Data Model is somehow uniquely
>>> validatable.
>>>
>>> An OO Class Model is routinely normalized using the RDM and can be
>>> validated in the same manner as a Data Model.
>>
>> Are you saying your OO models are normalized the same way your RDB
>> models are? I guess I'm wondering if you're thinking of systems with
>> 1:1 ratio of tables to classes.
>
>Yes. In a well-formed OO application the Class Model should fully
>conform to at least 3NF.

*cough* *splutter* *choke*

How do you normalise a 'fish'.

H. S. Lahman

unread,

Nov 16, 2006, 10:33:52 AM11/16/06

to

Responding to AndyW...

>>>Are you saying your OO models are normalized the same way your RDB
>>>models are? I guess I'm wondering if you're thinking of systems with
>>>1:1 ratio of tables to classes.
>>
>>Yes. In a well-formed OO application the Class Model should fully
>>conform to at least 3NF.
>
>
> *cough* *splutter* *choke*
>
> How do you normalise a 'fish'.

One normalizes a fish class the same way one normalizes a fish table; by
properties and their relationship to fish identity.

H. S. Lahman

unread,

Nov 16, 2006, 10:49:13 AM11/16/06

to

Responding to Jacobs...

>>Typically the Class Model will not map 1:1 to a storage Data Model
>
>
> Us table fans cringe at calling it "storage model". It is already an
> abstraction, not a hardware model.

How is an ERD Data Model not abstract? The issue is what it models. A
Data Model models the way data will be organized statically in the
persistent repository. A Class Model models the way properties will be
used dynamically in the solution to some problem. The point is that
those representations are not necessarily the same once one is outside
CRUD/USER processing.

>>outside of CRUD/USER processing. (That's the main reason why I advocate
>>isolating persistence access mechanisms in a subsystem; such isolation
>>encapsulates the conversion in views.) Two models of the problem space
>>can be in 3NF without being exactly the same.
>>
>>For example, in an OO application the developer can abstract problem
>>space delegation so that properties are "owned" by quite different
>>objects than in the Data Model, which is common when abstracting
>>entities that are conceptual rather than concrete. In addition,
>>behavior properties are largely anthropomorphized at the developer's
>>discretion. IOW, the developer has a lot of leeway in allocating which
>>abstractions own particular properties. But ensuring NF compliance is
>>not an option.
>
>
> Note that I used to use local temporary (app-level) tables to do
> similar things. However, the tools stopped readily supporting this when
> vendors switched to OOP-centric designs to fit the fad. It was sort of
> a self-fullfilling prophecy. I am still pissed at OOP for killing
> nimble tables.

They are still there. The mapping of object/tuple and class/table is
pretty straight forward, which is why normalization can work on a Class
Model. Check out Leon Starr's "Executable UML: How to Build Class
Models". It is the most comprehensive book on OO class modeling
available and almost every example is explained in terms of the table
analogy.

I don't have any problem with using tables. In fact, I use lookup
tables extensively in parametric polymorphism. What I have a problem
with is using tables for a specific problem solution that have the exact
same schema as tables in the database (once one is outside the realm of
CRUD/USER).

Dmitry A. Kazakov

unread,

Nov 16, 2006, 10:59:14 AM11/16/06

to

On Thu, 16 Nov 2006 15:33:52 GMT, H. S. Lahman wrote:

> Responding to AndyW...
>
>>>>Are you saying your OO models are normalized the same way your RDB
>>>>models are? I guess I'm wondering if you're thinking of systems with
>>>>1:1 ratio of tables to classes.
>>>
>>>Yes. In a well-formed OO application the Class Model should fully
>>>conform to at least 3NF.
>>
>> *cough* *splutter* *choke*
>>
>> How do you normalise a 'fish'.
>
> One normalizes a fish class the same way one normalizes a fish table; by
> properties and their relationship to fish identity.

Yep, fishsticks is normalized fish. A normalized fish table must be built
out of normalized fishsticks... I believe. I am not sure about fish
identity in those sticks... (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de