<http://blogs.in-streamco.com/anything.php?title=transaction_oriented_architecture_aka_th>
In the article I propose the impedance mismatch between Object Oriented
Languages can Relational Databases can be easily eliminated. All we
have to do (as OO programmers and designers) is stop pretending the
database doesn't exist.
--
Visit <http://blogs.instreamfinancial.com/anything.php>
to read my rants on technology and the finance industry.
> I was going to call it Transaction Oriented Processing, but didn't want
> its acronym confused with another, already more popular, in comp.object.
>
> <http://blogs.in-streamco.com/anything.php?title=transaction_oriented_architecture_aka_th>
I would point out that your notion of 'transaction' is different than
that of the typical RDBMS. The need for, content of, and timing of
audit transactions is driven solely by the problem domain. OTOH, the
notion of 'transaction' in the database is related to providing scope
for data integrity _within the database_ when queries are processed.
(Alas, 'transaction' is one of those terminally overloaded words in the
computing space.)
One of the problems of synchronizing problem solutions to databases is
providing mapping between such disparate notions, which segues to...
> In the article I propose the impedance mismatch between Object Oriented
> Languages can Relational Databases can be easily eliminated. All we
> have to do (as OO programmers and designers) is stop pretending the
> database doesn't exist.
I don't think the issue is ignoring the database; it is recognizing that
the database is a different subject matter applying different business
rules than the problem solution. Enter stage left, dancing: separation
of concerns and decoupling through subsystem encapsulation.
There is nothing to prevent abstracting the database subject matter in a
classic OO manner with objects like Schema, Table, Tuple, and Query.
One can provide behaviors to deal with database-specific mechanics like
the construction of SQL strings. One can easily provide optimizations
tailored to the database paradigm de jour. One just has to realize that
problem one is solving in accessing a database is different than the
problem the application is solving for the customer (at least outside
the CRUD/USER arena).
It seems to me the article was most of the way there when it proposes
things like isolating SQL so it is not embedded in the problem solution
and emphasizing low coupling between modules. It just didn't say
specifically that one needs to encapsulate database access in a
subsystem behind an interface to do that. B-)
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
h...@pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
in...@pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
I hit the wrong keyboard combination and my original reply disappeared.
It hasn't appeared on the newsgroup yet, so I assume it went to the
Great Bit Dump in the Sky. If it does show up it might be amusing to
compare the answers.
>>> In the article I propose the impedance mismatch between Object
>>> Oriented Languages can Relational Databases can be easily
>>> eliminated. All we have to do (as OO programmers and designers) is
>>> stop pretending the database doesn't exist.
>>
>>
>> I don't think the issue is ignoring the database; it is recognizing
>> that the database is a different subject matter applying different
>> business rules than the problem solution. Enter stage left, dancing:
>> separation of concerns and decoupling through subsystem encapsulation.
>>
>> There is nothing to prevent abstracting the database subject matter in
>> a classic OO manner with objects like Schema, Table, Tuple, and Query.
>> One can provide behaviors to deal with database-specific mechanics
>> like the construction of SQL strings. One can easily provide
>> optimizations tailored to the database paradigm de jour. One just has
>> to realize that problem one is solving in accessing a database is
>> different than the problem the application is solving for the customer
>> (at least outside the CRUD/USER arena).
>
> But that abstraction is exactly the busywork I think that can most-often
> be avoided. Admittedly, in our own application (and other's I've been a
> party to) we have abstracted tuples--but only as a mechanism for making
> an RDB's answer (projection) easily accessible by making its
> manipulation idiomatic to the programming language. For example, using
> structures in C or instances in Smalltalk.
The basic problem is that the RDB paradigm is designed for generic, ad
hoc access of data. IOW, it is optimized to be independent of what
problem is being solved with the data. Once one is outside the
CRUD/USER arena, applications are optimized to solve very specific
problems. That optimization involves different approaches to things
like relationship navigation (e.g., object-level instantiation vs.
table-level). So the views of the data should be /expected/ to be
different. Therefore one solves the business problem first and then
worries about how one maps that solution view into the DB view.
In addition, a big advantage of encapsulating in a subsystem is large
scale reuse. Once one abstracts the invariants of the DB paradigm de
jour properly, then the subsystem can be reused across applications.
Thus the RDB paradigm is abstracted exactly the same way regardless of
the semantics of the data. That is, one encodes the invariants of the
paradigm and leaves the details to data (e.g., mapping of table names,
field names, etc.). To do that one just applies basic problem space
abstraction to the RDB subject matter, so there it really costs very
little extra to achieve that reuse.
However, for that to work...
>> It seems to me the article was most of the way there when it proposes
>> things like isolating SQL so it is not embedded in the problem
>> solution and emphasizing low coupling between modules. It just didn't
>> say specifically that one needs to encapsulate database access in a
>> subsystem behind an interface to do that. B-)
>
> It did quit a little early (more is planned--specifically what it looks
> like in code) but the primary abstractions it will propose have to do
> with making procedure calls idiomatic and explicit rather than
> abstracted and implicit (hidden). I'm of the opinion that the more
> obvious the database (or at least its interface) is the more easily
> maintainable an application becomes. I've nothing against creating
> frameworks and patterns to facilitate those programming activities, but
> prefer the concept of a problem domain transaction to language-specific
> expressions mapping 1:1 (or some other low-ratio) with anything
> physically present in the database (like tables, views, or the columns
> in each). From this perspective, the only mapping TOA endorses is
> between a procedure's answer and the language's idiomatic expression of it.
One needs to design the interface to the problem solution's needs. Thus
it is the interface that needs to be replaced during large scale reuse
(think: Facade pattern). To be successful the interface needs to be a
pure message-based interface so that each side can map the message ID
and data packet into its own unique view. That, in turn, means a
consistent mapping of identity on each side of the interface.
Fortunately, that is usually easy for data, especially for paradigms
like the RDM that are designed to provide generic identity. Half the
work is done if the DBMS schema is available to the application. So one
just needs a bunch of table lookups to map data packet elements into
Table/Field identifiers to construct SQL strings. Those lookup tables
get defined from external configuration data that maps the interface
messages into the RDB schema.
The price of this is encode/decode of message data packets on each side
of the interface. For a SQL RDB, that effectively means duplicated
dataset encode/decode. Fortunately DB access is in milliseconds while
encoding/decoding data packets is in microseconds so nobody is likely to
notice anything except the developer's extra keystrokes, which don't
count in the overall scheme of things.
When there actually is a 1:1 mapping between the solution and DB views,
that message indirection is redundant. But that decoupling pays huge
bonuses when the mapping is not 1:1, when the mapping changes in the
future, and for large scale reuse across applications. The thing is
that once one is outside the CRUD/USER realm when solving some customer
problem, one can't know when the mapping will be 1:1 or when it might
change.
So one should solve the customer problem without worrying about what the
DB looks like. If some objects happen to map 1:1 that is transparent
because the interface is designed to the solution's data needs (e.g.,
"Save this pile of data I call 'X' and give it back to me when I ask for
'X' later"). IOW, one should be able to solve the customer problem and
design the DB access interface without knowing or caring how the stored
data is ultimately organized.
At another level, I don't see how you can advise not embedding SQL in
the problem solution without providing the indirection of a
message-based interface. Creating surrogate objects for the RDB that do
construct SQL queries and letting other objects talk to them to do the
grunt work doesn't really reduce the maintenance risk very much. When
the schema changes you are still touching the solution subject matter's
code and that implies risk of breaking the solution over a pure DB issue.
It will also be tempting to bend the solution around those surrogates,
which can lead to a less optimal and maintainable solution. That is, if
one has a preconceived notion of what the data looks like, one will be
tempted to build the solution around that view. So the generic, ad hoc
tail is wagging the solution dog.
IOW, I think maintainability will be maximized when the subject matter
concerns are clearly decoupled. The strongest way to do that is by
encapsulating the DB access in a subsystem behind a pure message-based
interface.
> It seems to me the article was most of the way there when it proposes
> things like isolating SQL so it is not embedded in the problem solution
I have to disagree with such advice. Isolating SQL is oversold. One
does not swap database vendors often enough to make it worthwhile.
Maybe if you sell a product that is designed to be used by multiple
DB's, such makes sense; but not for most custom apps. It creates more
work and more red-tape. (We have spent gazillion messages on this topic
before.)
> and emphasizing low coupling between modules.
I've found "coupling" to be a rather subjective property such that it
is a nearly useless catch-phrase these days. (That is until the point
somebody defines it clearly and not tied to subjective criteria, such
as what app part is more likely to change.)
> It just didn't say
> specifically that one needs to encapsulate database access in a
> subsystem behind an interface to do that. B-)
>
>
> *************
> There is nothing wrong with me that could
> not be cured by a capful of Drano.
>
> H. S. Lahman
-T-
Bullhonk!
I used to use custom, local, and/or temporary tables all the time to
solve app-specific problems, but it unfortunately fell out of style for
tool vendors when everybody thought OOP would replace such techniques.
The "Big-Iron" view of relational is an Oracle invention more or less.
In desk-top-based app software kits such as FoxPro and Paradox,
querying from the server and chomping tables for local needs was common
and easy. And you get query ability out-of-the-box instead of
roll-your-own set/get/find/look-up methods that OO always reinvents
bloatingly and inconsistently.
> IOW, it is optimized to be independent of what
> problem is being solved with the data. Once one is outside the
> CRUD/USER arena, applications are optimized to solve very specific
> problems. That optimization involves different approaches to things
> like relationship navigation (e.g., object-level instantiation vs.
> table-level). So the views of the data should be /expected/ to be
> different. Therefore one solves the business problem first and then
> worries about how one maps that solution view into the DB view.
Example? There is no objective evidence that OOP better fits the
"problem space" than tables.
>
> In addition, a big advantage of encapsulating in a subsystem is large
> scale reuse.
Reuse is an out-of-date OOP claim. Many OO'ers don't even bother
defending reuse anymore.
Otherwise show me something that OOP makes reusable that
procedural/relational can't. I will not take your word for it.
If you mean things like SQL generators for INSERT/UPDATE statements,
I've built and used procedural versions of those also.
> Once one abstracts the invariants of the DB paradigm de
> jour properly, then the subsystem can be reused across applications.
> Thus the RDB paradigm is abstracted exactly the same way regardless of
> the semantics of the data.
The "base" tables might be, but custom/local/temp tables were very
common and easy in pre-OO-fad tools, as described above.
> That is, one encodes the invariants of the
> paradigm and leaves the details to data (e.g., mapping of table names,
> field names, etc.). To do that one just applies basic problem space
> abstraction to the RDB subject matter, so there it really costs very
> little extra to achieve that reuse.
>
> However, for that to work...
>
[snip]
Please stop spreading falsehoods about relational.
-T-
I sense, perhaps incorrectly, a disparaging view of CRUD/USER
applications. You comments in both this thread and the thread
"Relational Database & OO" seem to indicate an opinion that CRUD/USER
applications are too simple to be representative of sophisticated OO
designs.
Is that true, or am I reading something in your comments that isn't there?
>
> In addition, a big advantage of encapsulating in a subsystem is large
> scale reuse. Once one abstracts the invariants of the DB paradigm de
> jour properly, then the subsystem can be reused across applications.
> Thus the RDB paradigm is abstracted exactly the same way regardless of
> the semantics of the data. That is, one encodes the invariants of the
> paradigm and leaves the details to data (e.g., mapping of table names,
> field names, etc.). To do that one just applies basic problem space
> abstraction to the RDB subject matter, so there it really costs very
> little extra to achieve that reuse.
Reuse is valuable and admirable, but the creation of an "encapsulating
.. subsystem" may be unnecessary, and touting the benefits of reuse
sounds like making virtue of necessity--except it may not be necessary
in the first place.
A database API is reused everywhere applications interface with DBs.
The mapping of the DB (like VW Smalltalk's EXDI or Java's JDBC) is
reused--it makes the DB's API more language-malleable, and that's reused
everywhere--regardless the application domain (I'm not aware of many
people that have created their own replacements for either EXDI or JDBC).
You seem to be advocating the creation of a layer between the
application the the language-DB interface that maps the application
domain objects to the DB--I'm guessing classes to tables, instances to
rows, and fields to columns. The /infrastructure/ you're describing
(I'm guessing) gives OO instances behaviors that allow them to either
instantiate themselves from or /persist/ themselves to the DB. It is
this behavior, the endowing of DB-awareness (or persistence--a rose is a
rose...) into our classes and their instances, which is characteristic
of the denial I was talking about. It is the premise of object
databases and their lesser incarnations of object-relational products
that the boundary between application and DB is better buried than
exposed. The simple fact that it /can/ be done proves to many that it
/should/ be done. This is a Jedi mind trick.
If I'm right in understanding what you're saying (and what countless
vendors, analysts, and pundits sell, present, and report), this is
exactly where our designs part company and where Transaction Oriented
Processing (or TOA--but I dislike the word architecture when used with
software) proposes an alternative, thinner, and simpler model.
>
> However, for that to work...
>
>> <snip>
>
> One needs to design the interface to the problem solution's needs.
> Thus it is the interface that needs to be replaced during large scale
> reuse (think: Facade pattern). To be successful the interface needs
> to be a pure message-based interface so that each side can map the
> message ID and data packet into its own unique view. That, in turn,
> means a consistent mapping of identity on each side of the interface.
The belief that there needs to be symmetrical mappings on either side of
the interface assumes their needs to be a mapping in the first place.
That's surprisingly similar to petitio principii, the fallacy of
assumption more commonly known as "begging the question." TOA/TOP
proposes (and I know I haven't gotten that far in the article) the
database and its application domain stored procedures are the only
persistence mechanism necessary, and that the benefits of a focused,
single, data-permeable gateway between application and database far
exceed the benefits of O/R mappings--regardless of abstraction--and that
its lightweight appearance shouldn't be dismissed as missing heavyweight
kick.
>
>
> Fortunately, that is usually easy for data, especially for paradigms
> like the RDM that are designed to provide generic identity. Half the
> work is done if the DBMS schema is available to the application. So
> one just needs a bunch of table lookups to map data packet elements
> into Table/Field identifiers to construct SQL strings. Those lookup
> tables get defined from external configuration data that maps the
> interface messages into the RDB schema.
>
> The price of this is encode/decode of message data packets on each
> side of the interface. For a SQL RDB, that effectively means
> duplicated dataset encode/decode. Fortunately DB access is in
> milliseconds while encoding/decoding data packets is in microseconds
> so nobody is likely to notice anything except the developer's extra
> keystrokes, which don't count in the overall scheme of things.
It may be early in the AM (for me) but I'm not following what your
talking about above.
>
> When there actually is a 1:1 mapping between the solution and DB
> views, that message indirection is redundant. But that decoupling
> pays huge bonuses when the mapping is not 1:1, when the mapping
> changes in the future, and for large scale reuse across applications.
> The thing is that once one is outside the CRUD/USER realm when solving
> some customer problem, one can't know when the mapping will be 1:1 or
> when it might change.
>
> So one should solve the customer problem without worrying about what
> the DB looks like. If some objects happen to map 1:1 that is
> transparent because the interface is designed to the solution's data
> needs (e.g., "Save this pile of data I call 'X' and give it back to me
> when I ask for 'X' later"). IOW, one should be able to solve the
> customer problem and design the DB access interface without knowing or
> caring how the stored data is ultimately organized.
But the customer problem can be solved without using OO. It can be
solved in FORTRAN, COBOL, C, LISP, or JavaWOO (much of the Java I've
seen isn't very OO at all so I wanted to include it in my list of non-OO
languages--Java Without Object Orientation). The customer's problem and
the DB are constants. Only languages and idioms lie between and may be
interchangeable. In fact, if the DB domain requirements are consistent
(its domain-API expressed through procedures) then suitable applications
can be written in multiple languages without compromising the DB's design.
>
> At another level, I don't see how you can advise not embedding SQL in
> the problem solution without providing the indirection of a
> message-based interface.
Use stored procedures. No mapping necessary.
> Creating surrogate objects for the RDB that do construct SQL queries
> and letting other objects talk to them to do the grunt work doesn't
> really reduce the maintenance risk very much.
No, but it sprinkles SQL throughout the application. SQL is necessarily
DB-design aware. If the DB design changes the SQL must necessarily
change as well. I don't want to be in the SQL maintenance business.
> When the schema changes you are still touching the solution subject
> matter's code and that implies risk of breaking the solution over a
> pure DB issue.
Not at all. We either believe complexity is hidden behind our
interfaces or we don't. As proof, our application (commercial finance
and paper trading) has regularly changed table designs without affecting
stored procedures. The application talks to the DB through stored
procedures to affect data changes. The application needn't change
unless the stored procedures do.
This is identical to programming functions. I can change the
implementation of a method, and as long as the method interface (name
and arguments) don't change then no one calling it needs to change.
>
> It will also be tempting to bend the solution around those surrogates,
> which can lead to a less optimal and maintainable solution. That is,
> if one has a preconceived notion of what the data looks like, one will
> be tempted to build the solution around that view. So the generic, ad
> hoc tail is wagging the solution dog.
But preconceived notions exist on both sides of the interface. As the
saying (almost) goes, two preconceived notions don't make a right
(notion). The database must be language neutral--but it can't be domain
neutral. If it were domain neutral then we'd be implying its design
didn't support the domain--which would defeat the purpose, no?
>
> IOW, I think maintainability will be maximized when the subject matter
> concerns are clearly decoupled. The strongest way to do that is by
> encapsulating the DB access in a subsystem behind a pure message-based
> interface.
If you believe message-based interfaces are valuable, I propose TOA/TOP
is a more faithful realization of it.
topmind wrote:
> H. S. Lahman wrote:
>
>> Responding to Gagne...
>>
>
>
>> It seems to me the article was most of the way there when it proposes
>> things like isolating SQL so it is not embedded in the problem solution
>>
>
>
> I have to disagree with such advice. Isolating SQL is oversold. One
> does not swap database vendors often enough to make it worthwhile.
> Maybe if you sell a product that is designed to be used by multiple
> DB's, such makes sense; but not for most custom apps. It creates more
> work and more red-tape. (We have spent gazillion messages on this topic
> before.)
>
Eliminating SQL in the application in TOA/TOP isn't about database
vendor independence, its about good design and maintainability.
>
>
>> and emphasizing low coupling between modules.
>>
>
> I've found "coupling" to be a rather subjective property such that it
> is a nearly useless catch-phrase these days. (That is until the point
> somebody defines it clearly and not tied to subjective criteria, such
> as what app part is more likely to change.)
>
Coupling is a relative term, not subjective--similar to a noise ratio.
Coupling describes the interdependence between two related objects. Low
coupling is realized when significant changes can be made to one object
without affecting the other. High coupling indicates changes can not be
made to one without requiring changes to another. In the case of OO and
DB, when a mapping exists between the two, the nearer that mapping
approaches 1:1 the more tightly coupled the application and DB are. In
a 1:1 mapping nothing in the DB can be changed without affecting the
application, nor can anything be changed in the application data model
without affecting change in the DB.
Object transaction engines exist.
See COM+..component services....I'm sure there's others.
Normally I don't feed the trolls, but this post had me LOL. You've
really outdone yourself this time.
>>The basic problem is that the RDB paradigm is designed for generic, ad
>>hoc access of data.
>
>
> Bullhonk!
>
> I used to use custom, local, and/or temporary tables all the time to
> solve app-specific problems, but it unfortunately fell out of style for
> tool vendors when everybody thought OOP would replace such techniques.
> The "Big-Iron" view of relational is an Oracle invention more or less.
> In desk-top-based app software kits such as FoxPro and Paradox,
> querying from the server and chomping tables for local needs was common
> and easy. And you get query ability out-of-the-box instead of
> roll-your-own set/get/find/look-up methods that OO always reinvents
> bloatingly and inconsistently.
Let me see if I understand this. You need a temporary table to solve
some problem _in your application_. So you add the table and fields to
the RDB schema, populate it with tuples, access it with queries, and
then delete it when the application is done. You do this where every
operation involves milliseconds for physical read/write operations (not
to mention the DBMS overhead of modifying the schema and contention for
DB resources in a multi-user environment). And somehow you feel this is
a better solution than providing a table in memory where operations are
measured in microseconds or nanoseconds and you don't load the server
unnecessarily?
>>IOW, it is optimized to be independent of what
>>problem is being solved with the data. Once one is outside the
>>CRUD/USER arena, applications are optimized to solve very specific
>>problems. That optimization involves different approaches to things
>>like relationship navigation (e.g., object-level instantiation vs.
>>table-level). So the views of the data should be /expected/ to be
>>different. Therefore one solves the business problem first and then
>>worries about how one maps that solution view into the DB view.
>
>
> Example? There is no objective evidence that OOP better fits the
> "problem space" than tables.
Yet another of your stock-in-trade forensic ploys: you are being
deliberately obtuse.
I know you know better because I've already pointed you at the most
comprehensive book on OO Class Modeling available, Leon Starr's
"Executable UML: How to Build Class Models", and pointed out that Leon
uses tables as an analogy for almost all of his examples. I've also
pointed out repeatedly that Class Models are normalized using the same
basic RDM rules as RDB schemas. IOW, tables per se have nothing to do
with the points I was making. In fact, in the quoted paragraph I
explicitly identified the main paradigm difference (i.e., the way
relationships are instantiated and navigated) -- which you chose to
ignore as yet another forensic ploy to pull the opponent's chain.
>>In addition, a big advantage of encapsulating in a subsystem is large
>>scale reuse.
>
>
> Reuse is an out-of-date OOP claim. Many OO'ers don't even bother
> defending reuse anymore.
>
> Otherwise show me something that OOP makes reusable that
> procedural/relational can't. I will not take your word for it.
The post was largely about such an example: reusing a subsystem across
applications for accessing /any/ RDB in /any/ problem context. You just
chose to ignore that.
In addition, you pull out yet another tired forensic ploy and try to put
words in my mouth to the effect that one cannot have reuse with P/R.
Where did anyone say that one could not encapsulate subsystems behind
APIs and reuse them using P/R? Modularity has been around every since
installable device drivers in Multics in the '60s. The OO paradigm just
does a better job of emphasizing and formalizing notions like
implementation hiding.
>>Once one abstracts the invariants of the DB paradigm de
>>jour properly, then the subsystem can be reused across applications.
>>Thus the RDB paradigm is abstracted exactly the same way regardless of
>>the semantics of the data.
>
>
> The "base" tables might be, but custom/local/temp tables were very
> common and easy in pre-OO-fad tools, as described above.
You are being deliberately obtuse again and focusing on your own
mantras. How does the RDB tell the difference between a temporary table
for a specific application and an enterprise table? It can't; it is
just a table.
So even if one were silly enough to store temporary tables unique to the
solution in the RDB rather than memory, one would still access them
through the DB access subsystem with exactly the same identity mapping
as any other table. So the DB access subsystem's <invariant>
abstractions are exactly the same.
Give it up, Bryce. You've been using these same forensic ploys for so
long they are transparent to the point of being comical. We all realize
now you are trolling here just to amuse yourself by pulling OO people's
chains. So now we are the ones laughing. Ta-ta.
Did you read the paper or was its use of the over-used term,
transaction, distracting? It is for me, but I'm unsure what else to
call it.
Here's a couple definitions:
* Database transaction - that which can be committed or rolled-back
in its entirety without harming a DB's integrity.
* Financial transaction - records who did what and when to whom and
for how much or even why
* Data-change transaction - records who did what and when to whom
and sometimes why
The last two, which I'll jointly combine and call application
transactions, are important features that facilitate auditing a system.
Additionally, embracing them helps maintain a system's integrity by
allowing error corrections to be recorded as error-correcting
transactions rather than DB transactions that simply remove the errant
transactions.
Application transactions are the kinds of things you see on your banking
statements and shopping receipts. They're familiar concepts we see all
around us, but they're oddly ignored as a mechanism to bridge the chasm
between the database and its business-rules.
I had a manager that once told me that people who know how things are
done will always work for people who know why things are done. This is
how I view the relationship between business rules (application
transaction rules) and database procedures. The latter knows how to
affect changes to the DB, but the former knows why and when to do them.
Need? No. They are just often simpler than arrays and OOP pointer pasta
(if the language or libraries support them).
> So you add the table and fields to
> the RDB schema, populate it with tuples, access it with queries, and
> then delete it when the application is done.
More or less. Note FoxPro and other dBASE-derived kits optionally
allowed a table to be a single file without any forced attachment to a
"schema". Some dialects even deleted the files for you when the app was
finished by declaring it temporary. This is different than the "big
iron" view of schemas.
> You do this where every
> operation involves milliseconds for physical read/write operations (not
> to mention the DBMS overhead of modifying the schema and contention for
> DB resources in a multi-user environment). And somehow you feel this is
> a better solution than providing a table in memory where operations are
> measured in microseconds or nanoseconds and you don't load the server
> unnecessarily?
The tools I used cached to RAM if available. This is good because if
they grow large they automatically use disk as needed. I didn't have to
worry about a RAM/disk dichotomy. I even tested it multiple times and
saw how it used RAM caching until RAM was full, and then it used disk
without the programmer having to flip any switch. It abstracted away
the RAM/disk boundary.
And performance was satisfactory even on a 286, long before Pentiums.
Your perception of tables and relational is too heavily shaped by "big
iron" RDBMS brands. You OO'ers need to get out more.
And what is this about a "multi-user environment"? In the desktop days
they were kept in the temp folder. As far as something like a web app,
one can still use tables with userID as part of a compound key to
emulate temp tables. It would be nice if vendors offered better temp
table features, but if they don't it is not a show-stopper in most
cases. There is a distinction between bad vendors/products and bad
paradigms.
>
> >>IOW, it is optimized to be independent of what
> >>problem is being solved with the data. Once one is outside the
> >>CRUD/USER arena, applications are optimized to solve very specific
> >>problems. That optimization involves different approaches to things
> >>like relationship navigation (e.g., object-level instantiation vs.
> >>table-level). So the views of the data should be /expected/ to be
> >>different. Therefore one solves the business problem first and then
> >>worries about how one maps that solution view into the DB view.
> >
> >
> > Example? There is no objective evidence that OOP better fits the
> > "problem space" than tables.
>
> Yet another of your stock-in-trade forensic ploys: you are being
> deliberately obtuse.
It was your claim. If you didn't mean it "better fits problem space",
then the obtusness is perhaps on your end.
>
> I know you know better because I've already pointed you at the most
> comprehensive book on OO Class Modeling available, Leon Starr's
> "Executable UML: How to Build Class Models", and pointed out that Leon
> uses tables as an analogy for almost all of his examples. I've also
> pointed out repeatedly that Class Models are normalized using the same
> basic RDM rules as RDB schemas. IOW, tables per se have nothing to do
> with the points I was making. In fact, in the quoted paragraph I
> explicitly identified the main paradigm difference (i.e., the way
> relationships are instantiated and navigated) -- which you chose to
> ignore as yet another forensic ploy to pull the opponent's chain.
I don't know what you are refering too here. How about some scenarios
or code that show OOP being better than relational.
>
> >>In addition, a big advantage of encapsulating in a subsystem is large
> >>scale reuse.
> >
> >
> > Reuse is an out-of-date OOP claim. Many OO'ers don't even bother
> > defending reuse anymore.
> >
> > Otherwise show me something that OOP makes reusable that
> > procedural/relational can't. I will not take your word for it.
>
> The post was largely about such an example: reusing a subsystem across
It is also about whether tables are/can be app-specific.
> applications for accessing /any/ RDB in /any/ problem context. You just
> chose to ignore that.
I have yet to see a *good* API that does this. There are too many
vendor-specific implementations and vendor-specific nice features that
save a lot of SQL if they exist. It is less trouble to talk to the DB
directly in most cases and give up trying to wrap everything you hate
or don't understand behind verbose classes.
Even if OO was by chance better for building DB wrappers, that does not
necessarily imply that it is better at building custom biz apps. The
more something resembles a device-driver, the better OOP will be at it.
Unfortunately, the device-driver view does not scale to every domain.
>
> In addition, you pull out yet another tired forensic ploy and try to put
> words in my mouth to the effect that one cannot have reuse with P/R.
> Where did anyone say that one could not encapsulate subsystems behind
> APIs and reuse them using P/R? Modularity has been around every since
> installable device drivers in Multics in the '60s. The OO paradigm just
> does a better job of emphasizing and formalizing notions like
> implementation hiding.
Prove it with code instead of brochure-talk.
>
> >>Once one abstracts the invariants of the DB paradigm de
> >>jour properly, then the subsystem can be reused across applications.
> >>Thus the RDB paradigm is abstracted exactly the same way regardless of
> >>the semantics of the data.
> >
> >
> > The "base" tables might be, but custom/local/temp tables were very
> > common and easy in pre-OO-fad tools, as described above.
>
> You are being deliberately obtuse again and focusing on your own
> mantras. How does the RDB tell the difference between a temporary table
> for a specific application and an enterprise table? It can't; it is
> just a table.
You were suggesting that tables cannot be app-specific. I am simply
demolishing that myth. App-specific temporary tables are a powerful
and useful modeling tool. And easy to sift and debug than
pointer-centric pasta OO classes.
>
> So even if one were silly enough to store temporary tables unique to the
> solution in the RDB rather than memory,
The disk/RAM dichotomy is an artificial seperation that table systems
don't necessarily have to care about.
> one would still access them
> through the DB access subsystem with exactly the same identity mapping
> as any other table. So the DB access subsystem's <invariant>
> abstractions are exactly the same.
>
> Give it up, [you]. You've been using these same forensic ploys for so
> long they are transparent to the point of being comical. We all realize
> now you are trolling here just to amuse yourself by pulling OO people's
> chains. So now we are the ones laughing. Ta-ta.
Youz is Projecting. Your characterization of tables is simply WRONG.
They can be app-specific and they are not inharently slow nor
cumbersome. Accept your wrongitude like a man instead of a weasOOL.
> H. S. Lahman
-T-
>> The basic problem is that the RDB paradigm is designed for generic, ad
>> hoc access of data. IOW, it is optimized to be independent of what
>> problem is being solved with the data. Once one is outside the
>> CRUD/USER arena, applications are optimized to solve very specific
>> problems. That optimization involves different approaches to things
>> like relationship navigation (e.g., object-level instantiation vs.
>> table-level). So the views of the data should be /expected/ to be
>> different. Therefore one solves the business problem first and then
>> worries about how one maps that solution view into the DB view.
>
> So you're agreeing the data structures are different in the application
> than in the DB? The biggest difference between what you say above and
> the article is that the article recommends starting with the DB, proving
> it correct, then developing the application.
That is where we differ. Solve the problem first and then figure out
how to talk to the database. If one builds the application around the
database view, one risks a suboptimal solution. [I would also argue
that one risks maintainability problems simply because the DB view is
necessarily static while the solution view needs to be both static and
dynamic. But that is a more complex argument and we probably shouldn't
go there.]
To put it another way, the database is there and there is nothing you
can do about it. The problem in hand, though, is entirely under your
control and you have an opportunity for creative design. Do you want
the DBA to design your application instead of you?
> I sense, perhaps incorrectly, a disparaging view of CRUD/USER
> applications. You comments in both this thread and the thread
> "Relational Database & OO" seem to indicate an opinion that CRUD/USER
> applications are too simple to be representative of sophisticated OO
> designs.
Not disparaging. CRUD/USER processing is a major segment of IT and it
is going to be around for as long as /people/ analyze gobs of data
through pattern recognition. But CRUD/USER processing is not a good
application for the OO paradigm because there is really only one problem
being solved: converting back and forth between the database and UI
views of the data. IOW, CRUD/USER applications are pipelines between
the DB and the UI while the software user is solving a particular
problem externally. The UI and RDB paradigms are very well defined so
that pipelining has already been largely automated in RAD IDEs and
canned layered model infrastructures. [BTW, I also feel there is a lot
of excellent design in the automation for CRUD/USER processing. Some
very smart people put a lot of work into coming up with that stuff.]
IMO, though, OO is overkill in that context. The OO paradigm is focused
on providing maintainability in large, complex applications. When most
of the necessary software comes from third party automation, the
solution is no longer large and complex. Thus, by its very nature RAD
is reducing large, complex problems into smaller, simpler ones through
automation. [That's not to say that one would not use the OO paradigm
to, say, construct a RAD IDE. B-)]
Bottom line: I use CRUD/USER to simply identify a class of applications
where I don't think the OO paradigm is very useful. (OTOH, I see basic
application partitioning as being the main justification for separating
the concerns of the problem solution from the concerns of persistence.)
>> In addition, a big advantage of encapsulating in a subsystem is large
>> scale reuse. Once one abstracts the invariants of the DB paradigm de
>> jour properly, then the subsystem can be reused across applications.
>> Thus the RDB paradigm is abstracted exactly the same way regardless of
>> the semantics of the data. That is, one encodes the invariants of the
>> paradigm and leaves the details to data (e.g., mapping of table names,
>> field names, etc.). To do that one just applies basic problem space
>> abstraction to the RDB subject matter, so there it really costs very
>> little extra to achieve that reuse.
>
> Reuse is valuable and admirable, but the creation of an "encapsulating
> .. subsystem" may be unnecessary, and touting the benefits of reuse
> sounds like making virtue of necessity--except it may not be necessary
> in the first place.
I submit that encapsulation of the database mechanisms is always
necessary except for trivial applications. It is basic separation of
concerns. The problem solution doesn't care if the data is stored in
flat files, an RDB, an OODB, or clay tablets. The problem solution
should not have to know about mechanisms like SQL query construction,
optimizations like anticipatory caches, or encoding/decoding of dataset
formats.
Note that the CRUD/USER environments already provide exactly that
encapsulation by providing a Data Layer that is isolated from the rest
of the application through an interface. IOW, providing that
encapsulation is a fundamental element of CRUD/USER structure. I'm just
generalizing beyond CRUD/USER. It is just basic modularization that
really doesn't have much to do with the OO paradigm. The OO paradigm
only enters the picture through specific mechanisms for the interface
like DAO and inheritance composition.
> A database API is reused everywhere applications interface with DBs.
> The mapping of the DB (like VW Smalltalk's EXDI or Java's JDBC) is
> reused--it makes the DB's API more language-malleable, and that's reused
> everywhere--regardless the application domain (I'm not aware of many
> people that have created their own replacements for either EXDI or JDBC).
You are talking about reuse _of the database_ across applications. IOW,
any application talks to the given database using the same interface.
But it is a particular database paradigm (e.g., RDB vs. OODB vs.
ISAM...) using particular computing environment technologies (JDBC,
etc.). Thus that API is only reusable across applications if the same
database and supporting technologies are available in a particular
computing environment.
I submit that the problem solution should be exactly the same regardless
of what computing environment one is in. For example, at the OOA/D
level the design should be implementable without change on any platform
with any storage mechanisms. That portability can only be achieved if
one separates the persistence access concerns from the problem solution
and decouples them through an interface. Only then can one substitute a
new environment without touching the problem solution in any way
> You seem to be advocating the creation of a layer between the
> application the the language-DB interface that maps the application
> domain objects to the DB--I'm guessing classes to tables, instances to
> rows, and fields to columns. The /infrastructure/ you're describing
> (I'm guessing) gives OO instances behaviors that allow them to either
> instantiate themselves from or /persist/ themselves to the DB. It is
> this behavior, the endowing of DB-awareness (or persistence--a rose is a
> rose...) into our classes and their instances, which is characteristic
> of the denial I was talking about. It is the premise of object
> databases and their lesser incarnations of object-relational products
> that the boundary between application and DB is better buried than
> exposed. The simple fact that it /can/ be done proves to many that it
> /should/ be done. This is a Jedi mind trick.
I am advocating inserting an interface that separates the problem
solution from the persistence access mechanisms. There is nothing new
in this. The existing RAD layered models already do exactly what I am
advocating. Consider the classic model:
Presentation
--------------
Business
--------------
Data
In the RAD world the Data Layer actually has two pieces; one on the
client side and the rest on the server side. That division just gets
hidden because all the networking is hidden. At a minimum one has to
link in a bunch of infrastructure modules into the application for the
Business objects to be able to talk to the server. When the computing
environment technologies change, one just links in a new set of
infrastructure modules.
Now some layered models take this a step further and have more layers.
In such models the client-side and server-side Data Layers are
explicitly separated. In that case the client-side Data Layer has the
responsibility for mundane tasks like forming SQL queries and
encoding/decoding SQL datasets. What I am talking about is analogous to
that layer that decouples the Business Layer through an interface so the
Business objects don't have to know about the specific mechanisms.
(I use subsystems rather than layers simply because once one is out of
the pipeline business the layers are more complicated and one needs to
partition laterally as well as vertically.)
> If I'm right in understanding what you're saying (and what countless
> vendors, analysts, and pundits sell, present, and report), this is
> exactly where our designs part company and where Transaction Oriented
> Processing (or TOA--but I dislike the word architecture when used with
> software) proposes an alternative, thinner, and simpler model.
It's thinner and simpler because it doesn't separate concerns. Failing
to separate concerns means that one must touch the solution logic
whenever something changes in the persistence realm and that is a
potential maintainability problem.
[Though I would bet that the total executable code size will be less if
one does encapsulate the persistence access concerns because of encoding
invariants. That's because the same executable code works for all
queries. So the more tables and joins needed, the more embedded code is
littered throughout the solution if one does queries on a
context-by-context basis.]
>> One needs to design the interface to the problem solution's needs.
>> Thus it is the interface that needs to be replaced during large scale
>> reuse (think: Facade pattern). To be successful the interface needs
>> to be a pure message-based interface so that each side can map the
>> message ID and data packet into its own unique view. That, in turn,
>> means a consistent mapping of identity on each side of the interface.
>
> The belief that there needs to be symmetrical mappings on either side of
> the interface assumes their needs to be a mapping in the first place.
> That's surprisingly similar to petitio principii, the fallacy of
> assumption more commonly known as "begging the question." TOA/TOP
> proposes (and I know I haven't gotten that far in the article) the
> database and its application domain stored procedures are the only
> persistence mechanism necessary, and that the benefits of a focused,
> single, data-permeable gateway between application and database far
> exceed the benefits of O/R mappings--regardless of abstraction--and that
> its lightweight appearance shouldn't be dismissed as missing heavyweight
> kick.
Sorry, but I don't follow this. What other persistence mechanism do you
think is necessary in the approach I advocate? All I am doing is
decoupling the problem solution from the persistence mechanism de jour.
>> Fortunately, that is usually easy for data, especially for paradigms
>> like the RDM that are designed to provide generic identity. Half the
>> work is done if the DBMS schema is available to the application. So
>> one just needs a bunch of table lookups to map data packet elements
>> into Table/Field identifiers to construct SQL strings. Those lookup
>> tables get defined from external configuration data that maps the
>> interface messages into the RDB schema.
>>
>> The price of this is encode/decode of message data packets on each
>> side of the interface. For a SQL RDB, that effectively means
>> duplicated dataset encode/decode. Fortunately DB access is in
>> milliseconds while encoding/decoding data packets is in microseconds
>> so nobody is likely to notice anything except the developer's extra
>> keystrokes, which don't count in the overall scheme of things.
>
> It may be early in the AM (for me) but I'm not following what your
> talking about above.
Which paragraph is the problem? (I don't want to elaborate on both if
only one is a problem.)
>> When there actually is a 1:1 mapping between the solution and DB
>> views, that message indirection is redundant. But that decoupling
>> pays huge bonuses when the mapping is not 1:1, when the mapping
>> changes in the future, and for large scale reuse across applications.
>> The thing is that once one is outside the CRUD/USER realm when solving
>> some customer problem, one can't know when the mapping will be 1:1 or
>> when it might change.
>>
>> So one should solve the customer problem without worrying about what
>> the DB looks like. If some objects happen to map 1:1 that is
>> transparent because the interface is designed to the solution's data
>> needs (e.g., "Save this pile of data I call 'X' and give it back to me
>> when I ask for 'X' later"). IOW, one should be able to solve the
>> customer problem and design the DB access interface without knowing or
>> caring how the stored data is ultimately organized.
>
> But the customer problem can be solved without using OO. It can be
> solved in FORTRAN, COBOL, C, LISP, or JavaWOO (much of the Java I've
> seen isn't very OO at all so I wanted to include it in my list of non-OO
> languages--Java Without Object Orientation). The customer's problem and
> the DB are constants. Only languages and idioms lie between and may be
> interchangeable. In fact, if the DB domain requirements are consistent
> (its domain-API expressed through procedures) then suitable applications
> can be written in multiple languages without compromising the DB's design.
The first sentence is why I don't think encapsulating persistence access
in a subsystem is an OO issue. OO just provides useful conceptual
terminology like 'encapsulation' and 'decoupling'. It is basic
modularization and I've done the same thing in FORTRAN and PL/I long
before OO. [Tougher to do in COBOL because the notion of 'record'
pretty much married data structures to the database view. B-)]
For the rest, the customer's problem requirements don't say anything
about persistence mechanisms; they just define what data needs to be
persisted, when it should be stored, and what the data integrity rules
are. Persistence paradigms, technologies, and mechanisms are pure
computing space issues and those <nonfunctional> requirements are
defined by Systems Engineering. Which is another reason for separating
the concerns. B-) Similarly, the Data Modeling the DBA uses to define
RDB schemas is an exercise in database design that is quite distinct
from a particular application's problem solution.
>> At another level, I don't see how you can advise not embedding SQL in
>> the problem solution without providing the indirection of a
>> message-based interface.
>
> Use stored procedures. No mapping necessary.
Alas, this is a Major Hot Button for me. I think stored procedures are
one of the most abused mechanisms in IT. They are a maintenance
nightmare if they are triggered by the DBMS or call one another. (I
assume in your case they are triggered only by the application, but I
think there are still potential problems.)
It depends on what is in your stored procedures. If they are devoted
solely to accessing the database, then fine. The stored procedures are
effectively providing a generic API to the persistence mechanisms. IOW,
when your application invokes the stored procedure via a method call, it
is just sending a message to the Data Layer (or my DB access subsystem).
However, I think that is a slippery slope because it is tough to provide
an interface at that level of abstraction that does not reflect the DB
view of data. That is, it will be very tempting to make the granularity
of the interface map into individual queries or table accesses. As soon
as the interface maps the DB view rather than the solution's needs, I
think one gets into trouble with the DB driving the solution.
OTOH, if you define a stored procedure like getDataPileX where the
actual data collection returned can come from an arbitrary number of
tables (i.e., the stored procedure defines the necessary join) and the
returned values can be distributed to an arbitrary set of objects'
attributes on the solution side, then that is pretty much what I am
advocating. That is, the interface is designed to the problem
solution's needs.
However, when each stored procedure accesses the DB directly I still
think it would sacrifice benefits. One is decoding the returned values.
If they come back as a predefined server dataset, then one is not
fully decoupled from the mechanisms because one must decode that
particular format _within the problem solution_.
Another problem I think would be redundancy. If each stored procedure
constructs its own unique joins and SQL queries, one is repeating a lot
of low level processing. If that same interface were to a subsystem,
then one would only need one code set in the subsystem implementation to
process any query or join. IOW, at the subject matter's level of
abstraction, all queries and joins are processed the same way through
the same executable statements
Yet another problem is global optimization. With stored procedures at
the getDataPileX level that talked directly to the DB, it would be
tougher to provide global optimizations like anticipatory caching. But
that is relative easy to do within a subsystem where one has local state
variables and other infrastructures.
>> It will also be tempting to bend the solution around those surrogates,
>> which can lead to a less optimal and maintainable solution. That is,
>> if one has a preconceived notion of what the data looks like, one will
>> be tempted to build the solution around that view. So the generic, ad
>> hoc tail is wagging the solution dog.
>
> But preconceived notions exist on both sides of the interface. As the
> saying (almost) goes, two preconceived notions don't make a right
> (notion). The database must be language neutral--but it can't be domain
> neutral. If it were domain neutral then we'd be implying its design
> didn't support the domain--which would defeat the purpose, no?
I think this comes down to the difference between a customer business
domain and a problem space.
The views on each side of the interface are tailored to their specific
subject matters. IOW, one is abstracting solutions from different
problem spaces. Each subject matter has its own unique functional
requirements that do not overlap. Those solutions need to be abstracted
differently to provide proper optimization in each problem space.
While the problem solution and the data model both have their roots in
the same customer domain, they are quite different. The data model is
restricted to be a generic static view that is suitable for ad hoc
access (i.e., problem-independent access). The problem solution is
inherently dynamic in nature and needs to be tailored to a specific
problem context.
>> IOW, I think maintainability will be maximized when the subject matter
>> concerns are clearly decoupled. The strongest way to do that is by
>> encapsulating the DB access in a subsystem behind a pure message-based
>> interface.
>
> If you believe message-based interfaces are valuable, I propose TOA/TOP
> is a more faithful realization of it.
Message-based interfaces are a necessary condition for decoupling
subject matters. I think the real issue here is separating subject
matters in the first place.
In that case then SQL separation is "bad" because it often forces you
to visit 2 different places to make changes: the app code and the SQL
area. If they are together in the same module, then you don't have to
do the Separation Texas Two Step. No hopping around = saved time.
What is the alleged bad scenario that separation protects us from?
> >
> >
> >> and emphasizing low coupling between modules.
> >>
> >
> > I've found "coupling" to be a rather subjective property such that it
> > is a nearly useless catch-phrase these days. (That is until the point
> > somebody defines it clearly and not tied to subjective criteria, such
> > as what app part is more likely to change.)
> >
> Coupling is a relative term, not subjective--similar to a noise ratio.
> Coupling describes the interdependence between two related objects. Low
> coupling is realized when significant changes can be made to one object
> without affecting the other. High coupling indicates changes can not be
> made to one without requiring changes to another.
Okay, but often there are multiple "axises" by which something can
change and not all changes are necessarily an equal frequency. It
usually turns into a case of balancing and weighing alternatives, not a
slam-dunk single right choice. OO fans have a distorted view of change
probabilities in my observation, driven by books that keep pointing out
change patterns that OO is good at, ignoring those it is bad at.
> In the case of OO and
> DB, when a mapping exists between the two, the nearer that mapping
> approaches 1:1 the more tightly coupled the application and DB are. In
> a 1:1 mapping nothing in the DB can be changed without affecting the
> application, nor can anything be changed in the application data model
> without affecting change in the DB.
Can you present a specific scenario? If you don't meta-tize the
elements (such as in a data or field dictionary table, or the ugly fad
of XML configs), then such would often be needed if one is dealing with
the same thing. If the Social Security Number (SSN) is stored in the
database, but the app also uses the SSN, then there is likely to be an
unavoidable coupling there.
>
> --
-T-
Technically you are correct. However, a database is already a detailed
model of the domain at hand. I often learn more about a domain from
looking the schema and samples of its data than from apps or other
means. If you ignore this instead of piggy-back on it, then you will
end up duplicating a lot of effort. By staying close to the DB you gain
a lot of domain abstractions that were discovered the hard way over the
years.
Plus the DB can offer abstraction tools for specific apps also, such as
views.
Most of the complexity of a biz can be modelled in the DB if you know
how. Behavior then becomes the real "implementation detail", not the
data, as you view it. Most behavior can be factored into attributes
and relational operations on these attributes.
RDBMS are not "storage"; they are domain abstraction tools. They are
"domain attribute managers", not "storage". Disks are for storage and
DB != disk. Storage is just one of many features a DB usually offers.
Until you realize this, you will always keep searching for ways to wrap
away relational into a dark corner and reinvent its features in OO the
hard way.
>
> Note that the CRUD/USER environments already provide exactly that
> encapsulation by providing a Data Layer that is isolated from the rest
> of the application through an interface. IOW, providing that
> encapsulation is a fundamental element of CRUD/USER structure. I'm just
> generalizing beyond CRUD/USER. It is just basic modularization that
> really doesn't have much to do with the OO paradigm. The OO paradigm
> only enters the picture through specific mechanisms for the interface
> like DAO and inheritance composition.
Large hiearchical taxonomies are nearly useless in my domain. Set
Theory is a much better tool for classifying loads of stuff in my
domain, and OO is anti-Set because it is pro-pointer and pro-tree
(navigational). This is not really a battle over databases versus OO,
but of sets versus navigational. Navigational results in shanty-town
messes. Perhaps you are "free from the DBA", as you describe above, but
you are not free from the mess you created. Some DBA's are pain in the
bleep, but usually they are just trying to enforce consistency and
system integrity rather than have wild cowboy OO-class picasso's like
you F things up.
OO is jungle modeling, void of civilization.
(I snipped the rest because it probably could not be settled without
studying specific scenarios.)
>
> H. S. Lahman
-T-
I guess I'm thinking we should design as simply as possible. For OO to
shine does it really require something more flashy than CRUD/USER apps?
If those apps are so simple to write then we should all strive to make
all our apps CRUD/USER apps.
I can imagine darn few applications that aren't CRUD/USER. But perhaps
that's because I think of everything as being a transaction. Before you
do anything to /my/ database you had better have logged in first and had
a valid session. Want to query the DB? You'd better have a valid
session so I can record what you did and that query had better have a
stored procedure to implement it. Want to change something? You'd
better have a valid session so I can record that, too, with a procedure
to match. What are you trying to do? It had better be a transaction I
already know about so I can record it properly and make sure your
session gets credit (or blame).
When my auditors show up they love me. I can tell them who did what
when. My users love me 'cause I can tell them what something looked
like before they made the change, last week, last year, or from epoch.
If they need to correct something it requires another transaction--also
attributed to the user that made the fix, what they did to fix it, and
when they fixed it. If there isn't a transaction to fix what got
screwed up we create one ('cause that won't be the last time that
happened) or on rare occasions (and critical) my programmers fix it (but
leave an audit trail in our bug-tracking system).
Our batch programs use the same transactions as our users do. My
security system is implemented at the transaction level. A transaction
processor cares little where a transaction comes from. It can be batch,
web, ATM, phone, fat client, web service, or script. Doesn't matter.
Embedded applications may not fit this model, but I'm unsure 'cause I've
never written one. Maybe they do and I'm thinking too provincially.
>
> I submit that encapsulation of the database mechanisms is always
> necessary except for trivial applications. It is basic separation of
> concerns. The problem solution doesn't care if the data is stored in
> flat files, an RDB, an OODB, or clay tablets. The problem solution
> should not have to know about mechanisms like SQL query construction,
> optimizations like anticipatory caches, or encoding/decoding of
> dataset formats.
My goal is to make every program as trivial as possible--but not too
trivial.
<http://blogs.in-streamco.com/anything.php?title=programming_rules>
That may or may not be significant. Perhaps you can describe a
non-trivial example?
>
> <snip>
> I am advocating inserting an interface that separates the problem
> solution from the persistence access mechanisms. There is nothing new
> in this. The existing RAD layered models already do exactly what I am
> advocating. Consider the classic model:
>
> Presentation
> --------------
> Business
> --------------
> Data
>
> In the RAD world the Data Layer actually has two pieces; one on the
> client side and the rest on the server side. That division just gets
> hidden because all the networking is hidden. At a minimum one has to
> link in a bunch of infrastructure modules into the application for the
> Business objects to be able to talk to the server. When the computing
> environment technologies change, one just links in a new set of
> infrastructure modules.
Our goal is to have something like this:
Presentation (where)
-------------
Business (why)
-------------
Transaction server/Security (who & what)
-------------
Stored Procedures (how)
-------------
DB (don't know what to put here)
>
> Now some layered models take this a step further and have more layers.
> In such models the client-side and server-side Data Layers are
> explicitly separated. In that case the client-side Data Layer has the
> responsibility for mundane tasks like forming SQL queries and
> encoding/decoding SQL datasets. What I am talking about is analogous
> to that layer that decouples the Business Layer through an interface
> so the Business objects don't have to know about the specific mechanisms.
That's where I won't go with you. We had a little bit of SQL that
snuck-through into production and it assuredly bit us in the butt as
soon as we needed to make a minor change to the DB and discovered I need
to ship a new program along with it because its SQL require changing.
For topmind, that's too-tight coupling.
Life has improved since that SQL was replaced with a stored procedure
call whose implementation can change independent of the programs that
use it.
>
> <snip>
>
>> If I'm right in understanding what you're saying (and what countless
>> vendors, analysts, and pundits sell, present, and report), this is
>> exactly where our designs part company and where Transaction Oriented
>> Processing (or TOA--but I dislike the word architecture when used
>> with software) proposes an alternative, thinner, and simpler model.
>
> It's thinner and simpler because it doesn't separate concerns.
> Failing to separate concerns means that one must touch the solution
> logic whenever something changes in the persistence realm and that is
> a potential maintainability problem.
You're not including your abstraction layer as part of your "solution
logic." I'm curious if your abstraction layer is independent enough
that when you change tables or column names in the DB there's nothing
outside the DB to change?
We're both trying to do that, aren't we?
When I wrote that paragraph it made perfect sense to me. :-)
Weren't you advocating a 1:1 mapping between the DB and the application
(even if through another layer)? That's the symmetry I was referring
to. Believing that mapping between the DB and any abstraction layer be
at or near 1:1 assumes first that a mapping exist and second that a 1:1
mapping is some kind of goal. The second goal assumes that first, and I
believe the first is ill-founded.
>
>>> Fortunately, that is usually easy for data, especially for paradigms
>>> like the RDM that are designed to provide generic identity. Half
>>> the work is done if the DBMS schema is available to the
>>> application. So one just needs a bunch of table lookups to map data
>>> packet elements into Table/Field identifiers to construct SQL
>>> strings. Those lookup tables get defined from external
>>> configuration data that maps the interface messages into the RDB
>>> schema.
>>>
>>> The price of this is encode/decode of message data packets on each
>>> side of the interface. For a SQL RDB, that effectively means
>>> duplicated dataset encode/decode. Fortunately DB access is in
>>> milliseconds while encoding/decoding data packets is in microseconds
>>> so nobody is likely to notice anything except the developer's extra
>>> keystrokes, which don't count in the overall scheme of things.
>>
>> It may be early in the AM (for me) but I'm not following what your
>> talking about above.
>
> Which paragraph is the problem? (I don't want to elaborate on both if
> only one is a problem.)
Actually, both (sorry). I don't understand what DB identity has to do
with half the work or look-up tables. Going from that to
encoding/decoding messages left me in the dust.
>
> <snip>
>
>>> At another level, I don't see how you can advise not embedding SQL
>>> in the problem solution without providing the indirection of a
>>> message-based interface.
>>
>> Use stored procedures. No mapping necessary.
>
> Alas, this is a Major Hot Button for me. I think stored procedures
> are one of the most abused mechanisms in IT. They are a maintenance
> nightmare if they are triggered by the DBMS or call one another. (I
> assume in your case they are triggered only by the application, but I
> think there are still potential problems.)
True--they are called by the application. Our system has only a single
trigger. I don't like them--not explicit enough.
>
> It depends on what is in your stored procedures. If they are devoted
> solely to accessing the database, then fine. The stored procedures
> are effectively providing a generic API to the persistence
> mechanisms. IOW, when your application invokes the stored procedure
> via a method call, it is just sending a message to the Data Layer (or
> my DB access subsystem).
Our procedures (I'm unsure if you'll think this is good or bad) are the
lowest-level of our business logic. To update an account a procedure
must be called with your current session ID, an account number, a
transaction type (ie deposit, withdrawal), and a few other params. The
procedure knows how to update the account table, the daily account
balance table, the transaction history tables, validate the tran type,
and a few other miscellaneous housekeeping functions. Things that any
application that may want to execute an account transaction would have
to do--regardless where they came from or what language they came from.
Our procedures, our about /how/ things happen--not why.
Not necessarily. I tend to use a fair amount of SELECT asterisks if
there is not a large volume of result records. Plus, often the app will
change also if the schema changes such that we will be visiting the app
code such that we only have one module to change. Otherwise, you don't
need to change the SQL if the apps aren't using a new column. See
below.
> > <snip>
> >> In the case of OO and
> >> DB, when a mapping exists between the two, the nearer that mapping
> >> approaches 1:1 the more tightly coupled the application and DB are. In
> >> a 1:1 mapping nothing in the DB can be changed without affecting the
> >> application, nor can anything be changed in the application data model
> >> without affecting change in the DB.
> >>
> >
> > Can you present a specific scenario? If you don't meta-tize the
> > elements (such as in a data or field dictionary table, or the ugly fad
> > of XML configs), then such would often be needed if one is dealing with
> > the same thing. If the Social Security Number (SSN) is stored in the
> > database, but the app also uses the SSN, then there is likely to be an
> > unavoidable coupling there.
> >
> I'm not concerned about single fields like SSN. I'm more concerned with
> a mapping of an Account class to an account table and their fields. If
> the account table changes or some of its lesser-used columns moved to
> another table then I have a maintenance problem. It matters little to
> me if I'm editing XML configs, Java code, or anything else. Any update
> that can be localized to the DB only, the happier I am--especially if I
> don't have to build/ship/system-test a bunch of software.
I am still not understanding your scenario and why or why not it would
require an app change. Since I don't use classes, if a DB change does
not affect a field that an app or module uses, I don't have anything to
change. Perhaps Classes are your problem, not databases. blOOt at work.
If your classes have to mirror the DB out of OOP purity or obsession
with compile-time checking, then you have committed yourself to a
wasteful dance. My app usually does not change if it does not actually
*use* a given field.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
Every select from your account table is now broken.
Now, some people may think the change unwarranted or not worth the cost
of updating the applications's SQL. Of course, if everything were
behind a stored procedure it wouldn't be a problem cause an procedure's
implementation can change without changing its interface (calling
parameters or result columns)
Others may think a view is just as handy--which is almost true. Except
that if I want to record what everyone does in a transaction history
table I need to do other MACRO things a view doesn't allow for. Which
is fine 'cause I can use stored procedures.
I am not sure why a view would not work.
Let me see if I got this strait: You removed the "current" summary
table because we decided to sum as-needed instead, using a new daily
table (a more fine-grained summary). Correct?
Then the summary table can be turned into a view with the same columns
as the no-longer-used Current_Summary table. Recording every
transaction is a separate issue, isn't it? The daily table gets its
info from per-day sums of individual transactions, but the changed
queries don't have to know about that, correct? If a few queries need
direct access to the daily or individual transaction table, then they
do. Just because a few stragglers cannot use views does not mean the
majority can't. I've noticed a roughly 9-to-1 ratio of reading queries
to writing queries in most shops. The drawbacks of views will thus only
extend to about 10% of all queries that use the affected tables on
average.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
Additionally, there's no transaction history on SELECT statements
(unless you wanted to enable security auditing.. but that's nasty). How
do you make sure the requesting user has permission to see the view?
How do you record their query? How do you make sure they only see
accounts their allowed to see? Those features aren't present in views
'cause views don't have parameters (like user ID etc--unless you've
added application user IDs to the database).
In our system's we'd have something like p_account(@sessionKey,
@accountKey). The procedure would be able to verify permissions through
the sessionKey (find the user, which party they belong to, which
accounts belong to them, and if they have READ access), can record the
transaction (sessionKey, transaction type, accountNumber), AND bury its
implementation behind its interface (procedure name and parameters) so
that changes to the account table, security, or transaction logging
won't affect application code.
Implementing our system with a TOA/TOP design has afforded us unusual
responsiveness to fixes and enhancements--many able to wind their way
through development, QA, and into production within hours of the
request--with minimal changes to applications. The fewer changes we
make the faster and more confidently we can move to production.
However, relational *does* allow something to be two "places" at once;
it transcends physical relationships and 3D space. As long as the
attributes or relationships are there to *identify* different aspects,
different aspects can be searched, sorted, or presented however the
h3ll one pleases. I am not stuck with Bill Gate's physical grouping or
Scott McNeally's.
Thus, if you "mark off" where SQL is to make it easier to find and
change, you could potentially group and edit all SQL together on your
screen if needed. (Or perhaps all SQL matching keywords sought.)
The problem is that OOP and files don't easily supply this point of
view. In the future hopefully code will be managed in RDBMS at a
function/method granularity, if not smaller, instead of hierarchical
directories and files at a module level.
But first Pinky and I have to convince the world that tables are
better....
-T-
Johny Appleseed was a "troll" too.
Why? What is wrong with using the same name? Some DBA's say that one
should put "_VW" in all view names, but that is the same debate as the
hungarian notation debate, and that is a long on-going debate that I
don't want to get into again. I am generally against hungarian
notation, but I am not zealotic against mild usage.
A global search-and-replace of just table names is not that difficult
anyhow. If your app is so important/fragile that such might brake
something, then consider unit testing. Unit testing is generally
recommended for big financial apps anyhow.
>
> Additionally, there's no transaction history on SELECT statements
> (unless you wanted to enable security auditing.. but that's nasty).
Are you saying that your DBMS cannot log view-based query access? If
so, that is a flaw/shortcomming of the DB product, not of relational in
general. Some OO'ers suggest they like OO because even though one ends
up reinventing a lot of DB-like features in OO app code, at least they
have control over it such that they can add missing features. But that
is almost like saying one should write their own word-processor because
MS-Word has bugs. There are open-source RDBMS such as PostGre that
permit one to alter the source code if needed, if you want to go that
route.
> How
> do you make sure the requesting user has permission to see the view?
> How do you record their query? How do you make sure they only see
> accounts their allowed to see? Those features aren't present in views
> 'cause views don't have parameters (like user ID etc--unless you've
> added application user IDs to the database).
My comments above apply. If you need that tight of control, perhaps
replace the SQL of the affected queries with stored procedures, which
allow custom coding of logging and access control. I generally don't
recommend stored procedures, but for tighter control (at the expense of
more busy-work) they may work better for you.
>
> In our system's we'd have something like p_account(@sessionKey,
> @accountKey). The procedure would be able to verify permissions through
> the sessionKey (find the user, which party they belong to, which
> accounts belong to them, and if they have READ access), can record the
> transaction (sessionKey, transaction type, accountNumber), AND bury its
> implementation behind its interface (procedure name and parameters) so
> that changes to the account table, security, or transaction logging
> won't affect application code.
>
> Implementing our system with a TOA/TOP design has afforded us unusual
> responsiveness to fixes and enhancements--many able to wind their way
> through development, QA, and into production within hours of the
> request--with minimal changes to applications. The fewer changes we
> make the faster and more confidently we can move to production.
I see no reason why a flexible RDBMS cannot do the same. Switch to
PostGre, the most powerful open-source RDBMS, and add features that it
does not have that you need.
Further, whatever wrapping you suggest for SQL, procedures can do the
same. OOP is not required to wrap SQL. I do function-based wrapping all
the time for commonly-used SQL activities. Thus the "should wrap SQL"
and "OOP wraps better" issues/claims are generally different issues.
I'll believe OOP wraps better than procedural only when I see
side-by-side comparisions of typical, common, and representative
scenarios.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
I'm a little confused about what aspect of the OP's posted article you
disagree with. The article mentions OO design principles, but does not
appear to advocate wrapping SQL in OO constructs, or even using any OO
constructs at all. What the article /does/ suggest is the use of a set
of stored procedures to act as an API for application programmers
writing code against the database. With a well-defined API in place,
you can change the applications and the database independently of one
another as long as the API stays the same.
That sounds like a solid approach to me by itself. The article does go
farther than that of course, leveraging the stored procedures to
implement detailed auditing within the database. But I don't understand
your objection to keeping an audit history in database tables either.
If you need an audit trail, and all access to the DB is through the
stored procedure API, adding audit functionality would be rather easy
to do. Are you suggesting that it would be /less/ work to compile a
custom version of Postgres?
- Matt McGill
Which are subject to debate and/or overlap with non-OO-specific design
principles also.
> but does not
> appear to advocate wrapping SQL in OO constructs, or even using any OO
> constructs at all. What the article /does/ suggest is the use of a set
> of stored procedures to act as an API for application programmers
> writing code against the database.
This is still "wrapping". Whether that is an "OO design principle"
depends on who you ask. Perhaps this is a disagreement over semantics.
OOP/OOA/OOD is not defined very consistently. However, it appears to
be a general "OOP habit" to wrap anything that is not OOP behind
classes, or perhaps stored procedures as a consolation.
> With a well-defined API in place,
> you can change the applications and the database independently of one
> another as long as the API stays the same.
Yes, but it is not a free lunch, as already described. I weigh the
maintenance trade-offs based on past experience, and often decide that
stored procedures (or wrapping SQL in general) is not warrented per
case or per app.
Experienced developers weigh the trade-offs, not stick to rigid dogma
that X must *always* have Y, etc. (Some call me dogmatic against OOP,
but it is just that OO being the best solution is too small a
percentage of most apps to bother introducing yet another paradigm into
an app, complicating staffing issues.)
>
> That sounds like a solid approach to me by itself. The article does go
> farther than that of course, leveraging the stored procedures to
> implement detailed auditing within the database. But I don't understand
> your objection to keeping an audit history in database tables either.
Where did I object to that?
> If you need an audit trail, and all access to the DB is through the
> stored procedure API, adding audit functionality would be rather easy
> to do. Are you suggesting that it would be /less/ work to compile a
> custom version of Postgres?
>
> - Matt McGill
-T-
Why sprinkle SQL around, especially if its redundant? That coding style
shows no appreciation for the lessons learned designing relational
databases (unless those, too, were rife with redundancy)
I thought of another reason overnight not to sprinkle SQL around --
performance. I can tune a procedure and everyone using it benefits.
How can I make sure everyone's SQL is efficient? If table statistics
change and old SQL sucks wind, am I go to hunting for it? No, I think
the value the burying or generating SQL hither and yon throughout an
application is sufficiently discredited.
What maintenance trade-offs? The only trade-off that seems to apply
here is time vs. flexibility. Introducing stored procedures gives you
more flexibility, and adds the time overhead of actually creating the
stored procedure. Except that I'm not convinced there's any time
overhead for anything more complicated than an isolated select, insert,
or update. A database transaction (not a transaction as described in
the article being discussed) often involves multiple selects and
inserts/updates, with various sanity checks mixed in. Not all
programming languages make DB access easy, and not all DB apis allow
you to make use of all the vendor-specific functionality present in any
given database product. The time you'd waste writing DB access code in
C will more than account for the time it takes to define a stored
procedure, and once you have the procedure, calling it from your
language of choice is simple.
>
> Experienced developers weigh the trade-offs, not stick to rigid dogma
> that X must *always* have Y, etc. (Some call me dogmatic against OOP,
> but it is just that OO being the best solution is too small a
> percentage of most apps to bother introducing yet another paradigm into
> an app, complicating staffing issues.)
>
> >
> > That sounds like a solid approach to me by itself. The article does go
> > farther than that of course, leveraging the stored procedures to
> > implement detailed auditing within the database. But I don't understand
> > your objection to keeping an audit history in database tables either.
>
> Where did I object to that?
Here:
>> Additionally, there's no transaction history on SELECT statements
>> (unless you wanted to enable security auditing.. but that's nasty).
>
> Are you saying that your DBMS cannot log view-based query access? If
> so, that is a flaw/shortcomming of the DB product, not of relational in
> general. Some OO'ers suggest they like OO because even though one ends
> up reinventing a lot of DB-like features in OO app code, at least they
> have control over it such that they can add missing features. But that
> is almost like saying one should write their own word-processor because
> MS-Word has bugs. There are open-source RDBMS such as PostGre that
> permit one to alter the source code if needed, if you want to go that
> route.
You are shadow boxing. The OP's need for logging has nothing to do with
OO, and OO techniques were not proposed as the solution. The RDBMS
likely can log query access, but who's to say that the logs it produces
are easily manipulatable programmatically? Modifying the source code of
an RDBMS to add functionality that can be easily obtained via stored
procedures is ridiculous, and casts a dubious light on all of your
conclusions regarding 'maintenance trade-offs.' I can think of few
maintenance nightmares quite like maintaining your own special branch
of Postgres for a few app-specific requirements. That's a way to get
yourself fired.
-Matt McGill
>>> I sense, perhaps incorrectly, a disparaging view of CRUD/USER
>>> applications. You comments in both this thread and the thread
>>> "Relational Database & OO" seem to indicate an opinion that CRUD/USER
>>> applications are too simple to be representative of sophisticated OO
>>> designs.
>>
>>
>> Not disparaging. CRUD/USER processing is a major segment of IT and it
>> is going to be around for as long as /people/ analyze gobs of data
>> through pattern recognition. But CRUD/USER processing is not a good
>> application for the OO paradigm because ...
>
> <snip>
>
> I guess I'm thinking we should design as simply as possible. For OO to
> shine does it really require something more flashy than CRUD/USER apps?
> If those apps are so simple to write then we should all strive to make
> all our apps CRUD/USER apps.
Remember that for CRUD/USER applications the real customer problem is
solved by the software user who analyzes the presented data, not the
software. The CRUD/USER pipeline application just allows the data to be
presented in a manner that makes that analysis easier for the user.
OTOH, once one is outside the realm of CRUD/USER the software itself
must solve business problems. IOW, the user is presented with the
/results/ of analysis rather than just the <reorganzied> data that
supports analysis.
So long as the business rules are simple, one can tweak CRUD/USER
pipelines to accommodate them. Most of the classic accounting systems
(GL, accounts payable, accounts receivable, etc.) were done pretty much
this way. One can argue that the entire notion of transaction
processing in MIS was the result of casting financial problems into
CRUD/USER data management pipelines. That's because the heaviest duty
business rules are things like allocations. IOW, one performs a few
very simple operations repeatedly one vast gobs of data.
You will note, though, that today all those systems can be bought OTS
because they /do/ have simple business rules, those rules were highly
standardized, and they tended to be independent. So they were easy to
automate in a generic fashion.
But that doesn't work when the solution is big and complex. Allocating
an advertising budget optimally among national, regional, and local
media markets based upon complex cost functions and demographics is not
the kind of problem one can solve with transactions. That kind of
optimization problem lives in the Operations Research realm.
As an example from my IT Days, inventory control for a grocery chain is
actually a very tough problem because they run on razor thin margins.
The difference between turning over inventory in 8 days vs. 9 days is
bankruptcy. B-) When forecasting one has to account for things like
increased product volume due to promotional discounts and the legendary
September Bulge in New England. [Grocery sales in New England for the
two weeks following Labor Day are almost twice as great as any other
time of the year. The chains actually pay railroad demurrage to store
goods in freight cars just before Labor Day because they don't have
enough room in their warehouses.]
But forecasting demand using highly sophisticated algorithms is just one
part of the problem. The inventory control application also has the
optimize shipments to individual stores where each shipment involves
hundreds of distinct item types. Just the problem of figuring out what
will fit into the trucks is nontrivial. Every day a Fortune 500 food
retailer needs to solve logistic problems equivalent to those for the
amphibious invasion of Inchon during the Korean War.
Sure there is a mongo database and every day a gazillion transactions
are run against it. But that is secondary to the real problems that the
IC application is actually solving. Those problems are not solved by
somebody communing with data on a GUI display or in a 4' thick report.
The data entry (e.g., reading cash register OCRs or buyers entering this
week's discounts) is handled pretty much like typical CRUD/USER
processing, but that is where the similarity ends. The IC application
has to figure out when vendor orders are produced, what is ordered,
where it is put in each warehouse, what is allocated to each store, how
to load/route trucks, and when to ship to the stores. When all that is
done, it has to crunch numbers so Management will know how much money
they lost that day. There is just no way to do all that within the
constraints of a CRUD/USER layered model.
>
> I can imagine darn few applications that aren't CRUD/USER. But perhaps
> that's because I think of everything as being a transaction. Before you
> do anything to /my/ database you had better have logged in first and had
> a valid session. Want to query the DB? You'd better have a valid
> session so I can record what you did and that query had better have a
> stored procedure to implement it. Want to change something? You'd
> better have a valid session so I can record that, too, with a procedure
> to match. What are you trying to do? It had better be a transaction I
> already know about so I can record it properly and make sure your
> session gets credit (or blame).
Fine. But this sounds like all you are doing is providing the user with
data. What problem is solved in the Business Layer?
If all one is providing are data entry validations or convenient
displays of the database data, one is in the CRUD/USER realm. But if
whoever is encoding the Business Layer solution has so much to worry
about that they don't even want to think about how the data is stored or
how one talks to the user (e.g., browser vs. GUI), then one isn't in
Kansas anymore.
>
> When my auditors show up they love me. I can tell them who did what
> when. My users love me 'cause I can tell them what something looked
> like before they made the change, last week, last year, or from epoch.
> If they need to correct something it requires another transaction--also
> attributed to the user that made the fix, what they did to fix it, and
> when they fixed it. If there isn't a transaction to fix what got
> screwed up we create one ('cause that won't be the last time that
> happened) or on rare occasions (and critical) my programmers fix it (but
> leave an audit trail in our bug-tracking system).
That seems quite reasonable for a data entry system. But if that is all
you need to describe what the user was doing to solve some problem, then
you are in the CRUD/USER realm because the only problem being solved
/is/ data entry. IOW, the user already solved the business problem by
providing the right data.
> Our batch programs use the same transactions as our users do. My
> security system is implemented at the transaction level. A transaction
> processor cares little where a transaction comes from. It can be batch,
> web, ATM, phone, fat client, web service, or script. Doesn't matter.
That's fine, but it is just basic application partitioning. What is a
transaction? It is basically a message with {message ID, <data packet>}
where the data packet contains data values. And that sort of
message-based interface is exactly what I advocate for decoupling
subsystems.
My concern here lie in maximizing the decoupling by making sure the
interface transactions are based on the solution's needs rather than the
DB's needs. IOW, I want the subsystem interface to reflect the abstract
semantics of the service in the business context, not its mechanisms and
internal structure.
> Embedded applications may not fit this model, but I'm unsure 'cause I've
> never written one. Maybe they do and I'm thinking too provincially.
Almost all R-T/E applications employ event-based interfaces. An event
is just an asynchronous message with a by-value data packet. Aka, a
transaction. Remember that modularization via event-based interfaces
grew up in R-T/E in the early '50s. So one can argue that the modern
view of MIS transaction processing just co-opts the original R-T/E view
of modularization from long before disks, much less RDBs.
>> I submit that encapsulation of the database mechanisms is always
>> necessary except for trivial applications. It is basic separation of
>> concerns. The problem solution doesn't care if the data is stored in
>> flat files, an RDB, an OODB, or clay tablets. The problem solution
>> should not have to know about mechanisms like SQL query construction,
>> optimizations like anticipatory caches, or encoding/decoding of
>> dataset formats.
>
> My goal is to make every program as trivial as possible--but not too
> trivial.
> <http://blogs.in-streamco.com/anything.php?title=programming_rules>
> That may or may not be significant. Perhaps you can describe a
> non-trivial example?
I buy KISS as a goal, but not a mandate. If simplicity (i.e., compact,
elegant, quickly developed) was the only goal we would all be doing
functional programming. The OO paradigm itself intrinsically results in
verbose programs because of the heavy reliance on static structure.
I don't have any substantive argument with the referenced blog posting.
[I could argue that eliminating parsing is tough in a world full of
interoperability standards like XML and that polling, while avoided by
R-T/E people, pretty much comes with the territory in some form whenever
processing is inherently asynchronous. With distributed processing,
interoperability, multi-tasking, multi-user systems, and whatnot IT is
beginning to look more and more like R-T/E every day.]
As far as an example is concerned, I am not sure what you want an
example of. That one can specify a non-SQL interface for a DB access
subsystem? Or that one can define the invariants of SQL query
construction within such a subsystem? Of the mapping between the
interface and those invariant abstractions. In any event, I got an
example below that may be relevant.
It seems to me this is more of a deployment model than a logical model.
That is, any business rules and policies in the stored procedures
should be logically associated with the Business layer while any DBMS
access rules and policies should logically be associated with the server
layer. IOW, it really doesn't matter if one stores a procedure in a
DBMS, a DLL, function library, or whatever; that seems like a pure
deployment issue.
My real issue with the CRUD/USER layered models is that they were
devised as a convenient framework for CRUD/USER processing where one has
1:1 mapping between the UI and the RDB. They tend to break down when
most of what the application does lives in the Business layer so that
one wants to view the UI and persistence as services to the solution
(i.e., Presentation is a low-level peer of the server). IOW, the
standard model outside the CRUD/USER realm is:
----------------------
Application subsystems -- high level control specific to the problem
----------------------
Service subsystems -- intermediate, reusable business logic
----------------------
Implementation subsystems -- those specific to the computing environment
----------------------
Architectural subsystems -- specific technology support
----------------------
Abstraction increases upwards while large scale reuse increases
downwards. Each subsystem represents a unique, well-bounded subject
matter from some problem space and it will address a unique suite of
functional requirements. In addition, all of the subsystems are related
through a directed acyclic graph of dependencies where the higher level
subsystem defines requirements on the lower level subsystems. IOW,
dependencies are defined in terms of requirements flows rather than
communication flows (another big difference with the CRUD/USER models).
All interfaces between subsystems are pure message-based data transfer
interfaces (aka transactions).
Typically UI and persistence access subsystems would be peers somewhere
within the implementation "layer". Architectural subsystems would
provide stuff like network protocols, hardware control, OS services, etc.
Again, let me emphasize that my problem with CRUD/USER organization is
not with it per se; it is about its inherent limitation to certain kinds
of processing. Within that niche it provides an excellent basis for
automation. And to do that the CRUD/USER layered infrastructures employ
very similar encapsulation and decoupling to what I advocate.
>> Now some layered models take this a step further and have more layers.
>> In such models the client-side and server-side Data Layers are
>> explicitly separated. In that case the client-side Data Layer has the
>> responsibility for mundane tasks like forming SQL queries and
>> encoding/decoding SQL datasets. What I am talking about is analogous
>> to that layer that decouples the Business Layer through an interface
>> so the Business objects don't have to know about the specific mechanisms.
>
> That's where I won't go with you. We had a little bit of SQL that
> snuck-through into production and it assuredly bit us in the butt as
> soon as we needed to make a minor change to the DB and discovered I need
> to ship a new program along with it because its SQL require changing.
But it seems to me you /are/ going with me on this. I am arguing that
one should /never/ have to touch the problem solution for any change in
the database details (so long as the data semantics remains the same).
If your problem is just about having to ship a new client executable, I
think that is a somewhat different problem than maintainability of the
problem solution itself. That is, it is a deployment problem. I think
we both want to avoid touching the client solution logic because of DB
changes. But the client still needs to access the DB correctly so
something has to change in the way that is done.
You deploy that access in the form of stored procedures. I would do it
with configuration data (probably in the DB itself) or as a separate DLL
for the DB access subsystem. Either way the client solution logic is
not touched or re-deployed.
>>> If I'm right in understanding what you're saying (and what countless
>>> vendors, analysts, and pundits sell, present, and report), this is
>>> exactly where our designs part company and where Transaction Oriented
>>> Processing (or TOA--but I dislike the word architecture when used
>>> with software) proposes an alternative, thinner, and simpler model.
>>
>>
>> It's thinner and simpler because it doesn't separate concerns.
>> Failing to separate concerns means that one must touch the solution
>> logic whenever something changes in the persistence realm and that is
>> a potential maintainability problem.
>
> You're not including your abstraction layer as part of your "solution
> logic." I'm curious if your abstraction layer is independent enough
> that when you change tables or column names in the DB there's nothing
> outside the DB to change?
Basically my model is:
[Problem Solution]
| |
+-----+ +-------+
| |
V V
[UI] [DB Access]
If the [Problem Solution] subsystem needs data or wants to save results,
it sends a message to the [DB Access] subsystem. The [DB Access]
subsystem recognizes the request and converts it into a SQL Query or
whatever.
When the [Problem Solution] decides it is time to save some data it
sends a message to [DB Access] like {saveX, data1, data2, data3, ...}.
The [Problem Solution] internally has some semantics it finds convenient
for 'X' so that it can map data1, data2, etc. to attributes in various
object abstractions, which it extracts to form the message.
When the [DB Access] receives the saveX message, it has its own mapping
of that message ID to RDB tables and tuples. So it maps data1, data2,
etc. to specific table fields positionally and plugs in the right
identity strings into the SQL update query.
Note that {saveX, data1, data2, data3, ...} might well be a stored
procedure in your system. That stored procedure would provide exactly
the same mapping to RDB table fields. From the perspective of the
problem solution there is absolutely no difference if the subsystem
interface provides a Facade-like API. [Provided the API is defined
around 'X', which represents the problem solution view rather than
predefined schema structures.]
The advantages of providing a <reusable> client subsystem rather than
stored procedures lie in economies of scale and optimization. Most of
the code in constructing a SQL string and a dataset only needs to be
done once for all queries. All one needs is a bunch of lookup tables to
provide text strings for specific table names and fields. In a
subsystem one can also provide more global optimization for things like
caching.
I argue that in CRUD/USER processing it is almost always 1:1 but outside
that realm it is fairly common for it not to be 1:1. Therefore the
solution developer would not be distracted by worrying about which
abstractions map 1:1 and which don't. Solve the problem properly and
then worry about mapping to the DB view.
As a practical matter it is easier to do /all/ the mapping elsewhere
than the problem solution. If it happens to be 1:1, fine. That way one
doesn't have to touch the problem solution if some abstraction that
originally mapped 1:1 ceases to do so when requirements change. One
also gets the advantage of the problem solution viewing all data
requests exactly the same way.
>>>> Fortunately, that is usually easy for data, especially for paradigms
>>>> like the RDM that are designed to provide generic identity. Half
>>>> the work is done if the DBMS schema is available to the
>>>> application. So one just needs a bunch of table lookups to map data
>>>> packet elements into Table/Field identifiers to construct SQL
>>>> strings. Those lookup tables get defined from external
>>>> configuration data that maps the interface messages into the RDB
>>>> schema.
>>>>
>>>> The price of this is encode/decode of message data packets on each
>>>> side of the interface. For a SQL RDB, that effectively means
>>>> duplicated dataset encode/decode. Fortunately DB access is in
>>>> milliseconds while encoding/decoding data packets is in microseconds
>>>> so nobody is likely to notice anything except the developer's extra
>>>> keystrokes, which don't count in the overall scheme of things.
>>>
>>>
>>> It may be early in the AM (for me) but I'm not following what your
>>> talking about above.
>>
>>
>> Which paragraph is the problem? (I don't want to elaborate on both if
>> only one is a problem.)
>
> Actually, both (sorry). I don't understand what DB identity has to do
> with half the work or look-up tables. Going from that to
> encoding/decoding messages left me in the dust.
OK. Let's say we have some clump of data the problem solution needs to
save. The solution creates a message and data packet and sends it to
the DB access subsystem. To keep it simple, let's suppose the message
is an API signature for a synchronous call in a subsystem interface that
is a Facade pattern: Facade::saveX (data1, data2, data3, ...).
The solution has some context that determines when the save should be
done and it knows what object attributes need to be saved in that
context. Since the developer defines the context, the developer
determines that the Facade::saveX method needs to be invoked and loads
the arguments from attributes per the Facade specification. That
mapping is usually pretty easy on the solution side because it is really
driven by problem requirements.
The conversion to the RDB view is provided by the DB access subsystem
because that is what it lives to do. There will be some mapping that
exists:
data value Table name Field Name
----------- ----------- -----------
argument 1 Customer Name
argument 2 Customer Balance
argument 3 Contact PhoneNumber
There is also a relationship that says we are interested in the Contact
entry for a specific Customer, identified by Name. This semantics is
all associated with the saveX message identifier. You would "hard-wire"
all this in the saveX stored procedure. But I don't want to write the
same query generation code a gazillion times with different names
plugged into it.
So I define a multi-column lookup table:
index Table Name Field Name Ref Field Reference index
------ ---------- ---------- --------- ---------------
1 "Customer" "Name" "none"
2 "Customer" "Balance" "none"
3 "Contact" "PhoneNumber" "CustName" 1
Now I can write code for the Facade::saveX message that simply plugs in
the table and field names from the lookup table for each argument on a
positional basis. The join dependency is handled by using
Contact::CustName as a referential attribute whose value is provided by
the 1st argument value. (Joins may be trickier than shown, but this is
just an example; the principle still applies.)
Now it is trivial to write that code to "walk" an arbitrary number of
arguments. I just have a loop that does the right lookups in the body
for each iteration. Since I could supply a lookup table for each
message ID (e.g., saveX, saveY, etc.), I would reuse the exactly the
same code to construct the SQL query string for every message ID. All I
need to do is instantiate a different relationship to the right mapping
table for the particular message:
* specifies R1 1
[Query] -------------------- [QuerySpec]
+ messageID + constructSQLString()
+ argumentCount + TableName[]
+ argList[] + FieldName[]
+ RefField[]
+ RefIndex[]
Here [QuerySpec] is essentially the lookup table above. Each object
instance provides the lookup values for a specific messageID. (This
sort of object is commonly known as a "specification object" -- a dumb
data holder that influences processing parametrically.)
One instantiates the R1 relationship so that a Query is associated with
the correct specification for the request in hand. The [QuerySpec]
objects are instantiated at startup from external configuration data.
(The R1 relationship can be defined by external configuration data for
the factory object that creates [Query] objects based on the
Query.messageID using a lookup table as well.)
When a SQL string needs to be formed, a Query navigates R1 and invokes
constructSQLString(). constructSQLString navigates back to the Query in
hand and "walks" the argList in lockstep with its internal tables to
incrementally construct the SQL string. (If things get exotic for joins
and whatnot, one can subclass [QuerySpec] to handle those situations.)
One key point is that all the details in the mapping are defined by the
[QuerySpec] attribute values combined with a mapping between messageID
and a particular [QuerySpec] instance. That can all be defined in
external configuration data, possibly stored in the DB itself. That
enables the second key point: one only needs to implement executable
code for the constructSQLString() method to handle all possible queries.
As a bonus, one may be able to make significant changes to the schema
and handle them completely in the configuration data without touching
any code at all.
[Caveat. Life is rarely simple and this was just an example to
demonstrate the basic ideas. In practice a truly reusable DB access
subsystem must address other issues and will probably abstract the RDB
view of things like relationships between tables differently. So things
are unlikely to be quite so simply in practice. However, I will bet
that the total executable statements will be substantially fewer than
those in your stored procedures for comparable applications.]
That is what I suspected. I would not regard that as business logic. I
would regard it as data access logic.
One can argue that updating a T-Table in a General Ledger by adding or
subtracting a value from the current value is implementing the business
rules of double entry bookkeeping for debits and credits. That is, the
operation itself implements business policy. However, I would counter
that the business logic of account debits and credits was actually
handled by whoever decided to send that particular transaction to the
DB. IOW, the mapping of SQL transactions to business logic was done at
a much higher level of abstraction.
To put it another way, by the time one is down to the level of
constructing a SQL query to particular tables and fields, the semantics
of the problem have been reduced to text values for table/field names.
Thus all one is providing is a mapping function for data values into a
specific storage organization. All the business smarts was done in
designing the what transactions to make available and when to use them.
It is not "abuses" but due to the fact that navigational is too
difficult for humans to navigate unless they reflect a familiar
physical environment or arrangement. This is why trees, physical
decomposition, and "pointer" path walking is popular with navigational
pushers.
> For 99.999% of items I'll stick with things should be
> recorded once and only once. Everything else is redundant.
>
> Why sprinkle SQL around, especially if its redundant?
Nobody here promoted redundant SQL. If there is redundancy, then
factor it to subroutines or views. Where, praytell, did you get that
notion?
> That coding style
> shows no appreciation for the lessons learned designing relational
> databases (unless those, too, were rife with redundancy)
>
> I thought of another reason overnight not to sprinkle SQL around --
> performance. I can tune a procedure and everyone using it benefits.
> How can I make sure everyone's SQL is efficient?
Performance versus developer convenience will always be a trade-off. C
is still popular not because of its software engineering properties,
but because it is fast for embedded apps and action games.
Except in extreme cases, it usually only makes sense to put the most
heavily-used portions into stored procedures rather than all of them.
> If table statistics
> change and old SQL sucks wind, am I go to hunting for it? No, I think
> the value the burying or generating SQL hither and yon throughout an
> application is sufficiently discredited.
Bullsh8t. The change pattern profiles I see do not warrent separation.
Separation is an evidence-free fad, promoted by those who hate and/or
don't understand relational philosophy.
Show some common, representative change impact profiles/scenarios if
you disagree. The last try here was "my DB vendor does not support
view logging", which is a vendor fault and not an inborn fault of
relational.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
Prove it with code scenarios.
> and adds the time overhead of actually creating the
> stored procedure.
Bull. Suppose you need to add a new field to a report. If the SQL is
in the report module, you only have to visit the one report module: a
single file/unit. If the SQL is in an SP or SQL-central area, then you
have to visit 2 places: the app module ANNNNDDD the "SQL place", not
one.
2 > 1
Similar applies to inspecting the SQL related to a module. I don't like
having to jump around to find corresponding SQL for a task. It wastes
time. It is red-tape busy-work.
> Except that I'm not convinced there's any time
> overhead for anything more complicated than an isolated select, insert,
> or update. A database transaction (not a transaction as described in
> the article being discussed) often involves multiple selects and
> inserts/updates, with various sanity checks mixed in. Not all
> programming languages make DB access easy,
Then dump F-ing Java. "SQL is difficult to use with stupid languages"
is actually something I might agree with.
> and not all DB apis allow
> you to make use of all the vendor-specific functionality present in any
> given database product.
Please clarify. If the app is only passing the SQL string onto the DB
as-is, when how can it "read" the SQL and stop stuff it does not like?
AI?
> The time you'd waste writing DB access code in
> C will more than account for the time it takes to define a stored
> procedure, and once you have the procedure, calling it from your
> language of choice is simple.
C sucks too. People use it for execution speed, not because it is
convenient for developers. It is sometimes called "the new assembler".
Example? I find it a lot easier to manipulate logs in a DB or tables
than in arrays, etc., (Assuming proper normalization.) especially with
table-friendly tools/languages.
> Modifying the source code of
> an RDBMS to add functionality that can be easily obtained via stored
> procedures is ridiculous, and casts a dubious light on all of your
> conclusions regarding 'maintenance trade-offs.' I can think of few
> maintenance nightmares quite like maintaining your own special branch
> of Postgres for a few app-specific requirements. That's a way to get
> yourself fired.
The Postgres suggestion is only one of several suggestions. You word it
as if it is the only approach I suggested. Some shops roll their own
OODMBS of sorts. Why is Postgres worse than such a sin?
Overall, you have not offered enough specifics for me to see what
actual alternatives are available. If all you have are un-inspectable
anecdotes, those are a dime a dozen and point both ways, and are thus
nearly useless here.
And again, where did I object to "keeping an audit history in database
tables"?
>
> -Matt McGill
-T-
These are generally called "resource optimization problems" if I am not
mistaking. I don't know if relational is inharently "bad" at those
because I have not done enough to compare.
But one thing you should consider is that the programming for such is
often done by programmers trained in C and Fortran, not RDBMS. Thus,
their solutions will tend *not* to take advantage of RDBMS features,
but be tilted toward imperative-only solutions.
Nor do I propose that the entire process be done in SQL. Solutions can
be a healthy mix of SQL and imperative programming. The more
experience you have with RDBMS, the more of the effort you can farm off
to them. (SQL is not the ideal relational language, I would note.)
In short, just because *you* have not found a way to use RDBMS
effectively with resource optimization problems does not mean it cannot
be done. The bottleneck could perhaps be you, not relational.
-T-
Views are fun, and we use them as well, but their simplicity makes it
difficult to tune for performance. What a view can do in a single
SELECT statement can sometimes be more efficiently done with multiple
statements and/or temporary tables.
> <snip>
>
> Performance versus developer convenience will always be a trade-off.
Not necessarily. As CPUs have gotten faster and memory cheaper hardware
has done more to promote developer convenience than methodologies have.
A well crafted and easily maintained system written 18 months ago can
now be twice as fast as then--without doing anything but moving to
faster hardware. Hardware may have been the most expensive component
for computing's first 30 years, but the tables have turned and now it is
the least expensive. Better than optimizing hardware utilization is
optimizing human utilization. Using more productive languages and
processes like Smalltalk on modern hardware has never been as
productive--and can only become more so in the future.
> <snip>
> The change pattern profiles I see do not warrent separation.
> Separation is an evidence-free fad, promoted by those who hate and/or
> don't understand relational philosophy.
>
Denying separation's evidence does not make separation evidence-free.
Given two applications, one with embedded SQL (static or automagically
generated) and the other using DB procedures, my ability modify the DB
and load a new fixed procedure that maintains its interface without
having to ship a new application (and everything that entails) is proof
of loose coupling's (separation's) value over tight coupling.
I suppose it depends on how you define "navigation". Generally I
picture it representing or mirroring some kind of physical
interpretation of the model. It is a technique that has its place, but
it is less general-purpose than "querying", at least in a usable sense.
Thinking of things in terms of sets instead of locations takes a
different mindset and some practice. However, I believe it to be
superior when dealing with more than about 3 dimensions/factors. Dr.
Codd and Buchman (sp?) faught this battle in the early 70's. Buchman
generally lost, until OOP started to try to resurrect the idea under
new names.
> >
> >> For 99.999% of items I'll stick with things should be
> >> recorded once and only once. Everything else is redundant.
> >>
> >> Why sprinkle SQL around, especially if its redundant?
> >>
> >
> > Nobody here promoted redundant SQL. If there is redundancy, then
> > factor it to subroutines or views. Where, praytell, did you get that
> > notion?
> >
> If you factor it into subroutines or then you've pretty much created a
> flavor of stored-procedure--but without any of its benefits. You've
> satisfied structured programming, but have not mitigated any maintenance
> or deployment challenges.
Again, I would have to see specific scenarios to compare the
cost-vs-benefits. I don't develop the way I do for the h8ll of it; I do
it because I weigh the pros and cons of the choices.
Stored procedures are in the database, *not* in the source code files.
Thus, if you have to add a new field to screen X, for example, you are
likely to have to visit both the database's stored procedure repository
AND the source-code files. This is a double effort and wastes time and
makes the company less profitable and thus more likely to fire you
during a downturn and replace you with $3/hr programmers in Timbuktu.
If the related SQL is in the same source module, then you only have to
visit one spot.
>
> Views are fun, and we use them as well, but their simplicity makes it
> difficult to tune for performance. What a view can do in a single
> SELECT statement can sometimes be more efficiently done with multiple
> statements and/or temporary tables.
I don't see any inharent reason why this has to be the case. If your
particular vendor makes views slow, then you are complaining about a
*product* and not a technique.
> > <snip>
> >
> > Performance versus developer convenience will always be a trade-off.
> Not necessarily. As CPUs have gotten faster and memory cheaper hardware
> has done more to promote developer convenience than methodologies have.
> A well crafted and easily maintained system written 18 months ago can
> now be twice as fast as then--without doing anything but moving to
> faster hardware. Hardware may have been the most expensive component
> for computing's first 30 years, but the tables have turned and now it is
> the least expensive. Better than optimizing hardware utilization is
> optimizing human utilization. Using more productive languages and
> processes like Smalltalk on modern hardware has never been as
> productive--and can only become more so in the future.
I don't see how this contradicts what I said. You are arguing that the
tradeoff favors tuning for developers instead of machines, not that
there is no tradeoff.
> > <snip>
> > The change pattern profiles I see do not warrent separation.
> > Separation is an evidence-free fad, promoted by those who hate and/or
> > don't understand relational philosophy.
> >
> Denying separation's evidence does not make separation evidence-free.
I am not denying it, only claiming I have not seen it. Why should
people blindly take your word for it? I gave the "two hop" scenario as
an example downside.
> Given two applications, one with embedded SQL (static or automagically
> generated) and the other using DB procedures, my ability modify the DB
> and load a new fixed procedure that maintains its interface without
> having to ship a new application (and everything that entails) is proof
> of loose coupling's (separation's) value over tight coupling.
No it is not "proof". I cannot inspect what you are doing at all. It is
not a repeatable nor observable "experiment" at this stage, and thus
not scientific. Maybe you are lying or failed to see a better
alternative that I could point out, for example, if I could see it.
Again, why should people blindly take your word for it? Hidden
anecdotes are not very useful around here because there are too many of
them and they all point different ways.
Don't take it personally; we just want to see the
examples/code/scenarios for ourselves rather than have to trust
somebody.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
I have to disagree. Features it lacks:
* Dynamic typing and dynamic module loading
* Native string handling
* Named parameters
* Native associative arrays
* "Eval"-like features such that code can be stored in databases, etc.
Lack of native string handling requires one to remember to put an "end
of string" marker all the time or programs crash wildly or display very
pecular behavior. That is nitty-gritty detail that I don't want to have
to worry about. If I spend all my time babysitting strings, then I
don't have time to focus on solving business problems, and my ass will
be offshored to some dude in Timbuktu earning $3/hr. Thus, the less
one has to focus on technology, the more their face can be involved in
local business and customer issues.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
And as far as dynamic module loading, you must be referring to something
shared libraries don't provide?
> * Native string handling
>
I thought that an advantage to C in the sense C's syntax and semantics
remained consistent and wasn't cluttered with a bunch of invented types
that meant more to humans than chips.
> * Named parameters
>
Not many have those, but DB stored procedures come to mind...
> * Native associative arrays
>
Are you trying to build an in-memory RDB?
> * "Eval"-like features such that code can be stored in databases, etc.
>
Are you complaining it's not interpreted?
Yes, but you end up reinventing an interpreter from scratch more or
less.
>
> And as far as dynamic module loading, you must be referring to something
> shared libraries don't provide?
No, the ability to call "new" code at run-time. For example, if I store
a function definition, function call, or expression to be evaluated in
a databases. At run-time a query is done to grab the code from the
database, and then execute it.
> > * Native string handling
> >
> I thought that an advantage to C in the sense C's syntax and semantics
> remained consistent and wasn't cluttered with a bunch of invented types
> that meant more to humans than chips.
Then why does it have native math? I use strings more often than math
actually. Why does math get special treatment but not strings? If you
force one to roll-their-own string handling, then why not do the same
with math and have:
a = add(b, c);
Instead of
a = b + c;
> > * Named parameters
> >
> Not many have those, but DB stored procedures come to mind...
I use them. And what do SP's have to do with this? We are talking about
C, no?
> > * Native associative arrays
> >
> Are you trying to build an in-memory RDB?
No, just associative arrays. They are good, for example, for grabbing a
record from a query result set where we don't want to hard-wire the
structure before run-time.
> > * "Eval"-like features such that code can be stored in databases, etc.
> >
> Are you complaining it's not interpreted?
Uh, yes.
Anyhow, we are off-topic here. I don't want to fight over programming
languages. I just don't want you to argue that embedded SQL is bad
because C cannot handle it. If C is not good with embedded SQL but
other languages don't have the same problem, then the problem is with
C, not embedded SQL. I don't care if C cannot do X. Toss C if that is
the problem, not X.
Another nice feature of some languages is the ability to have
multi-line strings to assist with editing and pasting SQL. Without
that, you have to quote every SQL line, which can be a pain in the
zooper.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
Which is only gonna bet better with time. Yes, C has its uses, but
generally if you are using a RDBMS, you are not in a situation where
C's advantages will show anyhow.
> > actually. Why does math get special treatment but not strings? If you
> > force one to roll-their-own string handling, then why not do the same
> > with math and have:
> >
> > a = add(b, c);
> >
> > Instead of
> >
> > a = b + c;
> >
> CPUs have accumulators, registers, and addition/subtraction operations.
If C's focus is mostly about mirroring the hardware, then I suppose you
are right. However, it is hard to call it a "high-level language" of
any sorts then. String processing is a common need, and C's syntax and
standard libraries appear to utterly ignore this fact, regardless of
whether chips natively support strings for not.
> Perhaps you'd be more impressed with Smalltalk for which /everything/ is
> an object and addition is performed by sending a message. In your
> example above b is sent the + message with the object c as an argument.
> The object returned is stored in a.
I am not that fond of OOP. I'd probably lean more towards Lisp than
Smalltalk if I wanted "atom-level purity". But, that is another topic.
> > I use them. And what do SP's have to do with this? We are talking about
> > C, no?
> >
> We are talking about C, but you're comparing it to something else (you
> haven't named what) so I just thought I'd throw-in the fact stored
> procedures have named parameters (as does Python).
SP's named parameter supports tends to be vendor-specific, if I am not
mistaken.
> > Anyhow, we are off-topic here. I don't want to fight over programming
> > languages. I just don't want you to argue that embedded SQL is bad
> > because C cannot handle it.
> My argument is embedded SQL is bad because it tightly couples an
> application to the DB's structure--which should be avoided.
You have not shown this, only claimed it. Either approach is not a free
lunch and exchanges one kind of "coupling" for another. The changes
that I see most often in the field favor mostly embedding, for reasons
already described.
>
> --
> Visit <http://blogs.instreamfinancial.com/anything.php>
> to read my rants on technology and the finance industry.
-T-
topmind wrote:
> <snip>
> You have not shown this, only claimed it. Either approach is not a free
> lunch and exchanges one kind of "coupling" for another. The changes
> that I see most often in the field favor mostly embedding, for reasons
> already described.
>
Your burden of proof isn't pragmatic--it seems more like willed denial.
Anyway:
Given embedded SQL mapping an account /thing/ to the DB as:
select * from account where accountKey='...'
then the SQL is dependent on: an account table/view, its columns, and
the explicit reference to accountKey.
Alternatively, and what the paper recommends, is using procedures.
Procedure names and their arguments should be considered (and
designed/implemented) as an API to the DB's solution domain. Instead of
SQL with both explicit and implicit dependencies the API can be used:
exec getAccount @accountKey='...'
True, there's still a dependency on the return columns that remains
explicit, but now the application is insulated from changes in the DB.
I can rename the account table, I can rename the accountKey column, I
can include another table in the procedure's SELECT, or add additional
DB-level business rules or performance tuning. Even the explicit
dependencies can be feigned by aliasing projected columns.
For a transaction processing system another parameter is added, like a
session key, the procedure can use to validate the session has adequate
privileges to retrieve that specific account's information and I can log
the query to a query history table so I can track everything that
session did.
If a wholesale change to the DB is made I can preserve the older,
obsolescent, procedures for backward compatibility and create a new
procedure supporting the new and extended DB design's functionality.
exec getAccountKey2 @sessionId=... @accountKey=...
Which can not be done as easily with embedded SQL. SQL-embedded
applications would require being
recompiled/packaged/tested/integrated/deployed which is time consuming,
even using automated test tools. The fewer entities that need to be
touched for any given fix/enhancement the better off everyone is.
For security reasons I could go a step further, and deny any user SELECT
permission on tables and only grant them access to the DB through its
API by only allowing users EXECUTE permission on procedures. Of course,
this kind of interface (access to methods but not data) is common among
object oriented languages and isn't a foreign concept to OO programmers.
In fact, after inspecting multiple C APIs for LDAP, iCAL, and other
libraries it appears it's not even foreign to C programmers. Often a
structure is used to hold state information for a connection to a
resources, but the format of that information isn't exposed to API-users
except through an API. Even when a method may only be returning a value
from the structure, API programmers know that level of indirection
affords them flexibility inside the structure they may use without
negatively impacting all the API's users.
So a common pattern is used by both C and OO programmers. What my paper
is promoting (and I'll try to do a better job explaining) is that the
same pattern be applied to how OO programs access the DB.
> Given embedded SQL mapping an account /thing/ to the DB as:
>
> select * from account where accountKey='...'
>
> then the SQL is dependent on: an account table/view, its columns, and
> the explicit reference to accountKey.
Dependant on a "view"? How many layers of indirection do we need?
Indirection is not a free lunch. Views are already a layer of
indirection. Wrapping a view is two levels of indirection. Too many
layers of indirection makes for muddy, bloaty beurocratic code.
>
> Alternatively, and what the paper recommends, is using procedures.
What paper? The link given earlier? It did not given enough specifics.
> Procedure names and their arguments should be considered (and
> designed/implemented) as an API to the DB's solution domain. Instead of
> SQL with both explicit and implicit dependencies the API can be used:
>
> exec getAccount @accountKey='...'
>
> True, there's still a dependency on the return columns that remains
> explicit, but now the application is insulated from changes in the DB.
No, because it is still tied to columns, like you mentioned.
Column-oriented changes will require visiting 2 places instead of one.
Even if I entirely accepted your "different view" scenario above (I
don't), column changes are roughly about 5-to-20 times more likely than
table/view-related changes. Thus, cost/benefit comparison still does
not favor it.
> I can rename the account table, I can rename the accountKey column, I
> can include another table in the procedure's SELECT, or add additional
> DB-level business rules or performance tuning. Even the explicit
> dependencies can be feigned by aliasing projected columns.
I won't deny that, but those are *not* common enough to justify the
cost of separation in my opinion. I rarely see key name changes. And
views do not necessarily prevent performance tuning within the view
(but it may depend on the DB vendor).
>
> For a transaction processing system another parameter is added, like a
> session key, the procedure can use to validate the session has adequate
> privileges to retrieve that specific account's information and I can log
> the query to a query history table so I can track everything that
> session did.
That almost sounds like it is outside the scope of querying anyhow.
But, I would have to see the details.
>
> If a wholesale change to the DB is made I can preserve the older,
> obsolescent, procedures for backward compatibility and create a new
> procedure supporting the new and extended DB design's functionality.
>
> exec getAccountKey2 @sessionId=... @accountKey=...
>
> Which can not be done as easily with embedded SQL. SQL-embedded
> applications would require being
> recompiled/packaged/tested/integrated/deployed which is time consuming,
> even using automated test tools. The fewer entities that need to be
> touched for any given fix/enhancement the better off everyone is.
Again, this sounds like it is outside the scope of querying. You are
perhaps doing application-oriented stuff in SP's. You are putting
app-specific biz logic in SPs and they are now procedural models
instead of mere queries.
Listen, if SP's make your shop's app easier to maintain, then go with
it. I am just disputing that separation is a *universal* benefit for
all shops and apps.
It sounds like you spend more time tweaking performance than you do
adding and changing features. SP favor the first but not the second in
my observation. If your shop is peformance-intensive, then SP's may be
the way to go. I usually see feature-change intensity over performance
fiddling intensity, but every place is different. (Often performance
problems are due to bad programmers or DBA's in my observation, but
that is another topic.)
-T-