<http://blogs.in-streamco.com/anything.php?title=the_rdb_is_the_biggest_object_in_my_syst>
What I realized while trying to describe my preference to use DB
procedures as the primary (re: only) interface between my applications
and the database is because I believe my DB's physical representation of
data belongs to it alone and that customers of the DB oughtn't be
permitted to directly manipulate (change or query) its data. I realized
this is exactly what data-hiding is all about and why expert object
oriented designers and programmers emphasize the importance of
interfaces to direct data manipulation.
I thought more about this and posted a second article, Databases as
Objects: My schema is my class, which explored more similarities between
databases and objects and their classes.
<http://blogs.in-streamco.com/anything.php?title=my_schema_is_an_class>
I intend next to explore various design patterns from GoF and Smalltalk:
Best Practice Patterns to see if the similarities persist or where they
break down, and what can be learned from both about designing and
implementing OO systems with relational data bases.
If you agree there's such a thing as an object-relational impedance
mismatch, then perhaps its because you're witnessing the negative
consequences of tightly coupling objects that shouldn't be tightly coupled.
There's a hypothesis in there somewhere.
As always, if you know of existing research on the subject I'm anxious
to read about it.
--
Visit <http://blogs.instreamfinancial.com/anything.php>
to read my rants on technology and the finance industry.
Introduction to Databases Theory CJ Date.
Database In Depth CJ Date
Hope this helps...
> An unexpected thing happened while debating topmind (over in
> comp.object): I had an epiphany.
> Instead of responding to the news group I thought about it for a short
> bit (very short) and posted an article to my blog titled, "The RDB is
> the biggest object in my system."
>
> <http://blogs.in-streamco.com/anything.php?title=the_rdb_is_the_biggest_object_in_my_syst>
>
>
> What I realized while trying to describe my preference to use DB
> procedures as the primary (re: only) interface between my applications
> and the database is because I believe my DB's physical representation of
> data belongs to it alone and that customers of the DB oughtn't be
> permitted to directly manipulate (change or query) its data. I realized
> this is exactly what data-hiding is all about and why expert object
> oriented designers and programmers emphasize the importance of
> interfaces to direct data manipulation.
>
> I thought more about this and posted a second article, Databases as
> Objects: My schema is my class, which explored more similarities between
> databases and objects and their classes.
>
> <http://blogs.in-streamco.com/anything.php?title=my_schema_is_an_class>
>
> I intend next to explore various design patterns from GoF and Smalltalk:
> Best Practice Patterns to see if the similarities persist or where they
> break down, and what can be learned from both about designing and
> implementing OO systems with relational data bases.
Oh dear. I want to say that's a "Great Blunder" Squared; however, I
suspect it requires a higher order exponent to do justice to the blunder.
> If you agree there's such a thing as an object-relational impedance
> mismatch, then perhaps its because you're witnessing the negative
> consequences of tightly coupling objects that shouldn't be tightly coupled.
Do you realise what you are saying has direct analogy to noting a
mismatch between assembler and java then concluding one has to wrap
one's java code in assembler?
> There's a hypothesis in there somewhere.
No doubt. But does it have any real use or predictive value?
> As always, if you know of existing research on the subject I'm anxious
> to read about it.
If the existing research were water, you would have already drowned
while asking for a sip. You could start with Frege circa 1879. Or you
could cut to the chase: http://en.wikipedia.org/wiki/First-order_logic
http://en.wikipedia.org/wiki/Edgar_F._Codd
Usually one prefers to use high-level interfaces. Below might be some
reasons one utilizes a lower-level interface:
1) Impossible or impractical to do via high-level interface.
2) Lack of performance.
In general, I think you are correct in thinking of "database as one big
object with all the rules for data hiding and interfaces OO [that]
programmers are already acquainted with".
It is also a source of conflict between OO and Relational acolytes.
Numerous frameworks and "patterns" exist that try to ease
object-relational mappings and IO, but they ignore the 800-lb gorilla in
the room: the database exists and exists separately from applications.
A database' model is static--it doesn't change for each of its
consumers. An object model doesn't have to be static--it can change to
be whatever is optimal for a specific task.
Why don't we exploit that?
In the system's I've written there's never been a single
program--there's usually dozens. Each one has the possibility to define
the best object model to suit the program's task. Neither are the
programs written in a single language. At InStream both Smalltalk and
PHP are currently used--and only a few years ago Python was also used.
Should the object model have been the same for all three? Should the
high-level interface been the same? Should a separate framework been
used for each?
Programmers and designers shouldn't ignore the potential variety of
their client programs, nor restrict each program's language
unnecessarily. Investing heavily in a framework can do just that. Even
if PHP is better for something than Java, a heavy investment in Java
frameworks may force square programs into round infrastructure.
When I look at the concepts of high-cohesion within an object and low
coupling between objects I think I see remedies to both your points
above. Things that are impossible or impractical to do in a high-level
language may be better implemented inside the DB using SET operations.
Why update each row individually when I can update them all at once?
That which is impractical may also suffer performance problems, but
performance alone isn't a reason to do anything (unless that performance
is so poor as to make something unfeasible). However, being able to
improve performance in one place (like a procedure or view) is preferred
to needing editing SQL sprinkled throughout an application or toolkit.
Rather than waste effort trying to make either an extension of the
other, I'm trying to demonstrate, using OO terms for OO people, that the
database is an object, which should hide its data and only be used
through its interface, just like they do with other objects in their
models. In the same way OO programmers send messages to objects to have
them do things programmers should send messages to their database to
have it do things. Those things the DB might do can be fairly
sophisticated and domain specific if they express themselves as
procedures. Using procedures, applications can query and manipulate
data in a uniform and predictable manner regardless what programming
language is used or what paradigm(s) it supports.
[snip]
>. . . However, being able to
>improve performance in one place (like a procedure or view) is preferred
>to needing editing SQL sprinkled throughout an application or toolkit.
What sprinkling? Most of my SQL code is a very small part of an
app. If I needed to, I could isolate it. Often, one SQL statement
does a lot of work. It is all the other manipulation of data for the
UI that I worry about.
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.
> Neo wrote:
>
>>> What I realized while trying to describe my preference to use DB
>>> procedures as the primary (re: only) interface between my
>>> applications and the database is because I believe my DB's physical
>>> representation of data belongs to it alone and that customers of the
>>> DB oughtn't be permitted to directly manipulate (change or query) its
>>> data.
>>
>> Usually one prefers to use high-level interfaces. Below might be some
>> reasons one utilizes a lower-level interface:
>> 1) Impossible or impractical to do via high-level interface.
>> 2) Lack of performance.
>
> I agree programmers usually prefer higher-level interfaces.
Quite the contrary: You demonstrate exactly the opposite.
[meaningless commentary snipped]
> It is also a source of conflict between OO and Relational acolytes.
Actually, the source of conflict is the ubquitous (and frankly
obstinate) ignorance among the OO crowd.
[more snippage]
> A database' model is static--it doesn't change for each of its
> consumers.
Here you model profound ignorance of data independence and logical
independence in particular.
An object model doesn't have to be static--it can change to
> be whatever is optimal for a specific task.
Good luck with that.
> Why don't we exploit that?
Because the rest of us are not as stupid as you are.
[boring anecdotes, straw men and various handwaving snipped]
> That which is impractical may also suffer performance problems, but
> performance alone isn't a reason to do anything (unless that performance
> is so poor as to make something unfeasible).
Whether performance alone is or is not reason depends entirely on your
value proposition. If one's product has the word 'engine' in its
description, performance very often is paramount.
However, being able to
> improve performance in one place (like a procedure or view) is preferred
> to needing editing SQL sprinkled throughout an application or toolkit.
The solution to your problem is to avoid hiding your higher-level
abstractions by sprinkling them in a mountain of generally useless lower
level clutter in the first place.
This is twice you have denied what you are obviously doing. I conclude
you lack intellectual honesty, and I won't bother wasting any more of my
valuable time on you.
[The data management system] should hide its data and only be used
> through its interface, just like they do with other objects in their
> models.
I think you will find the intelligent and informed folks around here
will find your pursuit perverse.
In the same way OO programmers send messages to objects to have
> them do things programmers should send messages to their database to
> have it do things.
In other words, piss away two or three millenia of advances in our
understanding of logic in favour of a retarded physical protocol. Sounds
genius. ::rolls eyes::
[snippage]
The interface to the database is the schema and SQL.
What do you think the interface is?
> Using procedures, applications can query and manipulate
> data in a uniform and predictable manner regardless what programming
> language is used or what paradigm(s) it supports.
Using SQL, you get all those things you just mentioned.
I don't see any need to complexify it up.
Marshall
Would it be more precise to say the interface is purely relations
(however they may be formed or expressed) or maybe relations as
described by Codd?
If the programmers out there want to add to relations, certain
well-formed logical operators, I guess I wouldn't object but I think
that's implicit.
My two cents.
p
Suppose we have a simple application that deals with 5 persons with the
some attributes: name, age, ss#.
One way to implement this is to create 5 instances of the person class
in C++ where each instance has the desired attributes. The programmer
needn't worried how the objects are stored and recalled as some mapping
software takes care of that behind the scenes. A line of code might
look like this:
print (john.age, john.ss#);
Another way is for a C++ application to connect to an RMDB and create a
relation named T_Person with attributes name, age and ss#. Similar code
might look like this:
rs = ado.execute("SELECT * FROM T_Person WHERE name='john'");
Print (rs.field("age"), rs.field("ss#"));
Each of the above methods have advantages and disadvantages that make
it more appropriate for a particular requirement. When you have two
systems/methodologies at work, there is likey to be some type of
impedance mismatch in feature, flexibility, performance, etc.
<snip>
<amphiboly snipped>
<snip>
<ad hominem circumstantial snipped>
<amphiboly? snipped>
<snip>
<composition, accident and ad-hominem snipped>
<snip>
<agreement snipped>
<snip>
<another agreement snipped>
Thanks, Bob!
INSERT (title, author)
VALUES ("Complexity", "M. Mitchel")
INTO catalog
Or I can create a higher-level interface that knows something about my
bookstore
exec p_addBook @title="Complexity", @author="M. Mitchel"
Additionally, I can ask the database to give me a list of its procedures
(its interface), but without them all I have is its tables. True, I can
tell a lot from the tables and a nice diagram, but I can tell a lot more
from a list of procedures and their arguments.
That list of procedures is the interface I'm talking about. After
populating a database with tables, views, and triggers, why keep talking
to it using SQL when I can improve my application's interface to it by
creating a domain-aware interface with intention revealing named procedures?
True, I can do everything in SQL that I can do in procedures (they're
written in SQL after all), but the procedures have an improved interface
and protect the integrity of my model. I could also write whatever
bytecodes SQL is compiled into rather than SQL, but SQL is an improved
interface and protects the integrity of the relational engine. I could
even access the database's data directly on disk, but SQL is an improved
interface over the bytecodes which are improved over doing my IO.
I could write an entire program with inlined C code and not use macros
or functions. But functions improve modularity, readability,
reliability, and reusability. Why wouldn't SQL benefit from the same
organization?
> <snip>
>
> Another way is for a C++ application to connect to an RMDB and create a
> relation named T_Person with attributes name, age and ss#. Similar code
> might look like this:
>
> rs = ado.execute("SELECT * FROM T_Person WHERE name='john'");
> Print (rs.field("age"), rs.field("ss#"));
>
Here is what I think the first problem is: now the application has
become dependent on the physical design of the database. It's dependent
on the existence of a table named T_Person and that the table has the
columns name, age, and ss#.
If this were an object, OO people would stand up and object to the C++
code's improper use of another object's private data. OO purists might
insist that instead of violating the object's private data that getter
functions should be used. But for some reason when that object is a
relational database too many OOers silently violate the tenets of loose
coupling and go right for the data with nary a twinge in their necks.
> Each of the above methods have advantages and disadvantages that make
> it more appropriate for a particular requirement. When you have two
> systems/methodologies at work, there is likey to be some type of
> impedance mismatch in feature, flexibility, performance, etc.
>
>
I don't see an advantage in the second example. The fact it works is
its only benefit. If instead the code were modified ever so slightly as:
rs = ado.execute("exec getPerson @name='john'");
Print (rs.field("age"), rs.field("ss#"));
Now we're using our application model's interface. As long as the
interface remains the same our DBAs can go about tuning and designing
without fear of breaking (at least) this line of code. Now it doesn't
matter whether T_person exists or how getPerson is implemented--only
that its interface remains consistent.
Does anyone really care how ad.execute() is implemented? Not as long as
its interface is unchanged. Change the interface of ado.execute() and
now everyone's interested.
That's what I'm trying to promote to OO people. The list of benefits is
longer than that, but just as simple.
> I could write an entire program with inlined C code and not use macros
> or functions. But functions improve modularity, readability,
> reliability, and reusability. Why wouldn't SQL benefit from the same
> organization?
Because relational algebra is rather different from your typical
procedural host language: it is a high level, declarative language, it
deals with entire sets of things at a time, and it's highly productive
(whatever you get back from a query can be used freely in formulating
new queries). When you wrap it up in a number of procedures -- in most
cases a much lower level abstraction -- you usually end up losing most
of the expressivity and compositionality that made the relational model
attractive for data management in the first place.
For example, you might package the lookup of a single customer's details
as a procedure, but if you then wanted to do something to all customers,
the best that a procedural host language is usually able to offer you is
a cursor loop. Using something like that is obviously a bad idea,
because this sort of thing is much more naturally and efficiently
implemented as a set update, optimized by the DBMS. The problem is that
after you've expressed that set update as a low level construct like an
explicit loop, the compiler cannot be expected to be intelligent enough
to figure out that you actually wanted to quantify over a set; going
from high level abstractions to low level details is much simpler than
the converse. Of course you could then tell to the compiler what it
doesn't understand: just implement a new method which encapsulates the
set update. But how is it cleaner or more productive to have umpteen
methods to do various specialized things to your database, than to have
the small set of closed, general, high level, declarative primitives
that the relational algebra represents?
So, I'd argue that you have it backwards. You don't want to wrap higher
level abstractions in lower level ones, even if it seemingly makes
things more uniform, because the uniformity comes at the price of
reducing the expressivity of the higher level language to that of the
lower level one. In reality you'll want the highest level abstraction to
guide the design of the rest of your language, because powerful
abstractions are where productivity, possibilities for automated program
transformation and comprehensible semantics come from. In the above
example, you'd ideally want to declare some operation by quantifying
another one over the set of all customers, and have the compiler and the
DBMS work out the precise low level implementation of your intended
semantics automagically. You can't do that sort of thing in a procedural
language, while in relational algebra and, say, most functional
languages, you can. What this tells us is that the procedural host
language is not as powerful as the alternatives. It should be amended or
traded in for something better; we *definitely* shouldn't try to force
the higher level primitives into the lower level mold.
--
Sampo Syreeni, aka decoy - mailto:de...@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
I introduced myself to this group with a little parody
http://groups.google.com/group/comp.databases.theory/tree/browse_frm/thread/16f8e2a48bf7cdee/c6afb5776e727eca?rnum=11&q=kvnkrkptrck&_done=%2Fgroup%2Fcomp.databases.theory%2Fbrowse_frm%2Fthread%2F16f8e2a48bf7cdee%2F5a41db9357aef1b7%3Flnk%3Dst%26q%3Dkvnkrkptrck%26rnum%3D3%26#doc_d0674565740f9e99
At the time, I had a hard time understanding how certain members of CDT
failed to see right through it. In my mind, it was utterly transparent
baloney - how could it possibly be mistaken for anything but parody?
Having read this response, I feel obliged to offer Bob Badour and other
misled CDT contributors a second apology.
> Thomas Gagne wrote:
>
>>Marshall wrote:
>>
>>>On Dec 21, 1:33 pm, Thomas Gagne <tga...@wide-open-west.com> wrote:
>>>
>>>>... the
>>>>database is an object, which should hide its data and only be used
>>>>through its interface, just like they do with other objects in their
>>>>models.
>>>
>>>The interface to the database is the schema and SQL.
>>>What do you think the interface is?
>>
>>After I've created a database for my application's data (let's pretend
>>its a bookstore) I can either talk to it with low-level DML from
>>applications
>>
>> INSERT (title, author)
>> VALUES ("Complexity", "M. Mitchel")
>> INTO catalog
>>
>>Or I can create a higher-level interface that knows something about my
>>bookstore
>>
>> exec p_addBook @title="Complexity", @author="M. Mitchel"
One wonders exactly how Mr. Gagne considers a simple imperative
statement involving a single value 'higher-level' than the simplest
version of a rich imperative statement involving sets of propositions
and first-order logic in all its glory. By choosing the least among data
statements to compare against the best among OO statements, Mr. Gagne
showed that the best of OO can match the least of SQL--itself the least
among decent data languages.
Without even getting into a good data language but sticking with SQL,
speculate on the OO equivalent of the following ad hoc program:
INSERT INTO catalog(title, author)
SELECT library.title AS title, guild.membername AS author
FROM congress_books book JOIN congress_titles library USING (isbn)
JOIN guild_pubs pub USING (isbn)
JOIN guild_members guild ON (guild.member_id = pub.member_id)
WHERE library.language = 'en_US'
AND book.publication_date BETWEEN '20060101' AND '20061231'
AND book.type = 'book'
AND pub.role IN ( 'author', 'coauthor', 'co-author' )
AND book.isbn != ALL ( SELECT isbn FROM blacklist )
AND NOT EXISTS (
SELECT 1 FROM catalog cat
WHERE cat.title = library.title
AND cat.author = guild.membername
)
UNION
SELECT bnb.title AS title, bguild.name AS author
FROM brit_natnl_biblio bnb JOIN guild_pubs bpub USING isbn
JOIN guild_members bguild ON (guild.member_id = pub.member_id)
WHERE bnb.pub_date BETWEEN '20060101' AND '20061231'
AND bnb.serial = false
AND bpub.role IN ( 'author', 'coauthor', 'co-author' )
AND bnb.isbn != ALL ( SELECT isbn FROM blacklist )
AND NOT EXISTS (
SELECT 1 FROM catalog c
WHERE c.title = bnb.title
AND c.author = bguild.membername
)
;
>>Additionally, I can ask the database to give me a list of its procedures
>>(its interface), but without them all I have is its tables. True, I can
>>tell a lot from the tables and a nice diagram, but I can tell a lot more
>>from a list of procedures and their arguments.
A database is a set of facts or propositions. Set algebra and predicate
calculus are the interfaces to sets and propositions. That a dbms
provides a facility to create, catalog and run programs as data in no
way alters the interface to the database.
>>That list of procedures is the interface I'm talking about. After
>>populating a database with tables, views, and triggers, why keep talking
>>to it using SQL when I can improve my application's interface to it by
>>creating a domain-aware interface with intention revealing named procedures?
Why indeed. Because the intelligent and informed among us consider your
so-called 'improvement' perverse. After representing our data as
relations with minimal encumbrances and maximal data management
facilities, we would have to be incredibly stupid to cripple the dbms
behind some retardate procedure names.
One would have to suffer a cognitive deficit even to prefer the 'list'
of procedures over a relation.
[additional nonsense snipped]
> I introduced myself to this group with a little parody
> http://groups.google.com/group/comp.databases.theory/tree/browse_frm/thread/16f8e2a48bf7cdee/c6afb5776e727eca?rnum=11&q=kvnkrkptrck&_done=%2Fgroup%2Fcomp.databases.theory%2Fbrowse_frm%2Fthread%2F16f8e2a48bf7cdee%2F5a41db9357aef1b7%3Flnk%3Dst%26q%3Dkvnkrkptrck%26rnum%3D3%26#doc_d0674565740f9e99
> At the time, I had a hard time understanding how certain members of CDT
> failed to see right through it. In my mind, it was utterly transparent
> baloney - how could it possibly be mistaken for anything but parody?
>
> Having read this response, I feel obliged to offer Bob Badour and other
> misled CDT contributors a second apology.
Thank you, Kevin. I appreciate your apology, and I long ago forgave your
parody. If you feel yourself compelled to further acts of contrition,
please choose a small excerpt of nonsense to avoid subjecting the rest
of us to the full brunt of the nauseating tedium.
Are we still in disagreement?
At some point, all that expression and composition has an ingress and
egress, and what lies between them is a reusable, repeatable code. No
one really wants to write all that SQL by hand every time so it's put
inside a macro. To increase the macro's value parameters are applied to
make it a function. Now we're cooking with fire.
But now I'm confused why the value of evolving loose SQL to functions
seems to escape c.d.t. Perhaps we're talking passed each other?
>
> For example, you might package the lookup of a single customer's
> details as a procedure, but if you then wanted to do something to all
> customers, the best that a procedural host language is usually able to
> offer you is a cursor loop.
Why wouldn't I use another procedure that either uses the results of the
first, or includes the salient SQL of the first so it may perform a
subsequent relational operation?
> Using something like that is obviously a bad idea, because this sort
> of thing is much more naturally and efficiently implemented as a set
> update, optimized by the DBMS.
Agreed.
> The problem is that after you've expressed that set update as a low
> level construct like an explicit loop, the compiler cannot be expected
> to be intelligent enough to figure out that you actually wanted to
> quantify over a set; going from high level abstractions to low level
> details is much simpler than the converse.
Agreed again. That's why I wouldn't do it that way. Besides, involving
the application in that kind of looping seems to violate the
cohesiveness of the DB--it can do it much more efficiently. One reason
I can think NOT to do it inside the database is if there was another
parallel operation that must be performed with each update. For
instance, our commercial finance system doesn't update anything without
leaving an elaborate audit trail behind so we can produce the equivalent
of your bank statement. The business reasons for when, what, and how
don't exist inside the DB. Triggers, though useful for single table
updates, are all thumbs for multi-table updates.
> Of course you could then tell to the compiler what it doesn't
> understand: just implement a new method which encapsulates the set
> update. But how is it cleaner or more productive to have umpteen
> methods to do various specialized things to your database, than to
> have the small set of closed, general, high level, declarative
> primitives that the relational algebra represents?
Except for, well, exceptions to things which are at best awkwardly
implemented with set operations, I think we agree.
Heh. I did the same thing on sci.math. There are so many nutjobs
over there; it's amazing! So I wrote up a post in which I "proved"
that every odd number except two and "infinity minus one" was
prime. I thought it was a fairly clever use of absurdity, but my
responses were mostly angry debunking. Turns out that when
you're defending yourself from a tidal force of nonsense,
parody becomes harder to recognize.
Marshall
Those two things you gave are just different syntax for the same
operation. I can parse the one and generate the other in a fairly
small amount of code. In fact I might be able to do it with
a sed script. There is no difference in the "level" of the two.
On the other hand if you make 50 additional such domain
specific procedures and compare them with full SQL,
you will see the SQL interface is hugely more powerful,
flexible, and yes, higher level.
> Additionally, I can ask the database to give me a list of its procedures
> (its interface), but without them all I have is its tables.
The tables (+ SQL) are substantially more powerful than any list
of procedures you're going to come up with, until and unless
the list of procedures expands in power and generality until
it becomes the same as SQL. And then what was the point?
> True, I can
> tell a lot from the tables and a nice diagram, but I can tell a lot more
> from a list of procedures and their arguments.
A table has a name/predicate and a list of attributes.
A procedure has a name/predicate and a list of parameters.
Through what mechanism are you able to "tell a lot more"
from the latter than the former? Isn't it possible that what
you're experiencing is simply the comfort and familiarity
of working with something you're used to instead of
something you're not used to?
> That list of procedures is the interface I'm talking about. After
> populating a database with tables, views, and triggers, why keep talking
> to it using SQL when I can improve my application's interface to it by
> creating a domain-aware interface with intention revealing named procedures?
Since you premise is false, your conclusion is invalid. These
"intention
revealing named procedures" are *restricted versions* of
what you can do with SQL. And what is the database schema
if not "domain aware?"
> True, I can do everything in SQL that I can do in procedures (they're
> written in SQL after all), but the procedures have an improved interface
> and protect the integrity of my model.
The procedural interface is necessarily crippled compared to
the SQL interface. (Again, unless your procedural interface
expands to be as large as SQL.)
You only think "improved" because you don't understand the superior
mechanisms by which SQL protects the integrity of the model.
> I could also write whatever
> bytecodes SQL is compiled into rather than SQL, but SQL is an improved
> interface and protects the integrity of the relational engine. I could
> even access the database's data directly on disk, but SQL is an improved
> interface over the bytecodes which are improved over doing my IO.
Sure. That's the thing the OO crowd is unable to comprehend;
that the OO computational model lies downward from SQL in
the direction of raw bytes and assembly.
Marshall
> On Dec 21, 1:33 pm, Thomas Gagne <tga...@wide-open-west.com> wrote:
>
>>... the
>>database is an object, which should hide its data and only be used
>>through its interface, just like they do with other objects in their
>>models.
>
> The interface to the database is the schema and SQL.
> What do you think the interface is?
Mathematics. Anything else is just clutter.
>>Using procedures, applications can query and manipulate
>>data in a uniform and predictable manner regardless what programming
>>language is used or what paradigm(s) it supports.
>
> Using SQL, you get all those things you just mentioned.
>
> I don't see any need to complexify it up.
Indeed. Mr. Gagne referred to 'data hiding' in any earlier post.
Somewhere along the line the OO folks lost touch with engineering
principle of "information hiding" by confusing it with something else.
Ask any three OO folks what that something else is and I am sure you
will get 7 contradictory answers, but everyone will agree that it is
called 'data hiding' and will assure us that everyone agrees on exactly
what it is and why it is important.
Programmers, who are familiar with data management, will immediately
recognize that information hiding and data independence are equivalent
concepts. (Or at least direct analogues.)
Physical independence hides all of the physical design decisions that
trade off various performance and resource criteria. Logical
independence hides the design decisions of each application from all
other applications and vice versa. Transactions and concurrency models
seek to maximize temporal independence. etc.
We programmers are addicts. We love to get lost in the details, "enter
the zone", and churn out code while mentally juggling complex concepts.
It's a high. The first step is admitting to ourselves...
"I think there are fundamental design principles, for example structured
programming principles, the good ideas in "Object Oriented" programming,
etc."
http://en.wikipedia.org/wiki/David_Parnas
Presumably, unlike many OO proponents, Parnas thinks not all ideas in OO
are good ideas.
Can your proposed procedural interface change its signature
without impacting applications? No it cannot; the way
decoupling works is we couple to an interface instead
of to an implementation, and the schema is the interface
to the database.
For two modules to work together they have to agree
on *something* and that thing they agree on can't
change without both of them changing.
> Here is what I think the first problem is: now the application has
> become dependent on the physical design of the database. It's dependent
> on the existence of a table named T_Person and that the table has the
> columns name, age, and ss#.
The database schema is *logical* design, not physical. The physical
design of the database is the in-memory and on-disk data structures
the dbms uses. It would indeed be a bad idea for an application
to access those directly. Fortunately the dbms provides us with
an abstract logical interface to these: SQL.
Again, if an application uses your procedural interface,
it has become dependent on the existence of a
procedure named add_book and on that procedure
having parameters "book" and "author". Etc.
Marshall
[snip]
>At some point, all that expression and composition has an ingress and
>egress, and what lies between them is a reusable, repeatable code. No
>one really wants to write all that SQL by hand every time so it's put
>inside a macro. To increase the macro's value parameters are applied to
>make it a function. Now we're cooking with fire.
I think you are going in the wrong direction.
All what SQL? The SQL in my app is a very small part of it. The
presentation code is much, much bigger. It is the presentation code
that I would like to cut down.
[snip]
It is a matter of competition in a tight market. The nonsense is
another form of parody, unrecognised by the artist(e)s themselves.
Humour is hard.
Hear hear! Now I'm imagining a set-based declarative UI construction
kit.
Marshall
In either case, the app is still dependent on T_Person, name, age, and
ss# but stored procedures can hide this better. If multiple users need
to access the same code/queries, centralizing it in stored procs or
DLLs makes sense.
[snip]
>Ask any three OO folks what that something else is and I am sure you
>will get 7 contradictory answers, but everyone will agree that it is
>called 'data hiding' and will assure us that everyone agrees on exactly
Except when it is called "encapsulation".
>what it is and why it is important.
OO is useful, but some of the larding is less than appropriate.
[snip]
>We programmers are addicts. We love to get lost in the details, "enter
>the zone", and churn out code while mentally juggling complex concepts.
>It's a high. The first step is admitting to ourselves...
Hello, my name is Gene, and I am a detailaholic.
No, we understand you just fine. You're just proposing something
that's not as good as what we already have, so we don't agree.
(By the way, there have been various other sorties from comp.object
proposing exactly the same thing.)
What with firefox crashing I've lost track of what it was in response
to, but I wanted to make a few more points:
The idea that various applications in diverse languages will need
to use the exact same query, and that when query instance 1
in application 1 changes, query instance 2 in application 2 will
change in the same way, does not occur in the real world in
my experience. Applications are written for different purposes,
and hence they don't use the same queries except in the
most trivial cases. A trivially reusable query might be
select * from Customers where CustomerId = ?
but there's no point in wrapping such a query in a procedure:
selectStarFromCustomersWhereCustomerIdIs(int customerId)
(or you could give it a shorter name.)
But supposing we did have a query that was so ...
something that we just had to abstract the query itself
and make it available to diverse applications. We could
1) write it as a procedure in every host language,
2) use a stored procedure or
3) make a view
Using 3 means that application code can query it
exactly as it would any other table, so the mechanisms
for access remain uniform. And in the unlikely event
that we want to change its semantics in a way that
doesn't change its type, (which is the most we could
do with a procedural interface) we can do so without
modifying any client code.
An enterprise database is much like a library interface,
and application programmers don't have much experience
with many-client library interfaces. They see that
interface modifications are difficult in an enterprise
database and assume that that must somehow be the
fault of SQL rather than an attribute of library interfaces
in general. However I have spoken with e.g. some
of the team that wrote the Java core API and they report
exactly the same difficulties with interface changes. In
fact they tend to have a more extreme take, in that
they say that once an interface is released you pretty
much can *never* modify it.
Marshall
No, we understand you just fine. You're just proposing something
Sorry for the duplicate posts. I don't know whether to blame
firefox or Google Groups, but upon receiving various weird
error messages I hit "send" again.
Marshall
> Thomas Gagne <tga...@wide-open-west.com> wrote:
>
> [snip]
>
>>At some point, all that expression and composition has an ingress and
>>egress, and what lies between them is a reusable, repeatable code. No
>>one really wants to write all that SQL by hand every time so it's put
>>inside a macro. To increase the macro's value parameters are applied to
>>make it a function. Now we're cooking with fire.
>
> I think you are going in the wrong direction.
>
> All what SQL? The SQL in my app is a very small part of it. The
> presentation code is much, much bigger. It is the presentation code
> that I would like to cut down.
>
> [snip]
UI code and exception-handling code account for the bulk of most
applications (and a disproportionate amount of bugs.)
> Bob Badour <bba...@pei.sympatico.ca> wrote:
>
> [snip]
>
>>Ask any three OO folks what that something else is and I am sure you
>>will get 7 contradictory answers, but everyone will agree that it is
>>called 'data hiding' and will assure us that everyone agrees on exactly
>
> Except when it is called "encapsulation".
Unless it is called polymorphism or inheritance or friend or protected
or private or interface or class or my amp goes to eleven.
And it is dependent on the limitations of the procedure. The procedure
can only add a single book at a time, for example. And it is dependent
on the limits of the language which calls the procedure.
The funny thing is, I actually like OO, such as it is. Java is a better
language than I otherwise would get to use.
I think it's worthwhile to identify what exactly the good ideas in OO
are.
I would propose the classic troika:
inheritance
encapsulation
polymorphism
Now it turns out that "encapsulation" has two related, widely used
meanings:
information hiding, and scoping together functions and the data they
operate
on, which I don't have a good name for besides "encapsulation." And
there
are many kinds of polymorphism, but only one is emphasized in OO. So
the
list becomes:
inheritance
information hiding
encapsulation
subtyping polymorphism
Inheritance is a mechanism for code reuse, and that's a good thing.
It is however a somewhat brittle mechanism, and over time the
OO consensus has been moving away from inheritance.
Information hiding in the OO sense is a good idea because it permits
limiting the scope of updates, which allows some centralization in
the manual enforcement of data integrity. Once a field is private,
it can only be modified within a single class, and so you only have
one class's worth of code to look at to see all the ways that
field can be modified. However, a distinctly superior mechanism
exists in the form of declarative integrity constraints. The reason
this is superior is that being declarative rather than procedural,
it is easier to get right: you only have to state what must not
happen, rather than think of all the ways and all the places that
might happen. Once you have declarative integrity constraints,
the whole concept of public vs. private seems quaint. However,
until you've figured this out, SQL looks like it's moving backwards
because it doesn't have private.
I have nothing much to say about bundling types and methods.
Now, polymorphism is simply great. However subtyping is
perhaps the least interesting kind of polymorphism, with
parametric polymorphism being more powerful and general.
Interfaces in Java are quite useful, however a number
of authors have commented on its limitations. Notably
the lack of retroactive abstraction mechanisms. I think
this can be particularly well addressed by the use of
a structural type system such as SQL has, rather than
a nominal one such as most OO languages have. (OCaml
being I believe the notable counterexample.) This
also gets in to all kinds of issues around union types
that I'm still thinking about.
Anyway, just some thoughts.
Marshall
Moving SQL to centralized stored procs or DLLs makes sense in some
cases such as:
1) Supporting multiple users
2) Increasing performance
3) Securing underlying data/structure
> On Dec 22, 9:58 am, Bob Badour <bbad...@pei.sympatico.ca> wrote:
>
>>Presumably, unlike many OO proponents, Parnas thinks not all ideas in OO
>>are good ideas.
>
>
> The funny thing is, I actually like OO, such as it is. Java is a better
> language than I otherwise would get to use.
>
> I think it's worthwhile to identify what exactly the good ideas in OO
> are.
> I would propose the classic troika:
>
> inheritance
Eww! Yuck! What's good about inheritance? Reuse is good. Inheritance has
to be the worst way ever proposed to achieve reuse, though. Generics
provide greater reuse without introducing inappropriate coupling.
> encapsulation
That depends on what one means by encapsulation.
> polymorphism
Again, I suppose that depends one what one means by polymorphism. Quite
a few things count as polymorphic in my view.
> Now it turns out that "encapsulation" has two related, widely used
> meanings:
> information hiding, and scoping together functions and the data they
> operate
> on, which I don't have a good name for besides "encapsulation."
Information hiding is good. Scoping together functions and data does
exactly the opposite of information hiding by introducing an arbitrary
design decision the user is forced to contend with.
Take the classic example of a hiring function. Does that scope with the
employer or with the employee? Either, neither and both are all equally
arguable in my mind.
And
> there
> are many kinds of polymorphism, but only one is emphasized in OO. So
> the
> list becomes:
>
> inheritance
Yuck!
> information hiding
Yum!
> encapsulation
Ugh!
> subtyping polymorphism
Tasty!
> Inheritance is a mechanism for code reuse, and that's a good thing.
Code reuse is good. Yes. Inheritance? Not so good.
> It is however a somewhat brittle mechanism, and over time the
> OO consensus has been moving away from inheritance.
That's a good thing. Eventually, they will invent relations.
> Information hiding in the OO sense is a good idea because it permits
> limiting the scope of updates, which allows some centralization in
> the manual enforcement of data integrity.
I disagree. Information hiding necessarily hides the locus of effect.
Limiting the locus of effect is a different design goal that sometimes
conflicts with information hiding.
Once a field is private,
> it can only be modified within a single class, and so you only have
> one class's worth of code to look at to see all the ways that
> field can be modified. However, a distinctly superior mechanism
> exists in the form of declarative integrity constraints. The reason
> this is superior is that being declarative rather than procedural,
> it is easier to get right: you only have to state what must not
> happen, rather than think of all the ways and all the places that
> might happen.
I am not sure I follow you here. When declarative, the only difference
between stating what must happen or what must not happen is simply the
negation operator. From a security perspective, it is generally better
to state what is allowed than try to think up all the things one must
prohibit. Constraints generally must evaluate to true.
Declarative is superior because it effectively separates concerns. And
by 'separation of concerns', I refer to Dijkstra's austere mental
discipline and not to the trite buzzword the aspect-oriented idiots
would make it.
When one knows what one wants, it is easier and less error-prone to
state what one wants than it is to state how to achieve it.
Once you have declarative integrity constraints,
> the whole concept of public vs. private seems quaint. However,
> until you've figured this out, SQL looks like it's moving backwards
> because it doesn't have private.
SQL's lack of support for data types makes it backward. Separating data
type from possible representation makes public vs. private seem quaint.
> I have nothing much to say about bundling types and methods.
I will give you the benefit of the doubt and assume you have nothing
good to say about it.
> Now, polymorphism is simply great. However subtyping is
> perhaps the least interesting kind of polymorphism, with
> parametric polymorphism being more powerful and general.
Indeed.
> Interfaces in Java are quite useful, however a number
> of authors have commented on its limitations. Notably
> the lack of retroactive abstraction mechanisms. I think
> this can be particularly well addressed by the use of
> a structural type system such as SQL has, rather than
> a nominal one such as most OO languages have. (OCaml
> being I believe the notable counterexample.) This
> also gets in to all kinds of issues around union types
> that I'm still thinking about.
>
> Anyway, just some thoughts.
So much to learn about and so little time.
Speaking of which, does anybody know of any good online references that
teach useful statistics? As opposed to the useless crap they teach
undergraduate engineers?
Below is an approx equivalent based on what I could understand of the
SQL.
(; Following add book3 in catalog since
its pub_date is between 19990705 and 20030105
and its author or co_author is a member of britNatBio
and it is an item of libraryOfCongress
and it is not on the blackList)
(set catalog
item
(!= (and (get book instance *)
(get * pub_date (<= (>= (get date instance *)
19990705)
20030105))
(get * (or author co_author) (get britNatBio member *))
(get libraryOfCongress item *)
)
(get blackList item *)
)
)
My message should be read as an explanation of the evolution of
my thoughts on OOP. There was a time when I thought "inheritance,
encapsulation, polymorphism" was a tremendous achievement.
Nowadays I see better abstractions. Fortran is a big step up from
assembly, and that fact doesn't change when we introduce SML
and observe that SML is a big step up from Fortran. If one is
interested in programming language theory (PLT) then it behooves
one to understand exactly what Fortran brought to the table, so
that even though one does not wish to use Fortran any longer,
one does not accidently discard some of its benefits when one
is moving on to the next higher level.
Inheritance has value. Consider that if one were given a choice
between a fairly primitive language that did not support it and
an otherwise identical language that did, I would choose
the language with inheritance. The fact that we now understand
that inheritance has problems and limitations that better
mechanisms don't have doesn't change the fact that inheritance
is better than nothing.
Does the idealized language I have in my head have direct
support for inheritance? No it does not. But I still ... honor
it the way I honor the invention of Fortran. That I do so
may appear somewhat perverse in light of how many people
still think OO is the way, the truth, and the light. But I
would be as much mistaken to take a reactionary position
against them as I would be to join them. I aspire to a
deeper, more nuanced understanding.
> > encapsulation
> That depends on what one means by encapsulation.
>
> > polymorphism
> Again, I suppose that depends one what one means by polymorphism.
> Quite a few things count as polymorphic in my view.
Sure. The interesting one seems to me to be parametric polymorphism.
Natural join for example is delightfully polymorphic. It takes two
operands of the abstract type called "relation" regardless of
what the concrete type of those relations are. One also
wants a mechanism to write such operators oneself.
> > Now it turns out that "encapsulation" has two related, widely
> > used meanings: information hiding, and scoping together functions
> > and the data they operate on, which I don't have a good name
> > for besides "encapsulation."
>
> Information hiding is good. Scoping together functions and data does
> exactly the opposite of information hiding by introducing an arbitrary
> design decision the user is forced to contend with.
>
> Take the classic example of a hiring function. Does that scope with the
> employer or with the employee? Either, neither and both are all equally
> arguable in my mind.
Sure. In fact this strikes me as the usual many-to-many case,
which relational handles so well and OO doesn't.
But again, this is another case where it seems to me a step
was taken to solve a real problem. The solution isn't the
right one, but it did solve some problems.
The better solution I think is some kind of module system,
perhaps like SML modules or Haskell typeclasses. In any
event this sort of modularity question is not one I understand
at all deeply.
> > And there
> > are many kinds of polymorphism, but only one is emphasized in OO. So
> > the list becomes:
>
> > inheritance
> Yuck!
>
> > information hiding
> Yum!
>
> > encapsulation
> Ugh!
>
> > subtyping polymorphism
> Tasty!
While subtyping polymorphism is the best of the list, I still
think it's not the ideal solution. In fact, for all the convenience
of subtyping, I've decided it's largely to be avoided. It's
something you sort of have to have if you're stuck with a
nominal type system, but these days I feel nominalism
is best kept to a minimum.
Amusingly, I see relational projection as a more general
mechanism. If we want to construct a supertype instance
from a subtype, simply project only those parts that one
wishes to keep. Is-a colored point a point? Is-a it a
color? I think "is-a" is the wrong question! I can get a
color and a point out of a colored point with simple
projection. I can get three different 2D points from
a 3D point with projection. Etc. You can do this with
easily with a structural type system, but not with a
nominal one.
> > Inheritance is a mechanism for code reuse, and that's a good thing.
> Code reuse is good. Yes. Inheritance? Not so good.
Again, I think inheritance is worth looking at to see what
works and what doesn't work about it, but not as an
example of something that was "done right."
Consider the Template pattern. An abstract base class
with a variety of final methods, parameterized by a
small number of abstract protected methods. Each
derived class fills in only the abstract methods.
I find it is often the case that the majority of the code
can go in the base class, and I get a fair degree of
reuse thereby.
Now, I think there are higher-order functional techniques
that achieve the same thing in a better way, but
I still consider the template pattern a success.
> > It is however a somewhat brittle mechanism, and over time the
> > OO consensus has been moving away from inheritance.
> That's a good thing. Eventually, they will invent relations.
Ha ha! No doubt. And they will probably call it something like
"extreme collections."
> > Information hiding in the OO sense is a good idea because it permits
> > limiting the scope of updates, which allows some centralization in
> > the manual enforcement of data integrity.
> I disagree. Information hiding necessarily hides the locus of effect.
> Limiting the locus of effect is a different design goal that sometimes
> conflicts with information hiding.
I can't say I particularly understood that.
> > Once a field is private,
> > it can only be modified within a single class, and so you only have
> > one class's worth of code to look at to see all the ways that
> > field can be modified. However, a distinctly superior mechanism
> > exists in the form of declarative integrity constraints. The reason
> > this is superior is that being declarative rather than procedural,
> > it is easier to get right: you only have to state what must not
> > happen, rather than think of all the ways and all the places that
> > might happen.
>
> I am not sure I follow you here. When declarative, the only difference
> between stating what must happen or what must not happen is simply the
> negation operator. From a security perspective, it is generally better
> to state what is allowed than try to think up all the things one must
> prohibit. Constraints generally must evaluate to true.
My use of the negative in the above was a mistake.
We can consider three points in a continuum of integrity
protection for a given variable.
1) any code any where has access and can modify it (traditional)
2) the variable is private to a given class and only the code
in that class can modify it (oo)
3) the variable exists inside a managed container of some kind
that support declarative integrity constraints. (declarative)
Points 1 and 3 represent the endpoints of this continuum.
Consider some arbitrary constraint f on x.
In case 1, at each and every update site in all application code,
when modifying x, we have to calculate a new_x, check f(new_x),
and abort the modify if it is false. When writing future code, we
have to know this, and maintain this discipline.
In case 2, the scope of application code for which we need to
manually insert the checks is limited to the class in which x
is a private field, and when writing future code, the scope is
likewise limited.
In case 3, application code doesn't have to do anything,
now or in the future.
My point is:
1 < 2 < 3
As an aside:
Many of the bogus claims of the OO crowd against SQL stem
from confusing cases 1 and 3. If the only equation you know is
1 < 2
and 3 looks like 1 to you, you're going to come in here and
start proposing unfortunate things like saying the database
is a variable that should be encapsulated so that application
code has no direct access.
> Declarative is superior because it effectively separates concerns. And
> by 'separation of concerns', I refer to Dijkstra's austere mental
> discipline and not to the trite buzzword the aspect-oriented idiots
> would make it.
>
> When one knows what one wants, it is easier and less error-prone to
> state what one wants than it is to state how to achieve it.
Just so. This to me is one of the few areas of PLT where The
Right Answer is pretty clear and well understood. By a few
folks, anyway.
> > Once you have declarative integrity constraints,
> > the whole concept of public vs. private seems quaint. However,
> > until you've figured this out, SQL looks like it's moving backwards
> > because it doesn't have private.
>
> SQL's lack of support for data types makes it backward. Separating data
> type from possible representation makes public vs. private seem quaint.
Absolutely.
> > I have nothing much to say about bundling types and methods.
> I will give you the benefit of the doubt and assume you have nothing
> good to say about it.
Just so.
> > Now, polymorphism is simply great. However subtyping is
> > perhaps the least interesting kind of polymorphism, with
> > parametric polymorphism being more powerful and general.Indeed.
>
> > Interfaces in Java are quite useful, however a number
> > of authors have commented on its limitations. Notably
> > the lack of retroactive abstraction mechanisms. I think
> > this can be particularly well addressed by the use of
> > a structural type system such as SQL has, rather than
> > a nominal one such as most OO languages have. (OCaml
> > being I believe the notable counterexample.) This
> > also gets in to all kinds of issues around union types
> > that I'm still thinking about.
>
> > Anyway, just some thoughts.
> So much to learn about and so little time.
Alas.
> Speaking of which, does anybody know of any good online references that
> teach useful statistics? As opposed to the useless crap they teach
> undergraduate engineers?
I can ask around. I know some statisticians.
Marshall
I dunno about Bob, but I disagree. Your suggestion would remove
perhaps the most powerful feature of a SQL DBMS - dynamic SQL. Most of
the non-trivial systems I've worked on have supported--to a greater or
lesser extent--mechanisms whereby users could manipulate the UI to
generate SQL queries 'on the fly'.
If you want to pull that into a 'stored procedure', feel free to do
so, but then you're just using the modern SQL-DBMS as a kind of
tp-monitor or application server, only one that shares an address space
with the data manager. All of which prompts a gigantic shrug.
If you want to 'think different' about this kind of thing, then
consider, instead of putting the SQL inside the procedural code, the
potential of putting the procedural code inside the SQL.
CREATE TABLE MyTable (
ID SomeIdentifierDomain PRIMARY KEY,
Data SomeApplicationSpecificDomain NOT NULL,
Related AnotherIdentifierDomain FOREIGN KEY REFERENCES
( AnotherTable )
);
WITH AnalyticResult ( Label, DisplayData ) AS (
SELECT A.SomeData,
SomeAggregate( T.Data )
FROM MyTable T, AnotherTable A
WHERE T.Related = A.ID
AND ComplexDomainSpecificPredicate ( A.MoreData, 'Do Re Me Fa
Lo Sa Te Do')
GROUP BY A.SomeData
)
SELECT OpenGraphWindow ( R.Label, R.DisplayData, :UI_Handle )
FROM AnalyticResult R;
But then, you can do this already. No one does, is all.
Many people agreed with the comment about Fortran. Still, there were
some things that assembly could do that Fortran couldn't. Seems true of
most abstractions I'm aware of which I guess is necessarily a result of
the very definition of abstraction which narrows one's view. This is a
great advantage for focussing on problems rather than machines. But I
wonder if things get out of control, eg., go too far when languages like
Java start try to model the very machines they run on, eg., IO classes.
Today, practically no one chooses a machine based on its instruction
set. I wonder if a machine that had relational operators might change
this (I'm not very familiar with the failed 'database machines' of
twenty years ago).
Now I will go to the Alphora site mentioned today and try to find out
how it applies relational ideas to display interfaces. Like Gene,
maybe, I have never understood why a mundane commercial application
can't present the database using relational operators. In fact the
ordinary single-user, single database situation is a better target for
discussion of how two databases ought to cooperate than distributed
ones. I'd go so far as to call it a canonical case. That's because I
think there are logically two machines even if the system happens to
have only one. Many users know this even if many developers forget it.
Apologies if I've mentioned this story before - when I was helping a
well-known airline go bankrupt, I was asked to make an interface from a
somewhat relational flight cargo system to process control machines that
ran big robotic machines in an automated warehouse. The pc machine
software required about a dozen messages, all to do with various flight
states, eg., departed, landed and so forth, but these were full of
holes, such as not being able to recognize that a flight had been
re-directed after takeoff. Each time I brought up one of these cases,
the robot vendor responded by adding a new message type. After much
head-banging, I finally got them to acknowledge that pc machines used
some Dec db software or other. I proposed that both systems ought to
communicate with only three kinds of messages, Insert, Replace and
Delete since the logical interface was the table/view data both db's
unwittingly shared. This is one of many technical arguments I've seen
that was nixed for political reasons. I suspect that many Java
developments have gone astray from original intent for similar reasons.
p
Fair enough.
>>> encapsulation
>>
>>That depends on what one means by encapsulation.
>>
>>
>>> polymorphism
>>
>>Again, I suppose that depends one what one means by polymorphism.
>>Quite a few things count as polymorphic in my view.
>
>
> Sure. The interesting one seems to me to be parametric polymorphism.
>
> Natural join for example is delightfully polymorphic. It takes two
> operands of the abstract type called "relation" regardless of
> what the concrete type of those relations are. One also
> wants a mechanism to write such operators oneself.
I agree. I want very much a language that allows one to express type
generators and generics very easily and well.
>>>Now it turns out that "encapsulation" has two related, widely
>>>used meanings: information hiding, and scoping together functions
>>>and the data they operate on, which I don't have a good name
>>>for besides "encapsulation."
>>
>>Information hiding is good. Scoping together functions and data does
>>exactly the opposite of information hiding by introducing an arbitrary
>>design decision the user is forced to contend with.
>>
>>Take the classic example of a hiring function. Does that scope with the
>>employer or with the employee? Either, neither and both are all equally
>>arguable in my mind.
>
> Sure. In fact this strikes me as the usual many-to-many case,
> which relational handles so well and OO doesn't.
Even for a one:many or a zero to one:zero to one relationship, the
problem exists. It only gets worse for n-adic operations.
> But again, this is another case where it seems to me a step
> was taken to solve a real problem. The solution isn't the
> right one, but it did solve some problems.
I suggest it was a step that ignored fundamentals in favour of features.
I further suggest you will find that focus on fundamentals at the
expense of features does a whole lot better.
> The better solution I think is some kind of module system,
> perhaps like SML modules or Haskell typeclasses. In any
> event this sort of modularity question is not one I understand
> at all deeply.
I need to learn more about SML and Haskell. So much to learn, so little
time.
>>>And there
>>>are many kinds of polymorphism, but only one is emphasized in OO. So
>>>the list becomes:
>>
>>> inheritance
>>
>>Yuck!
>>
>>
>>> information hiding
>>
>>Yum!
>>
>>
>>> encapsulation
>>
>>Ugh!
>>
>>
>>> subtyping polymorphism
>>
>>Tasty!
>
>
> While subtyping polymorphism is the best of the list, I still
> think it's not the ideal solution. In fact, for all the convenience
> of subtyping, I've decided it's largely to be avoided. It's
> something you sort of have to have if you're stuck with a
> nominal type system, but these days I feel nominalism
> is best kept to a minimum.
Many generic operations depend on properties unrelated to subtyping.
Interval generators require a type that is a total order. It doesn't
matter whether the type is integer or "third thursday of the month". I
am not sure what sort of type system captures the necessary traits.
> Amusingly, I see relational projection as a more general
> mechanism. If we want to construct a supertype instance
> from a subtype, simply project only those parts that one
> wishes to keep.
Projection does not affect types. Even coercing a value to a specific
supertype leaves the value's most specific type unaltered.
Is-a colored point a point? Is-a it a
> color? I think "is-a" is the wrong question!
The questions are fine. The answers are "no" and "no".
I can get a
> color and a point out of a colored point with simple
> projection.
Is that project? I am ambivalent, and I have to think about it more.
I can get three different 2D points from
> a 3D point with projection. Etc. You can do this with
> easily with a structural type system, but not with a
> nominal one.
I guess I don't understand the difference between structural types and
nominal types that would make a difference for the above.
>>>Inheritance is a mechanism for code reuse, and that's a good thing.
>>
>>Code reuse is good. Yes. Inheritance? Not so good.
>
> Again, I think inheritance is worth looking at to see what
> works and what doesn't work about it, but not as an
> example of something that was "done right."
Looking without concluding anything is not particularly useful. I look
and I conclude inheritance is not so good.
> Consider the Template pattern. An abstract base class
> with a variety of final methods, parameterized by a
> small number of abstract protected methods. Each
> derived class fills in only the abstract methods.
> I find it is often the case that the majority of the code
> can go in the base class, and I get a fair degree of
> reuse thereby.
Your example presupposes the mechanism. Consider the above with generic
operations parameterized by a small number of operations. The question
becomes: how to enforce semantics? For example, how do we know that a
given operation makes a type a total order?
> Now, I think there are higher-order functional techniques
> that achieve the same thing in a better way, but
> I still consider the template pattern a success.
Given a specific tool, one has to use the mechanisms the tool provides.
Given pieces of wood to fasten, a hammer, and a box of screws, the
outcome is only natural. That doesn't make a hammer a good way to drive
screws.
>>>It is however a somewhat brittle mechanism, and over time the
>>>OO consensus has been moving away from inheritance.
>>
>>That's a good thing. Eventually, they will invent relations.
>
>
> Ha ha! No doubt. And they will probably call it something like
> "extreme collections."
Nah! Collections will seem trite by then. Don't worry though: someone
will coin a bright, shiny new word for them. When agile and extreme
finally fall out of favour, the pendulum will probably swing toward some
other anthropomorphic trait. Nimble or smart or sane or some such.
>>>Information hiding in the OO sense is a good idea because it permits
>>>limiting the scope of updates, which allows some centralization in
>>>the manual enforcement of data integrity.
>>
>>I disagree. Information hiding necessarily hides the locus of effect.
>>Limiting the locus of effect is a different design goal that sometimes
>>conflicts with information hiding.
>
> I can't say I particularly understood that.
Information hiding hides design decisions. It might hide the decision to
alter the state of the hard disk. Or it might hide the decision to
maintain an index. Or it might hide the audit trail. Those decisions do
not limit the locus of effect much.
Similarly, application views hide the design of the base relations from
applications. However, that does not reduce the locus of effect. The
locus of effect is controlled and maintained at a different level of
abstraction from the application views--namely at the level of the base
relations.
I certainly agree with that.
[more agreed parts snipped]
>>Speaking of which, does anybody know of any good online references that
>>teach useful statistics? As opposed to the useless crap they teach
>>undergraduate engineers?
>
> I can ask around. I know some statisticians.
Please do. It's an area that is becoming increasingly important for our
field. Statistics, machine learning and gaming look like the big things
in the immediate future.
>Gene Wirchenko wrote:
>> Thomas Gagne <tga...@wide-open-west.com> wrote:
>>
>> [snip]
>>> At some point, all that expression and composition has an ingress and
>>> egress, and what lies between them is a reusable, repeatable code. No
>>> one really wants to write all that SQL by hand every time so it's put
>>> inside a macro. To increase the macro's value parameters are applied to
>>> make it a function. Now we're cooking with fire.
>>>
>>
>> I think you are going in the wrong direction.
>>
>> All what SQL? The SQL in my app is a very small part of it. The
>> presentation code is much, much bigger. It is the presentation code
>> that I would like to cut down.
>>
>In my commercial finance system, 37% of the source code is SQL. 43% is
>presentation (PHP). When you say your SQL is a small amount, what does
>that mean? Are able to count it?
That it is a very small percentage. Certainly under 10%.
> Thomas Gagne wrote:
>
>>Bob, you're misunderstanding me.
First, I understand Thomas just fine. As others here have noted, they
understand him just fine too.
Second, it makes little sense for Thomas to address me directly. Having
noted Thomas' lack of intellectual honesty, I already added him to my
killfile, and I will only see those things others reply to.
Third, by denying his objective reality, Thomas renders himself unlikely
to learn much of anything and incapable of understanding his own lack of
knowledge.
I do not want to replace SQLese with
>>OOese.
We understand you want to replace SQL with procedure calls as a proxy
for object methods. How exactly is that not OOese?
I'm suggesting that the wonderful SQL you posted can be made
>>into a DB-stored procedure to facilitate reuse,
It was an ad hoc program. It was written to perform a single task one
time. While one might need to perform a similar task in 12 month's time,
the import schema will very likely change in the interim rendering the
stored procedure useless. Whether one keeps the source code in the
database or in some other repository, one will need to change it before
using it again.
Given that it only took a few moments to write, it will only take a few
moments to write the appropriate program again when the time comes. In
fact, trying to re-use the program may cause one to overlook subtle
differences in next year's import schema, and I suggest it might be
better to start from scratch.
and that it is better
>>for multiple programs to call the procedure than for each to embed the SQL.
One will invoke the program exactly once. Thus, one has no need for
multiple invocations of the program from other programs.
>>Are we still in disagreement?
Yes, very much so. I added Thomas to my killfile because he is unlikely
to ever understand the disagreement. See here:
http://www.apa.org/journals/features/psp7761121.pdf or (oh! the irony!)
here: http://gagne.homedns.org/~tgagne/contrib/unskilled.html
> I dunno about Bob, but I disagree. Your suggestion would remove
> perhaps the most powerful feature of a SQL DBMS - dynamic SQL. Most of
> the non-trivial systems I've worked on have supported--to a greater or
> lesser extent--mechanisms whereby users could manipulate the UI to
> generate SQL queries 'on the fly'.
While certainly powerful, I am not sure I would call that the most
powerful of the features removed. One loses so many.
> If you want to pull that into a 'stored procedure', feel free to do
> so, but then you're just using the modern SQL-DBMS as a kind of
> tp-monitor or application server, only one that shares an address space
> with the data manager. All of which prompts a gigantic shrug.
I suspect Thomas exemplifies the pitfalls of knowing products without
fundamentals.
I want to use procedure calls as a proxy for SQL, not object methods.
What I propose to do I can do without an OOPL. I can do without any
programming language. I could do it using only SQL stored procedures
inside an RDB. Sitting in front of my interactive SQL session instead
of typing SQL for every business transaction I needed to repeatedly
perform I would enter the name of a stored procedure(s). Programming
languages add little to that except automation.
There's no OO programming involved. Zip. Zilch.
>
>
> I'm suggesting that the wonderful SQL you posted can be made
>>> into a DB-stored procedure to facilitate reuse,
>
> It was an ad hoc program. It was written to perform a single task one
> time. While one might need to perform a similar task in 12 month's
> time, the import schema will very likely change in the interim
> rendering the stored procedure useless. Whether one keeps the source
> code in the database or in some other repository, one will need to
> change it before using it again.
>
> Given that it only took a few moments to write, it will only take a
> few moments to write the appropriate program again when the time
> comes. In fact, trying to re-use the program may cause one to overlook
> subtle differences in next year's import schema, and I suggest it
> might be better to start from scratch.
I'm not thinking of operations performed once/year. I'm thinking of
operations performed thousands of times every minute, or even thousands
of times every day. You don't see value in creating a procedure for that?
Okay, so it's procedural.
> I'm not thinking of operations performed once/year. I'm thinking of
> operations performed thousands of times every minute, or even thousands
> of times every day. You don't see value in creating a procedure for that?
I don't. And if a given abstraction is valueless, it will be valueless
no matter
how frequently or infrequently it is used.
Thomas, are you familiar with the evolution of a programmer's mental
model as he moves from a simple procedural model to an OO one?
As in the transition from C to C++? You know how, when folks first
start down that path, they are simply adapting their old models to
the new constructs? Instead of coherent object abstractions, they
simply use classes as a kind of grouping for procedures, etc. etc.
"You can write fortran in any language." And in general, as long
as people stay in that phase, they don't see a lot of value to OO
(which is not surprising, since they're incapable of deriving much
benefit at that point.) And they insist on viewing every new thing
through their old lens, and interpreting what they see according
to what works and doesn't work in their former way of doing
things.
That's where you are with SQL. You're thinking about SQL
in OO terms. And as long as you do so, as long as you continue
to view SQL through the values and techniques you learned
with OOP, your understanding will remain shallow. And the
stuff you'll propose to those of us who do think relationally
natively will be seen to be primitive and regressive. I'm
sorry to have to put it so baldly, but I don't know any other
way to say it. Note that you will *never* make this leap if
you simply continue to think of data management in OO
terms, if you only hang out on comp.object where everyone
does that, if you think a foreign key is a pointer and a
SQL table is physical.
To make the leap, you have to set down (temporarily!)
your old ways of thinking and pick up the new stuff,
and try it on in good faith. Learn the viewpoint that
it comes from. Learn the techniques it has for doing
what it does. SQL and OOP are almost exactly the
same age, and each has spent a long time adapting
to its area. SQL's area is data management, and it
does it much, much better than any general purpose
programming language does.
Marshall
#include <shrug.h>
There is a schema. There are queries. There is 'business logic'. There
is a UI.
Whether you embed the queries into procedural code that runs as a DBMS
stored procedure or into procedural code that runs in an application
server/tp monitor or into the code that controls the UI (all three
approaches have pedigree) is logically irrelevant.
The 'interface' to the schema is a declarative language. The box you
put it in matters not one wit to the database developers, or to the
DBMS engineers.
> I'm not advocating objects over relations. In fact, through the
> preference of procedures I'm enabling (ney--promoting) the exploitation
> of relational operations.
I disagree. (I should mention in passing that I agree with much of what
you've been saying in the thread.
In addition to the Object Relational mismatch, there is another mismatch
that is possibly older and more pervasive than it.
It's the data-centric versus the process-centric world view. For my first
twenty years in computing, I was firmly entrenched in the process-centric
world view, without even being aware that I had made a choice. My
transition to a data-centric world view was one of the rockiest transitions
I ever made in my professional life. It started out by being transferred
from a boss that was techno-centric to a boss that was business-centric. It
proceeded through a little language called Datatrieve (DEC, late 1970s and
early 1980s), and on into relational DBMSes with SQL interfaces.
But the real shock for me was the difference between the data-centric and
process-centric world views was the way you go about analyzing the
requirements for a proposed system based on the current way of doing
business. I can't do justice to the topic. It's amazing.
The object-oriented paradigm actually has room in it for both a data-centric
world view and a process-centric world view. But the majority of
object-oriented SW designers focus on object analysis and design at the
expense of message analysis and design. They tend to treat the design of
messages as a relatively trivial portion of overall system design. Nothing
could be further from the truth.
Yes, but what does all of this have to do with the quote I was responding
to?
Just this: when you use procedures as the primary intervace to the database
you have committed yourself to the process centric world view at the expense
of the data centric world view. When you call a procedure, you invoke a
process, by name. When you invoke a view, you invoke some data by name. I
have more to say, but I need to stop and think.
If we instead respect the boundary I believe there's a place to cross
into the DB and a place to return from it. Whether that place is raw
SQL or a SQL macro (procedure) makes a difference to how easily a DB's
referential and semantic integrity is maintained, how reusable it is,
and how much frustration programmers create for themselves.
Thomas,
Perhaps the following anecdote will shed some light on the vantage
point from which many of your responses are coming. Two years ago I
joined a team which was very liberal with SQL-wrapping - using small
procedures to "name" simple queries, updates, inserts, etc.;
"enhancing" the database with a procedural interface. The team
developed and maintained larger processes that "reused" these simple
procedures to the greatest extent possible, using embedded one-time SQL
queries as a last resort (and in many cases, still wrapping these niche
SQL queries into reusable procedures).
One of these large processes, a weekly "PRO" process, was having some
performance troubles. It was running from Friday night into Sunday,
with a few runtimes approaching 48 hours. There would have been real
problems if it were not to complete by 6:00 a.m. on a Monday morning,
so it needed to be looked into. The process itself was a
divide-and-conquer program, splitting the workload into ranges and
spawning 140 child processes to work on each range (140 was based on
dozens of trial-error optimization runs done over the years - yes, we
have a workhorse of a production system). The PRO process had 6 input
parameters which could be used to limit the workload by time frames,
financial status, etc. Each was added to speed things up at the
relatively minor risk of mishandling the "non-critical" or "old" data.
The weekly settings limitted the workload to approximately 85% of the
data from the prior year of activity, in a database that held 7 years
of activity - literally, trade-offs had been made for performance over
correctness. The program had a changelog that amounted to 23
modifications over the prior 3 years - a quick skim-through makes it
appear that 8 were for tuning efforts, 10 for bug fixes, 3 to address
schema changes, and a couple of misellaneous changes. Here are the
stats from the code as it was:
*** Module Analysis
83 Declarations
577 Statements
20 DML statements
1396 Code lines
1490 Blank or comment lines
The whole team was familiar with and despised the weekly PRO process.
After I'd worked on the team a few months, my supervisor decided it was
"my turn" to tune it - kind of an "induction" thing.
I spent three weeks mapping out every twist and turn of the whole
process, and ultimately translated it into a single 300-line SQL
"MERGE" statement (a DML combo of INSERT and UPDATE). This MERGE
statement ran (and still runs today) in under 45 minutes, boosted with
a "sip" of degree-8 concurrency. It reproduced the results of the old
process with only 6 differences (each of which was an undetected bug in
the old code). The only other difference was that it processed all
data in the database, not just one years worth; so in 45 minutes it was
actually doing what would have required about 300 hours for the old
process to run.
Here are the stats of my replacement module:
*** Module Analysis
0 Declarations
1 Statements
1 DML statements
318 Code lines
35 Blank or comment lines
I put it in place over 2 years ago (amusingly, I had to fight quite a
battle to get it in place: nobody "trusted" it because it was so
utterly different than the "easy to read" procedural code).
To date, only one modification has occured, and this was to accomodate
an additional field added to one of the underlying tables.
The new code is 0% reusable - you cannot take any part of it and use it
for anything other than the PRO process. In contrast, though the old
PRO process is no longer used, some of the old PRO code had to be left
in place because it contained about a dozen procedures (with signatures
resembling: "validate_project_has_lines (project_name IN,
date_range_start IN, date_range_end IN, proj_has_lines_flag OUT,
error_flag OUT, error_message OUT)") that *had* been reused by other
processes.
The two solutions exemplify the difference made by viewing an
application's requirements in a process-oriented light instead of a
data-oriented light. If you have any sense that the two approaches are
comparable and ought to be evaluated against one another in terms of
pros-cons (e.g. the pure SQL approach runs faster but is in some sense
inferior due to lack of code reuse), then I can see how you'd interpret
as "rude" the ease and cavalierness with which certain cdt posters
dismiss ignorant proposals to treat databases as objects.
Nice post! Fascinating story.
This illustrates the oft-misunderstood fact that
one SQL statement is often more like an entire program
than it is like a procedural statement.
Marshall
We have no disagreement and our approaches are not in conflict with each
other. Our system has similar jobs that run entirely inside the
database. We still create procedures for them, but they are not called
by application programs. They are relatively large SET processing tasks
that can only efficiently be executed on sets. They are not
transactions. They have nothing to do with OLTP.
From what you described, your job didn't really have anything to do
with an application trying to hydrate or dehydrate objects from the the
DB, or execute transactions either (something that must be done
1000/minute), or require an external stimulus, like a customer, to
initiate it.
I understand what you are saying. I've done it before and I do it
still. But I'm still having a communication problem in c.d.t. I
actually think its a form of stereotyping: no matter what I say, because
it uses the term "object" prejudice thinks I want to replace set
processing with Java then take your little sister to the wrong side of
the tracks.
I'm going to have to work on some examples so c.d.t. can see that what I
propose to do doesn't require OOPLs or abandoning set processing.
I can see how it would appear that way, but I don't think this
explanation
is correct. Note that most of what I do is OOPL; been doing it for 20
years. Every time I write application code, it's in a pure OO style.
I've tried quite a variety of ORMs; some public ones, some proprietary
ones. Wrote two myself. Wrote many by-hand OR classes. I have
lots of experience with huge codebases written this way. And after
many years of trying, I became convinced that this is *not* the way
to go. The OO layering techniques, applied over SQL, has the effect
of obscuring what's happening, and crippling the expressive power
of the relational model, just when you need it the most. This has
inevitable disastrous effects on performance. Ultimately it
becomes very difficult to determine what queries happen in a given
codepath without exhaustive tracing. That's something that should
be immediately apparent, not obfuscated.
Furthermore, these problems aren't incidental or the result of
a lack of discipline or undertrained programmers; I work with
some of the best programmers in the industry. It is instead a
direct consequence of the technique of encapsulating and
abstracting the dbms and the relational model. You *cannot*
wrap a high level abstraction inside a low level abstraction
without crippling and obscuring the high level abstraction;
it is fundamentally impossible.
Lately I've been writing applications in a style which makes
no effort to hide the relational part. SQL is embedded directly where
it is used. Function nesting is kept to a minimum: when you want to
generate a page, you first get the data using various queries, then
process it. The two are not interleaved willy-nilly. It works great
and it achieves performance that is often described as impossible
before the fact.
> I'm going to have to work on some examples so c.d.t. can see that what I
> propose to do doesn't require OOPLs or abandoning set processing.
By all means! However if you put a lot of effort into coming up with an
example that demonstrates your point clearly, but ultimately fail to
do so, please be willing to consider the possibility that your
hypothesis
is invalid.
Marshall
This is a recurring theme on thedailywtf.com. Recent example
Set rsStock = objConn.Execute("SELECT * FROM [Inventory]")
While Not rsStock.EOF
If rsStock("ItemId") = intItemId Then
Call SendItemToTransmitter(rsStock)
End If
rsStock.MoveNext
Wend
This begs the question what kind of questions application programmers
are asked at the interviews these days. I bet explaining what the
following line of code means
Inventory |><| `ItemId=value`
is not among them.
Thomas,
I believe Marshall's response clearly expresses my overall sentiments.
But to address your OLTP comment: I mentioned a function that was
originally part of the PRO process but remains as a stored procedure
that is called elsewhere in our application code:
validate_project_has_lines (project_name IN,
date_range_start IN, date_range_end IN, proj_has_lines_flag OUT,
error_flag OUT, error_message OUT);
This function is currently being used by one of our online forms (our
application codebase is about 40% OLTP). If you think of the function
as an interface to the database, consider the following questions that
the form developer may want answered when he sees the form make the
call:
* What tables are queried?
* How are the date ranges applied? What if the end-date field of the
form is not populated?
* Is there an efficient execution plan for resolving the query?
* Speaking of efficiency, does the query make use of bind variables, or
does it use dynamic SQL that will be parsed with each new call?
* Are there any "hidden" filters (i.e. does the function look at
approved lines only, or all lines)?
* Is the returned "proj_has_lines_flag" going to be "Y/N", "Yes/No",
"X/<null>"?
* What types of exceptions are captured, and what exceptions will
propogate?
* Is the returned error_flag going to be "Y/N", "Yes/No", "X/<null>"?
Now, consider the following "pesky SQL" code snippet as an alternative
interface to the database:
SELECT CASE WHEN COUNT(*) = 0 THEN 'Y' ELSE 'N' END
INTO :PROJECT_HAS_LINES
FROM PROJECTS P, PROJECT_LINES L
WHERE P.PROJECT_ID = L.PROJECT_ID
AND PROJECT_NAME = :P_NAME
AND L.STATUS = 'APPROVED'
AND L.START_DATE <= NVL(:START_DATE, L.START_DATE)
AND L.END_DATE >= NVL(:END_DATE, L.END_DATE)
This is slightly but not substantially more complex than the above
function call (at least, not to anyone familiar with SQL). But consider
each of the above questions from the perspective of the forms developer
who sees this SQL as the interface to the database. Consider the ease
with which a client request: "Hey, the form should actually check if
there are APPROVED or SUBMITTED lines" could be handled. How would
such a request be handled with the function interface (careful of side
effects to other processes)?
In my experience, the approach you advocate is rotten through and
through - I have only ever seen it degrade applications, be they batch
or OLTP. Though I routinely see the opposite, I have yet to encounter
a situation where an application is improved by any metric (even lines
of code) by abstracting away embedded SQL.
There is no stopping the invincibly ignorant. - DT
>>--
>>Visit <http://spam removed>
>>to read my rants on technology and the finance industry.
Kevin,
Why are you advertising this moron's site?
QFT
'Think Relational in the design of your MIS' is the best advice -
indeed about the only good advice - I ever heard from a consultant.