Maybe there is only a 1-in-15 chance of having to switch DBMS vendors.
However, the cost of the layers happens 100% of the time if you build
them. It is usually bad economics; probability-based cost-benefit
analysis is against it. Only put wrappers around commonly-used stuff
that varies only by a parameter or 2.  Let repetition of use be your
guide, not dogma.
[1] Although sql is not the best possible relational language.
     
-T-
oop.ismad.com
The answer may depend on what is SQL to you. Or more generally, what is the
role of DB in your design. Is it just an implementation detail of
persistence? Is it a legacy system?
Anyway, I guess it might be difficult to prepare statements without knowing
the objects they are going to work with. Then there is the issue of
transactions, connections, environments etc. So "hash" is probably
Transaction which refers to a Connection etc. You should also consider the
issue of life times. Some sort of GC might be required to ensure that
objects local to transactions don't outlive them. For my [admittedly
limited] purposes, I tend to use a persistence layer which hides DB away.
-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Your application (and classes) will be bound to the local database
schema anyway. This mapping has to be done somewhere. Your classes "on
top" of the SQL statements will not survive any major schema changes
just because you separate the SQL strings.
> But I am not sure
> what the best way is to decouple them. The method that I am using right
> now is to create a hash to store all the SQL statements and prepare
> them in the main program, then I will pass this hash as a variable to
> all the classes that are going to use it.
This design will probably just cause you and your colleagues extra
work. When you add functions to your application you both have to edit
you classes AND edit the SQL statement list. When someone tries to read
and understand your application, he has to read both the classes and
the SQL statement list simultaneously to be able to understand what the
application actually do.
Fredrik Bertilsson
http://butler.sourceforge.net
     A "good" OO design will typically focus more on behavior than on
state.  Before deciding on a persistence solution, you need the
answers to a number of questions, including:
     - How much information do you need to persist?
     - What are your performance requirements?
     - Do you control the database schema or does the database support
       many systems across the enterprise?
     - How do you see the persistence requirements changing over the
       anticipated lifetime of your system?
> The method that I am using right now is to create a hash to store
> all the SQL statements and prepare them in the main program, then I
> will pass this hash as a variable to all the classes that are going
> to use it.
     Have you looked at any existing OO-RDBMS mapping tools
appropriate for your combination of implementation language(s), OS,
and DBMS?  Even if your requirements don't currently justify the cost
and expense of such a solution, you will learn how the problem has
been solved by others and can avoid reinventing the wheel.
Regards,
Patrick
------------------------------------------------------------------------
S P Engineering, Inc.    | The experts in large scale distributed OO
                         | systems design and implementation.
          p...@spe.com    | (C++, Java, Common Lisp, Jini, CORBA, UML)
What you want to do is to shields client classes from the
implementation details of querying. You do that by defining an
interface which exposes querying behaviour in a form which is
independent from SQL. Then you implement this interface in the most
efficient way - namely, classes that handle their own SQL.
These classes can in turn be defined so that certain parts are
configurable (say, the database schema, the name of the tables etc) but
the structure that is *fixed* at any point isn't. Beyond a certain
limit, attempting to configure everything becomes overdesign - since
you pay as much for changing configurations than for changing code, and
usually without all the checks and balances provided by the compiler
and (depending on opinion ;-) static type checks.
You are correct that one wants to decouple the problem solution logic 
from the database.  (CRUD/USER processing is an exception.)  Typically 
that is done by encapsulating the DB access in a subsystem that has a 
generic data transfer interface based on the problem solution's data 
access needs.
Within that subsystem one provides SQL command construction.  To do that 
one has to have a mapping of data identity between the subsystem 
interface messages and the RDB schemas.  So if one has an interface 
message like:
saveAccount (accountID, value1, value2, ... valueN)
The subsystem will have to have a mapping to know that saveAccount maps 
to the "Account" table, accountID maps to the "Account Name" field, 
value1 maps to the "current balance" field, and so on.  (Essentially one 
identifies the individual tuple fields positionally in the message.)
There are various mechanisms based on table lookups to do this, of which 
your hash scheme is one (though the granularity is usually at the field 
level). This allows a SQL string to be incrementally built up for the 
saveAccount interface method by plugging in text snippets to a standard 
format.
Such lookup tables can be hard-wired in the subsystem.  Alternatively, 
one can initialize them from external configuration files.  That's more 
infrastructure work but it allows the entire RDB access subsystem to be 
reused as-is across applications by simply substituting the 
configuration file.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
h...@pathfindermda.com
Pathfinder Solutions  -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH
Hongyu
The crux of the problem is what is considered the higher abstraction,
SQL or OOP? Depending how you answer this quesion your design
gravitates to be SQL centric or OOP centric. From SQL centric
perspective, it's OK for high level SQL queries to sail in the mess of
low level procedural code (OOP notwithstanding). A developer with
rudimentary grasp of relational databases would look for the source
code in search for those islands of high level SQL code, because they
contain the logic he can easily understand, versus the ad-hock spagetti
of classes and methods. Therefore, by isolating those SQL queries into
a place they don't belong you are doing a great disservice to a
programmer that will maintain your code after.
I understand your point, but here are a couple of problems that I faced
when mixing SQL queries directly with the client classes. I think they
are typical to lots of people.
Some of my database tables are not stable during the development
procedure, for example, their names or structures can be changed, or
the tables could be moved from one database to another (from
development database to production database, for example). Once that
happens, I have to comb through every class file that used this table,
modify each of them and recompile the whole package. It could be an
exhausting process if the table is spreaded in lots of classes.
Another problem for me is that the DBA in my company limited the number
of cursors that each user can use. In my Java code, some objects are
within loops and can be created and destroyed in large numbers. If
those objects contain SQL queries, the program essentially has to
prepare and close the same SQL statements many many times and sometimes
it still overflows the cursor limit for some reason. Even without the
cursor limit, the create() and close() statements within each class
still created a lot of unnecessary burden and are a waste of time.
Therefore, I used a hash to store all the prepared SQL statements and
passed it to the client class, so that I only need to prepare and
destroy them once. Admittedly, it's not an elegant solution, but it
solved both problems.
Based on the suggestions in this thread, I plan to wrap the same hash
in a central "database service" class and move all the database related
operation to this class, so that in future any client class with
database need just need to use a specific interface of this "database
service" class to interact with the database. I think that would be a
nicer OOP structure.
Well, our mileage varies. Changing table names and database schema
doesn't sound like the right database environment. Typically schema
evolves to some stable state, where fewer changes are expected after
the system becomes mature.
I don't understand the issue with production versus test. That is
changing just one variable -- database connection. Database connection,
of course should be refactored into a single entity.
> Another problem for me is that the DBA in my company limited the number
> of cursors that each user can use. In my Java code, some objects are
> within loops and can be created and destroyed in large numbers. If
> those objects contain SQL queries, the program essentially has to
> prepare and close the same SQL statements many many times and sometimes
> it still overflows the cursor limit for some reason.
That is done for a reason. If your program opens cursor irresponsibly
in a loop, or leaks cursors this is the way to catch it. (Cursor in a
loop is just a standard anti-pattern). There is no reason for an
application program to demand more than a fixed number of cursors.
Cursors iterate sets, so that by just making your query more
sophisticated it is always possible to accomodate a wildest possible
requirement.
Certainly I just gave a simplified example. The real situation could
be, say, one SQL query needs to join several tables from multiple
schemas belonging to multiple databases. If you need to move some (not
all) of the tables from one schema or database to the other (say, from
test to production) at a certain stage of the development cycle, it's
not as simple as just modifying the connection variable. There are
issues that you need to take care of, for example, remote LOB access
etc.
> That is done for a reason. If your program opens cursor irresponsibly
> in a loop, or leaks cursors this is the way to catch it. (Cursor in a
> loop is just a standard anti-pattern). There is no reason for an
I agree this is related to design pattern. It can be solved by creating
better code design. The solution that I have conceived and am still
contemplating are just examples to reach a good solution to my
problems, which not only can solve the problems, but also will keep a
better OO taste. And certainly, there are more than one way to achieve
that. That's why I am glad to see all the great suggestions here.
I seriously challenge the idea that there is ever a need to write an
application that would fetch some set of data from the test database
instances, and the other part of the data from production. Ad hock
queries, yes, e.g. comparing the data in the two instances: production
and test. But as as a part of an application?
Nope. You create your sandbox database that contains everything you
need when developing or enhancing your application. If it means that
you need to (partially) copy some data from production, so be it. Then,
after your application and development/test database development is
finished, you upgrade production in a single step.
I can imagine situation like you describe, but those are very rare
exceptions, not the rule.
Mike Feathers teaches his students to write unit tests that access code that 
does not touch a short list of no-nos:
 - file system
 - hardware
 - database
Using such tests to grow code forces it to strongly decouple. That means you 
write tests that fail, then write code to pass the tests, so tests drive the 
design.
It's very easy to pass a test by messing with a database. Naturally, you 
often should. Mike rewards his students when they don't. This technique 
often enables the decoupling systems that others here have doubtless 
presented.
-- 
  Phlip
  http://www.greencheese.org/ZeekLand  <-- NOT a blog!!! 
But are your class structure stable during the development procedure?
Changing the table structure is comparable to changing the class
structure. If you change the interface of a class, used by other
classes, you might also have to modify and recompile the whole package.
But this is part of life when doing non-waterfall development.
Extracting the SQL statements would only help you if you are doing pure
name changes in table and column names. Doing other things such as
changing primary keys, deleting tables, adding columns, will maka a
huge impact on the non-SQL part of your application too. If you need to
isolate your application from pure table and column name changes, you
should find an existing generic SQL framework that can translate from
the application name to the db name. Doing a custom subsystem only for
this task, is overkill.
Fredrik Bertilsson
http://butler.sourceforge.net
Are there any particular reason not allowing your unit tests touching
the file system? I think that the program/classes has to be loaded from
the file system anyway. I seem to me that a unit test not touching the
file system or the database is a little bit incomplete. Your code may
pass all unit tests, but still not work.
Fredrik Bertilsson
http://butler.sourceforge.net
Speed of test execution - the faster they run, the more likely we run 
them, the more likely we write them, the more code coverage happens.
Unit tests aren't scenario tests - we use (automated) acceptance tests 
for these.
The two type of tests together create the highest (economical) coverage.
Andy
When a test starts, I assume you in some way have to load the test data
from disk into memory. The only difference is that the file system is
touched at test startup. So I don't think there are very much speed to
gain. And I am not sure that speed of test execution is so important
that it should force us to do decoupling in the application. If
decoupling should be done, there must be more significant arguments but
speed of test execution.
Fredrik Bertilsson
http://butler.sourceforge.net
>> Mike Feathers teaches his students to write unit tests that access code 
>> that
>> does not touch a short list of no-nos:
>>  - file system
>>  - hardware
>>  - database
>
> Are there any particular reason not allowing your unit tests touching
> the file system?
You snipped it.
Suppose I write tests that exercise their tested code without touching 
classes whose names begin with O thru Z.
Disregarding the effect this might have on class names, look what happens. 
To get things done, functions that should access Oatmeal or Pinenuts will 
instead access a base implementation, Food. So (sometimes) concrete things 
depend on abstract things, and changes don't easily ripple through the 
system.
This is a game, not a development rule.
However, the database, GUI, hardware drivers, and file system are 
deceptively easy to couple to. The problems with coupling to them are often 
more severe than coupling to Oatmeal. Not just at port time or upgrade time.
Off-the-shelf libraries invariably bring their own design. Libraries that 
access hardware bring designs that obey the needs of hardware. So forcing 
code to strongly decouple from these libraries isolates and improves your 
own design.
> I think that the program/classes has to be loaded from
> the file system anyway. I seem to me that a unit test not touching the
> file system or the database is a little bit incomplete. Your code may
> pass all unit tests, but still not work.
Uh, if the code doesn't work but has lots of tests, the odds the code does 
not work are very low. Then, identifying the problem will be easy, and 
adding more tests that force the code to comply will be easy too.
But Mike's obviously talking about fopen() and similar to read a file, to 
pass the test. Of course the test and code themselves load from disk, and of 
course sometimes the tested code calls other code that accesses the 
database.
The only thing loading from disk at startup is executable and even that 
is conditional upon where the tests are run from.  As most of the time 
they are run from within the IDE, they themselves are also in memory.
Test data is all in memory - not stored in any file format.
Andy
> The only thing loading from disk at startup is executable and even that
> is conditional upon where the tests are run from.  As most of the time
> they are run from within the IDE, they themselves are also in memory.
>
> Test data is  all in memory - not stored in any file format.
But none of that is a rule. That should be a by-product of other
principles. And quite a few tests will ignore the rule.
-- 
  Phlip
I assume the test data has to survive a system restart, so in some way
they need to be in some file format.
Fredrik Bertilsson
http://butler.sourceforge.net
Maybe you could try to describe the problems with coupling too, not
just telling that they are severe?
> Uh, if the code doesn't work but has lots of tests, the odds the code does
> not work are very low.
If you design your tests to not test the real thing, the probability
that the code works is still low.
> and of course sometimes the tested code calls other code that accesses the
> database.
Good. But if the application heavily depend on the database, it has to
be more often than "sometimes". A better solution would be to use a
light-weight SQL database such as Hypersonic or MySQL for testing, if
test execution time is important.
Fredrik Bertilsson
http://butler.sourceforge.net
For Acceptance test yes - for unit tests no.
Some tests need the data to survive the  re-construction of the class 
under test.
> Mike Feathers teaches his students to write unit tests that access code
> that does not touch a short list of no-nos:
>
> - file system
> - hardware
> - database
> Using such tests to grow code forces it to strongly decouple.
Would you define decouple? From what I have read, so far, a Unit Test in TDD 
is not a Unit Test in the traditional sense. To differentiate an attempt has 
been made to call the TDD Unit Test - a Programmer Test.  Does the same hold 
for decoupling? Is there a TDD version?
Does it mean that there are dependencies but those dependencies can be
removed by instaniating a 'mocked up' (simply has a noop implementation of
the base class' interface) object?
Or is it truly decoupled, no dependencies on user-defined classes?
If it means the former, I don't see what is gained? I can take a poorly
designed class and then create an abstract interface or base class to
"decouple" dependencies and then test the containing class in isolation. So
again what does that gain?
Thanks,Mike
----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
>To my knowledge, in OOP design it's not good to mix SQL queries with
>application codes in classes, because it limits the usability of the
>classes and bound them to the local database schema. 
Indeed, it is not good to mix SQL with business rules in *any*
paradigm.  You'd like your business rules to know nothing of SQL.
You'd also like your business rules to know nothing of the relational
schema.  The business rules want to use a convenient set of data
structures (or objects).  
>But I am not sure
>what the best way is to decouple them. The method that I am using right
>now is to create a hash to store all the SQL statements and prepare
>them in the main program, then I will pass this hash as a variable to
>all the classes that are going to use it. I am not sure whether there
>is any better design. Thanks in advance!
I would not do this.  Rather I would create a set classes that know
how to query the database and convert the data into the form that the
business rules would like to see it.
-----
Robert C. Martin (Uncle Bob)  | email: uncl...@objectmentor.com
Object Mentor Inc.            | blog:  www.butunclebob.com
The Agile Transition Experts  | web:   www.objectmentor.com
800-338-6716   
"The aim of science is not to open the door to infinite wisdom, 
 but to set a limit to infinite error."
    -- Bertolt Brecht, Life of Galileo
>SQL in code is not evil[1].
Agreed.  SQL is not evil.  But combining it with business rules is
inadvisable.
Just a thought.
It might be helpful to define decoupling in terms of degree of coupling.
For example, weakly coupled vs strongly coupled since coupling cannot
be avoided entirely.
>> Mike Feathers teaches his students to write unit tests that access code
>> that does not touch a short list of no-nos:
>>
>> - file system
>> - hardware
>> - database
>
>> Using such tests to grow code forces it to strongly decouple.
>
> Would you define decouple?
"Couple" means at change time, A must change only because B changed. Not 
because A shares new features with B.
Note that "cohere" is not the opposite, it's simply orthogonal. A and B 
might share legitimate reasons to change. Some changes reveal coupling and 
some reveal coherency.
For example, today I wrapped up a small amount of data in this pattern:
http://c2.com/cgi/wiki?InspectEvalFileFormat
That means one module has a simple data structure, and uses the language's 
.inspect() feature to write the source code required to reconstitute the 
structure. The module passes this as a string to another module, which calls 
eval() to evaluate the string and use the data structure.
When finished, I could change the data structure in two places - directly in 
a user interface as YAML:
 option:
    option 1: yes
    option 2: maybe
    option 3: nope
You can add option 4 there, and add code to the module using the data to 
pull option 4. All the transport between the two modules does not care about 
the shape of the data structure. So the YAML and the back-end module now 
cohere, even though they are not close to each other.
> From what I have read, so far, a Unit Test in TDD is not a Unit Test in 
> the traditional sense.
That's why the "movement" is struggling to rebrand them "developer test". 
Per the D in TDD. The last D.
> To differentiate an attempt has been made to call the TDD Unit Test - a 
> Programmer Test.  Does the same hold for decoupling? Is there a TDD 
> version?
What other versions were there?
In the case of "unit test", that has a specific meaning that should be 
defended. The failure of such a test must implicate only one unit.
> Does it mean that there are dependencies but those dependencies can be
> removed by instaniating a 'mocked up' (simply has a noop implementation of
> the base class' interface) object?
Sometimes yes. The effect gets more profound as mocks are _not_ used.
When tests re-use production modules, they come from a much narrower 
environment than other production modules. A running program has many 
modules and many opportunities for coupling. Tests should not turn on all 
the modules just to test one function. They also should not mock those 
modules.
> Or is it truly decoupled, no dependencies on user-defined classes?
Unit tests often require tested classes to work without any other classes.
Code produced by TDD often reuses other modules. You can depend on something 
without coupling to it.
TDD is a dynamic process of small adjustments. When a new test A'' 
challenges A to improve its behavior, A might force B to change, even if B 
should not participate in the new behavior. We have an opportunity to 
inspect this, and refactor B a little.
But in a pure TDD system, A and B could only exist because tests A' and B' 
generated them, using very few other modules. So the odds that A and B 
couple start low before A'' arrives.
> If it means the former, I don't see what is gained? I can take a poorly
> designed class and then create an abstract interface or base class to
> "decouple" dependencies and then test the containing class in isolation. 
> So
> again what does that gain?
We typically arrive at "just enough" dependencies at test time. TDD forces 
clear and narrow dependencies between modules, essentially as a byproduct of 
rapidly writing tests.
Nobody is saying "mock everything". Mike Feathers advises his students to 
not couple to that short list because those are the things that most 
powerfully lure us to couple to them.
Cool. When I forget my badge and show
up at the front door and give them my employee ID, it is sure nice not
to have an
EMP table and query it directly,  but send a message to each and every
DEPT
instance to ask "does emp 1234 work for you?"
This funny interface separation idea is one of the reasons why data
mangement folks call OOP crowd incompetent.
The example that you described is too simple to be meanful in this
argument. In simple situations like yours, you don't need to bother
using advanced pattern design, not even OOP because a simple C style
code would be enough. Here we are discussing large and complex software
package developments, which involve lots of database tables and
multiple database resources.
This is might be true. What I wrote is a test. Simplicity aside, do you
see the difference between the first and the second one? If you can
answer this correctly, then you pass data management test, and
therefore are qualified to write database applications. Without knowing
your skills, I can't trust your judgement and your design decisions.
Eric
In a good design, most of the code should be business rules. If it is
GUI management, for example, then you are not using the appropriate
libraries/tools/languages. You are not coding to the domain but wasting
code on technlogy issues or the like rather than spending code on
solving the problem at hand.
>
> -----
> Robert C. Martin (Uncle Bob)  | email: uncl...@objectmentor.com
-T-
False Dichotomy.
Take care, Ilja
Is there any reason the OO code should do such a stupid thing (apart 
from having been designed by a stupid designer and coded by a stupid 
coder) ?
Anyway, I wouldn't let the guys at the front door have to manually issue 
SQL queries - I would provide them with a nicer user interface !-)
What is so stupid about it? Does the application meet the required
functionality? Yes it does. Yet, why almost every application
performance group meeting discusses a case like this?
> Anyway, I wouldn't let the guys at the front door have to manually issue
> SQL queries - I would provide them with a nicer user interface !-)
This is why application programming is not a fad. Still, I hardly see
any objects involved:
boolean isValidEmployee( int empNo ) {
    //wrap around "select count(*) from emp where emp# = :empNo"
}
OK, boolean more realistically is a record with picture, first and last
names fields, but I digress. Tell me a single reason why this SQL query
shouldn't be embedded in this function.
     If the data is owned by the application, embedding SQL might be
justified.  In most environments of even slightly above trivial levels
of complexity, information such as this is maintained in a database
that is shared across the enterprise.  The reasons why the application
should not embed SQL in this case include:
     - It couples the application to the database schema.  Changes to
       the schema (due, for example, to requirements of completely
       different applications or database administrator decisions)
       could break the application.
     - The "validate employee ID" functionality is likely to be used
       in more than one application.  Encapsulating it as a reusable
       component or service reduces the complexity of each application
       and provides more consistency across the enterprise.
     - Embedded SQL requires more than a simple SQL statement.  The
       application must deal with database connections, transactions,
       cursors, etc.  Encapsulating this overhead and allowing it to
       be reused across multiple applications reduces the complexity
       of each application and the opportunity for bugs to be
       introduced.
     - Encapsulating SQL in components or services that expose
       required functionality instead of the underlying database
       schema makes impact analysis much more straightforward and
       reliable.  When SQL is embedded in applications across the
       enterprise, it becomes virtually impossible to determine the
       effect of any proposed change to the database schema.
Note that these issues are independent of the implementation
approach.  The value of decoupling the application from the database
schema exists if your implementation uses procedural, OO, functional,
or any other paradigm.
Regards,
Patrick
------------------------------------------------------------------------
S P Engineering, Inc.    | The experts in large scale distributed OO
                         | systems design and implementation.
          p...@spe.com    | (C++, Java, Common Lisp, Jini, CORBA, UML)
In a god architecture, the database is owned by the application.
Applications should interact using messages. For example, a common
pattern in many enterprise applications is to have separate production
(transactional) database and datawarehouse databases. A data mining
software does not interact directly with the production databas.
>  It couples the application to the database schema.  Changes to
>  the schema (due, for example, to requirements of completely
> different applications or database administrator decisions)
> could break the application.
Can you please some example of this? Besides, your application will
break anyway.
>  The "validate employee ID" functionality is likely to be used
> in more than one application.  Encapsulating it as a reusable
> component or service reduces the complexity of each application
> and provides more consistency across the enterprise.
Why would this not be possible if you are emedding SQL in your
application?
> Embedded SQL requires more than a simple SQL statement.  The
> application must deal with database connections, transactions,
> cursors, etc
Connection, transaction and cursor handling can still be done by
pre-made generic class libraries. This has nothing to do with embedding
SQL.
> Encapsulating SQL in components or services that expose
> required functionality instead of the underlying database
> schema makes impact analysis much more straightforward and
> reliable.  When SQL is embedded in applications across the
> enterprise, it becomes virtually impossible to determine the
> effect of any proposed change to the database schema.
Adding tables and columns will not effect existing clients. Removal of
tables or columns will, but it will most likely also change the service
API and give you the same problem with finding all clients to the
service API methods.
Fredrik Bertilsson
http://butler.sourcefore.net
     That absolute assertion is incorrect.  Many systems that provide
significant, measurable business value are based on architectures that
include an enterprise or otherwise shared database.
> Applications should interact using messages.
     This is another absolute assertion that is not correct in all
cases.  Message passing is one valuable technique, but it is not
universally applicable.
> >  It couples the application to the database schema.  Changes to
> >  the schema (due, for example, to requirements of completely
> >  different applications or database administrator decisions) could
> >  break the application.
> 
> Can you please some example of this? Besides, your application will
> break anyway.
     I'm surprised you would need examples.  Database schemas change
for a wide variety of reasons, including support of new applications,
support of modifications to existing applications, and optimizations
made by DBAs, to name just a few.  If the application is coupled to
the schema then yes, changes to the schema could definitely impact the
application.  If the application is properly decoupled, it is less
likely to "break anyway".
> >  The "validate employee ID" functionality is likely to be used in
> > more than one application.  Encapsulating it as a reusable
> > component or service reduces the complexity of each application
> > and provides more consistency across the enterprise.
> 
> Why would this not be possible if you are emedding SQL in your
> application?
     Because SQL embedded in one application is not reusable in
another.  That seems obvious.
> > Embedded SQL requires more than a simple SQL statement.  The
> > application must deal with database connections, transactions,
> > cursors, etc
> 
> Connection, transaction and cursor handling can still be done by
> pre-made generic class libraries. This has nothing to do with
> embedding SQL.
     The point is that decoupling the application from the database
allows the application developer to focus on _why_ he or she is using
the "validate employee ID" functionality rather than on the mechanics
of how to validate an employee ID.  Encapsulating the overhead
associated with using the DBMS is necessary to provide this
decoupling.
> > Encapsulating SQL in components or services that expose required
> > functionality instead of the underlying database schema makes
> > impact analysis much more straightforward and reliable.  When SQL
> > is embedded in applications across the enterprise, it becomes
> > virtually impossible to determine the effect of any proposed
> > change to the database schema.
> 
> Adding tables and columns will not effect existing clients. Removal
> of tables or columns will, but it will most likely also change the
> service API and give you the same problem with finding all clients
> to the service API methods.
     Not all schema changes will affect the functionality supported by
the database.  Decoupling the provision of that functionality from the
details of how it is implemented allows the same functionality to be
provided by a wide variety of different underlying implementations.
Coupling an application to a database schema makes the application far
less resilient to change.
     Frankly, I'm don't understand your purpose in replying to my
original post.  Are you seriously suggesting that tightly coupling an
application to a database schema is good practice?
Sincerely,
Of course there are existing applications that uses many different
architectures and still provides measurable business value. But we are
talking about how to develop new applications, right? I think most
architects agree that not every enterprise application should access
every database directly. If application A want to access data from the
database belonging to application B, application A has to interact with
application B in some other way but directly access the database
belonging to application B.
>  I'm surprised you would need examples.
So why don't you give any?
> Database schemas change
> for a wide variety of reasons, including support of new applications,
> support of modifications to existing applications,
As I pointed out before, new tables or columns will not break any
existing application. But I would like to see an example of some other
kind of schema change that would be possible to encapsulate in the
persistence layer. If you remove tables or columns, I can guarantee you
that you have to rewrite your business object model too.
> and optimizations made by DBAs
Changes in indexes, caches, etc will not break any existing SQL
statemenets.
>  Because SQL embedded in one application is not reusable in
> another.
If they are using the same database schema, it is obviously reusable.
> Decoupling the provision of that functionality from the
> details of how it is implemented allows the same functionality to be
> provided by a wide variety of different underlying implementations.
A database schema is not an implementation. It is an interface. The
RDBMS is an implementation. By using ANSI SQL, you decouple the
application from any specific implementation.
> Are you seriously suggesting that tightly coupling an
> application to a database schema is good practice?
Of course. Unnecessary decoupling will only cause you a code bloat. The
database schema implements a lot of business logic, and the database
schema only change when the business rules change. Still you have not
provided any example of a schema change that would only effect the
persistence layer and not the rest of the application.
Fredrik Bertilsson
http://butler.sourceforge.net
     Even if we limit the scope of the discussion to new applications,
there are still environments in which databases are shared by multiple
applications.  New applications must deal with this reality.
> I think most architects agree that not every enterprise application
> should access every database directly.
     We seem to be in partial agreement.  I would go further and say
that all but trivial CRUD applications should be decoupled from the
schemas of any databases used.
> If application A want to access data from the database belonging to
> application B, application A has to interact with application B in
> some other way but directly access the database belonging to
> application B.
     Again, not all databases have a single owner.  Shared and
enterprise databases are quite common.
> > Database schemas change for a wide variety of reasons, including
> > support of new applications, support of modifications to existing
> > applications,
>
> As I pointed out before, new tables or columns will not break any
> existing application. But I would like to see an example of some
> other kind of schema change that would be possible to encapsulate in
> the persistence layer.
     Denormalization for performance is a change I've seen on more
than one occasion.  There is no change to the functionality supported
by the database, but the schema changes.  If the application is
coupled to the schema, it too would need to change for no other reason
than the coupling.
> If you remove tables or columns, I can guarantee you that you have
> to rewrite your business object model too.
     Not necessarily.  If the new schema provides the same information
in a different way, only the implementation of the component or
service providing the business functionality need change.
Applications that are clients of that functionality are insulated.
> > and optimizations made by DBAs
>
> Changes in indexes, caches, etc will not break any existing SQL
> statemenets.
     Denormalization for performance, as noted above, will definitely
break existing SQL.  DBAs may also change the schema to support other
applications, for example by moving some columns from one table to a
new table in order to change a 1:1 relationship to a 1:M.  Such
changes are not uncommon in large integrated systems.
> > Because SQL embedded in one application is not reusable in
> > another.
>
> If they are using the same database schema, it is obviously
> reusable.
     Not unless you consider copy-and-paste to be reuse.  If you are
suggesting instead that one application expose the functionality that
is implemented internally as embedded SQL then we seem to be agreeing
that the embedded SQL should be encapsulated and exposed as a service.
> > Decoupling the provision of that functionality from the details of
> > how it is implemented allows the same functionality to be provided
> > by a wide variety of different underlying implementations.
>
> A database schema is not an implementation. It is an interface. The
> RDBMS is an implementation. By using ANSI SQL, you decouple the
> application from any specific implementation.
     A particular schema is an implementation detail.  The same
functionality can be supported by a variety of schemas, more than one
of which can be in third normal form.  Applications that depend on
functionality rather than schemas are more resilient to change than
those that are coupled to the implementation represented by one
particular schema.
> > Are you seriously suggesting that tightly coupling an application
> > to a database schema is good practice?
>
> Of course. Unnecessary decoupling will only cause you a code bloat.
     I never suggested "unnecessary" decoupling.  For simple CRUD and
reporting applications, direct use of SQL is probably a good choice.
> The database schema implements a lot of business logic, and the
> database schema only change when the business rules change.
     That is simply not the case.  I've provided a few examples of the
types of schema changes that occur in real systems.  Changes to
business rules are far from the only reason for schema changes.
Anyone architecting a system of even moderate complexity must
recognize that such changes can impact the stability and resiliency of
components that use persistent information and address the risk
accordingly.
Wrong. The database schema outlives the application. I just went to DMV
where the clerk was still using character based entry screen. Now tell
me, which of the following is likely to happen on the next step
1. DMV would throw away the database of all vehiles/drivers/ etc in
california and write a completely new application with new persistence
store. (Persistence is just implementation detail, you know).
2. They write GUI based application that will work with existing
database. Both applications would coexist at some point.
>      - The "validate employee ID" functionality is likely to be used
>        in more than one application.  Encapsulating it as a reusable
>        component or service reduces the complexity of each application
>        and provides more consistency across the enterprise.
Aha. In case #2 of my new example tell me what components of the old
application do you think would be reusable for the new application.
>      - Embedded SQL requires more than a simple SQL statement.  The
>        application must deal with database connections, transactions,
>        cursors, etc.  Encapsulating this overhead and allowing it to
>        be reused across multiple applications reduces the complexity
>        of each application and the opportunity for bugs to be
>        introduced.
If you have any difficulty opening database connection, or iterating a
cursor, you may think refreshing your programming skills. Those are not
the kind things that are worth abstracting.
>      - Encapsulating SQL in components or services that expose
>        required functionality instead of the underlying database
>        schema makes impact analysis much more straightforward and
>        reliable.  When SQL is embedded in applications across the
>        enterprise, it becomes virtually impossible to determine the
>        effect of any proposed change to the database schema.
If the schema change is nontrivial (because just adding a column
wouldn't break anything), then you make backward compatible views. Is
this kind of obvious?
     That statement does not contradict what I wrote.  In fact, it
demonstrates yet another reason why applications should be decoupled
from database schemas:  the schemas are not owned by the application.
> I just went to DMV where the clerk was still using character based
> entry screen. Now tell me, which of the following is likely to
> happen on the next step
> 
> 1. DMV would throw away the database of all vehiles/drivers/ etc in
> california and write a completely new application with new
> persistence store. (Persistence is just implementation detail, you
> know).
> 
> 2. They write GUI based application that will work with existing
> database. Both applications would coexist at some point.
     How about 3:  For a sufficiently important application, the
existing database schema will be modified.  Existing applications that
are not decoupled from the schema will be impacted.
> >      - The "validate employee ID" functionality is likely to be
> >        used in more than one application.  Encapsulating it as a
> >        reusable component or service reduces the complexity of each
> >        application and provides more consistency across the
> >        enterprise.
> 
> Aha. In case #2 of my new example tell me what components of the old
> application do you think would be reusable for the new application.
     I don't understand the point of your question.  Are you
suggesting that there is no potential for reuse of business
functionality across different applications?
> >      - Embedded SQL requires more than a simple SQL statement.
> >        The application must deal with database connections,
> >        transactions, cursors, etc.  Encapsulating this overhead and
> >        allowing it to be reused across multiple applications reduces
> >        the complexity of each application and the opportunity for
> >        bugs to be introduced.
> 
> If you have any difficulty opening database connection, or iterating
> a cursor, you may think refreshing your programming skills. Those
> are not the kind things that are worth abstracting.
     The application should focus on _why_ it needs particular
functionality and _when_ to invoke that functionality.  Mixing
database access details and schema-specific logic in with application
logic reduces maintainability.  Further, duplicating that logic in
multiple applications contributes further to maintenance problems and
introduces more opportunities for bugs.
> >      - Encapsulating SQL in components or services that expose
> >        required functionality instead of the underlying database
> >        schema makes impact analysis much more straightforward and
> >        reliable.  When SQL is embedded in applications across the
> >        enterprise, it becomes virtually impossible to determine the
> >        effect of any proposed change to the database schema.
> 
> If the schema change is nontrivial (because just adding a column
> wouldn't break anything), then you make backward compatible
> views. Is this kind of obvious?
     And for the next change?  And the next?  Are you suggesting that
adding an arbitrary number of new views with names like EMPLOYEE_V1,
EMPLOYEE_V2, EMPLOYEE_AS_OF_20051122, etc. is better practice than
encapsulating the business functionality required in one place?  What
about changes that are more complex than simple additions?  Those
views are going to start to affect performance.
Sincerely,
> Patrick May wrote:
> 
>>      - Embedded SQL requires more than a simple SQL statement.  The
>>        application must deal with database connections, transactions,
>>        cursors, etc.  Encapsulating this overhead and allowing it to
>>        be reused across multiple applications reduces the complexity
>>        of each application and the opportunity for bugs to be
>>        introduced.
> 
> If you have any difficulty opening database connection, or iterating a
> cursor, you may think refreshing your programming skills. Those are not
> the kind things that are worth abstracting.
Are applications built around DBs or vice versa?
Not only the cursor but DB as a whole is a subject of abstraction. The very
idea of separation of data (whether it be physically, logically or
descriptively on the language level) is not conform to the view on software
design principles people like me are counting for modern. It is a clash of
civilizations, if you want. Arguments don't work here much. (:-))
-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Component reuse is a fiction rather than reality
http://jdj.sys-con.com/read/152251.htm
It is difficult to justify the need in complex component in a trivial
world of
"Fetch the data from the database, massage it, display it on the
screen"
> > >      - Embedded SQL requires more than a simple SQL statement.
> > >        The application must deal with database connections,
> > >        transactions, cursors, etc.  Encapsulating this overhead and
> > >        allowing it to be reused across multiple applications reduces
> > >        the complexity of each application and the opportunity for
> > >        bugs to be introduced.
> >
> > If you have any difficulty opening database connection, or iterating
> > a cursor, you may think refreshing your programming skills. Those
> > are not the kind things that are worth abstracting.
I forgot transactions, that you mentioned. Are you aware that SQL
handles transactions as transparently as it possibly can? Are you
suggesting that you can raise application abstraction level to the
point when application programmer don't have to be aware of the
difference between optimistic and pessimistic locking?
>      The application should focus on _why_ it needs particular
> functionality and _when_ to invoke that functionality.  Mixing
> database access details and schema-specific logic in with application
> logic reduces maintainability.  Further, duplicating that logic in
> multiple applications contributes further to maintenance problems and
> introduces more opportunities for bugs.
What maintainability? If I change the schema, does it affect the
connection? Nope. If I add a column to the table, the application would
still be happily iterate the cursor, and get all the required fields
from the record.
> > >      - Encapsulating SQL in components or services that expose
> > >        required functionality instead of the underlying database
> > >        schema makes impact analysis much more straightforward and
> > >        reliable.  When SQL is embedded in applications across the
> > >        enterprise, it becomes virtually impossible to determine the
> > >        effect of any proposed change to the database schema.
> >
> > If the schema change is nontrivial (because just adding a column
> > wouldn't break anything), then you make backward compatible
> > views. Is this kind of obvious?
>
>      And for the next change?  And the next?  Are you suggesting that
> adding an arbitrary number of new views with names like EMPLOYEE_V1,
> EMPLOYEE_V2, EMPLOYEE_AS_OF_20051122, etc. is better practice than
> encapsulating the business functionality required in one place?  What
> about changes that are more complex than simple additions?  Those
> views are going to start to affect performance.
You are constantly repeating the point about changing the schema, and
yet give no example of such a change. Once again, most schema changes
don't affect the application. Columns are never deleted from existing
tables. New columns and table addition won't affect any application.
The only significant change that could break the existing application
is a pretty serios schema reorganization.
The naming you use is completely wrong. I don't know how you name your
classes and methods, but it is possible to name the tables in a
meaningful way. If you have EMPLOYEE_V1, and EMPLOYEE_V2, then it is
very likely that they don't differ in a few extra columns which doesn't
justify introducing a new name at all.
Essentially the debate reduces to how do we specify the interface
between application and the database. One thing that your community
fails to accept is that there is already a pretty good rotocol of
exchange between the DB and app -- SQL. Somehow they are distracted by
the fact that SQL wrapped some weeny auxiliary protocol, like JDBC,
which functioning requres few rather trivial concepts like connection
and cursor.  Remember the SQL is high abstraction language, as poweful
as Prolog, and much more sucessful than Prolog. It just can't be
compared to those puny object interfaces that java entrprise folks are
trying to invent.
Common, but stupid.
> Denormalization for performance is a change I've seen on more
> than one occasion.
Of you denormalize a set of tables for performance, I assume that the
original normalized tables are still there and are the master for the
data. Otherwise you have created yourself a hell. If you don't realize
this I can give you some examples. Because the original tables still
exists, you are just adding new tables, which does not break any
existing SQL statements. Normally denormalization are only done in data
mining databases. The production database are still there and the data
is replicated from the production database to the datawarehouse
database.
> If the new schema provides the same information
> in a different way, only the implementation of the component or
> service providing the business functionality need change.
Please give some example. Why would you change the schema if it still
provides the same information? It like rewriting your classes, but the
still do the same thing.
> DBAs may also change the schema to support other
> applications, for example by moving some columns from one table to a
> new table in order to change a 1:1 relationship to a 1:M.  Such
> changes are not uncommon in large integrated systems
Chaning a 1:1 relationship to a 1:M will not change your business
logic?? Please give some example.
>  A particular schema is an implementation detail.
Everything may be considered as an "implementation detail" in some
context. But in the context we are talking about, the database schema
is an interface.
> The same functionality can be supported by a variety of schemas, more than one
> of which can be in third normal form.
The same fact applies to your service interfaces. Two different
interfaces may provide the same functionality. But we still not need to
decouple our application from the service interface.
> I've provided a few examples of the
> types of schema changes that occur in real systems.
You have still not provided a single example.
> Changes to business rules are far from the only reason for schema changes.
Changes to business rules are by far the most common reason for schema
changes.
Fredrik Bertilsson
http://butler.sourceforge.net
Ahem? If you denormalize for performance, then you have to provide
backward compatible views. Besides, are you aware of materialized
views?
>      Denormalization for performance, as noted above, will definitely
> break existing SQL.  DBAs may also change the schema to support other
> applications, for example by moving some columns from one table to a
> new table in order to change a 1:1 relationship to a 1:M.  Such
> changes are not uncommon in large integrated systems.
If somebody uses denormalization for performance too often, I would
serioulsly chllenge his SQL tuning abilities.
> > > Because SQL embedded in one application is not reusable in
> > > another.
> >
> > If they are using the same database schema, it is obviously
> > reusable.
>
>      Not unless you consider copy-and-paste to be reuse.  If you are
> suggesting instead that one application expose the functionality that
> is implemented internally as embedded SQL then we seem to be agreeing
> that the embedded SQL should be encapsulated and exposed as a service.
Yes copy-and-paste is a reuse. The second code instance could be easily
tuned to seve a slightly different pupose. Oh, modifying the code is
horrible, I forgot. You have to subclass and redefine the method.
BTW, leveraging common SQL quieries via database views is also reuse,
which you seems prefer to totally ignore.
>      A particular schema is an implementation detail.  The same
> functionality can be supported by a variety of schemas, more than one
> of which can be in third normal form.  Applications that depend on
> functionality rather than schemas are more resilient to change than
> those that are coupled to the implementation represented by one
> particular schema.
If the database schema is "an implementation detail", then the
conversation is over. Any good book on fundamentals of data managenet
could help.
     That view is . . . exceedingly uncommon in my experience.  Large,
distributed, enterprise systems must share information among thier
components.  A significant portion of the effort expended in a typical
EAI project deals with enabling the sharing of data.  Are you really
asserting that a database instance should have only one client?
> > Denormalization for performance is a change I've seen on more than
> > one occasion.
> 
> Of you denormalize a set of tables for performance, I assume that
> the original normalized tables are still there and are the master
> for the data.
     Not necessarily.  Certainly not in the situations where I've seen
it done.
> Otherwise you have created yourself a hell.
     The tradeoff was a significant performance improvement in
exchange for more complexity in maintaining referential integrity.  I
wasn't the one who made the call and I may even have chosen
differently had it been my responsibility, but the decision was
logically supportable.
> > If the new schema provides the same information in a different
> > way, only the implementation of the component or service providing
> > the business functionality need change.
>
> Please give some example. Why would you change the schema if it
> still provides the same information? It like rewriting your classes,
> but the still do the same thing.
     That's called refactoring and it is done to improve the
maintainability and extensibility of the code.  The same can apply to
database schemas.  The most recent time I've seen it done was to
better support a planned system that could benefit from inheritance
being modeled differently.
> > DBAs may also change the schema to support other applications, for
> > example by moving some columns from one table to a new table in
> > order to change a 1:1 relationship to a 1:M.  Such changes are not
> > uncommon in large integrated systems
>
> Chaning a 1:1 relationship to a 1:M will not change your business
> logic?? Please give some example.
     Consider something as simple as allowing a customer to have more
than one phone number.  If the existing applications don't need more
than one, having them decoupled from the schema means that the change
can be made in exactly one location.  Even if the existing
applications must be extended to include support for multiple phone
numbers, having them decoupled from the schema allows them to be
migrated incrementally rather than in one high risk, resource
intensive big bang.
> >  A particular schema is an implementation detail.
>
> Everything may be considered as an "implementation detail" in some
> context. But in the context we are talking about, the database
> schema is an interface.
     We may be differing on definition.  What do you mean by
"interface"?  The schema and SQL are a means for accessing the data,
subject to some integrity rules enforced by the DBMS.  Interpretation
of that data is up to the application.  An interface, in this context,
would provide additional semantic abstraction.
> > The same functionality can be supported by a variety of schemas,
> > more than one of which can be in third normal form.
>
> The same fact applies to your service interfaces. Two different
> interfaces may provide the same functionality. But we still not need
> to decouple our application from the service interface.
     Actually, I spend a lot of my time doing just that.  Most of the
systems I've worked on in the past three years are based on
service-oriented architectures (SOAs).  Jini, for example, decouples
clients and services, and allows services to be accessed through
multiple interfaces, if appropriate.  It makes for very scalable,
resilient, and extensible systems.
> > I've provided a few examples of the types of schema changes that
> > occur in real systems.
>
> You have still not provided a single example.
     That claim is more than a little disingenuous, considering you
just replied to two of my examples.
> > Changes to business rules are far from the only reason for schema
> > changes.
>
> Changes to business rules are by far the most common reason for
> schema changes.
     That may be the case in your experience, but large enterprise
databases are subject to a wide variety of pressures.  In any case,
your assertion does not contradict mine.
     As noted in another part of this thread, you'll have a problem
with the proliferation of views over time.
> > > > Because SQL embedded in one application is not reusable in
> > > > another.
> > >
> > > If they are using the same database schema, it is obviously
> > > reusable.
> >
> >      Not unless you consider copy-and-paste to be reuse.  If you
> > are suggesting instead that one application expose the
> > functionality that is implemented internally as embedded SQL then
> > we seem to be agreeing that the embedded SQL should be
> > encapsulated and exposed as a service.
> 
> Yes copy-and-paste is a reuse.
     That is not compatible with the usual definitions of "reuse" with
which I am familiar.  The problems related to copy-and-paste are
well-known to seasoned software developers.  Suggesting it as a good
practice flies in the face of a great deal of experience.
> BTW, leveraging common SQL quieries via database views is also
> reuse, which you seems prefer to totally ignore.
     I have shown no preference.  I am well aware of the capabilities
of views and stored procedures in this area.  The issue I'm
discussing, though, is the problems that arise from coupling an
application to a database schema.
> >      A particular schema is an implementation detail.  The same
> > functionality can be supported by a variety of schemas, more than
> > one of which can be in third normal form.  Applications that
> > depend on functionality rather than schemas are more resilient to
> > change than those that are coupled to the implementation
> > represented by one particular schema.
> 
> If the database schema is "an implementation detail", then the
> conversation is over. Any good book on fundamentals of data
> managenet could help.
     Alternatively, you could refrain from your implied insults and
explain why you disagree that the particulars of a schema are an
implementation detail.  I think I've made it clear why I hold that
view.
The rule is one database <-> one application. I don't know your
definition of "client", but an application may have many windows,
frames, etc.
> The tradeoff was a significant performance improvement in
> exchange for more complexity in maintaining referential integrity.
Denormalizations are normally done by people who don't know how to join
tables. They claim that this action give them "significant performance
improvement", without providing any proof. Denomalization without
keeping the original tables as masters, is very bad software
engineering.
> That's called refactoring and it is done to improve the
> maintainability and extensibility of the code.
Maybe you want to refactor the interface to your persistence layer
too...
> Consider something as simple as allowing a customer to have more
> than one phone number.
I assume that you originally have a table like this.
customer(customerid, ....., telephoneno)
If I need extra telephone numbers for a customer I would create a new
table.
customer_extra_telephone(customerid, telephoneno)
But I would still keep the telephoneno in the customer table, becuase
you still need a "main" telephoneno. In this case we have not broken
any existing SQL statements, only added a new table.
> What do you mean by "interface"?
An interface is something that let you access a service without having
to know how the service is implemented. A database schema is a very
good example of this, because when you are making a SQL call, you can
be accessing anything from a flat file driver, to big DBMS monster like
DB/2.
> Actually, I spend a lot of my time doing just that.
And are you happy about it? Wasting your time with decoupling
applications.
>> You have still not provided a single example.
> That claim is more than a little disingenuous, considering you just replied to two of my > examples.
When I ask you for an example of a schema change I expect to get: (1)
schema structure before the change and (2) schema structure after the
change. You have not provided any schema structure examples whatsoever.
Fredrik Bertilsson
http://butler.sourceforge.net
     I have never seen this "rule" used in any but either very simple
applications or when a database is embedded in a commercial
application and explicitly not intended to be exposed.  Shared
databases are the norm in enterprise systems.
> > The tradeoff was a significant performance improvement in exchange
> > for more complexity in maintaining referential integrity.
> 
> Denormalizations are normally done by people who don't know how to
> join tables.
     Not in this case.  The decision was made by experienced DBAs at a
major European bank.
> They claim that this action give them "significant performance
> improvement", without providing any proof.
     They carefully analyzed the situation, identified the potential
improvements and risks, considered the costs and benefits, and made a
rationally defensible decision.  Your assertion is baseless.
> Denomalization without keeping the original tables as masters, is
> very bad software engineering.
     Clearly these very experienced DBAs thought otherwise, and were
able to justify their decisions.  Your absolute rule is not
universally recognized, it would seem.
> > That's called refactoring and it is done to improve the
> > maintainability and extensibility of the code.
>
> Maybe you want to refactor the interface to your persistence layer
> too...
That's exactly my point. What is yours?
> > Consider something as simple as allowing a customer to have more
> > than one phone number.
>
> I assume that you originally have a table like this.
> customer(customerid, ....., telephoneno) If I need extra telephone
> numbers for a customer I would create a new table.
> customer_extra_telephone(customerid, telephoneno) But I would still
> keep the telephoneno in the customer table, becuase you still need a
> "main" telephoneno. In this case we have not broken any existing SQL
> statements, only added a new table.
     This is an excellent example of why applications should be
decoupled from database schemas.  You've got the tail wagging the dog
here by littering your schema with vestiges of previous versions
solely to avoid impacting applications that use it.  Your proposed
approach will accumulate cruft continuously over the lifetime of the
database.
> > What do you mean by "interface"?
>
> An interface is something that let you access a service without
> having to know how the service is implemented.
Fair enough, we can work with this definition.
> A database schema is a very good example of this, because when you
> are making a SQL call, you can be accessing anything from a flat
> file driver, to big DBMS monster like DB/2.
     You've dropped to too low a level.  A database schema is not a
particularly good example of an interface because it exposes
unnecessary details (table names, column names, etc.) to its clients.
Changes to the schema that do not affect the supported functionality
can nonetheless impact the clients.  Further, each client must
interpret the data exposed by the schema in order to provide the
business functionality of interest.  Decoupling the application from
the schema allows the functionality to be provided explicitly, in a
single, simpler to maintain artifact.
> > Actually, I spend a lot of my time doing just that.
>
> And are you happy about it? Wasting your time with decoupling
> applications.
     We've been building some of the most flexible, scalable,
resilient, and extensible large distributed enterprise systems I've
seen.  Our customers certainly don't see it as a waste of time.
Perhaps you should get a better understanding of SOA before
criticizing it.
> > > You have still not provided a single example.
>
> > That claim is more than a little disingenuous, considering you
> > just replied to two of my > examples.
>
> When I ask you for an example of a schema change I expect to get:
> (1) schema structure before the change and (2) schema structure
> after the change. You have not provided any schema structure
> examples whatsoever.
     First of all, the phone number example meets your criteria, as
does the change made to better support inheritance, if you have any
experience with O-R mapping tools at all.  More importantly, this
discussion is about the benefits of decoupling applications from
database schemas.  Going into detail about particular schemas is
likely to result in a loss of focus on the larger issues.  Anyone who
has developed enterprise systems will have plenty of personal
experience to draw on with regard to schema changes and the associated
ripple effects.
> Dmitry A. Kazakov wrote:
>> Are applications built around DBs or vice versa?
>>
>> Not only the cursor but DB as a whole is a subject of abstraction. The very
>> idea of separation of data (whether it be physically, logically or
>> descriptively on the language level) is not conform to the view on software
>> design principles people like me are counting for modern. It is a clash of
>> civilizations, if you want. Arguments don't work here much. (:-))
> 
> Essentially the debate reduces to how do we specify the interface
> between application and the database.
That's a DB-oriented point of view. In my world decomposition just does not
go along this line.
> One thing that your community
> fails to accept is that there is already a pretty good rotocol of
> exchange between the DB and app -- SQL.
Harness and spurs for steering cars...
> Somehow they are distracted by
> the fact that SQL wrapped some weeny auxiliary protocol, like JDBC,
> which functioning requres few rather trivial concepts like connection
> and cursor.  Remember the SQL is high abstraction language, as poweful
> as Prolog, and much more sucessful than Prolog.
Ah, it is so easy to be more successful than Prolog! (:-)) Just in case you
have missed our lengthy debate with H.S. Lahman, I don't buy 4-5GLs, so
this argument cannot work on me.
> The decision was made by experienced DBAs at a major European bank.
What makes it a good decision, because the DBA is experienced, it was
made by a DBA, the bank is major, the bank is European or it was made
at a bank??
I am a "experienced software engineer working for a major European
software vendor" and I have never even heard anyone suggesting to
denormalize a schema, except from datawarehouse solutions. Nobody wants
to denormalize a production database.
>> Maybe you want to refactor the interface to your persistence layer
>> too...
> That's exactly my point.  What is yours?
The purpose with interfaces is to give the user of the interfaces a
stable way to access the provided features. How the features are
provided is hidden behind the interface and may change regulary. But
changing the interface should only be done when absolutely necessary.
My point is: Be careful when you design an interface, and don't break
it unless you really have to. Create a new interface instead.
> You've got the tail wagging the dog
> here by littering your schema with vestiges of previous versions
> solely to avoid impacting applications that use it.  Your proposed
> approach will accumulate cruft continuously over the lifetime of the
> database.
The same will happen with your persistence layer. It will be a real
mess (been there myself) impossible to maintain.
How would your schema change look like? I can't see it being very
different. You also claimed that you were changing a 1:1 relation to
1:M. But this showed not to be true, because you still had a telephone
number that is the "main" number. So what you did, was to add a 1:M
relation, keeping the original 1:1 relation. That type of changes does
not break any existing code.
> We've been building some of the most flexible, scalable,
> resilient, and extensible large distributed enterprise systems I've
> seen.
Just bragging without giving any possibility for external verification.
> Our customers certainly don't see it as a waste of time.
Most customers have no idea about the concept of decoupling.  You can
create flexible, scalable, resilient, and extensible large distributed
enterprise systems with embedded SQL, too.
Fredrik Bertilsson
http://butler.sourceforge.net
     You snipped the portion of my reply where I noted an exception
for products that embed a database that is explicitly not intended to
be exposed.
> Every enterprise application vendor (including my own company) are
> very restrictive with letting other applications accessing their
> databases.
Which is why I noted that common exception.
> The solutions is almost always to create export/import utilities or
> creating services that the other applications might access.
     That is indeed the most common approach used in EAI efforts when
dealing with proprietary applications.  In-house, bespoke systems are
a different matter.
> > The decision was made by experienced DBAs at a major European
> > bank.
>
> What makes it a good decision, because the DBA is experienced, it
> was made by a DBA, the bank is major, the bank is European or it was
> made at a bank??
     You snipped the portion of my reply where I described the process
used to make the decision.  The fact that it was well analyzed and
rationally defensible is what makes it an acceptable decision.
> I am a "experienced software engineer working for a major European
> software vendor" and I have never even heard anyone suggesting to
> denormalize a schema, except from datawarehouse solutions. Nobody
> wants to denormalize a production database.
     No one wants to do so simply for the sake of denormalizing.  As I
described, the decision was made based on an understanding of the
pressures on their enterprise database and an explicit cost benefit
analysis.
> > > Maybe you want to refactor the interface to your persistence layer
> > > too...
> >
> > That's exactly my point.  What is yours?
> 
> The purpose with interfaces is to give the user of the interfaces a
> stable way to access the provided features. How the features are
> provided is hidden behind the interface and may change regulary. But
> changing the interface should only be done when absolutely
> necessary.  My point is: Be careful when you design an interface,
> and don't break it unless you really have to. Create a new interface
> instead.
     I agree.  This is exactly why database schemas make poor
interfaces.  They expose too much of the implementation, so that
changes to the implementation can impact clients of the schema even if
no functional change is required by that client.  Coupling an
application or component of a system to a database schema is generally
not a best practice.
> > You've got the tail wagging the dog here by littering your schema
> > with vestiges of previous versions solely to avoid impacting
> > applications that use it.  Your proposed approach will accumulate
> > cruft continuously over the lifetime of the database.
>
> The same will happen with your persistence layer. It will be a real
> mess (been there myself) impossible to maintain.
     If the interfaces presented to the clients provide business
functionality and semantics, the clients are insulated from more
changes than if the clients are coupled to the database schema.
> > We've been building some of the most flexible, scalable,
> > resilient, and extensible large distributed enterprise systems
> > I've seen.
>
> Just bragging without giving any possibility for external
> verification.
     I'm just describing personal experience.  Look into technologies
like Jini and other SOA approaches for yourself if you don't believe
me.
> > Our customers certainly don't see it as a waste of time.
>
> Most customers have no idea about the concept of decoupling.  You
> can create flexible, scalable, resilient, and extensible large
> distributed enterprise systems with embedded SQL, too.
     My basic point in this thread is that it is far easier to create
systems with these properties, resiliency and extensibility in
particular, if care is taken not to couple components of the system to
database schemas.
Or between application and data
>
> That's a DB-oriented point of view. In my world decomposition just does not
> go along this line.
>
Or rather, a monad point of view...
> > One thing that your community
> > fails to accept is that there is already a pretty good rotocol of
> > exchange between the DB and app -- SQL.
>
> Harness and spurs for steering cars...
A standard way of querying collections, with coherent algebraic
properties, is a good thing.  That's a level above, two levels above,
Robert Martin's simple data structures.
>
> > Somehow they are distracted by
> > the fact that SQL wrapped some weeny auxiliary protocol, like JDBC,
> > which functioning requres few rather trivial concepts like connection
> > and cursor.  Remember the SQL is high abstraction language, as poweful
> > as Prolog, and much more sucessful than Prolog.
>
> Ah, it is so easy to be more successful than Prolog! (:-)) Just in case you
> have missed our lengthy debate with H.S. Lahman, I don't buy 4-5GLs, so
> this argument cannot work on me.
>
Oh, I think you won that debate.  The enterprise world doesn't care
about executable UML.  Some architectural groups care, but nobody cares
about them.
But the world, the enterprise world, is moving rapidly towards a data
centric view.  Distributed objects are out, web services are in, query
languages are in, vertical standardization on schemas is in.  I know,
this isn't your world, but it's bigger than your world.
-- Daniel Parker
www.servingxml.com
> A standard way of querying collections, with coherent algebraic
> properties, is a good thing.  That's a level above, two levels above,
> Robert Martin's simple data structures.
Good point. I would be ready to embrace this if SQL really described these
algebraic properties in an abstract universal way, rather than prescribed
some. My feeling is that it is below Martin's structures. Accusations that
OO people are reinventing the wheel are all true, they do, but they do it
on a higher abstraction level.
> But the world, the enterprise world, is moving rapidly towards a data
> centric view.  Distributed objects are out, web services are in, query
> languages are in, vertical standardization on schemas is in.  I know,
> this isn't your world, but it's bigger than your world.
Maybe you are right, but the question is whether it will be a long term
trend. The future world will be massively distributed. Swarm computing,
nano-computing are the buzz words to come. I don't think that data centric
view will prevail then.
That's a bad solution. My company has customers. We have many applications - 
and most of them them need access to the customer table. Exports/imports are 
fine for data warehousing, but in OLTP, that just doesn't work.
>
>I am a "experienced software engineer working for a major European
>software vendor" and I have never even heard anyone suggesting to
>denormalize a schema, except from datawarehouse solutions. Nobody wants
>to denormalize a production database.
Nobody should want to denormalize a database. But sometimes performance 
dictates denormalization.
I had a contract position with a major telco. One of the application had 
terrible performance. There was a query that joined about 25 tables. Duh! I 
denormalized the database and ended up with a join of ... maybe 5 to 7 tables. 
The performance was quite acceptable. (Turns out my manager designed the 
database but that's another story. Let's just say I wasn't there long ;-)
>Most customers have no idea about the concept of decoupling.  You can
>create flexible, scalable, resilient, and extensible large distributed
>enterprise systems with embedded SQL, too.
Sure. But you can't reuse any of the data access code. How many times do we 
need to write select .... from customer - when a simple getCustomer method can 
be called?
Eric
The customer table should be replicated to the applications that need
to read the customer table. If you let every application access the one
and only customer table directly, you will create a big monolitic
enterprise application.
> One of the application had terrible performance. There was a
> query that joined about 25 tables. Duh! I denormalized the
> database and ended up with a join of ... maybe 5 to 7 tables.
The question is not about denormalization or not, the question is about
denormalize and remove the original normalized tables. The best
solution is to keep the normalized tables as master tables which is
used for updates. The data in the master tables should be replicated to
the denormalized slave tables, which can be used by functions that has
performance problems. Doing this, you will not break any existing SQL
statements.
> How many times do we need to write select .... from customer - > when a simple getCustomer method can be called?
Of course you should not write identical SQL statements twice in your
code. If you want to call the same SQL statements from multiple
locations in your application, you just write a function/method that
can reused.
But this is not the same as decoupling. If you are decoupling SQL
statements from the rest of your application, you are forcing every SQL
statement in a separate layer regardsless if it is likely to be reused
or not.
If you are decoupling your application, you will end up with about 20
different getCustomerByXXX-methods. Different joins and different
where-criterias will give you many many combinations.
Using decoupling will in practice force programmers to use predefined
getCustomerByXXX-methods that are not suited for their current needs,
because creating a new getCustomerByXXX method takes too much work.
Instread they will do post-filtering and extra loading of related
tables, after the call to getCustomerByXXX.
Fredrik Bertilsson
http://butler.sourceforge.net
     Replication introduces a number of issues (how often, how much
data, what mechanism, recovery, etc.) and is not always the best
option.  If multiple components of the system need to update the
database rather than just reading it, the problems are multiplied.
> If you are decoupling your application, you will end up with about
> 20 different getCustomerByXXX-methods. Different joins and different
> where-criterias will give you many many combinations.
     This may be the case for CRUD and reporting applications, in
which case direct access to the database is probably justified.  In
more complex systems, the data is being used to support some behaviors
of interest to the organization, so the applications will be better
served by providing interfaces that reflect those behaviors.
> Using decoupling will in practice force programmers to use
> predefined getCustomerByXXX-methods that are not suited for their
> current needs, because creating a new getCustomerByXXX method takes
> too much work.  Instread they will do post-filtering and extra
> loading of related tables, after the call to getCustomerByXXX.
     Again, this is only the case for CRUD and reporting
applications.  Large, distributed enterprise systems are moving
towards service-oriented architectures in order to focus more on the
behaviors of interest.  Instead of calling a lot of get methods on
what are essentially simple data structures, clients invoke behaviors
on richer interfaces.  A proliferation of get*By methods such as you
describe is an indication that some refactoring is in order.
>
>Robert C. Martin wrote:
>> On 9 Nov 2005 21:59:18 -0800, "topmind" <top...@technologist.com>
>> wrote:
>>
>> >SQL in code is not evil[1].
>>
>> Agreed.  SQL is not evil.  But combining it with business rules is
>> inadvisable.
>
>In a good design, most of the code should be business rules. If it is
>GUI management, for example, then you are not using the appropriate
>libraries/tools/languages. You are not coding to the domain but wasting
>code on technlogy issues or the like rather than spending code on
>solving the problem at hand.
A huge amount of effort goes into mapping business data into
presentation formats.  This mapping is not a business rule, per se;
but is still complicated and important code that needs to be written.
It should be separate from the business rules.
Similarly, there is often a large amount of effort that goes into
mapping the business data into a persistent repository (i.e. a
database) that is accessible and usable by many different
applications.  Once again this mapping is not a business rule, per se;
but is still complicated and important code that needs to be written.
It should be separate from the business rules.
-----
Robert C. Martin (Uncle Bob)  | email: uncl...@objectmentor.com
Object Mentor Inc.            | blog:  www.butunclebob.com
The Agile Transition Experts  | web:   www.objectmentor.com
800-338-6716   
"The aim of science is not to open the door to infinite wisdom, 
 but to set a limit to infinite error."
    -- Bertolt Brecht, Life of Galileo
Fine by me.
Many places seek to decouple things further at the database end by
wrapping all SQL in stored procedures.
Which brings up another issue.  Would your code snippet be from the
client tier or the middle tier?  Judgements might be different
depending.
J.
The number of joins is not a problem. Siebel queries, for example are
often join more than 30 tables. They execute in milliseconds. I've seen
45 table join in Oracle Apps. Complex query doesn't have to be slow. It
is the optimization of complex query that is challenging.
I wrote many times that those who use stored procedures as an interface
to the database are morons who are clueless about data fundamentals.
Tell me a single reason why wrap
insert into T (a,b,c,d,e) values (:1, :2, :3, :4, :5);
into a stored procedure
function insert_into_T (
    a integer,
    b integer,
    c integer,
    d integer,
    e integer
)
Worse yet, they don't stop at data modification. They expand their
silly interface idea to queries!
> Which brings up another issue.  Would your code snippet be from the
> client tier or the middle tier?  Judgements might be different
> depending.
I really don't see any difference.
Yes, but that that doesn't mean that it's true :-)
> Tell me a single reason why wrap
>
> insert into T (a,b,c,d,e) values (:1, :2, :3, :4, :5);
>
> into a stored procedure
>
> function insert_into_T (
>     a integer,
>     b integer,
>     c integer,
>     d integer,
>     e integer
> )
>
Several reasons:
First, it allows you to drastically reduce the amount of code in your
application.  If you have an object A that exposes property names that
match the parameter names of an update stored procedure, the entire
update operation can be performed with a single method call using pull,
along the lines of
dbAccessor.update(A,storedProcName);
Second, it supports separation of concerns.  An update operation may
involve updates to several tables, "if exists then update else insert"
logic, checks on other tables, etc. and this logic can be conveniently
collected in the stored procedure.
> Worse yet, they don't stop at data modification. They expand their
> silly interface idea to queries!
>
Not so silly, but the problem here is that the ANSII standard dosesn't
support returning arbitrary numbers of rows through SELECT statements.
SYBASE does support that, and with SYBASE you have a lot of room to
play with how the query is implemented without affecting the client
application, assuming, of course, some simple database accessor layer
in the middle.  For example, you can implement the stored procedure
with one select statement, then change to two select statements, then
add some new attributes, change an attribute name in a table but alias
it to the old name, use some secret Sybase techniques to force an index
if the query takes forever - you can do all of these things without
having to recompile your C, C++ or Java client app.  But ANSII and
Oracle don't support this, and SYBASE has almost disappeared off the
database landscape, so I don't do this anymore.
I rarely use stored procedures anymore, partly because of the lack of
sensible standardization, partly because developers are sometimes
locked out of databases by DBS's, partly because it is less necessary
now to use secret optimization hooks.  But there are benefits.
Regards,
Daniel Parker
You don't get the point. There does not need to be a 1 to 1 correspondence 
between a database and an application. RDB is all about applications using a 
common repository - this is what is known as enterprise data. If you are 
replicating the customer database, you do not have current data. Things like 
new customers, address changes, bad customers (fraud), need to be available ot 
all OLTP apps.
>The question is not about denormalization or not, the question is about
>denormalize and remove the original normalized tables. The best
>solution is to keep the normalized tables as master tables which is
>used for updates. The data in the master tables should be replicated to
>the denormalized slave tables, which can be used by functions that has
>performance problems. Doing this, you will not break any existing SQL
>statements.
I see your point, though I probably wouldn't want that much redundancy.
>> How many times do we need to write select .... from customer - > when a
> simple getCustomer method can be called?
>
>Of course you should not write identical SQL statements twice in your
>code. If you want to call the same SQL statements from multiple
>locations in your application, you just write a function/method that
>can reused.
>
>But this is not the same as decoupling. If you are decoupling SQL
>statements from the rest of your application, you are forcing every SQL
>statement in a separate layer regardsless if it is likely to be reused
>or not.
True.
>If you are decoupling your application, you will end up with about 20
>different getCustomerByXXX-methods. Different joins and different
>where-criterias will give you many many combinations.
Also true, though one may be clever and reduce the number of methods.
>Using decoupling will in practice force programmers to use predefined
>getCustomerByXXX-methods that are not suited for their current needs,
>because creating a new getCustomerByXXX method takes too much work.
>Instread they will do post-filtering and extra loading of related
>tables, after the call to getCustomerByXXX.
That's an organizational issue.
If you have inline SQL, you probably have developers rewriting SQL that has 
been written elsewhere, but who can find it? Decoupling will insulate the rest 
of the application to some extent where there are changes to the DB schema. It 
also makes it a lot easier to change the DBMS. One of my former customers went 
from Sybase to Informix to Oracle in 3 years. JDBC helps a lot but there are 
real differences between the vendor's SQL.
Eric
The front end was a web app and the SQL was dynamic. That certainly slowed it 
down. But there was not a lot of data - I think the largest table had less 
than 500 rows. But it's possible the query had problems - it's real easy to 
get a catersian product with a lot of tables. 
Eric 
>> Speed of test execution - the faster they run, the more likely we run
>> them, the more likely we write them, the more code coverage happens.
>
>When a test starts, I assume you in some way have to load the test data
>from disk into memory. The only difference is that the file system is
>touched at test startup. So I don't think there are very much speed to
>gain. And I am not sure that speed of test execution is so important
>that it should force us to do decoupling in the application. If
>decoupling should be done, there must be more significant arguments but
>speed of test execution.
1. Don't underestimate the power of fast tests.  When tests are fast,
people will run them.  When they are slow, they won't.  When people
run the tests frequently, fewer errors creep into the system.
Developer feel free to make changes and improve the code.  The sun
shines, and life is beautiful.
2. Forcing the tests to run decoupled from the database has the
secondary effect of forcing you to decouple your application from the
database.  This is just good design.  However, when coupled with the
unit tests, it is good design that is tested and enforced.
>> Test data is  all in memory - not stored in any file format.
>
>I assume the test data has to survive a system restart, so in some way
>they need to be in some file format.
Typicality in FitNesse pages, or Unit test constants.  Sometimes it
will be in some XML test-configuration file, or some binary image file
that gets loaded by the tests.
One of the goals of testing is to keep your test data under control at
all times.  One of the keys to doing this is to keep the test data as
small and focussed as possible.  Therefore, instead of having one
large test database that you run all your tests on, you have many tiny
little sets of test data.
>Robert C. Martin wrote:
>> On 9 Nov 2005 17:35:50 -0800, "Hongyu" <for...@hongyu.org> wrote:
>>
>> >To my knowledge, in OOP design it's not good to mix SQL queries with
>> >application codes in classes, because it limits the usability of the
>> >classes and bound them to the local database schema.
>>
>> Indeed, it is not good to mix SQL with business rules in *any*
>> paradigm.  You'd like your business rules to know nothing of SQL.
>> You'd also like your business rules to know nothing of the relational
>> schema.  The business rules want to use a convenient set of data
>> structures (or objects).
>
>Cool. When I forget my badge and show
>up at the front door and give them my employee ID, it is sure nice not
>to have an
>EMP table and query it directly,  but send a message to each and every
>DEPT
>instance to ask "does emp 1234 work for you?"
Which is, of course, NOT what you would do.  A good OO program does
not ignore the fact that a database exists.  What it tries to ignore
are the details of the interface; in particular the schema, the API,
and the query language (SQL).  
In the case you cited, the program would send a message to a query
object asking for employee# 1234.  That query object would then
generate the appropriate SQL, and use the database API appropriately.
>This funny interface separation idea is one of the reasons why data
>mangement folks call OOP crowd incompetent.
Which is an indication of their own ignorance of OO.  OO is not ANTI
database.  OO design is about decoupling, not about de-USING.  OO
designers don't avoid using a database, but they do avoid letting the
knowledge of the details permeate the business rules.
>boolean isValidEmployee( int empNo ) {
>    //wrap around "select count(*) from emp where emp# = :empNo"
>}
>
>OK, boolean more realistically is a record with picture, first and last
>names fields, but I digress. Tell me a single reason why this SQL query
>shouldn't be embedded in this function.
I think the SQL *should* be embedded in that function.  The question
is, where is that function?
If that function is in a class that looks something like this:
public class EmployeeGatewayImplementation extends EmployeeGateway {
  public boolean isValidEmployee( int empNo ) {...}
  ...
}
And if there exists an interface that looks like this:
public interface EmployeeGateway extends {
  public boolean isValidEmployee( int empNo );
  ...
}  
Then the application is decoupled from the SQL simply by using the
gateway:
EmployeeGateway eg = ...;
if (eg.isValidEmployee(1234)) {...}
Now I can replace the Gateway with any kind of implementation I like.
If I want my tests to use in in-RAM database, I can.  If I want to
swap out Oracle for Sybase, I can.  If I want to save all the
employees in flat files, I can.  
However, if I am less disciplined about where I put that function.  If
I allow the application code to build it's own SQL.  Then none of
these options are open to me.  I will have bound my application to the
API and query language of a particular database vendor and paradigm.
And that would be foolish.
No, it's not. When there is a lot of tables, optimizer search space is
huge. It can't simply afford to investigate all join order
permutations, in general, and join orders with cartesian products, in
particular. Db implementations differ, so I can't speak for all of
them, of course. The big three would give you some minimal quality
optimiser, at least.
Merge operation? (Often called upsert).
As for complex insert operation, why there might be more than one place
in the program where I store the shopping cart?  OK, suppose there is,
then the arguments to the storeCart stored procedure have to be rather
complex -- array of items.
The point of creating a function is to abstract some nontrivial
behaviour in it. Inserting a master record and a set of child records
is simple. For me it would take the same amount of time learning how to
invoke the function properly, versus calling the DML statements
directly.  I have to be familiar with database schema, of course. The
idea that database application programminng can be schema agnostic is
nonsence.
> Not so silly, but the problem here is that the ANSII standard dosesn't
> support returning arbitrary numbers of rows through SELECT statements.
> SYBASE does support that, and with SYBASE you have a lot of room to
> play with how the query is implemented without affecting the client
> application, assuming, of course, some simple database accessor layer
> in the middle.  For example, you can implement the stored procedure
> with one select statement, then change to two select statements, then
> add some new attributes, change an attribute name in a table but alias
> it to the old name, use some secret Sybase techniques to force an index
> if the query takes forever - you can do all of these things without
> having to recompile your C, C++ or Java client app.  But ANSII and
> Oracle don't support this, and SYBASE has almost disappeared off the
> database landscape, so I don't do this anymore.
If that feature mattered, the big three would give no second thought
implementing it. There is not a lot of new development in the database
landscape happening nowadays.
> There is not a lot of new development in the database
> landscape happening nowadays.
>
I think there is. Have a look at this.
Regards,
Daniel Parker
     While I tend not to use stored procedures because of the lack of
standardization across RDBMS implementations and the potential for
blurring the layers of abstraction in a system, if I were to use them
it would not be in this manner.  One of the purposes of decoupling
applications from the database schema is to provide a more appropriate
level of abstraction.  Excepting, as usual, simple CRUD and reporting,
applications are not interested in inserting data into a table.  A
more appropriate abstraction, whether implemented as a stored
procedure, C function, Java method, or Lisp form would be along the
lines of "create new customer" or "add item to shopping cart".
     Decoupling the application from the database schema supports the
use of more appropriate interfaces which leads to increased
maintainability and extensibility.  Applications should be focused on
_what_ to do rather than the mechanics of _how_ to do it at the
database level.
The dividing line between such is rather blurry in practice.
For example, we may start out with:
  if threshold greaterThan 100 then display_as_red
  else if threshold greaterThan 50 then display_as_yellow
  else display_as_white
This is as much a "business rule" as any.  But it is also a display
issue.
Now suppose that different managers want different threashold colors
and ranges and want to self-tweak them. At such a point we may
"meta-tize" the logic by having a table such as:
  table: thresholdColors
  ----------------------
  mgrRef  (manager reference ID)
  amount  (upper limit range, double or float)
  color      (web color format rrggbb)
This is more or less the same thing as before, but represented
declaratively instead.
Back to the display issue. If this was recorded in a log instead of
just displayed, would that change its classification in your mind? In
otherwords, the dichotomy between "display" and "process" is flimsey in
the biz world because whether something is display data or "kept" data
(processing) is only a small difference. There have been many times
where someone will ask, "Can we keep these report results every week
instead of just display them so that we can study trends and changes?"
One may suggest that one can simply use the raw data instead to
regenerate as needed, but what if they want to see the threasholds as
set in the past rather than see past data with current threadholds?
(The peskiest managers will want both views.) This is fairly common
with rate information: they may want to see old data with the old
rates, not the current rates. Either we store historical rates, or we
just store a snap-shot of the calculations at a given point.
>
> Similarly, there is often a large amount of effort that goes into
> mapping the business data into a persistent repository (i.e. a
> database) that is accessible and usable by many different
> applications.  Once again this mapping is not a business rule, per se;
> but is still complicated and important code that needs to be written.
> It should be separate from the business rules.
>
>
I view a RDBMS as a powerful processing tool, not JUST "storage". One
can often convert a lot of code into a declarative format, somewhat
like above. In fact, one can convert *most* of the code into such if
you really wanted to. However, the practicality of the extreme
declarative approach is questionable (at least with today's tools).
In summary, I don't think clear lines can be drawn. (Unless perhaps
your methodology dictates up-front what to declaratize and what not
to.)
>
>
> -----
> Robert C. Martin (Uncle Bob)  | email: uncl...@objectmentor.com
-T-
Hm, this sounds like just a display issue, unless your business is in
traffic lights.
Business rules alter the state of a (business) object. Everything else
isn't a business rule - but a display/reporting/postprocessing issue.
A rule like
if threshold greaterThan 100 then state=alert
else if threshold greaterThan 50 then state=warning
else state=ok
is a business rule. Then everyone can display the state with the colors
they want.
>Now suppose that different managers want different threshold colors
>and ranges and want to self-tweak them.
You first have to know how your business looks like, then you decide
what's a business rule is about.
If you are *not* doing traffic lights,  and you are in a business where
the threshold can have three states only, the managers have no..
business :) trying to put ranges in - *unless they cosciently decide to
alter the business*. So long they deal with the three states, they of
of course can change the display colors as it suits them.
The reason for this is that changing the business rules has usually
vastly larger implications than "just" a display issue - and such
changes should be therefore considered attentively or at least taking a
_coscient_ risk.
If you're doing traffic lights, then the display colour *is* business.
Again, there's no manager in the world that can decide that suddendly a
traffic light must have also the pink color - that would defy the
business of traffic lights, which is to indicate to stop, go ahead or
warning by colour codes understood by everybody.
Your rule mixed the two.
>Back to the display issue. If this was recorded in a log instead of
>just displayed, would that change its classification in your mind? In
>otherwords, the dichotomy between "display" and "process" is flimsey in
>the biz world because whether something is display data or "kept" data
>(processing) is only a small difference.
The biz world doesn't need to have clear ideas because the bottom line,
not consistency, matters. And this can be achieved, for simple
businesses, without particular care for consistency - things are simple
enough that things can be fixed on the fly if so necessary. At the
opposite side, software systems need to be built along clear ideas -
*which can change* - but should  be clear at any given moment.
The two things - bottom line and consistency - converge when businesses
become more complex. Then the additional cost for consistency becomes
justified.
The fact that there's lots of people that doesn't get this in the biz
world doesn't make it less true - just makes lots of business run down
or with big inefficiencies (which in the long run amounts usually to
the same). The usual path is that - since most company start small -
their people does not realize that the rules of the game change under
their feet while the company's growing - and there you have the
periodic "reorganizations", "
Incidentally, everything else being equal, that cost lowers with how
much easier it is to perform changes, and there's where OO - or in
general what I'd call "change-oriented systems and engineering" (that's
a nice name, isn't it?) - come into play.
>One may suggest that one can simply use the raw data instead to
>regenerate as needed, but what if they want to see the threasholds as
>set in the past rather than see past data with current threadholds?
That's interesting - because if you take this "change-oriented"
perspective, the idea that you may want to track history of things
(business rules, for example) becomes completely natural. In software
engineering terms, for example, becomes a no-brainer to have some
version control system, defect tracking etc (which god knows how many
software teams do not use, in 2005). The limits are due only to
cost/benefit tradeoffs.
>I view a RDBMS as a powerful processing tool, not JUST "storage".
They are, no doubt, but they'll work as good or as bad as the design
that is governing they use..
Actually, the color is irrelevant in traffic light business :)
Regards,
Goran
Well, it matters to driver, however, traffic box has no influence on color
displayed as it fixed (lamp with filter). It is possible for maintanance to
connect the outputs to wrong lamps. In that case, driver will be freed in
court (if he/she survives possible crash). Bad things will happen to one who
did such mistake, though.
Traffic box, however, has extremly strict business rules/requirements that
it has to meet. Though it looks like simple task, it's actually complex
software and hardware that is controlling few simple lamps. But this is,
however, out of topic for this thread.
Regards,
Goran
> Business rules alter the state of a (business) object. Everything else
> isn't a business rule
>
I don't think so.  The term "business rules" refers to declarative
statements that define or constrain facts about the business.  Nothing
to do with altering the state of objects, per se.  Business rules may
place constraints on what state changes are valid state changes.
Regards,
Daniel Parker
There is no logic there -- neither business logic, nor logic in your
message. An active database (this is a valid term BTW that can be
googled) can register an interest in any specific event, being that
hreshold > 100
or any other complex condition. A SQL language allows formulating
predicates that can have insane level of complexity (e.g. nested
subqueries). Comprehending predicates let alone subqueries, though,
seems to be a way over the head of a OO propellerheads. Traffic light,
that's such a challenging problem to model! Airplane.fly! Give me a
break.
Sometimes biz managers ask for things that seem illogical or poorly
coordinated. Sometimes I might politely (as much as an geek can) point
out cleaner alternatives. However, they pull the purse strings so what
they want is what they get. If each manager wants different levels
values, different number of levels, and defferent colors, then that is
that. My job is to deliver wishes, not argue about business practices
(beyond politely suggesting alternatives). That is the kind of stuff I
tried when I was younger and got slammed by the reality paddle.
[....]
> Incidentally, everything else being equal, that cost lowers with how
> much easier it is to perform changes, and there's where OO - or in
> general what I'd call "change-oriented systems and engineering" (that's
> a nice name, isn't it?) - come into play.
I'll believe it when I see it in actual (or at least realistic) code.
The textbOOks assume certain change patterns that I just don't find
very realistic.
-T-
> Sometimes biz managers ask for things that seem illogical or poorly
> coordinated. Sometimes I might politely (as much as an geek can) point
> out cleaner alternatives. However, they pull the purse strings 
Another thing you might want to do is ask questions: "Can you please 
tell me more about why you need this in red if it's over 100, and yellow 
if it's under 50 ? What do you do when you see a red one, that you don't 
do for a yellow one ?" These questions might well lead naturally to 
terms like "alert" and "warning", as Cristiano suggests.
There's not much risk of offending, as you are asking questions to 
further your own understanding of the business - which cannot be assumed 
to be as extensive as the manager's. Quite different from arguing.
The important things is to keep asking questions until we've drilled 
down to the real requirements.
It's possible to write software without doing that: we end up with code 
that coincidentally happens to do the right thing (show red and yellow 
at appropriate times). But we have failed to identify the reasons why 
those are the right things to do, record those reasons in the code, and 
make those reasons a part of the conversations about the code. We're 
missing the key terms "alert" and "warning". Our failure is not an 
engineering failure so much as a failure to learn.
These debates about "decoupling" are mostly about that, I thhink: about 
how effective we are at learning things about various domains, using 
ever more precise language to talk about these domains, and how well our 
notations (source code, etc.) reflect such increasing precision.
Laurent
This is why it is so important to have a business analyst involved in a
project.  As a business analyst myself, however, I may be bias.  Too
often a business manager will dictate what they want in a piece of
software and the developer will be required to put those features in.
If your lucky you may have a requirements analyst that can sift through
those requests and provide a bit of a buffer for technically unwise
choices or contradictory requests.  However when it comes down to the
business applicability of a request you really need someone that has
enough domain knowledge (or access to it), software engineering
knowledge, process design knowlege, and authority to bridge the gap.
Unfortunately most business analysts grow from the business side and
think they understand more about software engineering than they really
do.  Occasionally you'll have a software engineer or architect that
will move into this arena and make decisions based more on software
related issues than business issues.  What is needed is a truly
balanced representation of these two interests.  Somtimes this takes
two analysts (one from each side), other times you can find a truly
balanced analyst (I like to think I am such an analyst).
Though its becoming less common, many managers still expect to hire a
software engineer(s) and have all the skillsets necessary. Busninesses
are only just beginning to recognize the different divisions of skills
to develop software (programmer, architect, project manager,
requirements analayst, etc.) and that not ever programmer is skilled in
all these areas just because they can write code.  Many companies are
also recognizing the need for business analysts but they are usually
coming from the accounting or engineering diciplines instead of
computer science or software engineering.  But I think that is slowly
changing.
-Scott Frye
Scott...@aol.com
-------------------------------------------------
Panton in temperantia, comprehendo temperantia.
     -unknown
-----------------------------------------------
> This is why it is so important to have a business analyst involved in a
> project.  As a business analyst myself, however, I may be bias.
As my own posting might suggest, I think you have a point. However, I 
find that role labels (such as "business analyst" or "developer") can be 
dangerous because we lack common understanding of what skills and 
responsibilities we refer to when discussing these roles.
What does a business analyst do, in your view - that is, what do *you* 
do on a project ?
> any managers still expect to hire a software engineer(s) and have all
> the skillsets necessary. 
Agreed - just as I said, role labels can be dangerous. A role is only a 
shorthand for one way to package a set of skills. What is important is 
the skills themselves, not the details of how they are grouped together. 
We put too much trust in the labels, and fail to properly focus on the 
skills.
This is why one might find, for instance, that "software developer" in 
an agile development context means a different set of skills than is 
meant in other contexts. (A developer on an agile team is expected to 
perform activities that one would expect of an analyst in other 
contexts.)
Laurent
That is a good suggestion.  If you are in a position to posit extended
questions, by all means, go for it.
To be fair, SQL is not meant for large statements and can be difficult
if not impossible to factor out repetition in many cases. However, SQL
having flaws does not necessarily mean OO is the only alternative.
-T-
Yeesh.
I totally don't get what the game is supposed to be, putting XML and
XPath and whatever inside a relational framework.
Can you make any sense of this paper? At a quick scan, I cannot.
What *is* new in database?  We're all still working out the surrogate
key thingy.  Still trying to work out the proper OO/relational
coupling.  The real new in database is having it find its proper place
in a larger architectural world, with sets, set relations,
self-describing data (an attribute of relational models before anyone
tried to use XML for general purposes!).  Maybe that's what the XPath
etc nonsense is really all about, but it seems to be mostly hacked at
and not really thought through at a fundamental level.
J.
ps - good responses by you and Patrick re stored procedures.
Apparently there's a large culture gap between many OO practitioners
and even the most basic principles and practices of RDBMS, but that's
not exactly news.
Just on a practical basis, when I want to make a schema change for,
say, performance reasons, it's simply convenient to have everything
relevant available as SPs.  Especially more convenient than it might
be in an app that synthesized SQL strings at runtime so they would be
that much harder to find and coordinate with database changes.
Database tier needs to encapsulate its functions and deal with other
tiers only by contract and signature, and SQL, as much as we love it,
couples things too closely.
I agree!
>
> What does a business analyst do, in your view - that is, what do *you*
> do on a project ?
see below...
>
>> any managers still expect to hire a software engineer(s) and have all
>> the skillsets necessary.
>
> Agreed - just as I said, role labels can be dangerous. A role is only a
> shorthand for one way to package a set of skills. What is important is
> the skills themselves, not the details of how they are grouped together.
> We put too much trust in the labels, and fail to properly focus on the
> skills.
I agree.  In a perfect world, I think it is important to enumerate the
skills and let the project managers decide the best way to group those
skills into roles, the roles into people's jobs and the fill those jobs
with the people they see fit.  Of course, as you say, roles are a
shorthand to group those skills together and certain groupings might be
recognized as best practice.
>
> This is why one might find, for instance, that "software developer" in
> an agile development context means a different set of skills than is
> meant in other contexts. (A developer on an agile team is expected to
> perform activities that one would expect of an analyst in other
> contexts.)
And this is great.  If the skills are clearly emnumerated then
different development methodologies can be defined by the set up skills
applied, their order, and these can be labeled with role names.
You asked what role I do on a project?  Well I work in a manufacturing
IT department but I am also have advanced degrees in computer science.
My role differs from project to project.  On small projects I may do it
all other times I architect a solution and pass the requirements and
domain design on to a coder.  On large projects I will lead or
facilitate a team that starts from conception end ends with delivery.
There are projects where I am the project lead and there are others
that I've just been the archiect or just a consultant.  There are
projects we develop in house and others we have done by outside
consultants.  Basically my role varies constantly and I try to manage
this by keeping a lifecycle in my head and determining exactly which
roles I'm filling in any given project and who is filling the other
roles.  I make sure that if there is a role missing, it is pointed out
to the appropriate people quickly.
But I think your question was about the role of Business Systems
Analyst.  I am responsible for identifying areas in our various
processes (usually manual but frequently mixed) that could be improved
upon and applying computer resources.  The first step is to become
intimate with the process I am analyzing and determining if it best
accomplishes its goal.  I apply Six Sigma techniques to do this as I am
a six sigma black belt.  It is not uncommon for me to find several
steps in a business process that are redundant or whose need went away
years ago.  You'd be surprised how often I find someone spending hours
a day to gather data and generate reports that are never read because a
new report replaced it 5 years ago!  After analyzing and optimizing the
process itself, I will be ready to suggest a software solution to the
team (or several to be selected from) and translate these into
requirements for development.  That is where the typical software
development process begins and various paths may be taken to coding
depending on the complexity of the process and how much of it can be
done with custom coding vs. integration of existing or new canned
systems, decisions for development in house or by a contractor, etc.
I think in a general sense, the role of a Business Systems Analyst
should be to consider the entire process and act as a process expert
that translates the needs of management, into requirements for software
development, while keeping the big picture busniness picture in mind
because managers are usually only worried about their corner of the
business.  Basically the Business Analyst is a interface between what
the customer thinks he wants, and what the best solution really is.
-Scott Frye
Scott...@aol.com
-------------------------------------------------
Panton in temperantia , comprehendo temperantia.
     -unknown
-------------------------------------------------
> Basically the Business Analyst is a interface between what
> the customer thinks he wants, and what the best solution really is.
Thanks for the thoughtful response. Quite useful to me.
Which Six Sigma techniques do you find most immediately useful, in 
practice, at the start of a project ? (That is, for becoming intimate 
with a process and determining if it accomplishes its goal ?)
Laurent
It depends greatly on the situation.  The methodology is more important
to me than the specific tools.  To me Six Sigma has always been about
the methodology.  The tools where around a long time before the
methodogy and I used many of them before I knew what six sigma was.
Additionally, I am always on the look out for new tools to add to my
"Problem Solving Skills" toolbox.
In general the entire DMAIC loop is the most useful to me in all
aspects of systems development and any problem that I'm analyzing it
ANY context.  Analyzing a system to see what is wrong with it,
pareto-ing the problems and doing a cost analysis greatly assists in
defining what parts of the systems could be improved upon and how much
time we should spend fixing a specific problem.  Developing a plan that
is supported by numbers helps me seperate the would-like-to-haves from
the must-haves and helps me prioritize the requirements.  Metrics is
the way to go.  This entire process has the side effect of making you
an expert about the system.  The overall picture that develops often
allows me to see relationships around it that the people executing the
system may miss (can't see the forest for the trees).
Pareto charts, Cause and effect diagrams, FMEA, Brian storming diagrams
all are useful to me.  I sometimes make process maps and standard flow
diagrams but these usually look more like UML class diagrams and
interaction diagrams because I think in terms of software objects.
Analyzing the defect database and determining which products have the
most variation on their control charts would be useful as a method for
looking for issues to address, but so far the issues have always found
me and haven't been so subtle as to need these techniques...still
working on the low hanging fruit I guess.  Tools like GAGE analysis,
control charts and DOE's are more useful to the quality engineers than
me though being able to interpret these work products sometimes helps
me by allowing me to converse about the process with the experts.
I have yet to apply 6 sigma technique to the actual development
process.  Mostly because our custom development is so limited that any
benefit gained would not justify the cost of collecting and analyzing
the data yet.  Maybe someday.
I believe that most developers only try to find out as much about the
system as they absolutely need for writing code.  The goal is usually
to implement a piece of software that meets the requirements instead of
developing a software that represents an optimal solution for the
problem space.  Many developers, after working on a project become
domain experts and upon completion of the project would do it all
differently if they were to start over.  Applying solid techniques to
learn the problem space and design it (in my case 6 sigma) before
developing the software not only makes the process easier, but it
reduces the number of changes that can be expected in the system
shortly after its deployed.  Most of those changes, in my experience,
are a result of a poor understanding of the problem space by those that
wrote the requirements.
Proponents of agile development techniques SEEM to agree that you have
to become a domain expert systematically and their technique is about
remaining nimble enough with your code to be able to handle
mis-understanding about the system and turn on a dime as your
understanding changes.  The evidence seems to suggest this is a
technique that works but I think the jury is still out and they seem to
ignore that there are other techniques (like I've described) to gain
that knowledge.  There are some systems, where the problem space is
rapidly changing that these agile techniques my be the only answer for.
As I write this it occurs to me that the agilist will argue that this
is Big Up Front design.  I would disagree.  I think it is always good
to analyze a system and get a good understanding of it before
proceeeding but I was taught you always have to guard against
Analysis-Parallysis.  Six Sigma is also about iteration.  That's why
DMAIC is a loop.
>
>Robert C. Martin wrote:
>> A huge amount of effort goes into mapping business data into
>> presentation formats.  This mapping is not a business rule, per se;
>> but is still complicated and important code that needs to be written.
>> It should be separate from the business rules.
>
>The dividing line between such is rather blurry in practice.
It often seems blurry.  It is a goal of software design to sharpen the
focus and discover the true separation between business rules and
presentation rules.
>
>For example, we may start out with:
>
>  if threshold greaterThan 100 then display_as_red
>  else if threshold greaterThan 50 then display_as_yellow
>  else display_as_white
>
>This is as much a "business rule" as any.  But it is also a display
>issue.
Correct, the rules are blurry.  We need to separate them.  And in this
case the first level of separation is simply between a state, a cause
for that state, and a way to render that state.  We might map it as:
  Cause           State        Render
------------+---------------+--------------
  t <= 50   |    Nominal    |   White
  50<t<=100 |    Warning    |   Yellow
  t > 100   |    Alarm      |   Red
This mapping allows us to express the situation as two ordered pairs.
The first (Cause,State) is the business rule.  The second
(State,Render) is the presentation rule.  
>Now suppose that different managers want different threashold colors
>and ranges and want to self-tweak them. At such a point we may
>"meta-tize" the logic by having a table such as:
>
>  table: thresholdColors
>  ----------------------
>  mgrRef  (manager reference ID)
>  amount  (upper limit range, double or float)
>  color      (web color format rrggbb)
>
>This is more or less the same thing as before, but represented
>declaratively instead.
Yes, this adds another column to the mapping.   Actually, it should
add two different columns.  The two ordered pairs now become ordered
triples: (Mgr, Cause, State) and (Mgr, State, Render)
>Back to the display issue. If this was recorded in a log instead of
>just displayed, would that change its classification in your mind? 
No, it's still the same issue. The "render" action simply changes.
>In
>otherwords, the dichotomy between "display" and "process" is flimsey in
>the biz world because whether something is display data or "kept" data
>(processing) is only a small difference. 
What you are saying is that user interface is volatile.  The way we
display things is subject to capricious change.  I agree.  That's why
it's useful to separate the things that do change from the things that
don't.  Our ordered triples do this nicely.
The point is that separations like this are a major objective of
software design, and in particular of Object Oriented software design.
>> Similarly, there is often a large amount of effort that goes into
>> mapping the business data into a persistent repository (i.e. a
>> database) that is accessible and usable by many different
>> applications.  Once again this mapping is not a business rule, per se;
>> but is still complicated and important code that needs to be written.
>> It should be separate from the business rules.
>I view a RDBMS as a powerful processing tool, not JUST "storage".
Again, the rule of separation comes into play.  An RDBMS is blurry.
It is simultaneously a storage mechanism and a processing tool.  We
need to keep these two aspects separate within our software design for
the same reason that we need to keep Cause, State, and Render separate
in the example above.
>
>In summary, I don't think clear lines can be drawn. 
Drawing those lines is a major part of what software design -
especially object oriented design - is all about.
-----
Robert C. Martin (Uncle Bob)  | email: uncl...@objectmentor.com
... AND....
>
> The point is that separations like this are a major objective of
> software design, and in particular of Object Oriented software design.
>
I agree with the the concept but the wording is too far reaching.
"A goal" of software design is to provide easily manintainable code
that scales well.  "A technique" to do this, used by OO software
design, is to sharpen the distinction between business rules and
presentation rules.
Does anyone really wonder why 70% of software projects are late, over 
budget, or both?
Software developers are too often asked to indulge the arrogance, 
incompetence, and laziness of managers and staff.  In some cases 
programmers (or business analysts, or software engineers, or project 
managers or whomever's role it is to interrogate users) need to find out 
what the user really needs.  To discover a total they'll ask for a 
report.  To find a list of at-risk accounts they'll ask for color-coded 
account numbers.  User's aren't in the habit of being precise about what 
they need, they're more accustomed to asking for what they're used to 
getting and perhaps some way of enhancing what they're accustomed to, 
rather than asking the software to do something more specific.
Is this feature really qualify for an 80/20 rule of any sort?  Is the 
software usable without it?  Does it deliver value without it?  Is this 
a feature they really want implemented before another feature that 
delivers greater value to the business?  Does the customer really want 
programmers worrying about how to color-code text to each user's taste 
and proclivities?
How often do we quite our longtime physician simply because they refuse 
to prescribe for us something we ask for?  We trust our doctors to do 
what's best for us even though they aren't us.  We feel the discomfort, 
but accept their treatment.
Programmers need to be a little more assertive in how they treat their 
clients business ailments, and do a better job of treating the patient 
than indulging their cosmetic vanities.  If the developers (individually 
or collectively) aren't in a place to offer expert business-relative 
advice, then perhaps they shouldn't be writing that specific application 
in the first place.
How many of us hire a massage therapist when a urologist is needed?  Why 
hire programmers with financial application backgrounds to write a 
workflow system?
One of the best ways to begin differentiating business rules from 
presentation rules is to get rid of as many of both as possible.  This 
is what XP, speed-to-value, and other release-early-and-often practices 
are all about.  This is true for procedural, object oriented, aspect 
oriented, functional, and even table-oriented programming.