Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

From algorithms to patterns

0 views

Skip to first unread message

learningOO

unread,

Jul 2, 2005, 9:36:04 AM7/2/05

I have problems in thinking in OO (patterns) terms. I am quite good in
doing (bottom-up) useful classes, like a library, and then use them to
raise the abstraction level (however I use lotsa encapsulation and cool
operator overloads tricks but not much inheritance), but the whole
application (the main func or equivalent) looks like an old C-like
application with

do A
do B
for loop: {do C,D ...}

even if it is at a higher abstraction level due to my utility classes
(which basically just increase the conceptual density of the code).

I would appreciate a hint for getting out from this way of thinking, if
there is one.

Also, I will make a specific example and ask you how I should make this
one pattern oriented:

<<<
A- Get data out of a database, grouped by the value of a column.
B- For each group:
{
B1- Dump such data to disk and launch an exising executable which will
perform computation and transform the data.
B2- Read the new data from disk and perform some analysis (coded here,
by me)
B3- Put the result of analysis in the database, in another table which
associates "value of the column" -> "result of analysis".
}
>>>

I have problems in coding this stuff in a format which is not the
obvious one with three global functions B1, B2 and B3, and the main()
function doing A and the B loop which calls such functions.

Also, I have commandline parameters which set a few global parameters e.g.
P1- username and password to access the database (for functions A and
B3), the table name and the column name for A, and the table name for B3
P2- the level of output verbosity
P3- the behaviour of B3, deciding if it only should do an "insert into"
(if data already existing, skip) or a subsequent "update" (if data
already existing, replace)

It is annoying to pass all those parameters around, expecially the
verbosity level P2, which all functions should know... and also the P3
which has to be passed both to A (at the beginning of the program) and
to B3 (at the end of the program). I am tempted to make those parameters
global instances, which is considered bad programming, or in a
singleton, which is not much of an improvement because again breaks
encapsulation (*)

(*) it breaks encapsulation because the code inside classes and
functions relies on the fact that there is a singleton outside, with
such name, and a verbosity level written inside, and these requirements
are not written on the interface of objects nor on the signatures of the
functions... and in case I code in a scripting language these
requirements are not even enforced at compile time... it just breaks at
runtime. Am I right that singletons are almost as bad as global instances?

Thanks for your help
learningOO

H. S. Lahman

unread,

Jul 2, 2005, 3:10:00 PM7/2/05

Responding to LearningOO...

> I have problems in thinking in OO (patterns) terms. I am quite good in
> doing (bottom-up) useful classes, like a library, and then use them to
> raise the abstraction level (however I use lotsa encapsulation and cool
> operator overloads tricks but not much inheritance), but the whole
> application (the main func or equivalent) looks like an old C-like
> application with
>
> do A
> do B
> for loop: {do C,D ...}
>
> even if it is at a higher abstraction level due to my utility classes
> (which basically just increase the conceptual density of the code).
>
> I would appreciate a hint for getting out from this way of thinking, if
> there is one.

The problem here is not about patterns per se. It is about basic OOA/D.
The example above is a classic procedural approach where all one has
done is organize a function library into classes. IOW, it is a C or
FORTRAN program with strong typing.

FOr example, the classic main() processing in an OO application would be
more like:

Factory.instantiate() // create an initial set of objects
A.start(); // kick off processing

That is, one instantiates an initial set of objects and then sends a
message to one of them. That object responds and sends a message to the
next, which responds to that message and then sends a message to another
object, and so on... Along the way the objects IOW, the flow of
control is captured in a daisy-chain of direct, peer-to-peer
collaborations between objects after the initial "primer" message in main().

A second, implied, problem is that the objects will exist to organize
individual functions rather than existing to abstract identifiable
problem space entities. Abstracting entities from a problem space is
the core of OOA/D. So you need to switch from the bottom-up view
class-oriented view to the Big Picture view as represented by OOA/D. My
advice is to find a couple of good books on OOA/D and read them. [The
Books section of my blog has some suggestions.]

>
>
> Also, I will make a specific example and ask you how I should make this
> one pattern oriented:
>
> <<<
> A- Get data out of a database, grouped by the value of a column.
> B- For each group:
> {
> B1- Dump such data to disk and launch an exising executable which
> will perform computation and transform the data.
> B2- Read the new data from disk and perform some analysis (coded
> here, by me)
> B3- Put the result of analysis in the database, in another table
> which associates "value of the column" -> "result of analysis".
> }
> >>>

A, B1, and B3 are primarily concerned with persistence. Only B2 and the
external executable actually solves a problem for the customer.
Conversely, the B2 processing doesn't care whether the data it processes
is stored in an RDB, flat files, or clay tablets.

So the first insight is to isolate the persistence access mechanisms so
that B2 does not have to know anything about them. Typically that is
done by having a dedicated subsystem of layer that encapsulates the
persistence mechanisms. For a simple example like this that may end up
just being a single class but conceptually it is an entirely different
subject matter.

The interface to that persistence should be designed around the
problem's needs. For example, one would expect an interface method like
getNextGroup to get the A data.

The processing in B2 will have business semantics. One is manipulating
entities that live in the customer's problem space (e.g., Customers,
Orders, Invoices, whatever). Solve the analysis problem in B2 by
abstracting objects for those entities. (Note that B2 only needs to
understand a single group.) Instantiate those objects from a factory
object that invokes getNextGroup to obtain the values to assign to
object attributes. Once the group objects are created, kick off
processing by sending a message to one of them. Daisy-chain their
collaborations until one has the results.

Then extract the results values and invoke the saveResults (or whatever)
persistence interface method to let the persistence access subsystem
handle the actual database writes (B3). It will understand how to map
the arguments into columns or whatever and invoke a SQL driver (or
whatever).

The key idea here is to solve the B2 customer problem first using
problem space abstractions. Then worry about how to get the relevant
attribute data in/out of the database. Let the persistence access
subsystem handle the mapping of data values to specific RDB (or
whatever) formats.

>
> I have problems in coding this stuff in a format which is not the
> obvious one with three global functions B1, B2 and B3, and the main()
> function doing A and the B loop which calls such functions.

Somebody needs to understand the context of multiple groups because B2
only works on one group at a time. Examine the problem space for an
entity that would naturally understand multiple groups. Usually that
will be some conceptual entity that would be a container for the groups
in the business context (e.g., a sales region if the groups are salesman
statistics). Let the entity have the responsibility of iterating over
getNextGroup, B2, and saveResults.

With peer-to-peer collaborations that entity probably doesn't even need
a loop. For example, assume an object, Region, that has several
Salesman, which for this example I will use as a surrogate for a set of
analysis objects that B2 processes. Assume a SalesmanFactory that
creates Salesman objects from database data. We might have the
following messages:

[Region] [SalesmanFactory] [Salesman]
| | |
| M1:newRegion() | |
|------------------------>| |
| | |
| M2:ready() | |
|<------------------------| |
| | |
| M3:process() | |
|------------------------>| |
| | |
| M4:noMore() | |
|<------------------------| |
| | M5:create() |
| |------------------------>|
| | |
| M6:created() | |
|<------------------------| |
| | M7:analyze() |
|-------------------------------------------------->|
| | |
| M8:done() | |
|<--------------------------------------------------|
| | |

The M1 and M2 messages just initialize [SalesmanFactory] for a new
region so it will start reading the database with the first [Salesman].
M3 cause [SalesmanFactory] to invoke getNextSalesman from the
persistence interface and it uses that data to instantiate a [Salesman].
(M5 would be the constructor at the OOP level.) If there are no more,
it notifies [region] with the M4 message; otherwise, if notifies
[Region] when the [Salesman] is ready via M6. Then [Region] issues the
M7 message to [Salesman] who responds with the M8 results. [Region]
responds ny invoking the saveSalesman() persistence interface method and
issues another M3 message. So the loop is now implicit in the {M3, M5,
M6, M7, M8} sequence of collaborations that continue until an M4 is issued.

This approach requires more interactions among objects than an embedded
loop would but it is far more robust. The reason is twofold. First
behaviors are broken down into logically indivisible chunks that will be
easier to manage and modify. Second, [Region] doesn't really need to
know much about the overall processing. It just responds in a very
limited way to M2, M6, and M8 messages without any real understanding of
how salesmen are related to regions. The rules, like cleaning up when
all the salesmen for a region have been processed, are isolated and
independent of any other processing or sequences of processing.

>
> Also, I have commandline parameters which set a few global parameters e.g.
> P1- username and password to access the database (for functions A and
> B3), the table name and the column name for A, and the table name for B3
> P2- the level of output verbosity
> P3- the behaviour of B3, deciding if it only should do an "insert into"
> (if data already existing, skip) or a subsequent "update" (if data
> already existing, replace)

Note that in the above example, only the persistence subsystem needs to
know about table names or how groups are ordered in tables. However,
that requires a mapping between the data identity in the B2 view and the
table/tuble identity in the RDB. Pure data transfer interfaces are good
for that because the only thing shared across the boundary is the data
packet definition. That is, each subsystem has its own unique view of
the semantics of the data.

In the simple example above that identity was implicit in the
getNextSalesman interface access. In more complicated situations it can
be done via table lookups where the tables are instantiated at startup.
In any event, that mapping is local to the persistence access
subsystem and the B2 processing knows nothing about it. Conversely, the
mapping of the message data to B2's object attributes is a personal
matter for B2 and the persistence access subsystem needs to know nothing
about it.

[Often the B2 and RDB views will map 1:1 because both model the same
basic problem space, especially in a simple example like this. Then the
encode/decode of messages on each side of the subsystem boundary seems
like unnecessary complexity. However, what the encode and decode are
really doing is decoupling the implementations so that they have no
knowledge of each other at all. That will greatly improve
maintainability in the future if the views may start to diverge. If one
is solving a problem for the user that is beyond traditional CRUD/USER
processing the problem and database views will almost always have
significant differences.]

In particular, your data seems to be column-first. That becomes
completely hidden from the B2 processing. IOW, the query or join can be
arbitrarily complex and it is only known with the persistence access
subsystem. B2 just requests, "I want the data I call 'X'," and the
persistence access subsystem knows how to map X to the RDB.

>
> It is annoying to pass all those parameters around, expecially the
> verbosity level P2, which all functions should know... and also the P3
> which has to be passed both to A (at the beginning of the program) and
> to B3 (at the end of the program). I am tempted to make those parameters
> global instances, which is considered bad programming, or in a
> singleton, which is not much of an improvement because again breaks
> encapsulation (*)
>
> (*) it breaks encapsulation because the code inside classes and
> functions relies on the fact that there is a singleton outside, with
> such name, and a verbosity level written inside, and these requirements
> are not written on the interface of objects nor on the signatures of the
> functions... and in case I code in a scripting language these
> requirements are not even enforced at compile time... it just breaks at
> runtime. Am I right that singletons are almost as bad as global instances?

In theory any public attribute is accessible by any other object that
can reach it through some set of relationship paths. Since all objects
in a subsystem are usually somehow connectable, all attributes are
potentially global. The way the OO paradigm manages that is all classes
are accessible but not all instances. That is, relationship
participation at the instance level severely restricts accessibility via
relationship navigation in practice.

So, since there is only one instance of a Singleton class, everyone who
can navigate relationships to get to the class will always accesses the
same instance. In that sense the instance data is global. However,
there is an additional consideration. The reason one is using Singleton
in the first place is because the problem space /demands/ that there be
only one source of data. Problem space rules and policies always trump
OOA/D/P practice. [OTOH, IMO Singleton is one of the most overused
patterns. Too often the problem does not justify the additional
complexity compared to a more natural abstraction.]

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
h...@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

0 new messages