Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

OOD and Normalisation

2 views
Skip to first unread message

v4vijayakumar

unread,
May 3, 2007, 5:21:12 AM5/3/07
to
How object-oriented design can be normalized? Normalization, anyway
related to OOD?

H. S. Lahman

unread,
May 3, 2007, 2:55:08 PM5/3/07
to
Responding to V4vijayakumar...

I really hope this was not a homework question... B-)

> How object-oriented design can be normalized? Normalization, anyway
> related to OOD?

The Class Model in UML is underlain by the same relational data model
branch of set theory that underlies DBMSes. As a result the Class Model
needs to normalized to Third Normal Form just like an RDB schema.
[Normal Forms above third are rarely relevant because they mostly deal
with identifier conventions and we rarely use explicit identifiers for
objects.]

Most OOA/D authors don't talk explicitly about normalization. However,
every OOA/D author will provide a suite of rules for constructing Class
Models that essentially ensure Third Normal Form under the guise of
things like one-fact-one-place. For example, the most comprehensive book
available on Class Modeling is Leon Starr's "Executable UML: How to
Build Class Models". Leon doesn't even mention Normal Form as far as I
recall, yet he provides the most comprehensive set for guidelines for
normalization that I have seen.

In a nutshell we have:

1NF: all responsibilities must be a simple domain. For knowledge
attributes this means that the attribute must be described in terms of
an abstract data type (ADT) that can be manipulated as if it were a
scalar. For attributes that can be expressed in terms of fundamental
values, the domain of data values must have a single semantics. So a
domain of {UNSPECIFIED, 5, 6, 7} is invalid because it captures two
separate semantics: valid data values of 5, 6, 7 and whether or not the
data is specified at all.

For behaviors this means that the behavior responsibility must be
cohesive and self-contained. Self-contained means it can depend on
knowledge attributes but it can't depend upon other object's behavior
responsibilities. (Note that this comes for free if one follows the
methodology's dictums about encapsulation.)

2NF; all responsibilities are fully dependent on the object identity.
Typically objects do not have explicit identity attributes but they do
have an unambiguous mapping to some some uniquely identifiable problem
space entity. This means the "value" of the property depends solely on
what problem space entity is abstracted in the object.

As a practical matter 2NF is not very relevant to OO development because
it is really about compound identifiers (i.e., multiple attributes
combine to define the object identity). What 2NF is saying is that if
there are multiple explicit identity attributes, then the "value" of a
non-identity attribute must be dependent on /all/ of the identity
attributes, not just some of them. A classic example of this is:

[Housing Development]
+ developmentID // identifier
...

[Subdivision]
+ developmentID // identifier
+ subdivisionID // identifier
...

[House]
+ developmentID // identifier
+ subdivisionID // identifier
+ houseID // identifier
+ style
+ builder

The style attribute is clearly dependent on the particular House
identity, which must be fully specified. The same thing seems true for
the 'builder' attribute since each House is built by one builder. But
suppose construction policy is that a builder builds all the houses in a
particular subdivision. Now the 'builder' value is fully specified if
one only knows {developmentID, subdivisionID}. So the 'builder'
attribute really belongs in the [Subdivision] class. [Note that if the
development id seriously homogenized, all Houses in the same subdivision
might have the same style. In that case, style also belongs in
[Subdivision].]

<aside>
Even though explicit identity attributes are not required in OO
development, I strongly recommend providing them during an initial cut
at a Class Model. It makes problems like the 'builder' attribute above
much more obvious and it also helps to define constraints on
relationship navigation when there are relationship loops. That is, when
there are alternate paths to get from object A to object B, must one get
the same set of Bs via both paths and, if not, which set is the right
one for a given collaboration? Derived identifiers as in the example are
excellent for sorting out such issues, even if they are never
implemented during OOP.
</aside>

3NF: all responsibilities depend upon nothing but the object identity.
Essentially this means that the "value" of a responsibility cannot
depend upon knowledge attributes that are not explicit identity
attributes. A classic example of this problem is:

[House]
+ address // identifier
+ builder
+ style
+ cost
...

The problem here is that it is highly unlikely that 'cost' is only
dependent on the House identity. In fact, it is probably dependent on
the style or on the combination of {builder, style}. IOW, only the
/combination/ of {builder, style, and cost} is dependent solely on the
identity of House, not the individual values. So, assuming cost is
solely dependent on style, we need:

[House]
+ address
+ builder
+ style // referential attribute
...

[Style]
+ style
+ cost

where the unique combination of values is captured indirectly through
the relationship to [Style].

---

One must be careful not to confuse coincidental values or data domains
with dependency. Consider Washing Machine and Refrigerator objects that
both have a 'color' attribute. If the colors are designed to be color
coordinated from the same manufacturer, they will have identical data
domains for 'color'. It is quite possible that an object from both sets
may be colored chartreuse. Nonetheless they are quite different things.
How is that the 'color' attribute doesn't violate 1NF (same data domains
semantics) and 3NF (both have the same color)?

The trick is to think for such generic qualities in terms of 'color of'.
IOW, the color of a Washing Machine is chartreuse and the color of a
Refrigerator is chartreuse. Thus the color of a Washing Machine is not
semantically the same as the color of a Refrigerator even though the
value is the same. Similarly, we have:

[Appliance]
+ color
A
|
+--------+-------------+
| |
[Refrigerator] [Washing Machine]

The notion of the color of an Appliance is something shared; it raises
the level of abstraction of color of a Refrigerator and color of a
Washing Machine to a common ground for both. Since a Refrigerator is an
Appliance, it has the color attribute.

This example, though, underscores an important difference between
normalization applied to OO Class Models and the Data Models used for
RDB Schemas. In the OO case Refrigerator can implement a different data
domain than a Washing Machine for the 'color' attribute (i.e., they
aren't required to be color coordinated). That is not true in Data
Models where there will be exactly one data domain for 'color'.

The reason is that in Data Modeling the [Appliance] table is
instantiated separately from the subclass tables and the subclass tables
do not have a 'color' attribute. So there is only one attribute in one
table with one data domain.

In contrast, in the OO Class Model the superclasses cannot be
instantiated separately so an object resolves the entire tree. Thus the
object identity /includes/ the [Appliance] properties. In addition, the
Class Model only identifies the responsibility (i.e., What it is); it
does not define its implementation (i.e., How it does it).

So Refrigerator and Washing Machine can provide different
implementations of the 'color' attribute, such as different data
domains. IOW, we resolve object identity at the leaf level of the tree
through inheritance. Thus unique data domains can be associated with
subclasses even though the responsibility is identified in a superclass.


*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
h...@pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
in...@pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH

v4vijayakumar

unread,
May 7, 2007, 1:01:59 AM5/7/07
to
On May 3, 11:55 pm, "H. S. Lahman" <h.lah...@verizon.net> wrote:
> Responding to V4vijayakumar...
>
> I really hope this was not a homework question... B-)
>

no.

> > How object-oriented design can be normalized? Normalization, anyway
> > related to OOD?
>
> The Class Model in UML is underlain by the same relational data model
> branch of set theory that underlies DBMSes. As a result the Class Model
> needs to normalized to Third Normal Form just like an RDB schema.
> [Normal Forms above third are rarely relevant because they mostly deal
> with identifier conventions and we rarely use explicit identifiers for
> objects.]
>
> Most OOA/D authors don't talk explicitly about normalization. However,
> every OOA/D author will provide a suite of rules for constructing Class
> Models that essentially ensure Third Normal Form under the guise of
> things like one-fact-one-place. For example, the most comprehensive book
> available on Class Modeling is Leon Starr's "Executable UML: How to
> Build Class Models". Leon doesn't even mention Normal Form as far as I
> recall, yet he provides the most comprehensive set for guidelines for
> normalization that I have seen.
>

...

It seems difficult to visualize _normal_ OOD. In relational context,
we only
have data, but in OOD/A context, we have many constructs to consider,
like data, operation, object, responsibility, etc.

I don't know this would make any difference, anyhow, not all data
inside
objects are going to be stored into persistence store, like relational
data.

H. S. Lahman

unread,
May 7, 2007, 11:26:24 AM5/7/07
to
Responding to V4vijayakumar...

> It seems difficult to visualize _normal_ OOD. In relational context,
> we only
> have data, but in OOD/A context, we have many constructs to consider,
> like data, operation, object, responsibility, etc.

An object is defined in terms of the properties that it has. The valid
properties happen to be responsibilities to know something and
responsibilities to do something. In an RDB, the only properties are to
know something.

However, the relational model still works quite well if one abstracts to
the level of /properties/ rather than data values. What one normalizes
in an RDB schema are the semantics of properties that happen to be data
values. The same notion of normalization can be applied to the semantics
of a more general notion of 'property' that happens to include ADTs and
behaviors. In the end Normal Form simple describes dependencies of
properties on identity and each other.

> I don't know this would make any difference, anyhow, not all data
> inside
> objects are going to be stored into persistence store, like relational
> data.

That's true, but it just underscores the notion that one should solve
the customer problem first and then worry about how one gets/saves the
solution's data as a separate problem.

Phlip

unread,
May 8, 2007, 11:44:01 PM5/8/07
to
v4vijayakumar wrote:

> I don't know this would make any difference, anyhow, not all data
> inside
> objects are going to be stored into persistence store, like relational
> data.

There is a blazingly simple metric here, which some people are too smart to
bring themselves to utter:

a normalized OOD should not duplicate
any definitions of behavior

That's all there is to it.

In the book /Design Patterns/, for example, you can explain the pattern as a
way to put the common behavior into the controlling, mediating, or abstract
objects, and put all the diverse behavior into the leaf or concrete classes.
Each pattern can be seen as a way to apportion behavior such all the
decisions and calculations are each unique throughout a design.

When you ask about data motion, the motion itself is behavior, and our
database wrappers tend to take care of it. The data is state, not behavior,
so it duplicates freely.

--
Phlip
http://flea.sourceforge.net/PiglegToo_1.html


0 new messages