Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Help! Difficulty understanding DB -> Object mapping

0 views

Skip to first unread message

usenet.ne...@gmail.com

unread,

May 7, 2005, 3:48:26 PM5/7/05

Hi I am hoping you cann help.

I have a simple system which allows orders to be created for customers.
I have studied various patterns and the like, and I have the following
data structure in the DB:

Order
=====
Order_ID
Customer_ID
Customer_Address_ID
...

OrderLineItem
=============
Order_ID
Product_ID
Qty
Price
Tax
TotalPrice

I have a number of questions. Is it normal to:

1. Store the full chosen billing and shipping address with the Order,
though the addresses are stored in a separate table?

2. Store the full product details with a Line Item? I currently do not
store the title of the product, just its id.

If I store all of these details in the DB structure, I must still load
them into objects. My problem currently is in loading a customers
order. I dont know where to join the product in the line items to the
actual product descriptions. Do I do a DB join to get the ProductTitle
back, and put the product title in my OrderLineItem object? Or do I get
back the order line items, and do a secondary search for the product
based on its ID, load the product, and associate teh product to the
order obejct? I'm confused, and concerned about performances problem.

Any advice?

Mike Austin

unread,

May 8, 2005, 3:57:00 AM5/8/05

usenet.ne...@gmail.com wrote:
> Hi I am hoping you cann help.
>
> I have a simple system which allows orders to be created for customers.
> I have studied various patterns and the like, and I have the following
> data structure in the DB:
>
> Order
> =====
> Order_ID
> Customer_ID
> Customer_Address_ID
> ...
>
> OrderLineItem
> =============
> Order_ID
> Product_ID
> Qty
> Price
> Tax
> TotalPrice
>
> I have a number of questions. Is it normal to:
>
> 1. Store the full chosen billing and shipping address with the Order,
> though the addresses are stored in a separate table?

In situations where you need historical accuracy, you need to copy the
data to the order tables. Once an order is processed, changing
product's title or price, or even worse, deleting products should not
harm the orders.

> 2. Store the full product details with a Line Item? I currently do not
> store the title of the product, just its id.

Copy anything that may be changed You can still keep the IDs for reference.

> If I store all of these details in the DB structure, I must still load
> them into objects. My problem currently is in loading a customers
> order. I dont know where to join the product in the line items to the
> actual product descriptions. Do I do a DB join to get the ProductTitle
> back, and put the product title in my OrderLineItem object? Or do I get
> back the order line items, and do a secondary search for the product
> based on its ID, load the product, and associate teh product to the
> order obejct? I'm confused, and concerned about performances problem.

The fewer the queries the better of course. You should be able to do
that in one swoop. I'm a little rusty on my SQL, but something like this:

select p.Product_Desc, o.Line_Item
from PRODUCTS as p, ORDER_ITEMS as o
where p.Product_ID == o.Product_ID
and o.Order_ID == <order#>

But since you will be copying data into the orders' table, you won't
need this.

Mike

frebe

unread,

May 8, 2005, 4:15:02 AM5/8/05

> 1. Store the full chosen billing and shipping address with the Order,
> though the addresses are stored in a separate table?
> 2. Store the full product details with a Line Item? I currently do
not
> store the title of the product, just its id.

"Storing" is only done in the tables, and the table structure must be
nomalized. If you are suggesting the line item object should have a
product title property, I could have, but it should delegate the call
to the product object.

Whether to join the product table or not, depends on the context. Do
you need the product information or not. In a client scenario you
probably need it, but in a batch scenario you probably don't. It is
impossible to have only one load method for the line item object.

> Do I do a DB join to get the ProductTitle
> back, and put the product title in my OrderLineItem object? Or do I
get
> back the order line items, and do a secondary search for the product
> based on its ID, load the product, and associate teh product to the
> order obejct?

This kind of problems is one of the main reasons for the failure of
entity beans in EJB.

My advice for you is to either skipping your domain objects such as
line item and product. When you need line item, write a SQL that
selects the information you need, and process the information using the
query result directly.

The other solution is to use a framework that allows you to define your
query in a OO way, and is able to automatically transform the query
(with or without joins) result to a hierachical object set.

I don't consider writing multiple load and find methods as an option.
It is a read code bloat. If you want to have domain objects and don't
use a competent framework, you have to sacrifice performance.

Fredrik Bertilsson
http://butler.sourceforge.net

usenet.ne...@gmail.com

unread,

May 8, 2005, 5:55:57 AM5/8/05

> harm the orders.
>

I realised this shortly after posting this message. Then I get
confused how this data is used with a Product - do I even think to link
up a Product object in the domain?

At the domain level, how do I store this information? Do I link to an
Product object? I am doing something similar to above, but
OrderLineItem doesnt have a Product link. This seems odd, as it makes
the Order/OrderLineItem work less like domain objects - this could be
my lack of expereince.

> Mike

usenet.ne...@gmail.com

unread,

May 8, 2005, 6:05:47 AM5/8/05

frebe wrote:
> > 1. Store the full chosen billing and shipping address with the
Order,
> > though the addresses are stored in a separate table?
> > 2. Store the full product details with a Line Item? I currently do
> not
> > store the title of the product, just its id.
>
> "Storing" is only done in the tables, and the table structure must be
> nomalized. If you are suggesting the line item object should have a
> product title property, I could have, but it should delegate the call
> to the product object.
>

Mike suggestion earlier states this too.

> > If I store all of these details in the DB structure, I must still
> load
> > them into objects. My problem currently is in loading a customers
> > order. I dont know where to join the product in the line items to
> the
> > actual product descriptions.
>
> Whether to join the product table or not, depends on the context. Do
> you need the product information or not. In a client scenario you
> probably need it, but in a batch scenario you probably don't. It is
> impossible to have only one load method for the line item object.
>

I do need to record the title of the product with the line item, that
seems for sure price too. When the customer accesses orders from his
history, he can view the line items of orders, select the products to
go view them, and even download parts of the product (online). It
seems that from the application, it is necessary to load the Product
object from somewhere - do I do that within my repository/mapper layer
(Fowler) by calling onto the Product mapper passing an ID? This seems
like too many queries...

> > Do I do a DB join to get the ProductTitle
> > back, and put the product title in my OrderLineItem object? Or do I
> get
> > back the order line items, and do a secondary search for the
product
> > based on its ID, load the product, and associate teh product to the
> > order obejct?
>
> This kind of problems is one of the main reasons for the failure of
> entity beans in EJB.
>
> My advice for you is to either skipping your domain objects such as
> line item and product. When you need line item, write a SQL that
> selects the information you need, and process the information using
the
> query result directly.
>

1. Use Order/OrderLineItems as stated, with data that may change copied
into the LineItem.
2. To view Order contents at the UI, have a different structure to
bring back the necessary data to be viewed, be that a resultset or
other information object?

Is this what you mean?

> The other solution is to use a framework that allows you to define
your
> query in a OO way, and is able to automatically transform the query
> (with or without joins) result to a hierachical object set.
>

I am using .NET - could you recommend somehting?

> I don't consider writing multiple load and find methods as an option.
> It is a read code bloat. If you want to have domain objects and don't
> use a competent framework, you have to sacrifice performance.
>

This is what I am realising, with some difficulty pain.

> Fredrik Bertilsson
> http://butler.sourceforge.net

frebe

unread,

May 8, 2005, 7:16:28 AM5/8/05

> I do need to record the title of the product with the line item, that
> seems for sure price too.

Denormalizing data might only be a good idea in datawarehouse
solutions. In your transactional database, the schema must be
normalized. The product title (and price) can be retrieved by a very
simple join.

> with data that may change copied into the LineItem.

If you need history in your database, you change the schema to this:
OrderLineItem(Order_ID, Product_ID, ProductVersion,...)
Product(Product_ID, ....)
ProductVersion(ProductVersion, title, price, ....)

Copying all product data into every order line would cause you a data
bloat.

> It seems that from the application, it is necessary to load the
Product
> object from somewhere

>From the database, direct (using ADO) or indirect.

> This seems like too many queries...

What is the problem with this? Queries is a good way to fetch the data
you want for the current context.

> To view Order contents at the UI, have a different structure to
> bring back the necessary data to be viewed, be that a resultset or
> other information object?

Using resultsets is the simpliest solution, but it is not type-safe and
error-prone (column names are types as strings). The domain object
solution, will force you to write many find- and load-methods, unless
you use a competent O/R mapper.

> I am using .NET - could you recommend somehting?

No, I am using java. But I have heard that ADO is more capable than
JDBC, so I would probably use ADO directly.

Fredrik Bertilsson
http://butler.sourceforge.net

usenet.ne...@gmail.com

unread,

May 8, 2005, 9:26:29 AM5/8/05

frebe wrote:
> > I do need to record the title of the product with the line item,
that
> > seems for sure price too.
>
> Denormalizing data might only be a good idea in datawarehouse
> solutions. In your transactional database, the schema must be
> normalized. The product title (and price) can be retrieved by a very
> simple join.
>
> > with data that may change copied into the LineItem.
>
> If you need history in your database, you change the schema to this:
> OrderLineItem(Order_ID, Product_ID, ProductVersion,...)
> Product(Product_ID, ....)
> ProductVersion(ProductVersion, title, price, ....)
>
> Copying all product data into every order line would cause you a data
> bloat.
>

This sounds like a reasonable idea. Are you putting the main
changeable details in ProductVersion (it seems that way but want to
check I am reading properly).

> > It seems that from the application, it is necessary to load the
> Product
> > object from somewhere
>
> >From the database, direct (using ADO) or indirect.
>
> > This seems like too many queries...
>
> What is the problem with this? Queries is a good way to fetch the
data
> you want for the current context.
>

The issue with queries which I thought you were referencing, was that I
will need to keep going back to the database to load up all of the
different objects referenced by the Order/OrderLineItem. From my
computer science course I have learnt that many journeys to the
database is a performance issue. Would you agree?

> > To view Order contents at the UI, have a different structure to
> > bring back the necessary data to be viewed, be that a resultset or
> > other information object?
>
> Using resultsets is the simpliest solution, but it is not type-safe
and
> error-prone (column names are types as strings). The domain object
> solution, will force you to write many find- and load-methods, unless
> you use a competent O/R mapper.
>

Yes, this is what I have.

> > I am using .NET - could you recommend somehting?
>
> No, I am using java. But I have heard that ADO is more capable than
> JDBC, so I would probably use ADO directly.
>

Fredrik thankyou.

> Fredrik Bertilsson
> http://butler.sourceforge.net

frebe

unread,

May 8, 2005, 10:50:40 AM5/8/05

>> If you need history in your database, you change the schema to this:
>> OrderLineItem(Order_ID, Product_ID, ProductVersion,...)
>> Product(Product_ID, ....)
>> ProductVersion(ProductVersion, title, price, ....)

> This sounds like a reasonable idea. Are you putting the main
> changeable details in ProductVersion (it seems that way but want to
> check I am reading properly).

Yes, every attribute to a product that you want to track history
changes for, should be in the ProductVersion table.

> The issue with queries which I thought you were referencing, was that
I
> will need to keep going back to the database to load up all of the
> different objects referenced by the Order/OrderLineItem. From my
> computer science course I have learnt that many journeys to the
> database is a performance issue. Would you agree?

Yes, I agree fully. Joining all data is much better.

Fredrik Bertilsson
http://butler.sourceforge.net

Mike Austin

unread,

May 8, 2005, 3:29:09 PM5/8/05

frebe wrote:
>>I do need to record the title of the product with the line item, that
>>seems for sure price too.
>
>
> Denormalizing data might only be a good idea in datawarehouse
> solutions. In your transactional database, the schema must be
> normalized. The product title (and price) can be retrieved by a very
> simple join.
>
>
>>with data that may change copied into the LineItem.
>
>
> If you need history in your database, you change the schema to this:
> OrderLineItem(Order_ID, Product_ID, ProductVersion,...)
> Product(Product_ID, ....)
> ProductVersion(ProductVersion, title, price, ....)
>
> Copying all product data into every order line would cause you a data
> bloat.

I see your point here, but orders may be volatile - you may want to
change a description or some other data per customer per order - maybe a
customized version of a product. You don't want to add a new product
for every small one-off change.

Also, if you basically move everything to ProductVersion, you might as
well consolidate the versions into one table, no?

PRODUCT: Product_ID, Product_Version, Title, Price
ORDER_ITEM: Order_ID, Product_ID, Product_Version

Mike

H. S. Lahman

unread,

May 8, 2005, 4:52:52 PM5/8/05

Responding to Usenet.news.account...

My response assumes that your application does something beyond basic
USER/CRUD pipeline processing. If that's all it does, then I am not
sure an OO approach is the best one. A RAD solution or some pseudo-OO
layered model infrastructure would probably be a better choice and you
can ignore this response.

> I have a simple system which allows orders to be created for customers.
> I have studied various patterns and the like, and I have the following
> data structure in the DB:
>
> Order
> =====
> Order_ID
> Customer_ID
> Customer_Address_ID
> ...
>
> OrderLineItem
> =============
> Order_ID
> Product_ID
> Qty
> Price
> Tax
> TotalPrice
>
> I have a number of questions. Is it normal to:

The first thing to do is forget about what the RDB looks like. That is
only relevant to accessing persistence. Solve the customer's problem
first and then worry about how objects in that solution will map into
tables in the RDB.

>
> 1. Store the full chosen billing and shipping address with the Order,
> though the addresses are stored in a separate table?

The RDB schema you suggest above violates 3NF. The customer address
depends on the Customer identity, not the Order identity. If addresses
are stored in a separate table, then the Customer_Address_ID should be
in the Customer table, not the Order table.

However, this illustrates why one should solve the software problem
first. One can conceive of contexts where Customer and Address are not
separate objects. That is, if one processes one order at a time in a
POS client, then the customer name and address could be unique to the
Order and both are just attributes of that one entity. That's because
the fact that the customer may have 14 different shipping addresses in
10 states doesn't matter; the POS system only "sees" one shipping
address for the order.

>
> 2. Store the full product details with a Line Item? I currently do not
> store the title of the product, just its id.

In the RDB there will undoubtedly be a separate table for Product.
That's because things like product descriptions are unique to the
Product, not the OrderLineItem. Some Product attributes, like
description, would appear redundantly in OrderLineItems from different
Orders while other Product attributes, like screw thread size, might be
unique to a particular Product or class of products and wouldn't appear
at all in some OrderLineItems.

OTOH, in the client application there might be good reason for having at
least some of the Product information in OrderLineItem. For example,
for display in a Shopping Cart one would want the details and they would
be associated uniquely with the OrderLineItem. That's OK because the
Shopping Cart abstraction <usually> constrains the problem so that there
would be only one OrderLineItem per Product (i.e., multiples are handled
solely through Quantity). IOW, {Order_ID, Product_ID} is equivalent to
{Order_ID, Order_Item_ID}.

>
> If I store all of these details in the DB structure, I must still load
> them into objects. My problem currently is in loading a customers
> order. I dont know where to join the product in the line items to the
> actual product descriptions. Do I do a DB join to get the ProductTitle
> back, and put the product title in my OrderLineItem object? Or do I get
> back the order line items, and do a secondary search for the product
> based on its ID, load the product, and associate teh product to the
> order obejct? I'm confused, and concerned about performances problem.

Deal with persistence in a subsystem or layer that understands the
mapping rules (i.e., the mapping of object/attribute identity to
table/field identity). That allows you to address performance problems
in RDB access in that subsystem or layer so that it does not affect your
problem solution logic.

The interface to the subsystem is designed around the problem solution's
view of the data. So if the problem solution has a single object for
Order with attributes for {customer_address, customer_city, etc.) it
would request, say, saving that data via saveOrder (order_ID,
customer_name, customer_address, ....) with all the attributes in the
Order object. The persistence access subsystem needs to map that notion
of Order into the actual tables in the RDB.

To do that the persistence access subsystem needs to know more about
identity. For example, it needs a Customer_ID. One way to do that is
via a table lookup in the persistence access subsystem between
customer_name and customer_ID. Such a table could be created
efficiently at startup with a single RDB query. That would save asking
the RDB for the customer_ID given the customer_name just so that one can
write a SQL query using customer_ID.

Alternatively, if the customer data usually comes from the RDB
originally (e.g., the RDB is read to get get the default customer
information to populate the form), then it might be more convenient for
the Order object to store customer_ID when the data is read. In that
case it is a handle just like a Window handle in a GUI -- just something
one hands back to the persistence access subsystem to keep identity
synchronized.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
h...@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

frebe

unread,

May 9, 2005, 2:39:34 AM5/9/05

> I see your point here, but orders may be volatile - you may want to
> change a description or some other data per customer per order -
maybe a
> customized version of a product.

Yes, if you are producing customized products, this is probably a good
design.

> Also, if you basically move everything to ProductVersion, you might
as
> well consolidate the versions into one table, no?
> PRODUCT: Product_ID, Product_Version, Title, Price
> ORDER_ITEM: Order_ID, Product_ID, Product_Version

I would still have a product table with only Product_ID as primary key.
There might be information for which you don't need to track history
changes. And a customer selects a product, he/she selects a Product_ID.
The customer can not select version. He/she will always get the latest
version.

Fredrik Bertilsson
http://butler.sourceforge.net

usenet.ne...@gmail.com

unread,

May 9, 2005, 10:13:15 AM5/9/05

ola.

H. S. Lahman wrote:
> Responding to Usenet.news.account...
>
> My response assumes that your application does something beyond basic

> USER/CRUD pipeline processing. If that's all it does, then I am not
> sure an OO approach is the best one. A RAD solution or some
pseudo-OO
> layered model infrastructure would probably be a better choice and
you
> can ignore this response.
>

CRUD is certainly part of it, not all of it however.

Asked my boss about this. Billing address details are stored against
the customer order, in case the customer deletes addresses from their
account. Reviewing old orders would still require the address to be
shown, and so linking would not be validity. From other mailings it
would seem maybe a Order to OrderShipping table could be used, to
remove the duplication (Order_Id, Customer_ID, Customer_Address_ID)?

> However, this illustrates why one should solve the software problem
> first. One can conceive of contexts where Customer and Address are
not
> separate objects. That is, if one processes one order at a time in a

> POS client, then the customer name and address could be unique to the

> Order and both are just attributes of that one entity. That's
because
> the fact that the customer may have 14 different shipping addresses
in
> 10 states doesn't matter; the POS system only "sees" one shipping
> address for the order.
>

This is eCommerce site using object orientation.

> >
> > 2. Store the full product details with a Line Item? I currently do
not
> > store the title of the product, just its id.
>
> In the RDB there will undoubtedly be a separate table for Product.
> That's because things like product descriptions are unique to the
> Product, not the OrderLineItem. Some Product attributes, like
> description, would appear redundantly in OrderLineItems from
different
> Orders while other Product attributes, like screw thread size, might
be
> unique to a particular Product or class of products and wouldn't
appear
> at all in some OrderLineItems.
>

Yes, this is the case. The products are simple, and there are no class
of products in this system. If the order references a product whose
name changes, the old orders need to show the actual name used when
ordered. The Version concept by Fredrick sounds good do you think?

> OTOH, in the client application there might be good reason for having
at
> least some of the Product information in OrderLineItem. For example,

> for display in a Shopping Cart one would want the details and they
would
> be associated uniquely with the OrderLineItem. That's OK because the

> Shopping Cart abstraction <usually> constrains the problem so that
there
> would be only one OrderLineItem per Product (i.e., multiples are
handled
> solely through Quantity). IOW, {Order_ID, Product_ID} is equivalent
to
> {Order_ID, Order_Item_ID}.
>

My Cart has CartItem objects. CartItem has Product object, and
quantity. This is same but different problem. Viewing old orders, I
must produce product related data for the order. If I begin believnig
I dont need to store the product items in the order, I must hook up
OrderLineItem objects to Product objects, and this is now the essence
of my posting earlier. For performance, how do I do this as best as
possible using Objects?

> >
> > If I store all of these details in the DB structure, I must still
load
> > them into objects. My problem currently is in loading a customers
> > order. I dont know where to join the product in the line items to
the
> > actual product descriptions. Do I do a DB join to get the
ProductTitle
> > back, and put the product title in my OrderLineItem object? Or do I
get
> > back the order line items, and do a secondary search for the
product
> > based on its ID, load the product, and associate teh product to the
> > order obejct? I'm confused, and concerned about performances
problem.
>
> Deal with persistence in a subsystem or layer that understands the
> mapping rules (i.e., the mapping of object/attribute identity to
> table/field identity). That allows you to address performance
problems
> in RDB access in that subsystem or layer so that it does not affect
your
> problem solution logic.
>

OK..Fine. The defaulting of this today is that datasets come back
mixed with OrderLineItem info. and Product info. The ProductMapper is
requested to return the Product object for the Order, but the fearing I
have is that ProductMapper gets a resultset full of OrdeRLineItem and
Product data, and the SQL query for Orders has to know all about
Products. I realized that I change the Product table, I need to change
Order related queries! Does this at all sonud fine?

> The interface to the subsystem is designed around the problem
solution's
> view of the data. So if the problem solution has a single object for

> Order with attributes for {customer_address, customer_city, etc.) it
> would request, say, saving that data via saveOrder (order_ID,
> customer_name, customer_address, ....) with all the attributes in the

> Order object. The persistence access subsystem needs to map that
notion
> of Order into the actual tables in the RDB.
>

Objects in upper system, other data (and IDs) below this system? What
I passion for is not to return to the DB more than once, even though
other data might be needed in this lower system. How does this work?

> To do that the persistence access subsystem needs to know more about
> identity. For example, it needs a Customer_ID. One way to do that
is
> via a table lookup in the persistence access subsystem between
> customer_name and customer_ID. Such a table could be created
> efficiently at startup with a single RDB query. That would save
asking
> the RDB for the customer_ID given the customer_name just so that one
can
> write a SQL query using customer_ID.

In this subsystem are you in the DB or in a lower part of the OO side
of the design? Confusion.

>
> Alternatively, if the customer data usually comes from the RDB
> originally (e.g., the RDB is read to get get the default customer
> information to populate the form), then it might be more convenient
for
> the Order object to store customer_ID when the data is read. In that

> case it is a handle just like a Window handle in a GUI -- just
something
> one hands back to the persistence access subsystem to keep identity
> synchronized.
>

Order is linked to Account. Order goes down into OrderMapper and
OrderMapper knows Account.ID is the customers ID to be used for the
table linkaging.

Does what I say make sense? Am I being stupid?

H. S. Lahman

unread,

May 9, 2005, 2:25:41 PM5/9/05

Responding to Usenet.news.account...

It still violates 3NF. I also don't buy the address-deleted argument.
Presumably the address is deleted because it is no longer valid, so why
would one want to use that rather that the address currently associated
with the Customer?

[However, one could make a good case for a Shipping Address being
associated with the Order because the Customer might want the Order
shipped to some place special. But the implication here was that this
was the Customer's business address.]

Note that reviewing orders is not relevant. Typically one needs more
information to provide context for such reviews so one will be accessing
the RDB using table joins anyway. The issue here is how to store the
data in the RDB for general access unambiguously and in a manner best
suited to ensuring data integrity, not how to display it for special
purposes.

>>However, this illustrates why one should solve the software problem
>>first. One can conceive of contexts where Customer and Address are not
>>separate objects. That is, if one processes one order at a time in a
>>POS client, then the customer name and address could be unique to the
>>Order and both are just attributes of that one entity. That's because
>>the fact that the customer may have 14 different shipping addresses in
>>10 states doesn't matter; the POS system only "sees" one shipping
>>address for the order.
>>
>
>
> This is eCommerce site using object orientation.

OK. But the number of objects still depends on how one needs to
abstract the notion of Order for the particular business context. The
problem in hand may only need a simplistic view of Order data compared
to the view of an enterprise database. If the problem solution never
has any need to think of "a customer" as anything more than a name and
address on an Order, then one does not need to abstract a Customer
entity; there is no ambiguity if the name and address are regarded as
fully dependent on the Order_ID. That's because at that level of
abstraction they are.

[Or it may need a more complex view. A telephone number is a classic
example. Most DBAs will define the level of abstraction of the RDB
schema so that a telephone number is a simple data domain for 1NF.
However, some applications need to view the number in terms of {area
code, exchange, number}. In that case the number is not a simple domain
and must be expressed as multiple attributes in the application.]

>>>2. Store the full product details with a Line Item? I currently do not
>>>store the title of the product, just its id.
>>
>>In the RDB there will undoubtedly be a separate table for Product.
>>That's because things like product descriptions are unique to the
>>Product, not the OrderLineItem. Some Product attributes, like
>>description, would appear redundantly in OrderLineItems from different
>>Orders while other Product attributes, like screw thread size, might be
>>unique to a particular Product or class of products and wouldn't appear
>>at all in some OrderLineItems.
>>
>
>
> Yes, this is the case. The products are simple, and there are no class
> of products in this system. If the order references a product whose
> name changes, the old orders need to show the actual name used when
> ordered. The Version concept by Fredrick sounds good do you think?

I think Bertilsson was addressing a different problem. If things like
the Product description change over time AND you have a need to keep
track of that when referring the Orders, then some kind of versioning
will be necessary. However, that seems unusual; usually such basic
changes would be reflected in a new Product_ID with some linkage for
equivalent Products that may be substituted when fulfilling an Order.
That is one would have

Product:
Product_ID
Description
...
Equivalent_Product_ID // or Obsolete_Product_Id or whatever

That is, one would have a conditional reflexive relationship on the
Product table to find equivalents, replacements, etc.

In your context here the correct RDB approach is to just provide the
foreign key for Product_ID in the OrderLineItem table. But apropos of
my original point, in the /application/ one may need the joined view
with information from both tables. Whether one expresses that in the
application with separate, related objects or a single object reflecting
the join depends on what problem one is trying to solve.

>>OTOH, in the client application there might be good reason for having at
>>least some of the Product information in OrderLineItem. For example,
>>for display in a Shopping Cart one would want the details and they would
>>be associated uniquely with the OrderLineItem. That's OK because the
>>Shopping Cart abstraction <usually> constrains the problem so that there
>>would be only one OrderLineItem per Product (i.e., multiples are handled
>>solely through Quantity). IOW, {Order_ID, Product_ID} is equivalent to
>>{Order_ID, Order_Item_ID}.
>>
>
>
> My Cart has CartItem objects. CartItem has Product object, and
> quantity. This is same but different problem. Viewing old orders, I
> must produce product related data for the order. If I begin believnig
> I dont need to store the product items in the order, I must hook up
> OrderLineItem objects to Product objects, and this is now the essence
> of my posting earlier. For performance, how do I do this as best as
> possible using Objects?

This is the problem Bertilsson was addressing. Once again, the way one
handles it (e.g., versioning vs. reflexive relationship) in the RDB and
the way one handles it in the application may be different. The RDB
view will be created <hopefully> to optimize the RDB data storage and
general purpose access. The application view, though, will be created
to optimize the solution to the problem in hand. One must then provide
a mapping between them that allows the RDB to accessed efficiently,
given its view. Solve the application problem first, then worry about
the mapping, and finally worry about efficient RDB access from the
subsystem.

Note that your application may not need to know anything about changes
to the database. From its viewpoint one may only want to display what
the customer /originally/ ordered. Then it would be up to the RDB
access subsystem to properly access the RDB to obtain the information,
regardless of how the RDB actually stored it.

OTOH, it seems more likely that one needs both the original and current
views because, in effect, a substitution has been made from the
customer's viewpoint and that is probably of interest when communicating
with the customer. In that case the application will have some way to
organize both views _that is convenient for the processing in hand_.
The job of the DB access subsystem is to map those views into whatever
views the DBA decided to use in the RDB schema.

>>>If I store all of these details in the DB structure, I must still load
>>>them into objects. My problem currently is in loading a customers
>>>order. I dont know where to join the product in the line items to the
>>>actual product descriptions. Do I do a DB join to get the ProductTitle
>>>back, and put the product title in my OrderLineItem object? Or do I get
>>>back the order line items, and do a secondary search for the product
>>>based on its ID, load the product, and associate teh product to the
>>>order obejct? I'm confused, and concerned about performances problem.
>
>>Deal with persistence in a subsystem or layer that understands the
>>mapping rules (i.e., the mapping of object/attribute identity to
>>table/field identity). That allows you to address performance problems
>>in RDB access in that subsystem or layer so that it does not affect your
>>problem solution logic.
>>
>
>
> OK..Fine. The defaulting of this today is that datasets come back
> mixed with OrderLineItem info. and Product info. The ProductMapper is
> requested to return the Product object for the Order, but the fearing I
> have is that ProductMapper gets a resultset full of OrdeRLineItem and
> Product data, and the SQL query for Orders has to know all about
> Products. I realized that I change the Product table, I need to change
> Order related queries! Does this at all sonud fine?

Optimizing things like SQL queries depends upon the specific RDB schemas
and the way the particular applications needs to access data. (The
design issues for this could occupy an entire college-level course.) So
there is no single magic answer. However, one can go a long way by
realizing a few simple things:

(1) Fewer DB accesses are better than many.

(2) Virtual memory is cheap.

(3) DB access time will usually be the major bottleneck in the
application unless it is a scientific application or does dazzling graphics.

(4) DBAs hate long transactions (multiple reads/writes while locking the
data) because they tie up resources.

(5) Accesses using joins should be minimized.

Basically what this means is that one should grab as much data as one is
/likely/ to need all at once even if sometimes it isn't all used. The
bottleneck is getting it into memory as datasets; extracting particular
information from the datasets in memory is the fast part. Prefer single
complex queries to multiple simple queries. If possible, create
specialized indices and store "compiled" joins for queries that are
commonly used. When mapping to the problem solution's needs when
performance is a big problem, look for ways to use write caching or
anticipatory reads. Cache requests rather than opening long
transactions whenever possible, especially if the data is from user
keyboard entry.

>>The interface to the subsystem is designed around the problem solution's
>>view of the data. So if the problem solution has a single object for
>>Order with attributes for {customer_address, customer_city, etc.) it
>>would request, say, saving that data via saveOrder (order_ID,
>>customer_name, customer_address, ....) with all the attributes in the
>>Order object. The persistence access subsystem needs to map that notion
>>of Order into the actual tables in the RDB.
>>
>
>
> Objects in upper system, other data (and IDs) below this system? What
> I passion for is not to return to the DB more than once, even though
> other data might be needed in this lower system. How does this work?

What I am talking about here is a variation on the layered models used
for RAD processing. One deals with accessing the RDB in a dedicated
layer or subsystem that is encapsulated behind a generic data transfer
interface. One way to think of the interface is that it is a GoF Facade
pattern. That interface class takes the problem solution's requests and
dispatches them to the objects in the subsystem/layer that model the RDB
view.

>>To do that the persistence access subsystem needs to know more about
>>identity. For example, it needs a Customer_ID. One way to do that is
>>via a table lookup in the persistence access subsystem between
>>customer_name and customer_ID. Such a table could be created
>>efficiently at startup with a single RDB query. That would save asking
>>the RDB for the customer_ID given the customer_name just so that one can
>>write a SQL query using customer_ID.
>
>
> In this subsystem are you in the DB or in a lower part of the OO side
> of the design? Confusion.

The subsystem is part of the application. (However, when done properly
such subsystems are often reusable across applications.) However, its
subject matter is the RDB paradigm rather than the business paradigm
being applied in the rest of the application. So while the problem
solution in the rest of the application abstracts in terms of Customer,
Order, and Product objects, the RDB access subject matter abstracts in
terms of Table, Tuple, and Dataset objects. IOW, though the subsystem
lives in the application, its mission in life is to understand the RDB
paradigm and efficiently map the application's business semantics into
that view.

The notion of "lower" is not the same as in the RAD layered models.
Here it really reflects a level of service. The RDB access subsystem
provides a low-level service for the problem solution: talking to the
database. Generally such subsystems will be at the same level of
abstraction as UI subsystems, whose mission is the low-level service of
talking to the user through the hardware. So in a RAD layered model the
UI and the RDB would be on opposite ends of the model, for more complex
applications that are on roughly the same level. Both encapsulate very
unique, low level computing space paradigms (e.g., GUI/browser/smoke
signals or RDB/OODB/flat files) so that the actual mechanisms are
completely transparent to the application solution.

>
>>Alternatively, if the customer data usually comes from the RDB
>>originally (e.g., the RDB is read to get get the default customer
>>information to populate the form), then it might be more convenient for
>>the Order object to store customer_ID when the data is read. In that
>>case it is a handle just like a Window handle in a GUI -- just something
>>one hands back to the persistence access subsystem to keep identity
>>synchronized.
>>
>
>
> Order is linked to Account. Order goes down into OrderMapper and
> OrderMapper knows Account.ID is the customers ID to be used for the
> table linkaging.

Alas, OrderMapper sounds suspicious to me. It seems to be some sort of
god object that coordinates other objects. If so, that is not a very OO
way to do things. Object should abstract intrinsic knowledge and
behaviors from identifiable problem space (business) entities. One then
connects the dots for the overall problem solution by connecting those
responsibilities with peer-to-peer messages.

OrderMapper sounds like a role that a person would play if there was no
software rather than an identifiable business entity. But the software
is replacing the person by distributing what the person does among the
objects we abstract for the solution. When we abstract business
entities that are inanimate we anthropomorphize them with the behaviors
a person would have executed.

If we simply create an object to do everything the person would do we
concentrate the entire solution in that object. A major goal of OO
development is to manage complexity by distributing and encapsulating
the behaviors across many objects.

OTOH, the Facade pattern I suggested above does a very similar thing as
what you describe. The Facade would provide the mapping that knows that
an Account_ID in the Order is mapped into a Customer_ID in the Customer
table of the RDB when doing joins and whatnot. That is, the Facade
provides a mapping between identity in the application solution
semantics and the RDB semantics. So if that is all OrderMapper is
doing, then I applaud it. B-)

Daniel Parker

unread,

May 10, 2005, 1:29:05 AM5/10/05

"H. S. Lahman" <h.la...@verizon.net> wrote in message
news:FWNfe.6048$EC6.2329@trndny06...

>
> Optimizing things like SQL queries depends upon the specific RDB schemas
> and the way the particular applications needs to access data. (The design
> issues for this could occupy an entire college-level course.) So there is
> no single magic answer. However, one can go a long way by realizing a few
> simple things:
>
> (1) Fewer DB accesses are better than many.
>
> (2) Virtual memory is cheap.
>
> (3) DB access time will usually be the major bottleneck in the application
> unless it is a scientific application or does dazzling graphics.
>
> (4) DBAs hate long transactions (multiple reads/writes while locking the
> data) because they tie up resources.

Right. Keep transactions short, where possible.

>
> (5) Accesses using joins should be minimized.
>
> Basically what this means is that one should grab as much data as one is
> /likely/ to need all at once even if sometimes it isn't all used. The
> bottleneck is getting it into memory as datasets; extracting particular
> information from the datasets in memory is the fast part.

The problem here is that data loaded into memory has to be managed, it has
to be invalidated if data on the server has changed, and this is generally
not trivial. There is a strong case to be made for getting the data from
the DBMS each time, for simplicity. The performance is usually fine, DBMS's
are fast, and they cache too. Also there is the issue of volumes of data,
granted, memory is cheap, but with data the only numbers that count are
zero, one, and as many as you like.

> Prefer single complex queries to multiple simple queries.

Yes, but doesn't that contradict your advice to minimize joins? Most DBMS's
have been able to give good performance with joins for some time, although
in the past you had to have a pretty good knowledge of how the query was
executed in order to achieve that performance. When I worked with Sybase
many years ago, we fondly referred to the "query optimizer" as a "query
pessimizer", and the idea was to work around its shortcomings. But I think
most DBMS vendors today have pretty good support for joins.

> If possible, create specialized indices and store "compiled" joins for
> queries that are commonly used.

Not necessarily. "Compiled" queries, say in stored procedures, are
typically tokenized on the first invocation, and a query plan is computed
based on the passed parameters for that invocation. If those parameter
values are atypical, the query plan may be off. Besides, processors are so
fast these days that the time to compile a query is miniscule compared to
the time to retrieve the data.

> When mapping to the problem solution's needs when performance is a big
> problem, look for ways to use write caching or anticipatory reads. Cache
> requests rather than opening long transactions whenever possible,
> especially if the data is from user keyboard entry.
>

The problem with delayed updates is that the cached data may become stale.
Suppose you read a record, cache it, somebody else reads a record, changes a
field and saves it, then you make your save, and overwrite the other users
change. A common solution to that is optimistic locking. Basically, when
you read your cached data, you also read a last updated timestamp on the
record, and if you try to update when that timestamp has changed, your
update fails.

Regards,
Daniel Parker

H. S. Lahman

unread,

May 10, 2005, 1:54:44 PM5/10/05

Responding to Parker...

>>Basically what this means is that one should grab as much data as one is
>>/likely/ to need all at once even if sometimes it isn't all used. The
>>bottleneck is getting it into memory as datasets; extracting particular
>>information from the datasets in memory is the fast part.
>
>
> The problem here is that data loaded into memory has to be managed, it has
> to be invalidated if data on the server has changed, and this is generally
> not trivial. There is a strong case to be made for getting the data from
> the DBMS each time, for simplicity. The performance is usually fine, DBMS's
> are fast, and they cache too. Also there is the issue of volumes of data,
> granted, memory is cheap, but with data the only numbers that count are
> zero, one, and as many as you like.

I agree there is no free lunch. Achieving good RDB access performance
requires extra work in a particular application. But the OP expressed a
concern over performance, so I took that as a given. [I would add that
whoever writes the subsystem needs to understand the real trade-offs,
which are not always obvious. Fortunately such subsystems tend to be
highly reusable.]

However, the main thing I had in mind in the quoted paragraph is the
notion that if one has to access a Customer, say to verify the Customer
name provided in the UI, it will usually be more efficient to go ahead
and read all the Customer fields that /might/ be of interest later plus
any data in relevant related tables at that time rather than doing so
with separate accesses on an as-needed basis later. That sort of poor
man's caching is almost always worthwhile.

>>Prefer single complex queries to multiple simple queries.
>
>
> Yes, but doesn't that contradict your advice to minimize joins? Most DBMS's
> have been able to give good performance with joins for some time, although
> in the past you had to have a pretty good knowledge of how the query was
> executed in order to achieve that performance. When I worked with Sybase
> many years ago, we fondly referred to the "query optimizer" as a "query
> pessimizer", and the idea was to work around its shortcomings. But I think
> most DBMS vendors today have pretty good support for joins.

Yes, there is a contradiction (aka trade-off). But if one needs the
data, the join is going to have to be done anyway so the issue is when
and how. I also agree that nowadays any problems are more likely to be
a side-effect of the schema than any inherent problems in query
processing. But there is still a fixed overhead in simply processing a
multi-table query, which segues to...

>>If possible, create specialized indices and store "compiled" joins for
>>queries that are commonly used.
>
>
> Not necessarily. "Compiled" queries, say in stored procedures, are
> typically tokenized on the first invocation, and a query plan is computed
> based on the passed parameters for that invocation. If those parameter
> values are atypical, the query plan may be off. Besides, processors are so
> fast these days that the time to compile a query is miniscule compared to
> the time to retrieve the data.

True. But this just comes back to the notion that optimizing RDB access
is a unique subject matter and whoever implements the subsystem had
better be familiar with the problem domain. That's why I told the OP
that there was no single magic solution.

The performance issue you site here is more of an issue for the RDB
server than the application. The RDB may be processing thousands of
queries a second so those "minuscule" differences tend to add up and the
DBA will thank the application developers for being well behaved. In
addition, depending on how smart the DBMS is, storing the query could
result in permanent indices being set up to service it, which could
significantly improve performance. But, again, the subsystem developer
needs to know about that.

>>When mapping to the problem solution's needs when performance is a big
>>problem, look for ways to use write caching or anticipatory reads. Cache
>>requests rather than opening long transactions whenever possible,
>>especially if the data is from user keyboard entry.
>>
>
> The problem with delayed updates is that the cached data may become stale.
> Suppose you read a record, cache it, somebody else reads a record, changes a
> field and saves it, then you make your save, and overwrite the other users
> change. A common solution to that is optimistic locking. Basically, when
> you read your cached data, you also read a last updated timestamp on the
> record, and if you try to update when that timestamp has changed, your
> update fails.

Sure. Any time the application does its own caching, read or write, one
has to be aware of the risks and implement solutions to mitigate them.

0 new messages