design question

Chris Seidel

unread,

Sep 5, 2008, 6:03:09 AM9/5/08

to

Hi,

I need to store java objects in my oracle database. These objects have
serveral fields, e.g.

Object1: String f1; String f2; Date f3
Object1: String f1; BigDecimal f2; Boolean f3

e.g.
Object1("Hello", "World", 2008-09-05)
Object2("Foo", 100.20, true)

The database shall scale up at least to 100 million objects.

How to map this to the database?

A)

create table ObjectProperty (ID numeric(10), ObjectID numeric(10), Fieldname
varchar2(50), Fieldvalue varchar2(4000));

So each object is stored into multiple rows in this table, one row per
field.
Each value is converted to varchar.

1, 100, Hello
2, 100, World
3, 100, 2008-09-05
4, 101, Foo
5, 101, 100.20
6, 101, 1

B)

create table Object(ObjectID numeric(10), o1_f1 varchar(4000), o1_f2
varchar(4000), o1_f3 date, o2_f1 varchar(4000), o2_f2 numeric(30,2), o2_f3
numeric(1));

1, Hello, World, 2008-09-05, null, null, null
2, null, null, null, Foo, 100.20, 1

So each object is store in a single row.
Values are store in there native format.

This could be optimized that Object1 and Object2 share fields of the same
type.

What design is better A oder B and why?

Thank you!

Mark D Powell

unread,

Sep 5, 2008, 9:25:57 AM9/5/08

to

A relational database is best used to store data in a relational
format with attributes stored as using the appropriate data type so
design B is better because you are storing each attribute in its own
designated column of the correct database data type, that is, a
numeric value in a number data type, a date value in a date data type,
etc ... rather than storing everything as character as the first
design would require.

Storing data this way will make the data available via other means/
methods than just the java application layer with its object building
and handling logic.

IMHO -- Mark D Powell --

Chris Seidel

unread,

Sep 5, 2008, 9:54:07 AM9/5/08

to

Mark D Powell wrote:

> Storing data this way will make the data available via other means/
> methods than just the java application layer with its object building
> and handling logic.

It is not necessary to access the data otherwise than via the java
application.

What kinds of pros and cons have A and B in respect of performance and
scalability?
Are there any queries which can't be done well with A (range, sorting etc)?

Thank you

DA Morgan

unread,

Sep 5, 2008, 10:16:30 AM9/5/08

to

"Best" depends on what you are going to do with it other than store it.

"Best" depends on whether you require it reconstituted as XML at some
point in the future.

"Best" depends, to a lesser extent, on your version which you don't
state (3 decimal places).

My preference for reasons that range from performance to reporting to
storage to relational theory is to shred XML into its constituent
elements storing each with its proper data type.
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

gym dot scuba dot kennedy at gmail

unread,

Sep 5, 2008, 10:26:34 AM9/5/08

to

"Chris Seidel" <cse...@arcor.de> wrote in message
news:48c139ff$0$11091$9b4e...@newsspool1.arcor-online.net...

That's what most programmers say. Allow for the flexibility of the data
getting accessed by means other than your program. It will add value to
your efforts.
Jim

Chris Seidel

unread,

Sep 5, 2008, 10:44:59 AM9/5/08

to

DA Morgan wrote:
> "Best" depends on what you are going to do with it other than store
> it.

Query it. I have to find objects fast based on the field data (=, >, <,
between, like)

> "Best" depends on whether you require it reconstituted as XML at some
> point in the future.

No XML needed.

> "Best" depends, to a lesser extent, on your version which you don't
> state (3 decimal places).

A stores all as string, no problem.
For B a new column will be there numeric(x,y) with y >= 3.
But for 99 % of the objects 2 decimal places will be ok.

> My preference for reasons that range from performance to reporting to
> storage to relational theory is to shred XML into its constituent
> elements storing each with its proper data type.

So you would prefer A?
What about the many null columns in this approach?

gym dot scuba dot kennedy at gmail

unread,

Sep 5, 2008, 10:38:52 PM9/5/08

to

"Chris Seidel" <cse...@arcor.de> wrote in message

news:48c145ea$0$11093$9b4e...@newsspool1.arcor-online.net...

Store numbers as numbers, dates as dates, strings as strings etc. Not doing
so will cause you a lot of grief later. Also the optimizer can do things
for proper storage that improper storage won't allow it to do. For example,
with dates the optimizer knows how many possible entries are between two
dates if the dates are represented as dates. It cannot tell if the dates
are represented as strings (since there are a huge number of entries between
two string values).Same with integers and if you have a constraint on the
field the optimizer can take that into account. If you want to do
comparisons between date fields you are going to have to compare strings
converted to dates which can cause a full table scan. For example show me
the results where x rows are within three days of each other. (also you
have the potential of a date stored as a string being an improper date same
with number, eg November 31st, 2008. -0. etc. ) Your model A looks a lot
like the one table EVA model which is sub-optimal. (sucks).

Relational theory has been around for a long time. (over 25 years) It has
been highly effective for large and small businesses. It isa well known
pattern. Use what other people have learned to your advantage. Use the
correct tool for the correct job. Making the db a string bit bucket isn't
going to help your efforts.

Jim

DA Morgan

unread,

Sep 5, 2008, 10:51:58 PM9/5/08

to

Most things have NULL columns ... what's the issue?

If you don't store numbers as numbers you can not do math. If you do
not store dates as dates you can not do date math. Storing everything
as strings made sense on mainframes 20 years ago. It has no place in a
modern RDBMS.

Ana C. Dent

unread,

Sep 5, 2008, 11:19:29 PM9/5/08

to

"Chris Seidel" <cse...@arcor.de> wrote in

news:48c139ff$0$11091$9b4e...@newsspool1.arcor-online.net:

> It is not necessary to access the data otherwise than via the java
> application.

A more correct statement would be as follows:

I don't now think it necessary to access the data otherwise than via the
java application, but who knows what the future may require.

Palooka

unread,

Sep 6, 2008, 4:34:27 PM9/6/08

to

DA Morgan wrote:
> If you don't store numbers as numbers you can not do math. If you do
> not store dates as dates you can not do date math. Storing everything
> as strings made sense on mainframes 20 years ago. It has no place in a
> modern RDBMS.

Oh yes it does. They want to use the database as a bit bucket and write
a million lines of Java. Might as well use flat files for the data, and
save on the database licences.

Sorry, I am in a bad mood.

Palooka

Chris Seidel

unread,

Sep 7, 2008, 7:34:54 AM9/7/08

to

DA Morgan wrote:

> Most things have NULL columns ... what's the issue?

A table with lets say 100 columns and 90 % NULLs in each row is no problem
for the database performance and storage?

> If you don't store numbers as numbers you can not do math. If you do
> not store dates as dates you can not do date math. Storing everything
> as strings made sense on mainframes 20 years ago. It has no place in a
> modern RDBMS.

OK, one could change A in a way that the values are stored natively:

create table ObjectProperty (ID numeric(10), ObjectID numeric(10), Fieldname

varchar2(50), FieldvalueV varchar2(4000), FieldvalueN numeric(30,2),
FieldvalueD date);

So the question is basicly sould I use one row per value (A) or one row per
object (B)?

I think B is better for the database but A is more flexible if new object
types have to be stored.

Chris Seidel

unread,

Sep 7, 2008, 7:38:01 AM9/7/08

to

Palooka wrote:

> They want to use the database as a bit bucket and
> write a million lines of Java.

A and B do not really differ in lines of java code.

hpuxrac

unread,

Sep 7, 2008, 1:14:56 PM9/7/08

to

On Sep 7, 7:34 am, "Chris Seidel" <csei...@arcor.de> wrote:

snip

> OK, one could change A in a way that the values are stored natively:
>
> create table ObjectProperty (ID numeric(10), ObjectID numeric(10), Fieldname
> varchar2(50), FieldvalueV varchar2(4000), FieldvalueN numeric(30,2),
> FieldvalueD date);

You might want to see what Tom and others think of varchar2(4000) ...
it ain't pretty.

http://asktom.oracle.com and do a search is one way

Several other recent discussions here on cdos of varchar2(4000) if you
use the google interface the newsgroup is pretty searchable.

William Robertson

unread,

Sep 7, 2008, 1:54:39 PM9/7/08

to

No offence, but option A is not only an astonishingly terrible idea,
but a well known astonishingly terrible idea. It's actually one of the
classic astonishingly terrible ideas. For example, see
http://oracle-wtf.blogspot.com/2006/02/eav-returns-concrete-elephant-approach.html

What if you want all rows containing both "Hello" and "Foo"? What if
you want unique or foreign key constraints? It's a fact of life that
unconstrained data gathers incorrect values as time goes on. Along
with all the date-range and integer arithmetic queries already
mentioned, that's a whole world of flexibility you are sacrificing in
the name of flexibility. Even the queries that remain possible become
complex and inefficient. My advice would always be to do the data
modelling properly and design an interface such that the model is
separate from the client application and presentation logic.

DA Morgan

unread,

Sep 7, 2008, 2:20:00 PM9/7/08

to

Chris Seidel wrote:
> DA Morgan wrote:
>
>> Most things have NULL columns ... what's the issue?
>
> A table with lets say 100 columns and 90 % NULLs in each row is no problem
> for the database performance and storage?

You are thinking horizontally when you should be thinking vertically.

Consider the following two tables.

CREATE TABLE t1 (
rid NUMBER,
attr001 VARCHAR2(5),
attr002 VARCHAR2(5),
attr003 VARCHAR2(5),
... VARCHAR2(5),
attr100 VARCHAR2(5));

CREATE TABLE t2 (
rid NUMBER,
attrnum NUMBER(3),
attrval VARCHAR2(5));

One is well designed ... the other a nightmare. And has nothing to do
the question of NULLs. Reconsider the way you are approaching the problem.

And I am in complete agreement with hpuxrac. VARCHAR2(4000) does not
belong in any database for the purpose you seem to be intending.

Ana C. Dent

unread,

Sep 7, 2008, 3:32:36 PM9/7/08

to

"Chris Seidel" <cse...@arcor.de> wrote in

news:48c3bc5c$0$3548$9b4e...@newsspool3.arcor-online.net:

The BEST solution is to design the tables to conform to Third Normal Form

Mark D Powell

unread,

Sep 8, 2008, 8:39:25 AM9/8/08

to

> damor...@x.washington.edu (replace x with u to respond)
> Puget Sound Oracle Users Groupwww.psoug.org- Hide quoted text -
>
> - Show quoted text -

>> Storing everything as strings made sense on mainframes 20 years ago <<

I suspect Daniel was just trying to make a point but I worked on
mainframes 20 years ago and when it came to storing numeric data in
your ISAM/VSAM data files or IMS or IDMS database you wanted to use a
numeric storage definition for numeric data: packed decimal, integer,
or float.

Dates were another issue as every shop had its own date formatting and
manipulation routines. We mostly used what was called Julian date
format, but which was really YYDDD.

Input files were generally all character data likely with over-punches
for numeric characters but otherwise you still tried to match the data
to the type.

Chris Seidel

unread,

Sep 8, 2008, 9:14:55 AM9/8/08

to

William Robertson wrote:

> No offence, but option A is not only an astonishingly terrible idea,
> but a well known astonishingly terrible idea. It's actually one of the
> classic astonishingly terrible ideas. For example, see
> http://oracle-wtf.blogspot.com/2006/02/eav-returns-concrete-elephant-approach.html

Oh nice. I was not the first with this problem ;)
But this link describes B as an "improvement" of A.
Thus both A and B are bad?

> What if you want all rows containing both "Hello" and "Foo"? What if
> you want unique or foreign key constraints?

Ok, these are complex queries.

> It's a fact of life that
> unconstrained data gathers incorrect values as time goes on.

OK.

> My advice would always be to do the data
> modelling properly and design an interface such that the model is
> separate from the client application and presentation logic.

This would mean at least one table for each type of object, doesn't it?

Chris Seidel

unread,

Sep 8, 2008, 9:33:26 AM9/8/08

to

hpuxrac wrote:

> http://asktom.oracle.com and do a search is one way
>
> Several other recent discussions here on cdos of varchar2(4000) if you
> use the google interface the newsgroup is pretty searchable.

Thank you.
It's all about data integrity, isn't it?

DA Morgan

unread,

Sep 8, 2008, 11:21:26 AM9/8/08

to

Except in politics, law, and golf.

DA Morgan

unread,

Sep 8, 2008, 11:23:49 AM9/8/08

to

Sounds like you missed the class on normalization. Here it is in short form.
http://www.psoug.org/reference/normalization.html
and read this too.
http://www.psoug.org/reference/codds_rules.html

Ideally you should be somewhere between at a minimum of 3NF and 5NF
may be better from an academic standpoint but not performance so don't
go there.

Helma

unread,

Sep 8, 2008, 2:21:55 PM9/8/08

to

Hello Chris,

To understand the DBA point of view, consider these two statements you
made:

1) " It is not necessary to access the data otherwise than via the
java application. "

The developer looks at the requirements of the specific application he
needs to build, and makes choices that seem like a shortcut to him
(e.g. let's create a table with only varchar2(4000) columns). I even
witnessed a developers team who refused to create primary and foreign
keys in the database " in order to keep flexibility".... They saw the
database as a necessary part of their application, but choosing
correct datatypes, primary keys, and go-with-the-flow of the RDBMS was
discarded as too much trouble, with no visible gain to them.

The DBA has a different view: applications ( and languages) come and
go, but the database (with it's RDBMS features and the data) is here
to stay. The lifespan of the data is often much longer that the
initial application.

2) "It's all about data integrity, isn't it? "

Data integrity, performanceproblems , problems when converting the
data into a new schema (new application?/ ETL ), scalability etc.

All the subjects that developers normally don't encounter - for some
an argument not to care.

the above is just IMHO, i'm not an expert as some others here.

hpuxrac

unread,

Sep 8, 2008, 3:45:07 PM9/8/08

to

On Sep 8, 9:33 am, "Chris Seidel" <csei...@arcor.de> wrote:

snip

> Thank you.

> It's all about data integrity, isn't it?

Scalable well performing applications using oracle as a back end are
"normally" done using a 3NF+ design with some possible performance
based de-normalization.

The varchar2(4000) tends to be a sign that possibly no database design
what-so-ever may be in play ... a warning signal.

You are welcome.

Palooka

unread,

Sep 8, 2008, 4:45:28 PM9/8/08

to

Mark D Powell wrote:
>
> Dates were another issue as every shop had its own date formatting and
> manipulation routines. We mostly used what was called Julian date
> format, but which was really YYDDD.
>
> Input files were generally all character data likely with over-punches
> for numeric characters but otherwise you still tried to match the data
> to the type.
>

Absolutely true. And sadly, pretty well every shop ended up with
non-valid dates in their data; even leaving aside the Y2K problem.

Nowadays we have decent relational databases, which simply reject
numbers which aren't, dates which aren't, variable length strings which
are longer than expected, child rows without parents, nulls where there
shouldn't be, and values not in the allowed list.

And without any coding required, other than some straightforward error
handling on the client side.

And can handle row level locking, multi versioning and read consistency
without any special code whatsoever.

And that's only Oracle7. We are four more major releases on from there.

Yet some developers want to treat the database as a bit bucket, and
forget all the lessons learned over the last 30 years.

Palooka

gym dot scuba dot kennedy at gmail

unread,

Sep 8, 2008, 8:34:08 PM9/8/08

to

"Chris Seidel" <cse...@arcor.de> wrote in message
news:48c529a1$0$7479$9b4e...@newsspool4.arcor-online.net...

Without the data you wouldn't have an application. Without data integrity
you end up with a heaping pile of bits. Without data integrity you have
performance problems. (eg optimizer has to make unwarrented assumptions
about your data) Tell me, do you just use one data type in Java, Strings?
Of course not. So why would you do the same in the db? Do you only have 1
class in Java, class generic? Of course not. So why would you do the same
in the db?
Jim

DA Morgan

unread,

Sep 8, 2008, 10:39:02 PM9/8/08

to

Palooka wrote:

> Yet some developers want to treat the database as a bit bucket, and
> forget all the lessons learned over the last 30 years.

There is a special name for those people ... They call them Java developers.

--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington

damo...@x.washington.edu (replace x with u to respond)

yossarian

unread,

Sep 9, 2008, 3:39:08 AM9/9/08

to

DA Morgan wrote:

>> Yet some developers want to treat the database as a bit bucket, and
>> forget all the lessons learned over the last 30 years.
>
> There is a special name for those people ... They call them Java
> developers.

I followed a thread yesterday on it.comp.java about database independent
apps. My point was that the only way to achieve that is to move SQL
statements inside stored procedures and do only JDBC "{call ...}"
statements on the Java side. They completely missed the point and
suggested to use JPA to do the job, so they don't have to write a line
of SQL (!). I'm wondering what is using JPA to interact with the database...

Y.

Chris Seidel

unread,

Sep 9, 2008, 5:33:41 AM9/9/08

to

gym dot scuba dot kennedy at gmail wrote:

> Tell me, do you just use
> one data type in Java, Strings? Of course not. So why would you do
> the same in the db? Do you only have 1 class in Java, class generic?
> Of course not. So why would you do the same in the db?

This is a normal use case in document management systems:

You have a class Document with some fixed fields like createDate, author
etc.
And you have arbitrary metadata like invoiceNo and customer for an invoice
or employee for a payslip.

These arbitrary metadata maybe stored as a Map-Object which is nothing more
than a key-value-pair.
This has to be stored in the database.

It must be possible to create "on-the-fly" new subtypes of Documents without
writing new classes or creating new tables.

This is where the problem comes from.

Mark D Powell

unread,

Sep 9, 2008, 4:57:54 PM9/9/08

to

On Sep 8, 10:39 pm, DA Morgan <damor...@psoug.org> wrote:
> Palooka wrote:
> > Yet some developers want to treat the database as a bit bucket, and
> > forget all the lessons learned over the last 30 years.
>
> There is a special name for those people ... They call them Java developers.
> --
> Daniel A. Morgan
> Oracle Ace Director & Instructor
> University of Washington

> damor...@x.washington.edu (replace x with u to respond)

> Puget Sound Oracle Users Groupwww.psoug.org

LOL. Unfortunately true.

Lars Tetzlaff

unread,

Sep 10, 2008, 3:31:59 PM9/10/08

to

Chris Seidel schrieb:

> Hi,
>
> I need to store java objects in my oracle database. These objects have
> serveral fields, e.g.
>
> Object1: String f1; String f2; Date f3
> Object1: String f1; BigDecimal f2; Boolean f3
>
> e.g.
> Object1("Hello", "World", 2008-09-05)
> Object2("Foo", 100.20, true)
>
> The database shall scale up at least to 100 million objects.
>
> How to map this to the database?
>

Maybe you could use the object-relational features in oracle. They allow
you to create object type hierarchies and store them inside object
tables. Look at

http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28371/toc.htm

for an introduction.

Lars