I need to store java objects in my oracle database. These objects have
serveral fields, e.g.
Object1: String f1; String f2; Date f3
Object1: String f1; BigDecimal f2; Boolean f3
e.g.
Object1("Hello", "World", 2008-09-05)
Object2("Foo", 100.20, true)
The database shall scale up at least to 100 million objects.
How to map this to the database?
A)
create table ObjectProperty (ID numeric(10), ObjectID numeric(10), Fieldname
varchar2(50), Fieldvalue varchar2(4000));
So each object is stored into multiple rows in this table, one row per
field.
Each value is converted to varchar.
1, 100, Hello
2, 100, World
3, 100, 2008-09-05
4, 101, Foo
5, 101, 100.20
6, 101, 1
B)
create table Object(ObjectID numeric(10), o1_f1 varchar(4000), o1_f2
varchar(4000), o1_f3 date, o2_f1 varchar(4000), o2_f2 numeric(30,2), o2_f3
numeric(1));
1, Hello, World, 2008-09-05, null, null, null
2, null, null, null, Foo, 100.20, 1
So each object is store in a single row.
Values are store in there native format.
This could be optimized that Object1 and Object2 share fields of the same
type.
What design is better A oder B and why?
Thank you!
A relational database is best used to store data in a relational
format with attributes stored as using the appropriate data type so
design B is better because you are storing each attribute in its own
designated column of the correct database data type, that is, a
numeric value in a number data type, a date value in a date data type,
etc ... rather than storing everything as character as the first
design would require.
Storing data this way will make the data available via other means/
methods than just the java application layer with its object building
and handling logic.
IMHO -- Mark D Powell --
> Storing data this way will make the data available via other means/
> methods than just the java application layer with its object building
> and handling logic.
It is not necessary to access the data otherwise than via the java
application.
What kinds of pros and cons have A and B in respect of performance and
scalability?
Are there any queries which can't be done well with A (range, sorting etc)?
Thank you
"Best" depends on what you are going to do with it other than store it.
"Best" depends on whether you require it reconstituted as XML at some
point in the future.
"Best" depends, to a lesser extent, on your version which you don't
state (3 decimal places).
My preference for reasons that range from performance to reporting to
storage to relational theory is to shred XML into its constituent
elements storing each with its proper data type.
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Query it. I have to find objects fast based on the field data (=, >, <,
between, like)
> "Best" depends on whether you require it reconstituted as XML at some
> point in the future.
No XML needed.
> "Best" depends, to a lesser extent, on your version which you don't
> state (3 decimal places).
A stores all as string, no problem.
For B a new column will be there numeric(x,y) with y >= 3.
But for 99 % of the objects 2 decimal places will be ok.
> My preference for reasons that range from performance to reporting to
> storage to relational theory is to shred XML into its constituent
> elements storing each with its proper data type.
So you would prefer A?
What about the many null columns in this approach?
Relational theory has been around for a long time. (over 25 years) It has
been highly effective for large and small businesses. It isa well known
pattern. Use what other people have learned to your advantage. Use the
correct tool for the correct job. Making the db a string bit bucket isn't
going to help your efforts.
Jim
Most things have NULL columns ... what's the issue?
If you don't store numbers as numbers you can not do math. If you do
not store dates as dates you can not do date math. Storing everything
as strings made sense on mainframes 20 years ago. It has no place in a
modern RDBMS.
> It is not necessary to access the data otherwise than via the java
> application.
A more correct statement would be as follows:
I don't now think it necessary to access the data otherwise than via the
java application, but who knows what the future may require.
Oh yes it does. They want to use the database as a bit bucket and write
a million lines of Java. Might as well use flat files for the data, and
save on the database licences.
Sorry, I am in a bad mood.
Palooka
> Most things have NULL columns ... what's the issue?
A table with lets say 100 columns and 90 % NULLs in each row is no problem
for the database performance and storage?
> If you don't store numbers as numbers you can not do math. If you do
> not store dates as dates you can not do date math. Storing everything
> as strings made sense on mainframes 20 years ago. It has no place in a
> modern RDBMS.
OK, one could change A in a way that the values are stored natively:
create table ObjectProperty (ID numeric(10), ObjectID numeric(10), Fieldname
varchar2(50), FieldvalueV varchar2(4000), FieldvalueN numeric(30,2),
FieldvalueD date);
So the question is basicly sould I use one row per value (A) or one row per
object (B)?
I think B is better for the database but A is more flexible if new object
types have to be stored.
> They want to use the database as a bit bucket and
> write a million lines of Java.
A and B do not really differ in lines of java code.
snip
> OK, one could change A in a way that the values are stored natively:
>
> create table ObjectProperty (ID numeric(10), ObjectID numeric(10), Fieldname
> varchar2(50), FieldvalueV varchar2(4000), FieldvalueN numeric(30,2),
> FieldvalueD date);
You might want to see what Tom and others think of varchar2(4000) ...
it ain't pretty.
http://asktom.oracle.com and do a search is one way
Several other recent discussions here on cdos of varchar2(4000) if you
use the google interface the newsgroup is pretty searchable.
No offence, but option A is not only an astonishingly terrible idea,
but a well known astonishingly terrible idea. It's actually one of the
classic astonishingly terrible ideas. For example, see
http://oracle-wtf.blogspot.com/2006/02/eav-returns-concrete-elephant-approach.html
What if you want all rows containing both "Hello" and "Foo"? What if
you want unique or foreign key constraints? It's a fact of life that
unconstrained data gathers incorrect values as time goes on. Along
with all the date-range and integer arithmetic queries already
mentioned, that's a whole world of flexibility you are sacrificing in
the name of flexibility. Even the queries that remain possible become
complex and inefficient. My advice would always be to do the data
modelling properly and design an interface such that the model is
separate from the client application and presentation logic.
You are thinking horizontally when you should be thinking vertically.
Consider the following two tables.
CREATE TABLE t1 (
rid NUMBER,
attr001 VARCHAR2(5),
attr002 VARCHAR2(5),
attr003 VARCHAR2(5),
... VARCHAR2(5),
attr100 VARCHAR2(5));
CREATE TABLE t2 (
rid NUMBER,
attrnum NUMBER(3),
attrval VARCHAR2(5));
One is well designed ... the other a nightmare. And has nothing to do
the question of NULLs. Reconsider the way you are approaching the problem.
And I am in complete agreement with hpuxrac. VARCHAR2(4000) does not
belong in any database for the purpose you seem to be intending.
The BEST solution is to design the tables to conform to Third Normal Form
>> Storing everything as strings made sense on mainframes 20 years ago <<
I suspect Daniel was just trying to make a point but I worked on
mainframes 20 years ago and when it came to storing numeric data in
your ISAM/VSAM data files or IMS or IDMS database you wanted to use a
numeric storage definition for numeric data: packed decimal, integer,
or float.
Dates were another issue as every shop had its own date formatting and
manipulation routines. We mostly used what was called Julian date
format, but which was really YYDDD.
Input files were generally all character data likely with over-punches
for numeric characters but otherwise you still tried to match the data
to the type.
> No offence, but option A is not only an astonishingly terrible idea,
> but a well known astonishingly terrible idea. It's actually one of the
> classic astonishingly terrible ideas. For example, see
> http://oracle-wtf.blogspot.com/2006/02/eav-returns-concrete-elephant-approach.html
Oh nice. I was not the first with this problem ;)
But this link describes B as an "improvement" of A.
Thus both A and B are bad?
> What if you want all rows containing both "Hello" and "Foo"? What if
> you want unique or foreign key constraints?
Ok, these are complex queries.
> It's a fact of life that
> unconstrained data gathers incorrect values as time goes on.
OK.
> My advice would always be to do the data
> modelling properly and design an interface such that the model is
> separate from the client application and presentation logic.
This would mean at least one table for each type of object, doesn't it?
> http://asktom.oracle.com and do a search is one way
>
> Several other recent discussions here on cdos of varchar2(4000) if you
> use the google interface the newsgroup is pretty searchable.
Thank you.
It's all about data integrity, isn't it?
Except in politics, law, and golf.
Sounds like you missed the class on normalization. Here it is in short form.
http://www.psoug.org/reference/normalization.html
and read this too.
http://www.psoug.org/reference/codds_rules.html
Ideally you should be somewhere between at a minimum of 3NF and 5NF
may be better from an academic standpoint but not performance so don't
go there.
To understand the DBA point of view, consider these two statements you
made:
1) " It is not necessary to access the data otherwise than via the
java application. "
The developer looks at the requirements of the specific application he
needs to build, and makes choices that seem like a shortcut to him
(e.g. let's create a table with only varchar2(4000) columns). I even
witnessed a developers team who refused to create primary and foreign
keys in the database " in order to keep flexibility".... They saw the
database as a necessary part of their application, but choosing
correct datatypes, primary keys, and go-with-the-flow of the RDBMS was
discarded as too much trouble, with no visible gain to them.
The DBA has a different view: applications ( and languages) come and
go, but the database (with it's RDBMS features and the data) is here
to stay. The lifespan of the data is often much longer that the
initial application.
2) "It's all about data integrity, isn't it? "
Data integrity, performanceproblems , problems when converting the
data into a new schema (new application?/ ETL ), scalability etc.
All the subjects that developers normally don't encounter - for some
an argument not to care.
the above is just IMHO, i'm not an expert as some others here.
snip
> Thank you.
> It's all about data integrity, isn't it?
Scalable well performing applications using oracle as a back end are
"normally" done using a 3NF+ design with some possible performance
based de-normalization.
The varchar2(4000) tends to be a sign that possibly no database design
what-so-ever may be in play ... a warning signal.
You are welcome.
Nowadays we have decent relational databases, which simply reject
numbers which aren't, dates which aren't, variable length strings which
are longer than expected, child rows without parents, nulls where there
shouldn't be, and values not in the allowed list.
And without any coding required, other than some straightforward error
handling on the client side.
And can handle row level locking, multi versioning and read consistency
without any special code whatsoever.
And that's only Oracle7. We are four more major releases on from there.
Yet some developers want to treat the database as a bit bucket, and
forget all the lessons learned over the last 30 years.
Palooka
> Yet some developers want to treat the database as a bit bucket, and
> forget all the lessons learned over the last 30 years.
There is a special name for those people ... They call them Java developers.
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
>> Yet some developers want to treat the database as a bit bucket, and
>> forget all the lessons learned over the last 30 years.
>
> There is a special name for those people ... They call them Java
> developers.
I followed a thread yesterday on it.comp.java about database independent
apps. My point was that the only way to achieve that is to move SQL
statements inside stored procedures and do only JDBC "{call ...}"
statements on the Java side. They completely missed the point and
suggested to use JPA to do the job, so they don't have to write a line
of SQL (!). I'm wondering what is using JPA to interact with the database...
Y.
> Tell me, do you just use
> one data type in Java, Strings? Of course not. So why would you do
> the same in the db? Do you only have 1 class in Java, class generic?
> Of course not. So why would you do the same in the db?
This is a normal use case in document management systems:
You have a class Document with some fixed fields like createDate, author
etc.
And you have arbitrary metadata like invoiceNo and customer for an invoice
or employee for a payslip.
These arbitrary metadata maybe stored as a Map-Object which is nothing more
than a key-value-pair.
This has to be stored in the database.
It must be possible to create "on-the-fly" new subtypes of Documents without
writing new classes or creating new tables.
This is where the problem comes from.
LOL. Unfortunately true.
Maybe you could use the object-relational features in oracle. They allow
you to create object type hierarchies and store them inside object
tables. Look at
http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28371/toc.htm
for an introduction.
Lars