I'm working on a database which contains a big table TD (10 millions
lines) with many columns (180)
I wonder if the performance won t be better if i normalize some
columns (about 20).
So this is the new table NORM_FIELD :
ID_DATA (ID reference from original table)
FIELD_ID
FIELD_DATA
+ 3 - 4 fields about updating (date, user, ...)
TD is partitionning case 2 columns.
Is it possible to partitionning NORM_FIELD which the same critery of
TD ?
Even ifnot, what the better solution between :
- copy data of partitioning field in NORM_FIELD to store in the same
partition (and have this critery for the join condition)
- create a new partitioning critery for NORM_FIELD (like the last
digit for having the best distribution)
I don t want to change the application code, so i will rename my table
and create a view with this name which will be a join with a another
view build from this method :
http://oracle.ittoolbox.com/groups/technical-functional/oracle-db-l/denormalizing-data-898552
Will performance be ok if i do this change ?
What s about your experience ?
Thanks for answers.
Chmanu
PS : it is possible i've not be clear cause my poor english, in this
case ask me to explain.
Performance may or may not improve but maintainability and the
ability to use it for something other than a data-dumpster will.
10M rows is not a particularly large table. But 180 columns is,
in the vast majority of cases, a really bad idea.
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Daniel
> 10M rows is not a particularly large table. But 180 columns is,
> in the vast majority of cases, a really bad idea.
180 columns smacks of bad design.
In a really horrible design, 180 indexes on these 180 columns would
exist as well.
To explore this argument further, if you did have this sort of bad
structure, would the Oracle CBO still choose the index that would give
the best execution plan to satisfy the query? Would it still choose
the index that would give the best execution plan for a table with
1000 columns (the limit in 10.2)?
I don't know the answer. Does anyone? (at some point, surely the cost
of determining the cost could be larger that the cost for executing
the query)
Barry
While I agree with your conclusion ... "smacks of bad design" ... I am
not at all sure I agree with how you got there. Why does 180 columns
add up to 180 indexes? More than a handful on a table too "smacks
of bad design."
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
I don't know the answer either, but I'll make a couple of
observations.
You can adjust how deep the optimizer will look for various plans.
With lots of indices, there may be some issue with wasting time
evaluating those that will never be right. Since the cost is what
finally determines which plan, it becomes possible to have way-off
costs coupled with perhaps never getting to the correct plan.
http://jonathanlewis.wordpress.com/2006/12/09/how-many-ways/
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/d91f5c65ee5e2355/91b269133fa1ce18?lnk=st&q=#91b269133fa1ce18
Investigate histograms. With a properly normalized system, you can
say "the more info you give to the optimizer, the better it is likely
to make a correct decision for costing plans." With too much data,
you can't say that. The general purpose of histograms is to give the
optimizer more information on data distribution. So what happens when
there are too many irrelevant histograms?
Even if it does generally work, performance problems arise where it
doesn't. This isn't anything different, I just think it could be much
worse. http://oracledoug.com/serendipity/index.php?/archives/820-Topsy-Turvy.html
**(number of denormalizations)
"...it's hard to know if there are even enough hints (yet) to describe
every path we might want." - Jonathan Lewis on his Scratchpad.
If you find something wrong, you have to give a simple case to Oracle
support to get it fixed. It may go to the bottom of the pile if it is
way out in left field.
As far as performance with the view, note that recent and even
unreleased patches are still dealing with performance and wrong
results issues for certain obscure view constructs. So test and let
us know.
jg
--
@home.com is bogus.
http://www.popjudaica.com/chosen-product.php?model=doyaandta
> > > Performance may or may not improve but maintainability and the
> > > ability to use it for something other than a data-dumpster will.
> > > 10M rows is not a particularly large table. But 180 columns is,
> > > in the vast majority of cases, a really bad idea.
> > > 10M rows is not a particularly large table. But 180 columns is,
> > > in the vast majority of cases, a really bad idea.
> > 180 columns smacks of bad design.
> > In a really horrible design, 180 indexes on these 180 columns would
> > exist as well.
I know and be conscient of that, i m agree with you.
For the context, this table is just for select. The only field
destined to be modified (just a bit) are them i want to normalize.
In fact the 180 columns contains the 20 i want to store in the second
table but there also are date, user and another information associated
to it (so 20*4 = 80 => the mainly table will reduce to 100 columns)
> > To explore this argument further, if you did have this sort of bad
> > structure, would the Oracle CBO still choose the index that would give
> > the best execution plan to satisfy the query? Would it still choose
> > the index that would give the best execution plan for a table with
> > 1000 columns (the limit in 10.2)?
>
> > I don't know the answer. Does anyone? (at some point, surely the cost
> > of determining the cost could be larger that the cost for executing
> > the query)
>
> I don't know the answer either, but I'll make a couple of
> observations.
>
> You can adjust how deep the optimizer will look for various plans.
> With lots of indices, there may be some issue with wasting time
> evaluating those that will never be right. Since the cost is what
> finally determines which plan, it becomes possible to have way-off
> costs coupled with perhaps never getting to the correct plan.
>
Is it possible to store the execution plan in a persistent mode or i
need to hint the view ?
> Investigate histograms. With a properly normalized system, you can
> say "the more info you give to the optimizer, the better it is likely
> to make a correct decision for costing plans." With too much data,
> you can't say that. The general purpose of histograms is to give the
> optimizer more information on data distribution. So what happens when
> there are too many irrelevant histograms?
>
> Even if it does generally work, performance problems arise where it
> doesn't. This isn't anything different, I just think it could be much
> worse. http://oracledoug.com/serendipity/index.php?/archives/820-Topsy-Turvy...
> **(number of denormalizations)
>
> "...it's hard to know if there are even enough hints (yet) to describe
> every path we might want." - Jonathan Lewis on his Scratchpad.
>
> If you find something wrong, you have to give a simple case to Oracle
> support to get it fixed. It may go to the bottom of the pile if it is
> way out in left field.
>
> As far as performance with the view, note that recent and even
> unreleased patches are still dealing with performance and wrong
> results issues for certain obscure view constructs. So test and let
> us know.
thank you for the advise, i will test with some data.
I havent ever read all your links (some word i don t understand i need
a good translator colleague)
Another possibility is to create more tables to join to the first (but
i don't like)
Regards,
Chmanu
Search the docs for plan stability. Which is implented as hints.
Don't be afraid of normalizing tables!!! If you follow the theory,
you work with the optimizer, rather than against it. You might search
wikipedia.org for normalization in your native language, maybe you'll
get lucky.
I don't know about your specific case, but generally when I see this
sort of denormalizing, eventually someone wants to put together the
disparate parts of the table with some absurd self-join. In some
cases, it becomes the standard operating code. You've seen by the
link in your original post how quickly decodes can become
unmanageable.
jg
--
@home.com is bogus.
Auditors do exactly what, again? http://www.signonsandiego.com/news/metro/pension/20080408-9999-1n8sec.html
I do some tests and i hope having good advice to tune :
The tests are not complete because i only create the normalize table
without alter the first table by dropping the unusable columns (i need
to develop a migrate script).
and the new table contain no data in the data field.
My first table (TD) : 947409 rows
The normalize table (NT) : 2490368 rows
The normalize table is a IOT (PK on ID_DATA ; FIELD_ID) and i add a
index on ID_DATA (for the group by)
If i create the view (V_NT) the plan cost show me a proportionnal
growth with number of NT rows inserted. (for select * ..where id=...)
If i ask to oracle the same query as the view but precise inside the
from of the view my ID, it fall down (1992 to 3) because use the index
to do the view treatment only on the id i m interresting of.
So the question is :
Is it possible to say to oracle to do the treatment only on the id is
return ?
------ TRACE -----
select *
from V_NT
where id = 17259478
Plan
SELECT STATEMENT ALL_ROWSCost : 1 992 Bytes : 277 585 605
Cardinality : 1 814 285
3 HASH GROUP BY Cost : 1 992 Bytes : 277 585 605 Cardinality : 1 814
285
2 FILTER
1 INDEX FAST FULL SCAN TD_PK Cost : 1 879 Bytes : 277 585 605
Cardinality : 1 814 285
SELECT ACP_ID,
MAX(DECODE(ID_FIELD, 1, DATA)) AS ded,
MAX(DECODE(ID_FIELD, 2, DATA)) AS qsqd,
MAX(DECODE(ID_FIELD, 3, DATA)) AS dsv,
MAX(DECODE(ID_FIELD, 4, DATA)) AS ntrg,
MAX(DECODE(ID_FIELD, 5, DATA)) AS yj,
MAX(DECODE(ID_FIELD, 6, DATA)) AS dsq,
MAX(DECODE(ID_FIELD, 7, DATA)) AS df,
MAX(DECODE(ID_FIELD, 8, DATA)) AS ee,
MAX(DECODE(ID_FIELD, 9, DATA)) AS er,
MAX(DECODE(ID_FIELD, 10, DATA)) AS gf,
MAX(DECODE(ID_FIELD, 11, DATA)) AS rezg,
MAX(DECODE(ID_FIELD, 12, DATA)) AS f,
MAX(DECODE(ID_FIELD, 13, DATA)) AS fds,
MAX(DECODE(ID_FIELD, 14, DATA)) AS fdsv,
MAX(DECODE(ID_FIELD, 15, DATA)) AS vf,
MAX(DECODE(ID_FIELD, 16, DATA)) AS zz,
MAX(DECODE(ID_FIELD, 17, DATA)) AS k_INF,
MAX(DECODE(ID_FIELD, 18, DATA)) AS bvr,
MAX(DECODE(ID_FIELD, 19, DATA)) AS d
FROM (SELECT *
FROM NT
WHERE id = 17259478) tb
where id = 17259478
GROUP BY ID
Plan
SELECT STATEMENT ALL_ROWSCost : 3 Bytes : 2 907 Cardinality : 19
3 SORT GROUP BY NOSORT Cost : 3 Bytes : 2 907 Cardinality : 19
2 INDEX UNIQUE SCAN TD_PK Cost : 3 Bytes : 2 907 Cardinality : 19
1 INDEX RANGE SCAN TD.IDX Cost : 3 Cardinality : 19
Thanks for your help.
Chmanu
the execution plan is ok.
The problem appear when i want to use a view which not including the
condition on id.
If this problem can be solves, what about my initial question :
Is it possible to create a partitionnal table on fields coming from a
join table ?
Chmanu
I've solved the problem and use a classic table (not iot).
The cost and the respond time are correct (6 with the join to TD and <
200ms), partitionning is perhaps not important (test with about 5M
rows but all of them are null).
Chmanu