transaction tables consistent reads - undo records applied

452 views
Skip to first unread message

Steve Howard

unread,
Jun 24, 2010, 10:20:20 PM6/24/10
to
Hi All,

10.2.0.4 three node cluster EE on SLES 10

Can someone give me a good definition of *exactly* what this means and
what causes it (mainly the latter). The documentation is not very
descriptive, with "Number of undo records applied to transaction
tables that have been rolled back for consistent read purposes".

It sounds like undo on undo, but we don't have any larger number for
rollbacks (or commits) when this happens than we do at any other time.

We have been plagued by this for over a year, and after multilpe SR's
where the support analyst just reads us the documentation, I am at my
wits end.

We have a fairly large table (almost 1TB with about 300 million rows)
with a large XMLTYPE column. Once a day, a job scans this table for
records added that day for propagation to an external system. The
longer the query runs, the more we see the session doing single block
reads against the undo tablespace, with the stat in the subject
climbing into the millions. Eventually, after several hours, an
ORA-01555 is thrown.

I even grabbed one of the P1/P2 parameters for the session querying
and dumped the undo block in the P2 value. While it was a second or
two after the event was posted, the block itself didn't even contain
any references to the table being queried!

Can anyone shed some light?

Thanks,

Steve

Steve Howard

unread,
Jun 24, 2010, 10:25:10 PM6/24/10
to
On Jun 24, 10:20 pm, Steve Howard <stevedhow...@gmail.com> wrote:
> Hi All,
>
> 10.2.0.4 three node cluster EE on SLES 10


...I should also mention the explain plan for the query is OK. It is
a range scan for the time period being retrieved, which may be two or
three hours, representing 300,000 rows (out of 300 million).

Also, when I look at "table fetch by rowid" for the querying session,
it will periodically just stall. It is as if the session is reading
thousands upon thousands of undo blocks for that next row. The
arraysize is 15 (standard SQL*Plus)

Sometimes this happens, sometimes it doesn't.

We did apply patch 7527908

joel garry

unread,
Jun 25, 2010, 12:01:38 PM6/25/10
to

I have no idea, but I speculate you can use Tanel Poders' latchprofx
and poke around on his site about in memory undo to figure this out.
Does your plan show lots of recursion?

jg
--
@home.com is bogus.
http://www.businessweek.com/news/2010-06-25/oracle-rises-after-sun-acquisition-fuels-profit-gain.html

Jonathan Lewis

unread,
Jun 25, 2010, 12:37:06 PM6/25/10
to

This happens when your query hits a block that was updated
"a long time" in the past by a transaction that has committed
but not been cleaned out.

Your query can tell that it is a committed transaction because
the ITL entry for the transaction points to transaction table slot
(in an undo segment header block) that has been re-used for
a newer transaction. (Part of the transaction id is the "transaction
sequence number", which is counting the number of times a transaction
slot has been used).

Your query therefore needs to know WHEN the transaction committed,
so that it can decide whether or not it's supposed to see the new version
or the old version of the row. (If the transaction committed before the
query then the query doesn't need to know exactly when the transaction
committed, if it started after the query then it has to be rolled back -
and it's possible that the "snapshot too old" is the result of the data
rollback
than the transaction table rollback.)

To find out when the transaction committed, your query copies the undo
segment header block and starts rolling it back. The number of times this
happens is recorded as:
"transaction tables consistent read rollbacks"

To perform the rollback, your query will read the transaction control block
(another part of the undo segment header) which contains a number of
important
details - including the first undo block address of the most recent
transaction
to use that undo segment header. This undo block address will hold the
first
record of that transaction *** - which include information about the
PREVIOUS
state of the transaction control block. By using this undo record your
query
can take the undo segment header block backwards in time by one step -
at which point it reads the older version of the transaction control block
and
repeats the process until it reaches the point where the transaction slot
it's
interested in has been taken back to the correct sequence number (or a
change
has taken the undo segment header block back to a point in time before the
start of the query). Each record it reads in this process is counted in
the
"transaction tables consistent reads - undo records applied"


(*** This is why the block you dumped had nothing to do with your table.)

The trouble with your requirement is that we really need to do a backwards
tablescan - because it's probably the data near the end of the table that
is
changing while you are "wasting" time reading all the data from the start
of
the table.

Unfortunately there is no such hint - but if it's really critical, you
could write
some code to scan the table one extent at a time in reverse order.


--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com


"Steve Howard" <steved...@gmail.com> wrote in message
news:82fa462e-574c-461d...@d37g2000yqm.googlegroups.com...

joel garry

unread,
Jun 25, 2010, 4:31:45 PM6/25/10
to
On Jun 25, 9:37 am, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:

Excellent explanation, but I lost you here. He says plan says doing a
range scan, for 1% of the table? (Maybe you hadn't seen subsequent
post yet, where he mentions a fetch suddenly exhibiting the
characteristics you describe.)

>
> Unfortunately there is no such hint - but if it's really critical, you
> could write
> some code to scan the table one extent at a time in reverse order.

This cleaning makes perfect sense, but I'm wondering if there is some
administrative tuning like adjusting undo size or retention or some
fiddling with initrans? Sounds critical if it's interrupting data
extraction. I'm wondering if the mysterious translation of xmltype
from a column might be a problem here. Steve, how exactly are you
inserting and accessing this column?

jg
--
@home.com is bogus.

snake oil 2.0 http://www.gapingvoidgallery.com/product_info.php?products_id=1614

Jonathan Lewis

unread,
Jun 26, 2010, 2:05:31 AM6/26/10
to
"joel garry" <joel-...@home.com> wrote in message
news:29c017c9-7c3a-40db...@i9g2000prn.googlegroups.com...
]On Jun 25, 9:37 am, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
]wrote:
]>
]> The trouble with your requirement is that we really need to do a
backwards
]> tablescan - because it's probably the data near the end of the table
that
]> is
]> changing while you are "wasting" time reading all the data from the
start
]> of
]> the table.
]
]Excellent explanation, but I lost you here. He says plan says doing a
]range scan, for 1% of the table? (Maybe you hadn't seen subsequent
]post yet, where he mentions a fetch suddenly exhibiting the
]characteristics you describe.)
]

By the time I'd written this much, I'd forgotten that he'd added the note
about the index - but it doesn't really make any difference (a) to the
explanation or (b) to the concept in the solution - except that you
can put in an "index_desc()" hint and that might be enough to help.
It depends on the how the query is written, what index it uses, and
the distribution of the changed data.

]>
]> Unfortunately there is no such hint - but if it's really critical, you


]> could write
]> some code to scan the table one extent at a time in reverse order.
]
]This cleaning makes perfect sense, but I'm wondering if there is some
]administrative tuning like adjusting undo size or retention or some
]fiddling with initrans? Sounds critical if it's interrupting data
]extraction.

The error is "just" the same as a traditional 1555 problem when it gets
that far so a "large enough" undo retention should stop the 1555 - but
that won't stop the amount of work it takes. Thinking about initrans is
a good idea - but that won't have any effect either because the problem
is the number of backward steps that have to be taken and the value of
initrans only eliminates the first few (i.e. a few relating to the size of
INITRANS).

Steve Howard

unread,
Jun 26, 2010, 9:02:29 AM6/26/10
to
> @home.com is bogus.http://www.businessweek.com/news/2010-06-25/oracle-rises-after-sun-ac...

Hi Joel,

I got a tip from Dion Cho regarding this that was also suggesting
dc_rollbacks as latching issue, but I could never prove any similarity
between his test case and our situation.

http://dioncho.wordpress.com/2009/04/04/rethinking-on-ora-01555-error/

Thanks,

Steve

Steve Howard

unread,
Jun 26, 2010, 9:10:30 AM6/26/10
to
On Jun 26, 2:05 am, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:
> "joel garry" <joel-ga...@home.com> wrote in message

Thanks much , Jonathan (and Joel).

I found a comment by you several months ago similar to this that I
always go back to. I can't tell you how many times I would go for a
cup of coffee with someone and mention that link and the fact that I
would like to send you a note for clarification, but just never did.

http://www.orafaq.com/usenet/comp.databases.oracle.server/2008/02/01/0022.htm

I really appreciate you taking the time to respond, as this has been
driving me nutty.

Let me take some time to digest what you posted and then I will
respond.

Thanks,

Steve

Steve Howard

unread,
Jun 28, 2010, 9:37:32 AM6/28/10
to
On Jun 26, 2:05 am, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:
> "joel garry" <joel-ga...@home.com> wrote in message

What is really odd about this is that several months ago, I started
running a job to “pre-scan” all the rows we would need, before ‘the
“real” job got there. My assumption was this had something to do with
block cleanout, even though none of the cleanout statistics were
incremented like the “transaction tables consistent reads – undo
records applied” counter was.

This doesn’t seem to help, though. My “pre-scan” job never has an a
issue, but I run one hour windows for the range to scan.

A little more background. This is a “transaction history” table of
sorts. It is partitioned by month, and records are only added, never
updated.

SQL> desc big_table
Name Null? Type
----------------------------------------- --------
----------------------------
PK NOT NULL NUMBER
FK NOT NULL NUMBER
COL3 NOT NULL NUMBER(3)
CREATE_TIME TIMESTAMP(6)
COL5 NOT NULL VARCHAR2(50)
COL6 VARCHAR2(50)
COL7 XMLTYPE

SQL>

We query as follows:

SELECT concatenated_xml_string_of_columns_from_big_table,
a.xml_col.getClobVal()
FROM big_table a
WHERE create_time between trunc(sysdate) + (:1 / 1440) and
trunc(sysdate) + (:2 / 1440)

…where the window is three hours. This does a range scan on the
create_time column, which is good as it is by far the most selective
filter.

The selected records are retrieved in PL/SQL (no bulk collect), and
run through a few more XML tagging operations and written to a file.
They are then propagated to a mainframe for additional business usage
to which I am not privy.

If the query runs “fast enough” (less than 30 minutes or so), we don’t
see the issue. If it starts to “get slow” for whatever reason, we
start reading tons of undo.

Based on what you wrote, and the fact that I “pre-scan” the rows,
shouldn’t I pay the price for the cleanout? Or could it be we *do*
have other transactions hitting this table of which I am not aware?
In other words,

* I pre-scan
* A row *is* changed after my query finishes
* They run the “real” query

Thanks,

Steve

joel garry

unread,
Jun 28, 2010, 1:15:35 PM6/28/10
to

Something old but new to me I learned today (from Lob retention not
changing when undo_retention is changed [ID 563470.1]):

"...It is assumed that when UNDO_RETENTION is changed the lobs
connected to that retention are also changed which is not the case .

If a lob is modified from RETENTION to PCTVERSION and back to
RETENTION again then the lob retention is updated. ..."

Of course I have no idea if it is related to your problem, unless you
say something like you've changed your undo retention from 30 minutes
or so and didn't know about this...

A bit more of a reach, maybe Bug 2931779 - False ORA-1555 accessing
"cache read" LOBs in RAC [ID 2931779.8] or related has reanimated in
some form.

I guess you need to start tracing and digging deep to figure this one
out. Those mysterious xml packages may be doing something strange...
(I've run into 3rd party app code at times that does stuff like update
and rollback, unexpectedly).

jg
--
@home.com is bogus.

http://thehill.com/blogs/hillicon-valley/technology/105721-sen-bond-says-dhs-shouldnt-oversee-cybersecurity

Steve Howard

unread,
Jun 29, 2010, 12:21:38 PM6/29/10
to
> @home.com is bogus.http://thehill.com/blogs/hillicon-valley/technology/105721-sen-bond-s...

Hi Joel,

I have often wondered about the black magic xmltype's as well. I can
say that by using logminer as well as dba_hist_sqltext I have verified
there are zero updates to these rows after they are inserted.

I am watching it happen as I type this, as I fend off arguments of
Oracle being "expensive, slow, and bloated"...yeah, a real fun day.

The session I am watching is selecting records inserted between 6AM
and 9AM this morning. It has been running for about 80 minutes, and
fetched a total of about 250,000 rows. For the last 30 minutes it has
fetched less than 2.000 while incrementing the "transaction tables
consistent reads - undo records applied" by almost 2 million. The
session has 256 counters for "transaction tables consistent read
rollbacks".

Once again, range scan of the create_time index. It will eventually
(if it doesn't ORA-01555) fetch about 300,000 rows in total.

I am at a loss.

Thanks,

Steve

Steve Howard

unread,
Jun 29, 2010, 12:53:58 PM6/29/10
to
On Jun 25, 12:37 pm, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:
> "Steve Howard" <stevedhow...@gmail.com> wrote in message

Jonathan,

I am just re-reading your post.

<<
The trouble with your requirement is that we really need to do a
backwards tablescan - because it's probably the data near the end of
the table that is changing>>

This is true.

<<
while you are "wasting" time reading all the data from the start of
the table. Unfortunately there is no such hint - but if it's really
critical, you could write some code to scan the table one extent at a
time in reverse order.
>>

Are you suggesting that if I were to scan the *newest* rows prior to
the "real" job reading older rows, that may help? If so, that would
be fairly easy, as I can identify them by create_time.

Thanks,

Steve

Reply all
Reply to author
Forward
0 new messages