In the thread about LOB storage, Jonathan objected that I haven't really shown any numbers, which is true. I decided to fix my laziness and wrote a Perl script to load 614 files with approximately 2.3 GB of into a simple table having 3 columns: fileno, filename and content of type BLOB. While loading, I was also monitoring the script execution and wait events. The outcome is more than interesting:
1 select event,time_waited,average_wait
2 from v$session_event
3 where sid=204
4* order by time_waited
SQL> /
EVENT TIME_WAITED AVERAGE_WAIT
-------------------- ----------- ------------
direct path write 18902 .67
flashback log file s 29509 4.53
ync
direct path read 35703 1.29
11 rows selected.
The event which caused the most time to be spent waiting was, as is to be expected, "direct path read". After all, the script was loading BLOB objects into the database. The second most time consuming event was, quite unexpectedly, "flashback log file sync". Much, much more time was spent on writing flashback logs than on writing the redo logs. As opposed to redo log, flashback is a horrific monster. This data is for a user session, which means that flashback log file sync is happening synchronously. In other words, there is a very hefty performance price to be paid for flashback, much much more than for redo logs and recoverability. The mechanisms and parameters to control flashback are not exposed which means that there is no control over that mechanism. The only possibility is to turn it on and off. Apparently, flashback will exact a heavy performance penalty on the DML intensive applications.
The loading script worked well:
SQL> select bytes/1048576 from dba_segments
2 where owner='SCOTT' and 3 segment_type='LOBSEGMENT' and
4 segment_name='SYS_LOB0000087552C00003$$';
BYTES/1048576
-------------
2304
Elapsed: 00:00:00.05
SQL> select count(*) from scott.bin_files;
COUNT(*)
----------
614
Elapsed: 00:00:00.00
SQL>
This is the original:
[mgogala@medo tmp]$ du -sh ~/mp3/misc
2.3G /home/mgogala/mp3/misc
[mgogala@medo tmp]$ ls /home/mgogala/mp3/misc|wc -l
614
This is the "business end" of the script:
my $dbh = db_connect("scott", "tiger");
my @files = glob("/home/mgogala/mp3/misc/*");
my $cnt = 0;
my $ins = "insert into bin_files values(?,?,?)";
my $sth=$dbh->prepare($ins);
foreach my $f (@files) {
my $cont=slurp($f);
$sth->bind_param(1,$cnt++);
$sth->bind_param(2,$f);
$sth->bind_param(3,$cont,{ora_type=>SQLT_BIN});
$sth->execute();
}
$dbh->commit();
This is the target table:
SQL> desc scott.bin_files
Name Null? Type
----------------------------------------- -------- ----------------------------
FILENO NUMBER(4)
FILENAME VARCHAR2(256)
CONTENT BLOB
SQL>
Version is 11.2.0.3, 64 bit on Linux, storage in row is disabled.
I will do some tests with deleting rows and re-loading the files and see whether the LOB segment will grow, but that will be another thread. This is only to show what a hefty performance penalty is exacted by having flashback turned on.
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.12.05.37.02@gmail.com...
| In the thread about LOB storage, Jonathan objected that I haven't really
|
| The event which caused the most time to be spent waiting was, as is to be
| expected, "direct path read".
Why do you expect to read when all you're doing is writing - (I think I know the answer to that question):
It would be interesting to see the session stats to see what the reads were, and how big the writes were.
I think the reads may have been "for flashback new" - to modify a block Oracle may have to read it and put it in the flashback log for writing before modifying it. In the case of very high volume the I/O for a LOB insert turns from "write" to "read / write (flashback) / write (lob)". (You may find that if you have a new tablespace the flashback activity is hugely reduced)
What's the full DDL for the table (use dbms_metadata.get_ddl) - are the LOBs nocache nologging, for example, and can this affect the flashback rate.
Mladen Gogala wrote,on my timestamp of 12/05/2012 3:37 PM:
> This is only to show what a hefty performance penalty is exacted by
> having flashback turned on.
A few weeks ago an Oracle "expert" insinuated I was a "bad dba" for refusing to use flashback in our production DW while testing new BI code releases.
I gave up on it a long time ago after watching the performance hit on some typical loads. Instead, I provide an exact copy of production - down to the same SAN LUN types and numbers - to break as they please. Fun and games to keep the optimizer behaving the same in both instances, but I can live with that.
Flashback has to read and then write the data somewhere, somehow. And when it is written in the flashback area, it is potentially from multiple tablespaces into one set of devices. So much for I/O spreading. Not easily tunable - I don't think using "hidden" parameters is a reliable long term tuning idea - and I/O to flashback includes data and indexes.
Of course: there are special cases where it is useful and doesn't terribly harm performance. On standby dbs for testing DR, for example. But on a DW? And with large loads and data cleansing - typical ETL cycle - happening all day?
No, flashback can't possibly, ever, cause a performance hit under typical DW process loads!... Ah well, how "bad dba" of me to actually care about performance instead of features. What can I say.
On Sat, 12 May 2012 07:26:46 +0100, Jonathan Lewis wrote:
> What's the full DDL for the table (use dbms_metadata.get_ddl) - are the
> LOBs nocache nologging, for example, and can this affect the flashback
> rate.
Jonathan, no need for DBMS_METADATA. I have the DDL:
create table bin_files(
fileno number(4,0),
filename varchar2(256),
content BLOB)
LOB(content) store as (tablespace test_lob disable storage in row nocache)
tablespace test_tbs
/
It was late when I started working on the test, I forgot the nologging clause. I re-created the table, the DDL now looks like this:
On Sat, 12 May 2012 07:26:46 +0100, Jonathan Lewis wrote:
> Why do you expect to read when all you're doing is writing - (I think I
> know the answer to that question):
When writing LOB data, oracle first has to prepare the space. It first has to read the "chunk" into the PGA. I proved this by increasing chunk to 32k. I could probably eliminate the direct reads completely, by changing the lob to be cached (blocks read during into the SGA). Of course, I would then get db file scattered reads instead.
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.12.18.08.31@gmail.com...
| On Sat, 12 May 2012 07:26:46 +0100, Jonathan Lewis wrote:
|
| > What's the full DDL for the table (use dbms_metadata.get_ddl) - are the
| > LOBs nocache nologging, for example, and can this affect the flashback
| > rate.
|
| Jonathan, no need for DBMS_METADATA. I have the DDL:
|
When puzzled I always look at the output from dbms_metadata in case something I had forgotten about has sneaked into the definition.
LOB ("CONTENT") STORE AS BASICFILE (
| TABLESPACE "TEST_LOB" DISABLE STORAGE IN ROW CHUNK 8192 RETENTION
| NOCACHE NOLOGGING
| STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
| PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
| BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) ;
|
As a test, create the LOB as CACHE.
Did you check the session stats to see what your reads were ?
On Sat, 12 May 2012 19:49:36 +0100, Jonathan Lewis wrote:
> "Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
> news:pan.2012.05.12.18.08.31@gmail.com...
> | On Sat, 12 May 2012 07:26:46 +0100, Jonathan Lewis wrote:
> |
> | > What's the full DDL for the table (use dbms_metadata.get_ddl) - are
> the | > LOBs nocache nologging, for example, and can this affect the
> flashback | > rate.
> |
> | Jonathan, no need for DBMS_METADATA. I have the DDL:
> |
> When puzzled I always look at the output from dbms_metadata in case
> something I had forgotten about has sneaked into the definition.
It seems that flashback operations are synchronous only for the "direct" operations, which bypass the SGA. When I come to think of that, it is a logical behavior of a sort. Also, Oracle prevents me from specifying "NOLOGGING" with CACHE, which is sort of strange. The script now completes in less than 9 minutes, which is excellent, after having started with 20 minutes.
mgogala@medo tmp]$ time ./test_lob
real 8m59.722s
user 0m6.372s
sys 0m7.414s
Here is the entire script, if someone else wants to get involved. One only needs to modify the directory and the maximum LOB size in the connection (LongReadLen) if the files to be loaded can exceed 10M in size.
#!/usr/bin/env perl
use warnings;
use strict;
use DBI;
use DBD::Oracle qw(:ora_types);
my $dbh = db_connect("scott", "tiger");
$dbh->do("alter session set events='10046 trace name context forever, level 12'");
my @files = glob("/home/mgogala/mp3/misc/*");
my $cnt = 0;
my $ins = "insert into bin_files values(?,?,?)";
my $sth=$dbh->prepare($ins);
foreach my $f (@files) {
my $cont=slurp($f);
$sth->bind_param(1,$cnt++);
$sth->bind_param(2,$f);
$sth->bind_param(3,$cont,{ora_type=>SQLT_BIN});
$sth->execute();
sub slurp {
my $name=shift;
local $/=undef;
my $fd;
open($fd," < ",$name) or die(" cannot open $name for reading : $! \n ");
my $cont=<$fd>;
close($fd);
return($cont);
On Sat, 12 May 2012 23:41:21 +1000, Noons wrote:
> No, flashback can't possibly, ever, cause a performance hit under
> typical DW process loads!... Ah well, how "bad dba" of me to actually
> care about performance instead of features. What can I say.
Flashback records almost identical information as redo. The main difference between the two is that redo logs are not supposed to be read, except in case of instance recovery. Redo logs are the main mechanism that provides the "D" in ACID.
Flashback logs, just like their redo counterparts, record change vectors, in order to be able to reconstruct the data, up to specific point in time. The default is 24 hours, controlled by the
"db_flashback_retention_target" parameter.
If the block being changed is in SGA, that is being done by a background process named "FBDA". If the block is not in the SGA, as in the case when LOB column is created with "NOCACHE" option, that must be done synchronously, by invoking a library function, because background processes cannot access PGA of another session. In other words, one must wait for flashback information to be written. When doing a direct load with NOLOGGING clause in effect, one avoids waiting for the redo to be written to disk but it doesn't avoid flashback.
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.12.19.46.20@gmail.com...
| On Sat, 12 May 2012 19:49:36 +0100, Jonathan Lewis wrote:
|
| > "Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
| > news:pan.2012.05.12.18.08.31@gmail.com...
| > | On Sat, 12 May 2012 07:26:46 +0100, Jonathan Lewis wrote:
| > |
| > | > What's the full DDL for the table (use dbms_metadata.get_ddl) - are
| > the | > LOBs nocache nologging, for example, and can this affect the
| > flashback | > rate.
| > |
| > | Jonathan, no need for DBMS_METADATA. I have the DDL:
| > |
| >
| > When puzzled I always look at the output from dbms_metadata in case
| > something I had forgotten about has sneaked into the definition.
| >
| > LOB ("CONTENT") STORE AS BASICFILE (
| > | TABLESPACE "TEST_LOB" DISABLE STORAGE IN ROW CHUNK 8192 RETENTION |
| > NOCACHE NOLOGGING | STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1
| > MAXEXTENTS 2147483645 | PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 |
| > BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) ;
| > |
| >
| > As a test, create the LOB as CACHE.
| >
| > Did you check the session stats to see what your reads were ?
|
| The plot thickens. I re-created the LOB as cache and flashback waits went
| away:
|
| EVENT TIME_WAITED AVERAGE_WAIT
| ------------------------------ ----------- ------------
| Disk file operations I/O 0 .02
| log buffer space 0 0
| log file sync 1 .77
| SQL*Net message to client 1 0
| log file switch completion 23 5.77
| SQL*Net message from client 523 1.02
| SQL*Net more data from client 1323 .01
| db file sequential read 33915 .14
|
|
| The DDL is now this:
|
|
| CREATE TABLE "SCOTT"."BIN_FILES"
| ( "FILENO" NUMBER(4,0),
| "FILENAME" VARCHAR2(256),
| "CONTENT" BLOB
| ) SEGMENT CREATION IMMEDIATE
| PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
| NOCOMPRESS LOGGING
| STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
| PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
| BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
| TABLESPACE "TEST_TBS"
| LOB ("CONTENT") STORE AS BASICFILE "MUSIC"(
| TABLESPACE "TEST_LOB" DISABLE STORAGE IN ROW CHUNK 32768 RETENTION
| CACHE
| STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
| PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
| BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) ;
|
| It seems that flashback operations are synchronous only for the "direct"
| operations, which bypass the SGA. When I come to think of that, it is a
| logical behavior of a sort. Also, Oracle prevents me from specifying
| "NOLOGGING" with CACHE, which is sort of strange. The script now
| completes in less than 9 minutes, which is excellent, after having
| started with 20 minutes.
|
| mgogala@medo tmp]$ time ./test_lob
|
| real 8m59.722s
| user 0m6.372s
| sys 0m7.414s
|
| Here is the entire script, if someone else wants to get involved. One
| only needs to modify the directory and the maximum LOB size in the
| connection (LongReadLen) if the files to be loaded can exceed 10M in
| size.
|
| #!/usr/bin/env perl
| use warnings;
| use strict;
| use DBI;
| use DBD::Oracle qw(:ora_types);
|
| my $dbh = db_connect("scott", "tiger");
| $dbh->do("alter session set events='10046 trace name context forever,
| level 12'");
| my @files = glob("/home/mgogala/mp3/misc/*");
| my $cnt = 0;
| my $ins = "insert into bin_files values(?,?,?)";
| my $sth=$dbh->prepare($ins);
|
| foreach my $f (@files) {
| my $cont=slurp($f);
| $sth->bind_param(1,$cnt++);
| $sth->bind_param(2,$f);
| $sth->bind_param(3,$cont,{ora_type=>SQLT_BIN});
| $sth->execute();
| }
|
| $dbh->commit();
|
| END {
| $dbh->disconnect() if defined($dbh);
| }
|
|
| sub db_connect {
| my ( $username, $passwd, $db ) = ( @_, $ENV{"TWO_TASK"} );
| if ( $username =~ m|/| ) {
| my @tmp = split( /[\/\@]+/, $username );
| $username = $tmp[0];
| $passwd = $tmp[1];
| $db = $tmp[2] || $ENV{"TWO_TASK"} ";
| }
|
| $db ||= "";
| my $dbh = DBI->connect( "dbi:Oracle:$db", $username, $passwd )
| || die( $DBI::errstr . " \n " );
| $dbh->{AutoCommit} = 0;
| $dbh->{RaiseError} = 1;
| $dbh->{ora_check_sql} = 0;
| $dbh->{RowCacheSize} = 1024;
| $dbh->{ora_array_chunk_size} = 1024;
| $dbh->{LongReadLen} = 10000000;
| return ($dbh);
| }
|
| sub slurp {
| my $name=shift;
| local $/=undef;
| my $fd;
| open($fd," < ",$name) or die(" cannot open $name for reading : $! \n
| ");
| my $cont=<$fd>;
| close($fd);
| return($cont);
| }
|
|
| -- | http://mgogala.byethost5.com
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.12.19.03.30@gmail.com...
| On Sat, 12 May 2012 19:49:36 +0100, Jonathan Lewis wrote:
|
|
| > Did you check the session stats to see what your reads were ?
|
| I turned on 10046 tracing. Reads were from the TEST_LOB tablespace, the
| LOB segmentL
|
|
The 10046 trace tells us what they were, not why they happened.
It's important to see (for example) things like "undo blocks read for flashback new".
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.12.19.46.20@gmail.com...
| On Sat, 12 May 2012 19:49:36 +0100, Jonathan Lewis wrote:
|
|
| The plot thickens. I re-created the LOB as cache and flashback waits went
| away:
|
| EVENT TIME_WAITED AVERAGE_WAIT
| ------------------------------ ----------- ------------
| Disk file operations I/O 0 .02
| log buffer space 0 0
| log file sync 1 .77
| SQL*Net message to client 1 0
| log file switch completion 23 5.77
| SQL*Net message from client 523 1.02
| SQL*Net more data from client 1323 .01
| db file sequential read 33915 .14
|
I'm not entirely surprised. The algorithm for when a block should be written to the flashback log relates to when it was last changed, when it last went into the flashback, and when it was last written. If your LOBs are direct path reads and writes all the time it's possible that the default action is to write the block into the flashback log because the information that is needed to minimise flashback logging doesn't get created on direct path actions.
When you can get LOBs over-written, though (i.e. LOB space re-used rather than constantly "newing" previously unused block), I think you may start to see flashback log writes re-appearing
Sorry about the empty previous post - finger trouble.
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.12.21.10.26@gmail.com...
|
| Flashback logs, just like their redo counterparts, record change vectors,
| in order to be able to reconstruct the data, up to specific point in
| time. The default is 24 hours, controlled by the
| "db_flashback_retention_target" parameter.
|
Flashback logs hold entire Oracle blocks (think Oracle 5 and the BI file).
However they don't (normally) hold a copy of every version of the block that has every existed; you should get something like one copy per flashback log file to cover all the changes that have happened to a block since it was most recently loaded into the buffer cache (compare redo logging full blocks when tablespaces are in backup mode)
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.13.07.43.08@gmail.com...
| On Sun, 13 May 2012 07:54:15 +0100, Jonathan Lewis wrote:
|
| > Flashback logs hold entire Oracle blocks (think Oracle 5 and the BI
| > file).
|
| Wow! Not that I don't believe you, but that is really wild. I'll try
| checking it by doing a hex dump of some .fb files and confirm this.
|
On Sun, 13 May 2012 09:56:11 +0100, Jonathan Lewis wrote:
> | Wow! Not that I don't believe you, but that is really wild. I'll try |
> checking it by doing a hex dump of some .fb files and confirm this. |
> I wouldn't want you to do otherwise.
And you are right. Current record: block image. I still don't understand why did they do it like that.
Trace file /oracle/diag/rdbms/o11/O11/trace/O11_ora_2567.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.2.0/11.2.0.3
System name: Linux
Node name: medo.home.com
Release: 3.3.5-2.fc16.x86_64
Version: #1 SMP Tue May 8 11:24:50 UTC 2012
Machine: x86_64
Instance name: O11
Redo thread mounted by this instance: 1
Oracle process number: 28
Unix process pid: 2567, image: ora...@medo.home.com
"Mladen Gogala" <gogala.mla...@gmail.com> wrote in message
news:pan.2012.05.14.03.40.04@gmail.com...
| On Sun, 13 May 2012 09:56:11 +0100, Jonathan Lewis wrote:
|
| > | Wow! Not that I don't believe you, but that is really wild. I'll try |
| > checking it by doing a hex dump of some .fb files and confirm this. |
| >
| > I wouldn't want you to do otherwise.
|
| And you are right. Current record: block image. I still don't understand
| why did they do it like that.
|