500 million rows every (8 hour) day

Richard Last

unread,

Nov 23, 2009, 9:48:53 PM11/23/09

to

I've been asked to investigate how to best build an Oracle DB that
will be able to handle the insertion of 500 million rows every 8 hour
work day. The data will be batch inserted via sql loader every few
seconds, into two main tables, distributed about 8:1. Each table is
about 30 columns wide. Around 6 months of data will be retained.

At the end of the day, a few stored procedures will have to be run
that produce 4 or 5 reports within a few minutes. Each of the reports
will contain about a million rows. Afterward, about 50 other reports
will be run. All of the reports will run against the data inserted
during the day, with the exception of maybe .1% of the data which will
reference data from a previous day.

I fully recognise i'm not providing sufficient detail for anyone to
provide very accurate recommendations. What I'm curious to know if
it's too tall an order for Oracle, or if it will require hadoop, or
similar technology. I've done quite a bit of SQL Server work, but not
nearly to this scale.

thx

Jack

unread,

Nov 24, 2009, 2:21:16 AM11/24/09

to

"Richard Last" <richa...@yahoo.com> wrote in message
news:884bd094-e613-47fe...@e7g2000vbi.googlegroups.com...

You are in rigth path. Go ahead, just find rigth questions.
Maybe you would use partitioned table.
http://www.psoug.org/reference/partitions.html

Arne Ortlinghaus

unread,

Nov 24, 2009, 3:27:47 AM11/24/09

to

Surely it will depend much on quick hard drives especially for writing and
enough main memory for the SGA to hold the indexes needed and much of the
data of one day.

Arne Ortlinghaus
ACS Data Systems

Jack

unread,

Nov 24, 2009, 3:33:10 AM11/24/09

to

"Richard Last" <richa...@yahoo.com> wrote in message
news:884bd094-e613-47fe...@e7g2000vbi.googlegroups.com...

Some relevant questions:
Data amount in TB?
Which hardware brand you wish to use?
Contact oracle and ask "Oracle Optimized Warehouse"

You will get that optimized system running, delivered and cost efective very
fast.

Have fun!

Richard Last

unread,

Nov 24, 2009, 8:55:01 AM11/24/09

to

On Nov 24, 3:27 am, "Arne Ortlinghaus" <Arne.Ortlingh...@acs.it>
wrote:

What has been mentioned by others... Oracle RAC, 96GB RAM, several
hundred TB of SAN storage. The hardware budget is bigger than the GDP
of some countries!!!!!

Shakespeare

unread,

Nov 24, 2009, 10:02:19 AM11/24/09

to

Richard Last schreef:

If the budget is high enough, I'd consider a full rack Exadata Server.

Shakespeare

Mark D Powell

unread,

Nov 24, 2009, 10:25:07 AM11/24/09

to

Richard, I think the answer depends on what and how the data will be
used for. That is how will the data be queried? Will it permanently
reside in the database or is the database just a holding point till
the data is filtered and transferred to its pemanent home? Is the
initial insert done to the data's final store within the database or
is it moved within the database?

I am not a big fan of using sqlldr direct path loads in a production
environment. If a direct path load job fails then the indexes become
unusable and require rebuilding. Having to rebuild large indexes on a
massive single point of insertion table pretty much brings the system
to a halt. If there will be multiple, concurrent sources of data
intput I would insist on using convention load or using programs
written to use bulk inserts or the older array insert feature of pro*
languages.

HTH -- Mark D Powell --

hpuxrac

unread,

Nov 24, 2009, 7:31:40 PM11/24/09

to

On Nov 24, 10:25 am, Mark D Powell <Mark.Powe...@hp.com> wrote:

snip

> Richard, I think the answer depends on what and how the data will be
> used for. That is how will the data be queried? Will it permanently
> reside in the database or is the database just a holding point till
> the data is filtered and transferred to its pemanent home? Is the
> initial insert done to the data's final store within the database or
> is it moved within the database?
>
> I am not a big fan of using sqlldr direct path loads in a production
> environment. If a direct path load job fails then the indexes become
> unusable and require rebuilding. Having to rebuild large indexes on a
> massive single point of insertion table pretty much brings the system
> to a halt. If there will be multiple, concurrent sources of data
> intput I would insist on using convention load or using programs
> written to use bulk inserts or the older array insert feature of pro*
> languages.

What the OP seemed to describe is a daily load cycle and if I
understood it correctly I would seriously think about using some kind
of daily staging table that accumulates the input. It possibly could
have no real useful indexes on it ... just have data dumped into it
( depending on what and where validation is needed ) ... then near the
end of the cycle use it to go elsewhere.

Partitioning might be a possibility. Inserting 500 million rows over
8 hours is pretty trivial on even most modest servers.

Designing properly the storage structures to handle 6 months of this
stuff then ditch it ( perhaps ) in some fashion or move it out
elsewhere to me sounds like where the business rules would really need
to be understood and clarified.

Carlos

unread,

Nov 25, 2009, 3:13:48 AM11/25/09

to

On Nov 24, 4:02 pm, Shakespeare <what...@xs4all.nl> wrote:
>
> If the budget is high enough, I'd consider a full rack Exadata Server.
>
> Shakespeare

Or even better, move to Teradata ;-)

Cheers.

Carlos.

Noons

unread,

Nov 25, 2009, 6:53:14 AM11/25/09

to

Richard Last wrote,on my timestamp of 24/11/2009 1:48 PM:

>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations. What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology. I've done quite a bit of SQL Server work, but not
> nearly to this scale.

Here are my not very accurate recommendations, for the reasons you pointed out.

Partition by day. Make the day's tables separate, manipulate them for the daily
reporting as needed. Add them to partitioned tables at end of day and run
reports across days then. At end of 6 months, drop partitions off the start of
the table.
Make sure no one comes up with some nonsense of storing LOBs at those volumes!

No matter what database technology is used, you'll need some serious disk
capacity. As well as memory and CPU. Look at compression at database layer.
You're definitely looking at 11gr1 or even r2: some improvements in partitioning
in those releases that would definitely help.

At these sort of volumes, the secret is always the same: storage architecture
must avoid having to traverse data not immediately needed. Whatever the db.

zifn...@gmail.com

unread,

Nov 25, 2009, 7:45:09 AM11/25/09

to

In a previous life we did the the same type of load. We were getting
about the same number of
rows in a 6 hour period.

We partitioned by day and subpartitioned by 'loader'. We had 12
programs loading data and they each had a number associated with them
that made up the subpartioning key. Obviously that didn't help with
queries but the queries were generally over an entire day anyway.
These programs were not using any type of direct path loading they
were just batching up a couple of hundred rows and inserting them.

Kai Rode

unread,

Nov 25, 2009, 1:14:04 PM11/25/09

to

>> If the budget is high enough, I'd consider a full rack Exadata Server.

>Or even better, move to Teradata ;-)

But...Exa is a million times better than Tera! ;-)

I wish I had such a beast of a database server under my control
though...heck even the minimal configuration of an Oracle Database Machine
is far more powerful than most customers' setups.

Shakespeare

unread,

Nov 25, 2009, 3:57:06 PM11/25/09

to

Kai Rode schreef:

And by far more expensive too.... still would love to have a full
rack... If I were half as rich as Larry, I'd buy one.

Shakespeare
(What's in a buck?)

Jack

unread,

Nov 26, 2009, 2:25:22 AM11/26/09

to

"scott...@comcast.net" <zifn...@gmail.com> wrote in message
news:83513236-dd64-4273...@m16g2000yqc.googlegroups.com...

-------------------
Nowadays it would be couple of (ten) thousands rows.

Advanced compression will save some TB of storage
http://www.oracle.com/database/advanced-compression.html
Also disk I/O will be faster

Noons

unread,

Nov 28, 2009, 7:57:49 AM11/28/09

to

Jack wrote,on my timestamp of 26/11/2009 6:25 PM:

> Advanced compression will save some TB of storage
> http://www.oracle.com/database/advanced-compression.html
> Also disk I/O will be faster

Not really: compression does not change the speed of disk I/O one atom.
It just makes less use of it.