At the end of the day, a few stored procedures will have to be run
that produce 4 or 5 reports within a few minutes. Each of the reports
will contain about a million rows. Afterward, about 50 other reports
will be run. All of the reports will run against the data inserted
during the day, with the exception of maybe .1% of the data which will
reference data from a previous day.
I fully recognise i'm not providing sufficient detail for anyone to
provide very accurate recommendations. What I'm curious to know if
it's too tall an order for Oracle, or if it will require hadoop, or
similar technology. I've done quite a bit of SQL Server work, but not
nearly to this scale.
thx
You are in rigth path. Go ahead, just find rigth questions.
Maybe you would use partitioned table.
http://www.psoug.org/reference/partitions.html
Arne Ortlinghaus
ACS Data Systems
Some relevant questions:
Data amount in TB?
Which hardware brand you wish to use?
Contact oracle and ask "Oracle Optimized Warehouse"
You will get that optimized system running, delivered and cost efective very
fast.
Have fun!
What has been mentioned by others... Oracle RAC, 96GB RAM, several
hundred TB of SAN storage. The hardware budget is bigger than the GDP
of some countries!!!!!
If the budget is high enough, I'd consider a full rack Exadata Server.
Shakespeare
Richard, I think the answer depends on what and how the data will be
used for. That is how will the data be queried? Will it permanently
reside in the database or is the database just a holding point till
the data is filtered and transferred to its pemanent home? Is the
initial insert done to the data's final store within the database or
is it moved within the database?
I am not a big fan of using sqlldr direct path loads in a production
environment. If a direct path load job fails then the indexes become
unusable and require rebuilding. Having to rebuild large indexes on a
massive single point of insertion table pretty much brings the system
to a halt. If there will be multiple, concurrent sources of data
intput I would insist on using convention load or using programs
written to use bulk inserts or the older array insert feature of pro*
languages.
HTH -- Mark D Powell --
snip
> Richard, I think the answer depends on what and how the data will be
> used for. That is how will the data be queried? Will it permanently
> reside in the database or is the database just a holding point till
> the data is filtered and transferred to its pemanent home? Is the
> initial insert done to the data's final store within the database or
> is it moved within the database?
>
> I am not a big fan of using sqlldr direct path loads in a production
> environment. If a direct path load job fails then the indexes become
> unusable and require rebuilding. Having to rebuild large indexes on a
> massive single point of insertion table pretty much brings the system
> to a halt. If there will be multiple, concurrent sources of data
> intput I would insist on using convention load or using programs
> written to use bulk inserts or the older array insert feature of pro*
> languages.
What the OP seemed to describe is a daily load cycle and if I
understood it correctly I would seriously think about using some kind
of daily staging table that accumulates the input. It possibly could
have no real useful indexes on it ... just have data dumped into it
( depending on what and where validation is needed ) ... then near the
end of the cycle use it to go elsewhere.
Partitioning might be a possibility. Inserting 500 million rows over
8 hours is pretty trivial on even most modest servers.
Designing properly the storage structures to handle 6 months of this
stuff then ditch it ( perhaps ) in some fashion or move it out
elsewhere to me sounds like where the business rules would really need
to be understood and clarified.
Or even better, move to Teradata ;-)
Cheers.
Carlos.
>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations. What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology. I've done quite a bit of SQL Server work, but not
> nearly to this scale.
Here are my not very accurate recommendations, for the reasons you pointed out.
Partition by day. Make the day's tables separate, manipulate them for the daily
reporting as needed. Add them to partitioned tables at end of day and run
reports across days then. At end of 6 months, drop partitions off the start of
the table.
Make sure no one comes up with some nonsense of storing LOBs at those volumes!
No matter what database technology is used, you'll need some serious disk
capacity. As well as memory and CPU. Look at compression at database layer.
You're definitely looking at 11gr1 or even r2: some improvements in partitioning
in those releases that would definitely help.
At these sort of volumes, the secret is always the same: storage architecture
must avoid having to traverse data not immediately needed. Whatever the db.
In a previous life we did the the same type of load. We were getting
about the same number of
rows in a 6 hour period.
We partitioned by day and subpartitioned by 'loader'. We had 12
programs loading data and they each had a number associated with them
that made up the subpartioning key. Obviously that didn't help with
queries but the queries were generally over an entire day anyway.
These programs were not using any type of direct path loading they
were just batching up a couple of hundred rows and inserting them.
>Or even better, move to Teradata ;-)
But...Exa is a million times better than Tera! ;-)
I wish I had such a beast of a database server under my control
though...heck even the minimal configuration of an Oracle Database Machine
is far more powerful than most customers' setups.
And by far more expensive too.... still would love to have a full
rack... If I were half as rich as Larry, I'd buy one.
Shakespeare
(What's in a buck?)
-------------------
Nowadays it would be couple of (ten) thousands rows.
Advanced compression will save some TB of storage
http://www.oracle.com/database/advanced-compression.html
Also disk I/O will be faster
> Advanced compression will save some TB of storage
> http://www.oracle.com/database/advanced-compression.html
> Also disk I/O will be faster
Not really: compression does not change the speed of disk I/O one atom.
It just makes less use of it.