I have got the following situation please give me some ideas how to
solve/work around it.
Current situation: Everyday day about 10 million records are processed
and bulk inserted in individual tables. Around 20K records are
inserted at a time. At the end of the day this table is clustered
indexed. The field used for indexing is not a primary key. There is no
primary key in this table. After indexing read access is given and
user run reports from the frontend using the clustered index field.
Hope to achieve: Table will be clustered indexed from the start and
bulk insert (10 mill records/day) will be done on this indexed table.
At the same time users will have access to this table to run reports.
Now we all know inserting data in a an indexed table (specially
clustered) is a bad idea. So is it completely impossible? I have seen
the same thing being done in Oracle database. Is it possible in SQL
Server.
From this blog: http://www.sqljunkies.com/WebLog/aferrari/archive/2007/07/02/41931.aspx
it seems that if I dummy sort the data before inserting then I can
overcome this performance issue. But then bulk-insert will be useless
here.
Please help.
Thanks in advance.
Whether it is a bad idea to insert a lot of rows in a clustered index
depends on the key values present in the table, and the values of the
rows to be inserted.
If they intermix, you will get page splits, and this can take
considerable time, and for reporting purposes (assuming a lot of
sequential I/O) will not perform as good as a defragmented table.
If the new data goes at the end, for example if the clustered key is an
order date and only new orders are added, then it is a pure waste of
time to drop the clustered index and add it again after the insert
operation. Depending on your CREATE INDEX settings (specifically the
fill factor), you might actually waste space (and therefore performance)
by recreating the clustered index.
In short: it depends on your data, and it depends whether the processing
time of the insert + index creation is most important, or the
performance of the report generation.
--
Gert-Jan
SQL Server MVP
The sort optimises disk access.
If you've got users using the table at the same time make your batches
small - say 1000 rows. If the sort has done its job there wont be many
dirty pages to flush when the checkpoint is done, but it might be worth
trying different batch sizes to see the effect.