Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Performance Issues for Huge Data import/insert

10 views
Skip to first unread message

Permood

unread,
Jun 23, 2005, 1:54:01 PM6/23/05
to
Hi Experts,
I need to insert 60 millions records or 6GB Fixed width text File into SQL
Server 2000. There are some problems with data, like date columns are in 6
char (mmddyy format) which needs to be convert into mm/dd/yyyy format and
etc.

Currently we have SQL scripts which import all data into ONE COLUMN Table
using bulk Insert statement and then from there we use SUBSTRING Function to
sort data into our main tables. Now we are developing VB.NET application
which will insert data directly into main Tables.

Performance is the main concern now, because VB.Net application taken twice
the amount time as SQL Script used to take.

I have following questions:-

1- Is there any way I can stop Transaction log (we are using simple
recovery) so data could be inserted more rapidly?
2- What should be the idea size of Tempdb, data and log file so its don’t
have to grow over and over during the insert? (for 30 GB total data needs to
insert)
3- Can I use DTS package and combine Transform and bulk insert operation
together in one step instead of two steps?
4- Is there any thing else I can do to reduce the insert or import time?
(currently its taking 90 minutes when we use SQL Scripts for 6 GB file and
200 minutes when we use VB.NET application)
5- What will be the ideal time for such kind of huge import?


Thank you for your help.

Regards,
-Permood

JT

unread,
Jun 23, 2005, 2:11:55 PM6/23/05
to
When importing large amounts of records, consider setting the database
"recovery model" setting to "simple". If this is also a OLTP database, then
reset it back to "full" when you are finished. I don't know what you mean by
using a VB.NET application to insert the data directly. Insering record one
at a time or even in batch mode via datasets is going to be slow. Stick with
BCP for the best performance, or DTS if you want to perform transformations
as you import. Another important consideration that many forget is to drop
indexes from the target table before the process begins and then re-create
them afterward. This will cut the amount of CPU and IO time needed to update
indexes during bulk inserts and also help minimize transaction logging. It
could also help to alter the database to RESTRICTED_USER mode (only DBO may
login) and back to MULTI_USER when finished.

"Permood" <Per...@discussions.microsoft.com> wrote in message
news:9A78306D-2C7F-46B2...@microsoft.com...

Alejandro Mesa

unread,
Jun 23, 2005, 2:23:02 PM6/23/05
to
> I need to insert 60 millions records or 6GB Fixed width text File into SQL
> Server 2000. There are some problems with data, like date columns are in 6
> char (mmddyy format) which needs to be convert into mm/dd/yyyy format and
> etc.

Better to use yyyymmdd (see convert in BOL, style 112), and sql server will
understand them no matter the settings of "SET DATEFORMAT" and language
currently being used.

> I need to insert 60 millions records or 6GB Fixed width text File into SQL

Why are you importing the data into just one column?, you can use DTS to
load fixed width file format an map the data into diff columns.

> 1- Is there any way I can stop Transaction log (we are using simple
> recovery) so data could be inserted more rapidly?

No matter if you use DTS, BCP or BULK INSERT to import the data, the idea is
to do it in batches and not in one huge batch. For example, if you are using
BULK INSERT then take a look to read about ROWS_PER_BATCH argument.

> 3- Can I use DTS package and combine Transform and bulk insert operation
> together in one step instead of two steps?

If you follow this approach then it will be like processing the batch one
row at a time (not exactly but similar, I think). If you upload the data into
an Intermediate table, then you can use set-based operations to route the
data to the corresponding tables. Anyway you should test both approach two
see which one give better performance.


AMB

Permood

unread,
Jun 23, 2005, 2:41:08 PM6/23/05
to
JT & Alejandro Mesa,
Thanks for your remarks.
We are you using simple recovery model. These databases are use for analysis
so i no need full recovery model. and we destray this data every month. also
i don't have any index(s) on these tables when i am importing the data. I
meant to say by using VB.NET word that, VB.NET use ADO.NET 2.0 SQLBulkCopy
method with 100,000 batch size. I tried 10,000 to 100,000 batch sizes.
still this import process is slow (as believe so). Even server machine have
4 GB RAM. the only thing which I didn't try is DTS package, one reason i
didn't try DTS package yet, SQL Server 2000 DTS packages are not comptiable
with SQL Server 2005 DTS packages, and I was thinking if down the road we
move to SQL Server 2005 i have to reCreate all DTS packages, which is more
then 60 of them (currently SQL Scripts).
How about the tempdb Size and database's LDF and MDF file size, would
increase in the size will help to get better performance.

Is there any thing else i can do to make this process faster?

once again thanks for your time and help.

JT

unread,
Jun 23, 2005, 3:14:33 PM6/23/05
to
If you import 6 GB over a one hour period, and the log grows by 100mb
increments, then that is about 1 log extention every minute. Therefore,
check the autogrow properties of your data and trasactions files and insure
they are growing by a percentage of at least 10% or 1000 mb (1 GB). You can
always truncate the logs after the import process is over. This reduces the
number of times that SQL Server has to re-adjust the size while importing.
It also helps if the physical location of the transaction logs on a seperate
HD from the data files, becuase it allows the hardware devices to operate
independently.
You could also try extracting the text files into several smaller files
instead of one large file and then have 2 bulk copy processes running
simultaneously. I have found that 2 concurrent BCP processes yield the best
performance than 1 or even > 2, becuase BCP alternates between reading a
batch from the source file and inserting to the target table, leaving idle
bandwidth for another process. With more than 2 processes, the 3rd seems to
always be blocked by the first 2. However, running multiple BCP processes
will bring the server to a crawl for any other users.


"Permood" <Per...@discussions.microsoft.com> wrote in message

news:6AF25AFC-8C20-477D...@microsoft.com...

0 new messages