Google Groups Home
Help | Sign in
Performance Issues for Huge Data import/insert
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Permood  
View profile  
 More options Jun 23 2005, 1:54 pm
Newsgroups: microsoft.public.sqlserver.programming
From: "Permood" <Perm...@discussions.microsoft.com>
Date: Thu, 23 Jun 2005 10:54:01 -0700
Local: Thurs, Jun 23 2005 1:54 pm
Subject: Performance Issues for Huge Data import/insert
Hi Experts,
I need to insert 60 millions records or 6GB Fixed width text File into SQL
Server 2000. There are some problems with data, like date columns are in 6
char (mmddyy format) which needs to be convert into mm/dd/yyyy format and
etc.

Currently we have SQL scripts which import all data into ONE COLUMN Table
using bulk Insert statement and then from there we use SUBSTRING Function to
sort data into our main tables. Now we are developing VB.NET application
which will insert data directly into main Tables.

Performance is the main concern now, because VB.Net application taken twice
the amount time as SQL Script used to take.

I have following questions:-

1-      Is there any way I can stop Transaction log (we are using simple
recovery) so data could be inserted more rapidly?
2-      What should be the idea size of Tempdb, data and log file so its don’t
have to grow over and over during the insert? (for 30 GB total data needs to
insert)
3-      Can I use DTS package and combine Transform and bulk insert operation
together in one step instead of two steps?
4-      Is there any thing else I can do to reduce the insert or import time?
(currently its taking 90 minutes when we use SQL Scripts for 6 GB file and
200 minutes when we use VB.NET application)
5-      What will be the ideal time for such kind of huge import?

Thank you for your help.

Regards,
-Permood


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JT  
View profile  
 More options Jun 23 2005, 2:11 pm
Newsgroups: microsoft.public.sqlserver.programming
From: "JT" <some...@microsoft.com>
Date: Thu, 23 Jun 2005 14:11:55 -0400
Local: Thurs, Jun 23 2005 2:11 pm
Subject: Re: Performance Issues for Huge Data import/insert
When importing large amounts of records, consider setting the database
"recovery model" setting to "simple". If this is also a OLTP database, then
reset it back to "full" when you are finished. I don't know what you mean by
using a VB.NET application to insert the data directly. Insering record one
at a time or even in batch mode via datasets is going to be slow. Stick with
BCP for the best performance, or DTS if you want to perform transformations
as you import. Another important consideration that many forget is to drop
indexes from the target table before the process begins and then re-create
them afterward. This will cut the amount of CPU and IO time needed to update
indexes during bulk inserts and also help minimize transaction logging. It
could also help to alter the database to RESTRICTED_USER mode (only DBO may
login) and back to MULTI_USER when finished.

"Permood" <Perm...@discussions.microsoft.com> wrote in message

news:9A78306D-2C7F-46B2-9DF7-0AB8BC891412@microsoft.com...


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alejandro Mesa  
View profile  
 More options Jun 23 2005, 2:23 pm
Newsgroups: microsoft.public.sqlserver.programming
From: "Alejandro Mesa" <AlejandroM...@discussions.microsoft.com>
Date: Thu, 23 Jun 2005 11:23:02 -0700
Local: Thurs, Jun 23 2005 2:23 pm
Subject: RE: Performance Issues for Huge Data import/insert

> I need to insert 60 millions records or 6GB Fixed width text File into SQL
> Server 2000. There are some problems with data, like date columns are in 6
> char (mmddyy format) which needs to be convert into mm/dd/yyyy format and
> etc.

Better to use yyyymmdd (see convert in BOL, style 112), and sql server will
understand them no matter the settings of "SET DATEFORMAT" and language
currently being used.

> I need to insert 60 millions records or 6GB Fixed width text File into SQL

Why are you importing the data into just one column?, you can use DTS to
load fixed width file format an map the data into diff columns.

> 1- Is there any way I can stop Transaction log (we are using simple
> recovery) so data could be inserted more rapidly?

No matter if you use DTS, BCP or BULK INSERT to import the data, the idea is
to do it in batches and not in one huge batch. For example, if you are using
BULK INSERT then take a look to read about ROWS_PER_BATCH argument.

> 3- Can I use DTS package and combine Transform and bulk insert operation
> together in one step instead of two steps?

If you follow this approach then it will be like processing the batch one
row at a time (not exactly but similar, I think). If you upload the data into
an Intermediate table, then you can use set-based operations to route the
data to the corresponding tables. Anyway you should test both approach two
see which one give better performance.

AMB


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Permood  
View profile  
 More options Jun 23 2005, 2:41 pm
Newsgroups: microsoft.public.sqlserver.programming
From: "Permood" <Perm...@discussions.microsoft.com>
Date: Thu, 23 Jun 2005 11:41:08 -0700
Local: Thurs, Jun 23 2005 2:41 pm
Subject: RE: Performance Issues for Huge Data import/insert
JT & Alejandro Mesa,
Thanks for your remarks.
We are you using simple recovery model. These databases are use for analysis
so i no need full recovery model. and we destray this data every month. also
i don't have any index(s) on these tables when i am importing the data. I
meant to say by using VB.NET word that, VB.NET use ADO.NET 2.0 SQLBulkCopy
method with 100,000 batch size. I tried 10,000 to 100,000 batch sizes.
still this import process is slow (as believe so). Even server machine have
4 GB RAM. the only thing which I didn't try is DTS package, one reason i
didn't try DTS package yet, SQL Server 2000 DTS packages are not comptiable
with SQL Server 2005 DTS packages, and I was thinking if down the road we
move to SQL Server 2005 i have to reCreate all DTS packages, which is more
then 60 of them (currently SQL Scripts).
How about the tempdb Size and database's LDF and MDF file size, would
increase in the size will help to get better performance.

Is there any thing else i can do to make this process faster?

once again thanks for your time and help.  


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JT  
View profile  
 More options Jun 23 2005, 3:14 pm
Newsgroups: microsoft.public.sqlserver.programming
From: "JT" <some...@microsoft.com>
Date: Thu, 23 Jun 2005 15:14:33 -0400
Local: Thurs, Jun 23 2005 3:14 pm
Subject: Re: Performance Issues for Huge Data import/insert
    If you import 6 GB over a one hour period, and the log grows by 100mb
increments, then that is about 1 log extention every minute. Therefore,
check the autogrow properties of your data and trasactions files and insure
they are growing by a percentage of at least 10% or 1000 mb (1 GB). You can
always truncate the logs after the import process is over. This reduces the
number of times that SQL Server has to re-adjust the size while importing.
It also helps if the physical location of the transaction logs on a seperate
HD from the data files, becuase it allows the hardware devices to operate
independently.
    You could also try extracting the text files into several smaller files
instead of one large file and then have 2 bulk copy processes running
simultaneously. I have found that 2 concurrent BCP processes yield the best
performance than 1 or even > 2, becuase BCP alternates between reading a
batch from the source file and inserting to the target table, leaving idle
bandwidth for another process. With more than 2 processes, the 3rd seems to
always be blocked by the first 2. However, running multiple BCP processes
will bring the server to a crawl for any other users.

"Permood" <Perm...@discussions.microsoft.com> wrote in message

news:6AF25AFC-8C20-477D-B88D-408BF5183E9E@microsoft.com...


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google