Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

perl and mysql: slow inserts with innodb

7 views
Skip to first unread message

rick...@gmail.com

unread,
Oct 4, 2006, 11:37:50 AM10/4/06
to
i'm trying to insert 500k rows from a text file into a innodb table and
it's taking a little over 4 hours.

would someone tell me what is wrong with this code? would someone show
me how to do a multi-value input in perl (e.g. 10k rows with single
insert statment)?

thanks,

-ricky

my $select_handle =
$dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
agent_shortname = ? AND RDEOUTPUT_rde_id = ?');
my $insert_handle =
$dbh->prepare('INSERT INTO actual VALUES
(DEFAULT,?,?,?,?,?,?,?,?,?)');

while (<$GTRUTH>) {
chomp;
my @gtruth = split;

# get FK: AGENTS_agent_id
$select_handle->execute($gtruth[3],$rde_id);
my ($AGENTS_agent_id, $agent_fullname) = $select_handle->fetchrow;

$insert_handle->execute(
$AGENTS_agent_id, # FK from agents
$rde_id, # FK from rdeoutput
$gtruth_line[0], # ground truth time
0, # target is DEAD
$gtruth_line[7], # move status
$gtruth_line[8] # action
);
}

Tad McClellan

unread,
Oct 4, 2006, 2:04:32 PM10/4/06
to
rick...@gmail.com <rick...@gmail.com> wrote:
> i'm trying to insert 500k rows from a text file into a innodb table and
> it's taking a little over 4 hours.

> my $select_handle =


> $dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
> agent_shortname = ? AND RDEOUTPUT_rde_id = ?');


Does your DB have an index on the agent_shortname and RDEOUTPUT_rde_id columns?


--
Tad McClellan SGML consulting
ta...@augustmail.com Perl programming
Fort Worth, Texas

rick...@gmail.com

unread,
Oct 4, 2006, 3:28:10 PM10/4/06
to
hi tad,

yes, the table has an index for both columns.

rick...@gmail.com

unread,
Oct 4, 2006, 3:33:39 PM10/4/06
to
hi tad,

yes, the table has an index for both columns.

xho...@gmail.com

unread,
Oct 4, 2006, 3:53:27 PM10/4/06
to
rick...@gmail.com wrote:
> i'm trying to insert 500k rows from a text file into a innodb table and
> it's taking a little over 4 hours.

How is the CPU usage divided between mysql and perl during that time?
Where/how are you committing?

>
> would someone tell me what is wrong with this code?

There isn't necessarily anything wrong with your code. Some things
are just slow.

> would someone show
> me how to do a multi-value input in perl (e.g. 10k rows with single
> insert statment)?

Do you know that mysql syntax for doing that? What part of that are you
having trouble with? This is probably one place where I would not use
placeholders, but rather use $dbh->quote($data). If you want to do it in
one command, you could use (from right to left) a map to escape and quote
the data, a join to put commas in, a map to wrap those in paranthesis, and
a join to put commas between them. And with a
while (my @chunk = splice @data,0,10_000) {
around all of it to break it into chunks of 10k.

> my $select_handle =
> $dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
> agent_shortname = ? AND RDEOUTPUT_rde_id = ?');
> my $insert_handle =
> $dbh->prepare('INSERT INTO actual VALUES
> (DEFAULT,?,?,?,?,?,?,?,?,?)');
>
> while (<$GTRUTH>) {
> chomp;
> my @gtruth = split;
>
> # get FK: AGENTS_agent_id
> $select_handle->execute($gtruth[3],$rde_id);
> my ($AGENTS_agent_id, $agent_fullname) =
> $select_handle->fetchrow;
>
> $insert_handle->execute(
> $AGENTS_agent_id, # FK from agents
> $rde_id, # FK from
> rdeoutput $gtruth_line[0], #
> ground truth time 0,
> # target is DEAD $gtruth_line[7],
> # move status $gtruth_line[8] #
> action );

How long does it take if you comment out the $insert_handle->execute and
only do the $select_handle part? How about just printing the formatted
data into a text file (rather than inserting it) then using mysql LOAD DATA
command?

> }

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

rick...@gmail.com

unread,
Oct 4, 2006, 4:04:36 PM10/4/06
to
> How is the CPU usage divided between mysql and perl during that time?
> Where/how are you committing?

almost no CPU usage and small memory footprint (6MB). the database is
on a computer in another city, but we have T1 lines.

> There isn't necessarily anything wrong with your code. Some things
> are just slow.

i'm starting to figure this out.

> How long does it take if you comment out the $insert_handle->execute and
> only do the $select_handle part? How about just printing the formatted

the select handle roughly halves the run time.

> data into a text file (rather than inserting it) then using mysql LOAD DATA
> command?

this is finally what i resorted to. i can parse the files to CSV text
documents in < 1 min. using LOAD DATA i can fill the table in 40
seconds. i would really like to know why my code is ~400 time slower
than LOAD DATA.

Brian Wakem

unread,
Oct 4, 2006, 4:20:03 PM10/4/06
to
rick...@gmail.com wrote:

>> How is the CPU usage divided between mysql and perl during that time?
>> Where/how are you committing?
>
> almost no CPU usage and small memory footprint (6MB). the database is
> on a computer in another city, but we have T1 lines.


There's your bottleneck. I'd say 500k rows in 4 hours (34.7/secs) is
acceptable under the circumstances. Even on a T1 it will be vastly slower
than if running on a local network or the same machine.

If you can run the script on the same machine as the db I expect you will
cut the 4hrs to nearer 4 minutes.

--
Brian Wakem
Email: http://homepage.ntlworld.com/b.wakem/myemail.png

xho...@gmail.com

unread,
Oct 4, 2006, 4:46:11 PM10/4/06
to
rick...@gmail.com wrote:
> > How is the CPU usage divided between mysql and perl during that time?
> > Where/how are you committing?
>
> almost no CPU usage and small memory footprint (6MB). the database is
> on a computer in another city, but we have T1 lines.

You are probably limited by latency, not by throughput. A wide area T1
line may have high throughput, but I would guess the latency is still quite
poor.

Can you just ship the file over to the remote computer and run your
script over there?

> > There isn't necessarily anything wrong with your code. Some things
> > are just slow.
>
> i'm starting to figure this out.
>
> > How long does it take if you comment out the $insert_handle->execute
> > and only do the $select_handle part? How about just printing the
> > formatted
>
> the select handle roughly halves the run time.

So even if you used multi-valued insert statements, it could only halve
the time. Can you just pull the entire selected table into a look-up
hash rather than making a round-trip each time?

> > data into a text file (rather than inserting it) then using mysql LOAD
> > DATA command?
>
> this is finally what i resorted to. i can parse the files to CSV text
> documents in < 1 min. using LOAD DATA i can fill the table in 40
> seconds. i would really like to know why my code is ~400 time slower
> than LOAD DATA.

It's just the way things are. When you do it row by row, you send a small
message to the server and can't move on to the next one until you receive
the mysql-level acknowledgement/results for the last one. With LOAD DATA
you can send big messages and send many at once without waiting for
acknowledgements.

rick...@gmail.com

unread,
Oct 4, 2006, 5:38:33 PM10/4/06
to
Everyone, thank you for your comments and suggestions. I ended up
storing foreign keys into hashes and then writing out my data to a CSV
file. Afterward LOAD DATA filled my tables in < 1 minute. Needless to
say we're happy (although I'm not to sure about the aesthetics of the
solution).

-ricky

0 new messages