Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Efficient way to delete a large amount of records from a big table

46 views
Skip to first unread message

Ran Margaliot

unread,
Mar 25, 2011, 11:06:43 AM3/25/11
to
Hello,

I want to delete a large amount of records (~200K) from a large table
(~500K records) in my MySql DB.

I want to make this call as efficient as possible because i dont want
the DB to become "unresponsive" while executing the call. I need to
delete records that are "older" than 10 days (according to created_at
column), currently I use:

delete from table_name where created_at < DATE_SUB(CURDATE(),INTERVAL
10 DAY)
The table also have a primary key id if it helps.

Any thoughts?

Jerry Stuckle

unread,
Mar 25, 2011, 11:11:42 AM3/25/11
to

Your primary key would have no effect in a case like this.

First thing to do is to EXPLAIN your query and see if there are any
indexes being used (i.e. is there an index on created_at?).

Also: How often are you running this query? How long does it take to
run currently? What is the age range of the rows you want to delete?


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================

Captain Paralytic

unread,
Mar 25, 2011, 11:32:01 AM3/25/11
to

Depending on what the data in the table looks like you could use an
index on the created_at column. The may (or may not) improve the
efficiency.

One thing to bear in mind is that ensuring that the database does not
become "noticeably" unresponsive is not necessarily dependent on the
efficiency of the query. You could run multiple queries over time with
a LIMIT clause. This would not make the query any more efficient
(indeed you might say that it makes the "process" more in-efficient),
but it would prevent the database being tied up with this one task for
too long.

John Nagle

unread,
Mar 28, 2011, 1:12:36 AM3/28/11
to
Right. created_at should be an indexed field to speed this up.

Also, if you're using InnoDB, and created_at is an indexed
field, the operation which is purging old records generally shouldn't
interfere with operations that are adding or working on new
records.

500K records isn't a large database.

John Nagle

"Álvaro G. Vicario"

unread,
Mar 28, 2011, 3:11:22 AM3/28/11
to

The DELETE statement has LOW_PRIORITY and QUICK switches. They look like
the first approach to test if you are using MyISAM:

«If you specify LOW_PRIORITY, the server delays execution of the DELETE
until no other clients are reading from the table. This affects only
storage engines that use only table-level locking (such as MyISAM,
MEMORY, and MERGE).

For MyISAM tables, if you use the QUICK keyword, the storage engine does
not merge index leaves during delete, which may speed up some kinds of
delete operations.»

http://dev.mysql.com/doc/refman/5.1/en/delete.html

See also http://dev.mysql.com/doc/refman/5.1/en/delete-speed.html


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

Gordon Burditt

unread,
Mar 28, 2011, 9:36:21 PM3/28/11
to
> I want to delete a large amount of records (~200K) from a large table
> (~500K records) in my MySql DB.

For "large" read "small".

> I want to make this call as efficient as possible because i dont want
> the DB to become "unresponsive" while executing the call. I need to

Efficiency may not be the issue here. If the DELETE query keeps the
table locked, it may block queries that modify that table.

I recall having a problem with this on an older system, expiring
old login records. (Perhaps deleting 20 million records out of 250
million daily). Initially, with once-a-day expiration, it locked
up the table for 45 minutes. Not good. As this was a 24x7 operation,
doing it overnight still caused problems. Some caveats on this example:

1) It used 10-year-old hardware.
2) It used MyISAM tables (InnoDB wasn't around or not mature enough then).
InnoDB I believe uses more selective locking.
3) Finding the records to delete wasn't the issue (with an appropriate
index); it seemed that actually deleting the records was taking the time.
4) MySQL has improved a lot since then.
5) Running the query more often, and with LIMIT, divided the lockout
into shorter lockouts that were more tolerable. However, from the
point of view of server load, this was a LOT less efficient, as it
was finding the records 100 times a day instead of once.


DELETE LOW_PRIORITY wouldn't help much, as once it started deleting, it
still held a lock for 45 minutes.

> delete records that are "older" than 10 days (according to created_at
> column), currently I use:
>
> delete from table_name where created_at < DATE_SUB(CURDATE(),INTERVAL
> 10 DAY)

I presume you have an index on created_at.

> The table also have a primary key id if it helps.

I expect the primary key will not help for this query. Actually,
it will hurt (minimally) because it has to delete entries out of
the primary key. Don't remove your primary key because of this,
though. Presumably you actually use this data with other queries
rather than just expiring it.

John Nagle

unread,
Mar 30, 2011, 1:53:44 PM3/30/11
to
On 3/28/2011 6:36 PM, Gordon Burditt wrote:
>> I want to delete a large amount of records (~200K) from a large table
>> (~500K records) in my MySql DB.
>
> For "large" read "small".
>
>> I want to make this call as efficient as possible because i dont want
>> the DB to become "unresponsive" while executing the call. I need to
>
> Efficiency may not be the issue here. If the DELETE query keeps the
> table locked, it may block queries that modify that table.
>
> I recall having a problem with this on an older system, expiring
> old login records. (Perhaps deleting 20 million records out of 250
> million daily). Initially, with once-a-day expiration, it locked
> up the table for 45 minutes. Not good. As this was a 24x7 operation,
> doing it overnight still caused problems. Some caveats on this example:
>
> 1) It used 10-year-old hardware.
> 2) It used MyISAM tables (InnoDB wasn't around or not mature enough then).
> InnoDB I believe uses more selective locking.
> 3) Finding the records to delete wasn't the issue (with an appropriate
> index); it seemed that actually deleting the records was taking the time.
> 4) MySQL has improved a lot since then.
> 5) Running the query more often, and with LIMIT, divided the lockout
> into shorter lockouts that were more tolerable. However, from the
> point of view of server load, this was a LOT less efficient, as it
> was finding the records 100 times a day instead of once.

That's a big delete load. You could measure the expiration time in
seconds, and spread out the deletion load that way, with about 2 minutes
of deletion per hour. With an index on the timestamp, InnoDB's row
locking, and a 30-second deletion cycle every 15 minutes, the impact
shouldn't be all that bad.

John Nagle

0 new messages