I want to delete a large amount of records (~200K) from a large table
(~500K records) in my MySql DB.
I want to make this call as efficient as possible because i dont want
the DB to become "unresponsive" while executing the call. I need to
delete records that are "older" than 10 days (according to created_at
column), currently I use:
delete from table_name where created_at < DATE_SUB(CURDATE(),INTERVAL
10 DAY)
The table also have a primary key id if it helps.
Any thoughts?
Your primary key would have no effect in a case like this.
First thing to do is to EXPLAIN your query and see if there are any
indexes being used (i.e. is there an index on created_at?).
Also: How often are you running this query? How long does it take to
run currently? What is the age range of the rows you want to delete?
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================
Depending on what the data in the table looks like you could use an
index on the created_at column. The may (or may not) improve the
efficiency.
One thing to bear in mind is that ensuring that the database does not
become "noticeably" unresponsive is not necessarily dependent on the
efficiency of the query. You could run multiple queries over time with
a LIMIT clause. This would not make the query any more efficient
(indeed you might say that it makes the "process" more in-efficient),
but it would prevent the database being tied up with this one task for
too long.
Also, if you're using InnoDB, and created_at is an indexed
field, the operation which is purging old records generally shouldn't
interfere with operations that are adding or working on new
records.
500K records isn't a large database.
John Nagle
The DELETE statement has LOW_PRIORITY and QUICK switches. They look like
the first approach to test if you are using MyISAM:
«If you specify LOW_PRIORITY, the server delays execution of the DELETE
until no other clients are reading from the table. This affects only
storage engines that use only table-level locking (such as MyISAM,
MEMORY, and MERGE).
For MyISAM tables, if you use the QUICK keyword, the storage engine does
not merge index leaves during delete, which may speed up some kinds of
delete operations.»
http://dev.mysql.com/doc/refman/5.1/en/delete.html
See also http://dev.mysql.com/doc/refman/5.1/en/delete-speed.html
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
For "large" read "small".
> I want to make this call as efficient as possible because i dont want
> the DB to become "unresponsive" while executing the call. I need to
Efficiency may not be the issue here. If the DELETE query keeps the
table locked, it may block queries that modify that table.
I recall having a problem with this on an older system, expiring
old login records. (Perhaps deleting 20 million records out of 250
million daily). Initially, with once-a-day expiration, it locked
up the table for 45 minutes. Not good. As this was a 24x7 operation,
doing it overnight still caused problems. Some caveats on this example:
1) It used 10-year-old hardware.
2) It used MyISAM tables (InnoDB wasn't around or not mature enough then).
InnoDB I believe uses more selective locking.
3) Finding the records to delete wasn't the issue (with an appropriate
index); it seemed that actually deleting the records was taking the time.
4) MySQL has improved a lot since then.
5) Running the query more often, and with LIMIT, divided the lockout
into shorter lockouts that were more tolerable. However, from the
point of view of server load, this was a LOT less efficient, as it
was finding the records 100 times a day instead of once.
DELETE LOW_PRIORITY wouldn't help much, as once it started deleting, it
still held a lock for 45 minutes.
> delete records that are "older" than 10 days (according to created_at
> column), currently I use:
>
> delete from table_name where created_at < DATE_SUB(CURDATE(),INTERVAL
> 10 DAY)
I presume you have an index on created_at.
> The table also have a primary key id if it helps.
I expect the primary key will not help for this query. Actually,
it will hurt (minimally) because it has to delete entries out of
the primary key. Don't remove your primary key because of this,
though. Presumably you actually use this data with other queries
rather than just expiring it.
That's a big delete load. You could measure the expiration time in
seconds, and spread out the deletion load that way, with about 2 minutes
of deletion per hour. With an index on the timestamp, InnoDB's row
locking, and a 30-second deletion cycle every 15 minutes, the impact
shouldn't be all that bad.
John Nagle