I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still in the millions). I can easily find these duplicate rows and can run a single delete query to kill them all. The problem is that trying to delete this many rows in one shot locks up the table for a long time, which I would like to avoid if possible. The only ways I can see to get rid of these rows, without taking down the site (by locking up the table) are:
I was just wondering if anyone else has had this problem before and, if so, how you dealt with it without taking down the site and, hopefully, with minimal if any interruption to the users? If I go with number 2, or a different, similar, approach, I can schedule the stuff to run late at night and do the merge early the next morning and just let the users know ahead of time, so that's not a huge deal. I'm just looking to see if anyone has any ideas for a better, or easier, way to do the cleanup.
I think the slowness is due to MySQl's "clustered index" where the actual records are stored within the primary key index - in the order of the primary key index. This means access to a record via the primary key is extremely fast because it only requires one disk fetch because the record on the disk is right there where it found the correct primary key in the index.
In other databases without clustered indexes the index itself does not hold the record but just an "offset" or "location" indicating where the record is located in the table file and then a second fetch must be made in that file to retrieve the actual data.
You can imagine that when deleting a record in a clustered index (like MySQL uses) all records above that record in the index (=table) must be moved downwards to avoid massive holes being created in the index (well that is what I recall from a few years ago at least - version 8.x may have improved this issue).
Armed with knowledge of the above 'under the hood' operations, what we discovered that really sped up deletes in MySQL 5.x was to perform the deletes in reverse order. This produces the least amount of record movement because you are deleting records from the end first, meaning that subsequent deletes have less records to relocate - logical right?!
I'd also recommend adding some constraints to your table to make sure that this doesn't happen to you again. A million rows, at 1000 per shot, will take 1000 repetitions of a script to complete. If the script runs once every 3.6 seconds you'll be done in an hour. No worries. Your clients are unlikely to notice.
Deleting 10,000 rows at a time is typically a large enough task tomake each query efficient, and a short enough task to minimize theimpact on the server (transactional storage engines might benefitfrom smaller transactions). It might also be a good idea to add somesleep time between the DELETE statements to spread the load over timeand reduce the amount of time locks are held.
I faced a similar problem. We had a really big table, about 500 GB in size with no partitioning and one only one index on the primary_key column. Our master was a hulk of a machine, 128 cores and 512 Gigs of RAM and we had multiple slaves too. We tried a few techniques to tackle the large-scale deletion of rows. I will list them all here from worst to best that we found-
I'd use mk-archiver from the excellent Maatkit utilities package (a bunch of Perl scripts for MySQL management) Maatkit is from Baron Schwartz, the author of the O'Reilly "High Performance MySQL" book.
The goal is a low-impact, forward-only job to nibble old data out of the table without impacting OLTP queries much. You can insert the data into another table, which need not be on the same server. You can also write it to a file in a format suitable for LOAD DATA INFILE. Or you can do neither, in which case it's just an incremental DELETE.
I had a really loaded base that needed to delete some older entries all the time.Some of the delete queries started to hang so I needed to kill them,and if there are too many deletes the whole base become unresponsive so I needed to restrict the parallel runs.So I've created a cron job running every minute starting this script:
I have not scripted anything to do this, and doing it properly would absolutely require a script, but another option is to create a new, duplicate table and select all the rows you want to keep into it. Use a trigger to keep it up-to-date while this process completes. When it is in sync (minus the rows you want to drop), rename both tables in a transaction, so that the new one takes the place of the old. Drop the old table, and voila!
Just like your car, your PC needs a tune-up from time to time to keep it in top working order. Many of us use our computers on a daily basis, and increasing numbers rely their PCs for work tasks or running a business. In other words, lack of maintenance can lead to a lack of productivity, longer loading times, and if nothing else, increased frustration.
Thankfully, there are a ton of options available to help perform regular maintenance. These tools often work with little user input but can have a dramatic impact on the health and performance of your PC.
While there are a ton of options available, not all PC tune-up software are worth your time, with services differing in a variety of factors. We based our selections for the top PC tune-up software on the following criteria:
Iolo System Mechanic is a reputable tune-up app that is known for its excellent performance and ease of use. It can detect and repair over 30,000 issues and will automatically start working whenever your computer is idle. Better still, Iolo Labs is constantly pushing out updates to help keep things running as smoothly as possible.
System Mechanic will defragment your hard drive, remove useless files and unwanted startup programs, control CPU and RAM usage, and more. One neat feature is NetBooster which will optimize your internet settings to improve download speeds and page load times. Privacy and security protections include the wiping of browsing history and patching of Windows vulnerabilities.
A huge bonus with this tool is that you pay one price to cover all PCs in your home, whereas some other utilities require you to purchase a separate license for each computer or limit the number of PCs you can cover. This makes Iolo System Mechanic an ideal choice for a family or a group of housemates.
Norton has several tune-up products on offer including Computer Tune Up and Norton Utilities Ultimate (previously called Norton Utilities Premium). However, the Tune Up is actually a one-time remote access service whereby a technician takes over your computer to analyze and optimize it.
Ashampoo WinOptimizer offers both a free and paid version. The no-cost option is fully functional and cleans your system, improving performance. Several privacy and security features are bundled in, including AntiSpy (deals with items such as remote access and folder sharing), a privacy manager, cookie manager, and a file wiper tool (ensures deleted files are irrecoverable). Ashampoo WinOptimizer free will also clear browser junk files and cache and provides configuration options for browser and cookies settings.
Price:The free version of WinOptimizer is robust and comes with a decent selection of features. You can pay a one-time fee of $50 for a one-device, lifetime license or $14 per year to use the service on three systems simultaneously. Ashampoo offers a 30-day money-back guarantee.
However, if your system still has a hard disk (one with moving parts that occasionally whirs or clicks when accessing data), defragmentation can really help. Generally speaking, we recommend that you defragment your hard drive once a month. Alternatively, you should defragment your drive when it has become 10% or more fragmented.
The good news is that Windows has a built-in defrag tool that you can use to keep your hard drive optimized. This ensures that you can keep your hard drive healthy without the need for any third-party tools.
The best tune-up programs take care of these issues for you using features such as disk defragmentation, browser extension management, benchmark testing, and automatic cleaning (removal of unwanted or unused files).
b1e95dc632