I have a table system_user with 10 million records and whose MYI and
MYD files take up about 2.7 GB collectively:
-rw-rw---- 1 mysql mysql 15842 Mar 29 06:15 system_user.frm
-rw-rw---- 1 mysql mysql 1206225532 Mar 30 02:20 system_user.MYD
-rw-rw---- 1 mysql mysql 1501195264 Mar 30 02:20 system_user.MYI
When I try to add an index on a field with the command
alter table system_user add index(school);
the "show processlist;" command shows that the command runs for about
half an hour in the state "copy to tmp table". During this time, in
the database directory, three files are created named:
#sql-269c_4.frm
#sql-269c_4.MYD
#sql-269c_4.MYI
which increase steadily in size over the first 30 minutes, so I assume
MySQL is copying the original table.
Then, abruptly, the "#sql-269c" files stop growing, and "show
processlist;" now lists the command in state "Repair with keycache".
On my last attempt it remained in that state for about two hours, with
the temp files not being modified, so I assumed MySQL ran into an
error somewhere. After I abort the "alter table" command, the temp
files get deleted, and now the original table still works fine and
myisamchk reports no errors on that table, although, of course, the
index did not get added.
The server has about 7.2 GB of disk space free, which should be enough
to make a complete copy of the table. A Google search for "Repair
with keycache" turns up many people complaining about the message but
no threads clearly saying what to do.
How do I avoid this roadblock and complete the process of adding the
index?
Bennett
Was the MySQL process using a lot of CPU? If so, it very well could
have been sorting keys; depending on what you're indexing on, this could
take a long time.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================
Yes, it turned out if I just cancelled the command and ran it again,
it ran to completion, after about five hours.
I think the reason it takes so long to modify the table is probably
that I created some indexes without really knowing how to optimize
index creation properly. The MYD file containing the data is 1.2 GB
but the indexes file was 1.5 GB before adding index(school) and 2.5 GB
afterwards. "school" is of type varchar(255). Now I understand
better why you should create indexes on substrings at the beginning of
varchar fields instead of the entire one, especially if it's long.
Is there a command or a tool to tell, of the indexes in the .MYI file,
which ones are taking up the most space, so that you could save the
most space by removing or optimizing them? Besides of course looking
at the definitions of the indexes and figuring out which ones were
created sloppily and taking up too much space, just from rules of
thumb about what kind of indexes take up what amount of space.
Bennett
Indexes are not about space. They are about access speed. If you need
more space, get a larger (or another) disk.
Every time you add an index, you potentially speed up access for SELECT
statements, but always slow access for INSERT, UPDATE and DELETE statements.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================
This is absolutely incorrect.
Indexes can and do enormously speed up access for UPDATE and DELETE
statements. An excellent example from a previous employer, about ten years
ago, was a query similar to this:
DELETE FROM claim_table WHERE claim_number = %hostvariable;
Inexplicably, the original developer had failed to build an index on
claim_number. The table contained some 1.5 million rows; a single DELETE took
approximately 25 minutes, because the query processor had to make a full table
scan to find the row(s) to be deleted. Adding an index on claim_number reduced
this time to less than one second.
The same principle applies to UPDATEs as well.
Yes, you can find times when indexes will speed up UPDATE and DELETE
statements. But that's only for statements where the index is being used.
For instance, add another index on 'first_name' to this table will slow
down both UPDATE and DELETE statements.
Well, duh!
The point is that the blanket statement that adding an index will "always slow
access for INSERT, UPDATE and DELETE statements" is untrue.
And the point you make that that statement is false is at a minimum
misleading.
Overall, an index will ALWAYS slow down INSERT, UPDATE and DELETE
statements. Of course, there are always exceptions to the rules, as
assholes will try to point out.
Nonsense. It *is* false that adding an index will "always slow access" for
those operations.
>
>Overall, an index will ALWAYS slow down INSERT, UPDATE and DELETE
>statements.
Repeating a false statement does not make it true.
>Of course, there are always exceptions to the rules,
If there are exceptions, then the assertion "always" is false. Had you said
"sometimes" or "usually" I would not disagree.
> as assholes will try to point out.
And then there are assholes that can't admit to being wrong.
But PROBABLY true for INSERT.
I think I'd agree that it's *always* true for INSERT -- which is of course the
reason that I specifically discussed DELETE and UPDATE in my response.
But it's only rarely true for UPDATE. An index degrades the performance of
UPDATE *only* if the UPDATE operation changes an indexed column.
In both UPDATE and DELETE operations, using an index on a column that
participates in a WHERE clause can improve performance by many orders of
magnitude, by avoiding full table scans. The benefit of doing so will, in most
cases, greatly outweigh the tiny added cost of maintaining the index.
It's a tradeoff: you choose between having a continual but very very small
degradation in performance (with an index) or having sporadic enormous
degradations (without one).
In the example I cited, I was asked to determine why a nightly batch job that
usually took only about ten minutes to run would occasionally take six or
eight hours. The problem was, as I said, that there was no index on a column
referenced in the WHERE clause of a particular DELETE statement, used only
when an insurance claim was cancelled. When the daily transaction mix did not
include any claim cancellations, the job would be finished in minutes. Even
*with* claim cancellations, usually only one or two rows would need to be
deleted. At 25 minutes apiece, even one or two rows adds significantly to the
time required to run the job -- but that was never noticed, because the job
was still finished long before the day shift arrived at 0730. Once in a while,
though, a dozen or more rows would be deleted. Without an index, at 25 minutes
each, that's an additional six hours or so. With an index, at < 1 second each,
it becomes irrelevant.
Well yes of course indexes are not "about" space, since they always
result in a table taking up more space, never less :) But having
large indexes seems to slow down the speed at which MySQL is able to
do other operations that change the table definition, like adding
other new indexes to the table, or dropping them.
e.g. when I add a new index to this table, MySQL appears to make a
complete temporary copy of the .MYI file, copying the original file to
the destination file at a rate of about 10 M per minute, and then
replacing the original with the copy.
When my *really* inefficient index was still present (an index on the
"school" field which was of type varchar(255)), that increased the MYI
file size from 1.5 GB to 2.5 GB. That means that the process of
adding any new index after that, would take about 2 extra hours
because of the extra time to copy the MYI file at 10 M per minute.
Dropping the index on the "school" field shrunk the MYD file from 2.5
GB back down to 1.5 GB.
Yes, it's unusual to care more about the "performance" of ADD INDEX
and DROP INDEX and ADD COLUMN statements than about the usual SELECT,
DELETE, INSERT statements, but that is the situation I'm in.
So again, is there a tool to tell which of my other indexes are taking
up the most space in the MYD file?
Bennett
Yes.
> Then, abruptly, the "#sql-269c" files stop growing, and "show
> processlist;" now lists the command in state "Repair with keycache".
Yes. That's how it is implemented.
Except that you would like to see "Repair with sorting". Read on below.
> On my last attempt it remained in that state for about two hours, with
> the temp files not being modified, so I assumed MySQL ran into an
> error somewhere.
Wrong assumption. MySQL creates the new index in the key cache (look at
the process status again). The created index pages are dumped to disk
only occasionally - when its complete or when the key cache runs full.
> How do I avoid this roadblock and complete the process of adding the
> index?
By not aborting ALTER TABLE (now that was easy :)
You should configure:
http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_myisam_max_sort_file_size
in order to allow MyISAM to use the faster "repair by sort" method to
create the new index. The size of the sort file should be at least:
2* #rows * (row pointer size + sum of max sizes of fields in the index)
Row pointer defaults to 6 bytes in recent versions. So for i.e. 10 mio
rows and an index on (VARCHAR(100) CHARSET utf8) you would need
2 * 10,000,000 * (6 + 300) ~= 6 GB
Sort files are created in tmpdir (not in the datadir as the temporary
copy of the table).
XL