HOT MySQL InnoDB Sorted Index Builds

0 views

Skip to first unread message

Message has been deleted

Leana Eckes

unread,

Jul 9, 2024, 7:32:30 PM7/9/24

to comalrabar

I'm trying to import a large SQL file that was generated by mysqldump for an InnoDB table but it is taking a very long time even after adjusting some parameters in my.cnf and disabling AUTOCOMMIT (as well as FOREIGN_KEY_CHECKS and UNIQUE_CHECKS but the table does not have any foreign or unique keys). But I'm wondering if it's taking so long because of the several indexes in the table.

But why couldn't mysqldump not include the keys with the CREATE TABLE statement for InnoDB tables, then do an ALTER TABLE after all the data is inserted? Or does InnoDB work differently, and there is no speed difference?

HOT MySQL InnoDB Sorted Index Builds

Download File https://ssurll.com/2yM5oN

There is indeed a performance overhead when you insert to tables that have secondary indexes. Inserts need to update the clustered index (aka the table), and also update secondary indexes. The more indexes a table has, the more overhead it causes for inserts.

Percona Server, a branch of MySQL, experimented with a mysqldump --optimize-keys option. When you use this option, it changes the output of mysqldump to have CREATE TABLE with no indexes, then INSERT all data, then ALTER TABLE to add the indexes after the data is loaded. See -server/LATEST/management/innodb_expanded_fast_index_creation.html

But in my experience, the net improvement in performance was small. It still takes a while to insert a lot of rows, even for tables with no indexes. Then the restore needs to run an ALTER TABLE to build the indexes. This takes a while for a large table. When you count the time of INSERTs plus the extra time to build indexes, it's only a few (low single-digit) percents faster than inserting the traditional way, into a table with indexes.

Another strategy is to use mysqldump --tab to dump CSV files instead of SQL scripts. Bulk-loading CSV files is much faster than executing SQL scripts to restore the data. Well, it dumps an SQL file for the table definition, and a CSV for the data to import. It creates separate files for each table. You have to manually recreate the tables by loading all the SQL files (this is quick), and then use mysqlimport to load the CSV data files. The mysqlimport tool even has a --use-threads option for parallel execution.

Test carefully with different numbers of parallel threads. My experience is that 4 threads is the best. With greater parallelism, InnoDB becomes a bottleneck. But your experience may be different, depending on the version of MySQL and your server hardware's performance capacity.

The fastest restore method of all is when you use a physical backup tool, the most popular is Percona XtraBackup. This allows for fast backups and even faster restores. The backed up files are literally ready to be copied into place and used as live tablespace files. The downside is that you must shut down your MySQL Server to perform the restore.

System variables that are true or false can be enabled at server startup by naming them, or disabled by using a --skip- prefix. For example, to enable or disable the InnoDB adaptive hash index, you can use --innodb-adaptive-hash-index or --skip-innodb-adaptive-hash-index on the command line, or innodb_adaptive_hash_index or skip_innodb_adaptive_hash_index in an option file.

The --innodb-status-file startup option controls whether InnoDB creates a file named innodb_status.pid in the data directory and writes SHOW ENGINE INNODB STATUS output to it every 15 seconds, approximately.

The innodb_status.pid file is not created by default. To create it, start mysqld with the --innodb-status-file option. InnoDB removes the file when the server is shut down normally. If an abnormal shutdown occurs, the status file may have to be removed manually.

The --innodb-status-file option is intended for temporary use, as SHOW ENGINE INNODB STATUS output generation can affect performance, and the innodb_status.pid file can become quite large over time.

Disabling the adaptive hash index empties the hash table immediately. Normal operations can continue while the hash table is emptied, and executing queries that were using the hash table access the index B-trees directly instead. When the adaptive hash index is re-enabled, the hash table is populated again during normal operation.

Permits InnoDB to automatically adjust the value of innodb_thread_sleep_delay up or down according to the current workload. Any nonzero value enables automated, dynamic adjustment of the innodb_thread_sleep_delay value, up to the maximum value specified in the innodb_adaptive_max_sleep_delay option. The value represents the number of microseconds. This option can be useful in busy systems, with greater than 16 InnoDB threads. (In practice, it is most valuable for MySQL systems with hundreds or thousands of simultaneous connections.)

The increment size (in megabytes) for extending the size of an auto-extending InnoDB system tablespace file when it becomes full. The default value is 64. For related information, see System Tablespace Data File Configuration, and Resizing the System Tablespace.

The innodb_autoextend_increment setting does not affect file-per-table tablespace files or general tablespace files. These files are auto-extending regardless of the innodb_autoextend_increment setting. The initial extensions are by small amounts, after which extensions occur in increments of 4MB.

Enabling the innodb_background_drop_list_empty debug option helps avoid test case failures by delaying table creation until the background drop list is empty. For example, if test case A places table t1 on the background drop list, test case B waits until the background drop list is empty before creating table t1.

Buffer pool size must always be equal to or a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances. If you alter innodb_buffer_pool_chunk_size, innodb_buffer_pool_size is automatically rounded to a value that is equal to or a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances. The adjustment occurs when the buffer pool is initialized.

Care should be taken when changing innodb_buffer_pool_chunk_size, as changing this value can automatically increase the size of the buffer pool. Before changing innodb_buffer_pool_chunk_size, calculate its effect on innodb_buffer_pool_size to ensure that the resulting buffer pool size is acceptable.

The innodb_buffer_pool_size variable is dynamic, which permits resizing the buffer pool while the server is online. However, the buffer pool size must be equal to or a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances, and changing either of those variable settings requires restarting the server.

Enabling this option permits multiple buffer pool instances when the buffer pool is less than 1GB in size, ignoring the 1GB minimum buffer pool size constraint imposed on innodb_buffer_pool_instances. The innodb_buffer_pool_debug option is only available if debugging support is compiled in using the WITH_DEBUG CMake option.

Specifies whether to record the pages cached in the InnoDB buffer pool when the MySQL server is shut down, to shorten the warmup process at the next restart. Typically used in combination with innodb_buffer_pool_load_at_startup. The innodb_buffer_pool_dump_pct option defines the percentage of most recently used buffer pool pages to dump.

Enabling innodb_buffer_pool_dump_now triggers the recording action but does not alter the variable setting, which always remains OFF or 0. To view buffer pool dump status after triggering a dump, query the Innodb_buffer_pool_dump_status variable.

Enabling innodb_buffer_pool_dump_now triggers the dump action but does not alter the variable setting, which always remains OFF or 0. To view buffer pool dump status after triggering a dump, query the Innodb_buffer_pool_dump_status variable.

Specifies the percentage of the most recently used pages for each buffer pool to read out and dump. The range is 1 to 100. The default value is 25. For example, if there are 4 buffer pools with 100 pages each, and innodb_buffer_pool_dump_pct is set to 25, the 25 most recently used pages from each buffer pool are dumped.

Specifies the name of the file that holds the list of tablespace IDs and page IDs produced by innodb_buffer_pool_dump_at_shutdown or innodb_buffer_pool_dump_now. Tablespace IDs and page IDs are saved in the following format: space, page_id. By default, the file is named ib_buffer_pool and is located in the InnoDB data directory. A non-default location must be specified relative to the data directory.

You can also specify a file name at startup, in a startup string or MySQL configuration file. When specifying a file name at startup, the file must exist or InnoDB returns a startup error indicating that there is no such file or directory.

The number of regions that the InnoDB buffer pool is divided into. For systems with buffer pools in the multi-gigabyte range, dividing the buffer pool into separate instances can improve concurrency, by reducing contention as different threads read and write to cached pages. Each page that is stored in or read from the buffer pool is assigned to one of the buffer pool instances randomly, using a hashing function. Each buffer pool manages its own free lists, flush lists, LRUs, and all other data structures connected to a buffer pool, and is protected by its own buffer pool mutex.

The total buffer pool size is divided among all the buffer pools. For best efficiency, specify a combination of innodb_buffer_pool_instances and innodb_buffer_pool_size so that each buffer pool instance is at least 1GB.

Enabling innodb_buffer_pool_load_abort triggers the abort action but does not alter the variable setting, which always remains OFF or 0. To view buffer pool load status after triggering an abort action, query the Innodb_buffer_pool_load_status variable.

Specifies that, on MySQL server startup, the InnoDB buffer pool is automatically warmed up by loading the same pages it held at an earlier time. Typically used in combination with innodb_buffer_pool_dump_at_shutdown.

Immediately warms up the InnoDB buffer pool by loading data pages without waiting for a server restart. Can be useful to bring cache memory back to a known state during benchmarking or to ready the MySQL server to resume its normal workload after running queries for reports or maintenance.