Performance optimization: MySQL usage optimization (1)

References:

“Explain | Do you really know how to use this great sword of index optimization?” 》

“Completely understand MySQL’s explanation with one picture”

“MySQL Performance Optimization Artifact Explain Usage Analysis”

“MySQL Index Application: The correct approach to index creation and the best guide to using indexes!” 》

“MySQL index failure”

related articles:

“MySQL: Infrastructure and Storage Engine”

“MySQL: Index (1) Principle and Underlying Structure”

“MySQL: Index (2) usage and related suggestions”

“mysql’s slow sql and pt-query-digest”

“Performance Optimization: MySQL usage optimization (2)”

Written at the beginning: This article is a summary after study. There may be some shortcomings and mistakes. You are welcome to correct me.

Preface

In the previous article of the MySQL column, we introduced MySQL’s master-slave replication and high-availability architecture MHA, etc. These are MySQL optimization solutions at the architectural level (including the sharding of databases and tables that are still planned). Today we will start again. Let’s look at how to optimize our SQL from the most basic perspective of SQL usage.

There are two main situations for optimizing SQL itself: one is the table-related design during the project development phase, and the other is optimizing the exposed problems after the project goes online. In the former case, you need to design the fields and indexes of the table, which paradigms need to be followed, and whether you need to redundant fields to reduce the frequency of table connections. In the latter case, you need to count slow SQL (for statistics and analysis of slow SQL, you can see This article of mine“mysql’s slow sql and pt-query-digest”), and then conduct systematic analysis and optimization.

In this article, we mainly explain the optimization of indexes and sql statements. In-depth understanding of this article requires a basic understanding of the index structure of MySQL. If you are not familiar with it, you can start with my previous article (“Index (1)”“Index (2)”), this article can also be seen as some supplements and improvements to these two articles. (This article uses the InnoDB storage engine by default)

Table of contents

Preface

1. Explain analysis statement

        1、id

same id

ids are different

Both IDs are the same and different.

        2、select_type

        SIMPLE

PRIMARY and SUBQUERY

        DERIVED

UNION and UNION RESULT

        3、table

        4、type

        const

        eq_ref

        ref

        range

        index

        ALL

        5、key

        6、key_len

        7、Extra

        Using filesort

        Using index

        Using temporary

        Using where

        using index condition

2. Points to note about various indexes

1. Primary key index

2. Composite index

3. Prefix index

4. Unique index

3. Index failure situation

1. Conditions for or separation

2. In fuzzy query, like starts with %, causing index failure.

3. Participate in function operations or type conversions

3.1. Implicit type conversion

3.2. Inconsistent character sets

3.3. Participate in calculations or built-in functions

4. Violates the leftmost prefix principle

5. MySQL thinks that the whole table is faster

5.1. is null and is not null

5.2. in and not in


1. Explain analysis statement

The explain keyword can simulate the execution of SQL statements by the MySQL optimizer, and can well analyze the performance bottlenecks of SQL statements or table structures. The usage of explain is very simple, just add explain before the statement, for example:

explain select * from test where  id = 1;

There are 12 columns in explain. The specific column names are as follows:

Below we will give some explanations about the more important columns. The following mainly refers to the“Explain | Do you really know how to use this great sword of index optimization?” 》

        1、id

The value of this column is the serial number in the select query. A simple SQL will only have 1 id, and this id is 1. butWhen there are complex queries such as subqueries or connection queries, the SQL will internally split them into multiple SQLs for execution. Due to the different execution orders, multiple serial numbers will appear, such as: 1, 2, 3, 4, etc. It determines the execution order of the tables.There are generally three situations in the execution plan of a certain SQL: the same id, different id, and both the same and different id.

same id

For example, in the connection query below, we see that the two data IDs in the execution results are both 1, which is the same. In this case, the execution order of the tables is from top to bottom, first executing table t1, and then executing table t2 (the table involved in each record is represented by table).

explain select * from test1 t1 inner join test1 t2 on t1.id=t2.id

ids are different

In the following subquery, we see that the IDs of the two pieces of data in the execution result are different. The first piece of data is 1, and the second piece of data is 2. At this time, the one with the larger sequence number will be executed first. Here, it will be executed from bottom to top. Execute table t2, and then execute table t1.

explain select * from test1 t1 where t1.id = 
(select id from  test1 t2 where  t2.id=2);

Both IDs are the same and different.

When the SQL statement becomes more complex, there will be situations where the IDs are the same or different. In this case, the one with the larger sequence number will be executed first, and then executed from bottom to top. When the sequence numbers are the same, they are executed from top to bottom. So the order of the tables in this example is: test1, t1.As for<derived2> , this is the subquery in the from clause, in this example it is t2

explain
select t1.* from test1 t1
inner join (select max(id) mid from test1 group by id) t2
on t1.id=t2.mid


        2、select_type

select_type indicates the type of query, the common ones are as follows

typemeaning
SIMPLESimple SELECT query, excluding subqueries and UNION
PRIMARYThe outermost query in a complex query, representing the main query
SUBQUERYThe SELECT or WHERE list contains a subquery
DERIVEDThe subquery contained in the FROM list, that is, the derivative
UNIONQuery after UNION keyword
UNION RESULTGet result set from table after UNION

        SIMPLE

There are no subqueries and union queries (but connections can be used). The SQL with the same ID introduced above is reused here.

explain select * from test1 t1 inner join test1 t2 on t1.id=t2.id

PRIMARY and SUBQUERY

When a subquery occurs,PRIMARY and SUBQUERY are used to represent the outer complex query and the inner subquery respectively.. SQL with different IDs is reused here. According to the order of IDs, it can be seen that the subquery is executed first and the parent query is executed later.

explain select * from test1 t1 where t1.id = 
(select id from  test1 t2 where  t2.id=2);

        DERIVED

When a from subquery occurs (that is, the result of a SQL query is used as a table).

explain
select t1.* from test1 t1
inner join (select max(id) mid from test1 group by id) t2
on t1.id=t2.mid

UNION and UNION RESULT

When union is used to combine the query results of two tables, the query type of the SQL followed by union is union, and the previous table is similar to a subquery and is marked as PRIMARY.<union1,2> Represents the table union of id=1 and id=2, and its result is marked as UNION RESULT. (Note that the id column of union result is empty)

explain
select * from test1
union
select* from test2

         3、table

The value of this column indicates the name of the table referenced by the output row, such as the previous: test1, test2, etc. However, there are also special cases, such as from subquery, etc.

  • <unionM,N>: Union of M rows with and id values ​​N.
  • <derivedN>: The value N used for the derived table result id associated with this row. The derived table may come from (for example) a subquery in the from clause.
  • <subqueryN>: The result of the subquery, its id value is N.

        4、type

The type field is more important. It provides an important basis for judging whether the query is efficient. Through the type field, we judge whether the query is a full table scan or an index scan, etc.

The order of execution results from best to worst is from top to bottom. Generally, you can master the following: const > eq_ref > ref > range > index > ALL. (The system type requires only one piece of data in the database table, which is a special case of the const type and will not appear under normal circumstances)

Suppose we now have a table test2 containing three columns (id, code, name), with a primary key index id and a common index code.

        const

It means that it can be found through the index once.const is used to compare primary key or uique index, because only one row of data is matched, so very quickly, if the primary key is placed in the where list, MySQL can convert the query into a constant.

explain select * from test2 where id=1;

        eq_ref

This type usually appears in join queries of multiple tables.Indicates that each result in the previous table can only match one row of results in the latter table (generally seen in unique index scans). And the comparison operation for a query is usually=, the query efficiency is higher.

explain select * from test2 t1 inner join test2 t2 on t1.id=t2.id;

        ref

Often used in non-(primary key and unique) index scans. Although the index column is hit, due to non-uniqueness, the search still needs to be continued after the required data is found.

explain select * from test2 where code = '001';

         range

It is often used for range queries on index columns, such as between … and or In operations. The sql is executed as follows:

explain select * from test2 where id between 1 and 2;

        index

        When you do not use where to limit the query conditions, and the queried column is only the index column, the entire index tree will be scanned (but because only the index column is queried, the table will not be returned), and all the indexes for this column will be taken out.

explain select code from test2;

        ALL

Full table scan, the worst case scenario.

        

        5、key

key indicates that this column represents the actually used index (the previous column possible_keys represents the index that may be used).

explain select * from test1 t1 where t1.id = 
(select id from  test1 t2 where  t2.id=2);

        6、key_len

Indicates the number of bytes of the index used by the query optimizer. This field can evaluate whether the combined index is fully used, or only the leftmost field is used. When we use a composite index, the key_len is different when matching only the first n and matching the first n+1 columns.

Here is an example. As shown in the figure below, there is a composite index involving three fields: name, status, and address. We test hitting all indexes and hitting only the first index. We can find that the length of key_len is different.

        7、Extra

This field contains additional information about how MySQL parses the query.

        Using filesort

When there is Using filesort in Extra, it means that MySQL requires additional sorting operations, and the sorting effect cannot be achieved through index order. Generally, using filesort is recommended to be optimized and removed, because such queries consume a lot of CPU resources.

        When the test1 table has a joint index like (code, name), we use a full scan index and only query the code, but we reverse sort the name. At this time, because the order of the name in the composite index is different, we can only sort it first. The data is queried into memory and then sorted.

explain select code  from test1 order by name desc;

        Using index

Indicates whether a covering index is used, that is, a composite index is used and no table return is required. I won’t give a separate example here. The SQL in the Using filesort example satisfies this condition.

        Using temporary

Temporary tables are used to store intermediate results. MySQL uses temporary tables when sorting results, which is common in order by and group by queries.

        Using where

Indicates that a where clause query is used, usually indicating that no index is used.

        Using index condition

Index pushdown is used, which we will introduce below.

2. Points to note about various indexes

1. Primary key index

When the primary key index stores data, table data and index data are stored together. At the same time, the default index structure of MySQL is B+Tree, which means that the data of the index node is ordered.

        If you use UUID or other string type fields as primary keys, then every time a new piece of data is inserted, it may destroy the original tree structure, and may even cause the data in the disk block to move, such as paging, which seriously affects performance. . Therefore, we generally recommend using auto-incrementing ID as the primary key. When there is new data to be inserted, the data will be placed at the end.

2. Composite index

Since the composite index is stored internally in the order of column 1, column 2, and column 3, if a jump occurs when using it, such as using the first and third index columns, then only the first column will take effect. (The order of the index columns in where will not affect whether it takes effect, because the optimizer will adjust it by itself).Therefore, when we build a composite index, we need to confirm the query frequency and try to put the columns with high frequency in the front, so that the composite index can be utilized to the maximum extent.

                       

        In addition, since the non-primary key index only stores the value of the primary key, if a column that is not in the composite index appears in the query column, you need to use the value of the primary key index to search for the data row on the primary key index tree. This process is Return to table. 

Therefore, in order to reduce table backing, try to use covering indexes (index columns completely contain query columns), and try not to use select *​. At the same time, we can add columns with higher query frequency to the composite index columns to improve efficiency.

3. Prefix index

The characteristic of prefix index is that it is short and concise. We can use the first N characters of a field to create an index. The index created in this form is also called a prefix index. Compared with using a complete field to create an index, a prefix index can save more money. Storage space brings more obvious advantages when there is more data.

However, although the prefix index brings the benefit of saving space, because the complete value of a field is not stored in its index node, MySQL cannot complete grouping sorting work such as ORDER BY and GROUP BY through the prefix index. At the same time, Operations such as overlay scanning cannot be completed.

4. Unique index

A great advantage of a unique index is that it is more efficient than a normal index when querying data, because data is queried based on the fields of a normal index, for example:

explain select * from test2 where code = '001';

Assume that a normal index is established on the code field. When querying data based on this field, when a piece of data with code = “001” is queried, the entire index tree will continue to be walked because there may be multiple field values. Same data. However, if a unique index is established on the code field, the search will stop immediately when a piece of data is found, so the field value of the unique index itself will be unique.

       Therefore, when querying data with a unique index, it will be much faster than a normal index, but it is different when inserting data. Because it is necessary to ensure that the data is not repeated, it will be checked whether the same data exists in the table before inserting. But ordinary indexes do not need to consider this issue, so the data insertion of ordinary indexes will be faster.

3. Index failure situation

Note that because the judgment of the MySQL optimizer is completely different when the number is different, the following content takes the case where the data volume is no less than a few thousand records as an example. (If the amount of data is too small and the index cannot be used, the efficiency will not be affected much. Sometimes even a full table scan will be faster)

Create a user table

CREATE DATABASE IF NOT EXISTS `test` DEFAULT CHARACTER SET utf8;
USE `test`;

DROP TABLE IF EXISTS `t_user`;
CREATE TABLE `t_user`  (
  `id` bigint NOT NULL DEFAULT 0 COMMENT 'Primary key, user's unique id',
  `user_name` varchar(32) NOT NULL DEFAULT '' COMMENT 'username',
  `password` varchar(64) NOT NULL DEFAULT '' COMMENT 'password',
  `email` varchar(32) NOT NULL DEFAULT '' COMMENT 'Mailbox',
  `phone_number` varchar(16) NOT NULL DEFAULT '' COMMENT 'phone number',
  `avatar` varchar(256) NOT NULL DEFAULT '' COMMENT 'avatar',
  `create_time` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT 'User account creation time',
  `update_time` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT 'Last updated record time',
  `last_login_time` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT 'last login time',
  `status` int(2) NOT NULL DEFAULT 0 COMMENT 'User status 0-normal 1-banned',
  PRIMARY KEY (`id`)
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_bin COMMENT = 'User information table';

Create a stored procedure and insert 100,000 pieces of test data

DROP PROCEDURE if exists insert_t_user_test;

DELIMITER $$
CREATE PROCEDURE insert_t_user_test(IN loop_times INT)
BEGIN
    DECLARE var INT DEFAULT 0;
    WHILE var < loop_times DO
        SET var = var + 1;
    	  INSERT INTO `t_user` VALUES (var, CONCAT('rkyao-', var), '123456', '[email protected]', '15251831704', 'avatar.jpg', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 0);
    END WHILE;
		COMMIT;
END $$


CALL insert_t_user_test(100000);

1. Conditions for or separation

Conditions separated by or, if one of the columns before and after or does not have an index, then even if the other column has an index, it will not be hit. In the following example, if only the user_name or email column is indexed, it will not be indexed.

If both columns have indexes, you can go

Let me add here that both columns have indexes, but when using and, only one index tree will be selected for query.

2. In fuzzy query, like starts with %, causing index failure.

Since index matching is similar to binary tree search (B+ trees are also evolved from binary trees), for example, if you want to find ‘abc’, if it is %bc, the original root will not be found, so naturally there is no way to use the index tree. However, those ending with % can still use the index tree.

3. Participate in function operations or type conversions

3.1. Implicit type conversion

The index field is of string type. Since single quotes are not added to the string during query, MySQL’s query optimizer will automatically perform type conversion, causing the index to become invalid.

3.2. Inconsistent character sets

There was a special explanation about this before, you can read it here“Index failure problem caused by inconsistent mysql character set”

3.3. Participate in calculations or built-in functions

Using a function in a field will make it difficult for the optimizer to start. The value in the B-tree may not match the result of the function, so the index will not be used, that is, the index will be invalid.

4. Violates the leftmost prefix principle

This is for the composite index, because the physical structure of the composite index is as follows. We can see that the order of the leaf nodes is according to column 1 and column 2. Therefore, when we skip column 1, naturally it cannot be used. Now it’s time to column 2. However, this situation has changed in 8.0. We will introduce skip scanning in the following content.

                 

        Supplement: What needs to be added here is that within the restrictions of the leftmost prefix principle, the use of range queries is not allowed, that is, if the first index is used such as <, > or like ending with %, the subsequent indexes of the column will be invalid. . As shown in the second example in the figure below, the address in the third column will not take effect.

The query process in the second example is as follows:

  • Use the name and status fields in the joint index to find the leaf nodes in all index trees that compound this condition.
  • Return the value stored in the index node to the server layer, and then scan the table one by one.
  • In the Server layer, the conditions of address=”Beijing City” are judged one by one, and finally the data that meets the conditions are filtered out.

It can be seen that the reason why the third column index cannot be used here is because the engine layer does not judge the data, but gives the results of the query based on the first two column indexes to the server layer for judgment. So the third column does not use the index.

Some friends may think that if they are judged at the engine layer, can’t we use the index? In fact, MySQL has also thought of this, so after 5.6, a new mechanism called index pushdown was added. We will Introduced in subsequent articles.

Here is a reference example that can be compared for in-depth understanding.

5. MySQL thinks that the whole table is faster

        The basis for deciding not to use an index to execute a query in MySQL is whether the cost is small enough. I asked my company’s DBA about whether to use an index, and the conclusion I got was only a detailed analysis of specific scenarios, and there is no specific statement of invalidity or effectiveness. . (Note that the analysis here is a scenario with more data. The performance of the optimizer is different when there is less data, but this scenario will not be analyzed here because it is of little significance). For detailed analysis, please read this article“Can indexes be used for IS NULL, IS NOT NULL, and not equal to MySQL?” 》

5.1. is null and is not null

5.2. in and not in

In this article, we introduce how to analyze the execution of SQL statements (through the explain statement), as well as the precautions and failure scenarios when using common indexes. Next, we will introduce MySQL’s self-optimization for queries and how we adjust SQL statements for optimization.

Related Posts

How to start Redis

St. Regis takeaway project 1 + source code

The difference between operating Pandas and Excel tables

[C#] Use C# to connect to SQL Server

Dameng database learning summary

Elasticsearch read and write delay problem, how to set the index refresh_interval refresh interval

How to completely uninstall the database sql server and “SQL Server 2019 cannot download the required files. This may mean that the version of the installer is no longer supported.” Installation issues

Analysis and practical use of database SQLite in Android Studio App development (including creating databases, adding, deleting, modifying, checking, remembering passwords, etc. The source code must be read)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*