옵티마이저(Optimizer)의 데이터 처리

Created

March 17, 2024

Created by

DaEun Kim

Mysql 서버의 쿼리 실행 과정

순서	실행 단계	실행 주체
1	SQL 문장의 syntax 를 검사하고, 문장을 쪼개서 파스 트리(parse tree) 를 만든다.	SQL 파서 (Mysql 엔진)
2	파스 트리를 가지고 아래 내용들을 결정하여 쿼리 실행 계획을 수립한다. 1. 불필요한 조건 제거 & 복잡한 연산 단순화 2. 테이블을 여러개 조인해야한다면 조인 순서를 어떻게 할지 결정 3. 각 테이블에 걸리는 조건과 인덱스 통계 정보를 가지고 탐색할 인덱스 결정 4. 임시테이블을 써야 하는지 여부 결정	옵티마이저 (Mysql 엔진)
3	2번에서 결정 실행 계획대로 테이블에서 데이터를 가져온다.	스토리지 엔진
4	3번의 결과를 가지고 조인 및 정렬(ORDER BY), 그루핑(GROUP BY) 수행한다.	Mysql 엔진

옵티마이저가 풀 테이블 스캔(full table scan)을 선택하는 기준

테이블 내에 레코드 개수가 적어서 인덱스를 통해 random I/O 하는 것보다 순차 I/O 하는 게 더 빠른 경우
WHERE / ON 절에 인덱스를 탐색할 수 있는 조건이 없는 경우
인덱스 레인지 스캔 (index range scan) 이 가능하다고 하더라도, 인덱스 B-Tree 를 샘플링해서 통계를 내봤을 때 조건에 일치하는 레코드 건수가 너무 많은 경우

병렬 처리

innodb_parallel_read_threads 라는 시스템 변수를 통해 하나의 쿼리를 몇 개의 스레드가 처리할 지 설정할 수 있다.

mysql> select count(*) from employee;
+----------+
| count(*) |
+----------+
|   313950 |
+----------+
1 row in set (0.12 sec)

mysql> set session innodb_parallel_read_threads=2;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*) from employee;
+----------+
| count(*) |
+----------+
|   313950 |
+----------+
1 row in set (0.08 sec)

mysql> set session innodb_parallel_read_threads=4;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*) from employee;
+----------+
| count(*) |
+----------+
|   313950 |
+----------+
1 row in set (0.07 sec)

병렬 처리하는 스레드의 갯수를 늘릴수록 집계 쿼리의 수행 시간이 짧아지는 것을 확인할 수 있다.

[주의] 서버의 CPU 코어 갯수보다 많게 설정하면 오히려 성능이 떨어질 수 있다.

정렬 (ORDER BY) 처리

소트 버퍼

정렬을 수행하기 위해 할당받는 메모리 공간.
쿼리가 끝나면 시스템으로 반납한다.
sort_buffer_size 라는 시스템변수로 최대 할당 용량을 설정한다.
커낵션 당 할당 받는 소트 버퍼가 크면 트래픽 많은 시간대에 OOM 문제 생길 수 있다.

정렬 알고리즘

옵티마이저가 어떤 정렬 알고리즘을 사용하는지 트레이스를 활성화해서 확인 가능하다.

mysql> SET OPTIMIZER_TRACE='enabled=on', END_MARKERS_IN_JSON=on;

mysql> select * from employee order by last_name limit 300000, 1;
+--------+----------------------+-------------------+--------+
| emp_no | last_name            | first_name        | dep_no |
+--------+----------------------+-------------------+--------+
| 203307 | 'abcmdmdlfsdd'.      | 'Choi'            |     16 |
+--------+----------------------+-------------------+--------+
1 row in set (5.87 sec)

mysql> select TRACE from information_schema.optimizer_trace;
...
            "sorting_table": "employee",
            "filesort_information": [
              {
                "direction": "asc",
                "expression": "`employee`.`last_name`"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "limit": 300001
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "memory_available": 262144,
              "key_size": 1025,
              "row_size": 1554,
              "max_rows_per_buffer": 168,
              "num_rows_estimate": 296326,
              "num_rows_found": 313950,
              "num_initial_chunks_spilled_to_disk": 1309,
              "peak_memory_used": 262144,
              "sort_algorithm": "std::stable_sort",   <<<--- 정렬 알고리즘 (C++ STL 에서 제공하는 stable_sort())
              "sort_mode": "<fixed_sort_key, packed_additional_fields>" <<<--- 정렬 모드
            } /* filesort_summary */

sort_mode	의미
`<sort_key, rowid>`	정렬 키 & 레코드 rowID 만 소트 버퍼에 가져와서 정렬. 정렬 후 rowID 는 디스크에서 데이터 레코드를 읽는 데 사용한다.
`<sort_key, additional_fields>`	정렬 키 & SELECT 대상이 되는 컬럼들까지 소트 버퍼에 가져와서 정렬. 정렬 후 테이블에서 데이터 레코드를 읽을 필요 없이 소트 버퍼에 있는 레코드들을 읽어간다. 각 컬럼은 고정 길이로 소트 버퍼에 저장한다.
`<sort_key, packed_additional_fields>`	위와 동일하지만 SELECT 대상이 되는 각 컬럼은 가변 길이로 소트 버퍼에 저장한다.

옵티마이저는 정렬할 레코드의 갯수가 얼마나 되는지, 각 레코드의 크기가 얼마나 되는지에 따라 sort_mode 를 가변적으로 선택하여 정렬한다.

정렬 대상

인덱스를 사용한 정렬

정렬에 인덱스를 사용하는 케이스 : https://dev.mysql.com/doc/refman/8.0/en/order-by-optimization.html#order-by-index-use
정렬에 인덱스를 사용하는 경우, 인덱스가 이미 정렬되어 있기 때문에 인덱스 순서대로 읽기만 하면 된다. 하지만 대부분의 쿼리가 아래 조건들을 만족해야 인덱스를 사용한 정렬이 가능하다.

ORDER BY col1[, col2] 에서 col1[, col2] 순서로 만들어진 인덱스가 있어야 한다.

[employee 테이블에 있는 인덱스 목록]

mysql> show index from employee ;
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name      | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee |          0 | PRIMARY       |            1 | emp_no      | A         |      296325 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_full_name |            1 | first_name  | A         |         955 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_full_name |            2 | last_name   | A         |        3581 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
3 rows in set (0.04 sec)

[복합인덱스 내 컬럼 순서대로 정렬 후 조회하는 쿼리 실행계획]

first_name, last_name 순서로 이미 인덱스가 정렬되어 있으므로 인덱스 풀 스캔만 하면 된다.

mysql> explain select first_name, last_name from employee order by first_name, last_name;
+----+-------------+----------+------------+-------+---------------+---------------+---------+------+--------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key           | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+---------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employee | NULL       | index | NULL          | idx_full_name | 518     | NULL | 296326 |   100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+---------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain analyze select first_name, last_name from employee order by first_name, last_name;
+------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------+
| -> Index scan on employee using idx_full_name  (cost=29977.35 rows=296326) (actual time=0.517..510.996 rows=313950 loops=1)
 |
+------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.64 sec)

[복합인덱스 내 컬럼 순서에 맞지 않게 정렬 후 조회하는 쿼리 실행계획]

인덱스 풀 스캔을 한 뒤, 스캔한 결과를 가지고 last_name, first_name 순서로 정렬하는 절차(filesort)가 부가적으로 발생한다.

mysql> explain  select first_name, last_name from employee order by last_name, first_name;
+----+-------------+----------+------------+-------+---------------+---------------+---------+------+--------+----------+-----------------------------+
| id | select_type | table    | partitions | type  | possible_keys | key           | key_len | ref  | rows   | filtered | Extra                       |
+----+-------------+----------+------------+-------+---------------+---------------+---------+------+--------+----------+-----------------------------+
|  1 | SIMPLE      | employee | NULL       | index | NULL          | idx_full_name | 518     | NULL | 296326 |   100.00 | Using index; Using filesort |
+----+-------------+----------+------------+-------+---------------+---------------+---------+------+--------+----------+-----------------------------+
1 row in set, 1 warning (0.01 sec)

mysql> explain analyze select first_name, last_name from employee order by last_name, first_name;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Sort: employee.last_name, employee.first_name  (cost=29977.35 rows=296326) (actual time=8692.764..8859.743 rows=313950 loops=1)
    -> Index scan on employee using idx_full_name  (actual time=0.269..578.901 rows=313950 loops=1)
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (8.99 sec)

WHERE 절에 있는 컬럼과 ORDER BY 에 있는 컬럼으로 동일한 인덱스를 탐색할 수 있어야 한다.

[employee 테이블에 있는 인덱스 목록]

mysql> show index from employee;
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name      | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee |          0 | PRIMARY       |            1 | emp_no      | A         |      296325 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_full_name |            1 | first_name  | A         |         955 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_full_name |            2 | last_name   | A         |        3581 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_dep_no    |            1 | dep_no      | A         |          52 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
4 rows in set (0.02 sec)

[WHERE 절에서 조건을 거는 컬럼과 ORDER BY 절에서 정렬 대상이 되는 컬럼이 같은 인덱스를 타는 경우]

first_name, last_name 복합인덱스가 이미 존재하므로, 정렬할 필요 없이 복합인덱스를 스캔한 순서대로 데이터 레코드를 찾으면 된다.

mysql> explain select * from employee where first_name = 'DaEun' order by first_name, last_name;
+----+-------------+----------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | employee | NULL       | ref  | idx_full_name | idx_full_name | 259     | const |    1 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)


mysql> explain analyze select * from employee where first_name = 'DaEun' order by first_name, last_name;
+----------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on employee using idx_full_name (first_name='user')  (cost=1964.45 rows=9302) (actual time=8.725..88.308 rows=5096 loops=1)
+----------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.10 sec)

[WHERE 절에서 조건을 거는 컬럼과 ORDER BY 절에서 정렬 대상이 되는 컬럼이 다른 인덱스를 타는 경우]

idx_dep_no 인덱스를 탐색한 후에 first_name, last_name 으로 정렬하기 위해 부가적인 정렬 절차(filesort)를 수행한다.

mysql> explain select * from employee where dep_no = 62 order by first_name, last_name;
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+----------------+
| id | select_type | table    | partitions | type | possible_keys | key        | key_len | ref   | rows | filtered | Extra          |
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+----------------+
|  1 | SIMPLE      | employee | NULL       | ref  | idx_dep_no    | idx_dep_no | 5       | const |   91 |   100.00 | Using filesort |
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+----------------+
1 row in set, 1 warning (0.00 sec)


mysql> explain analyze select * from employee where dep_no = 62 order by first_name, last_name;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Sort: employee.first_name, employee.last_name  (cost=31.85 rows=91) (actual time=7.017..7.132 rows=91 loops=1)
    -> Index lookup on employee using idx_dep_no (dep_no=62)  (actual time=0.540..3.815 rows=91 loops=1)
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

조인 시 드라이빙 테이블만을 사용한 정렬

일반적으로 조인을 수행하면 SELECT 대상이 되는 데이터 레코드의 갯수가 늘어나고, 데이터 레코드의 크기도 늘어난다.

그래서 조인 후 데이터 레코드들을 정렬하는 것 보다 드라이빙 테이블의 레코드를 먼저 정렬한 후 조인을 실행하는 게 더 낫다.

이 방법으로 정렬을 하기 위해서는 ORDER BY 절에 드라이빙 테이블의 컬럼만 존재해야 한다.

예)

[employee 테이블 정보]

mysql> show columns from employee;
+------------+-------------+------+-----+---------+----------------+
| Field      | Type        | Null | Key | Default | Extra          |
+------------+-------------+------+-----+---------+----------------+
| emp_no     | int         | NO   | PRI | NULL    | auto_increment |
| last_name  | varchar(64) | YES  |     | NULL    |                |
| first_name | varchar(64) | YES  | MUL | NULL    |                |
| dep_no     | int         | YES  | MUL | NULL    |                |
+------------+-------------+------+-----+---------+----------------+
4 rows in set (0.01 sec)

mysql> show index from employee;
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name      | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee |          0 | PRIMARY       |            1 | emp_no      | A         |      296325 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_full_name |            1 | first_name  | A         |         955 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_full_name |            2 | last_name   | A         |        3581 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_dep_no    |            1 | dep_no      | A         |          52 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
4 rows in set (0.02 sec)

[salary 테이블 정보]

mysql> show columns from salary;
+--------+------+------+-----+---------+----------------+
| Field  | Type | Null | Key | Default | Extra          |
+--------+------+------+-----+---------+----------------+
| id     | int  | NO   | PRI | NULL    | auto_increment |
| emp_no | int  | NO   | MUL | 0       |                |
| amount | int  | NO   |     | 0       |                |
+--------+------+------+-----+---------+----------------+
3 rows in set (0.01 sec)

mysql> show index from salary;
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table  | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| salary |          0 | PRIMARY    |            1 | id          | A         |      313740 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| salary |          1 | idx_emp_no |            1 | emp_no      | A         |      313722 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.02 sec)

[ employee 테이블과 salary 테이블을 조인하고, employee 테이블의 컬럼으로 정렬하는 쿼리의 실행계획]

mysql>
explain select * 
from employee as e join salary as s
on s.emp_no = e.emp_no 
where e.emp_no between 2000 and 2010 
order by e.last_name;
+----+-------------+-------+------------+-------+---------------+------------+---------+---------------------+------+----------+-----------------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref                 | rows | filtered | Extra                       |
+----+-------------+-------+------------+-------+---------------+------------+---------+---------------------+------+----------+-----------------------------+
|  1 | SIMPLE      | e     | NULL       | range | PRIMARY       | PRIMARY    | 4       | NULL                |   11 |   100.00 | Using where; Using filesort |
|  1 | SIMPLE      | s     | NULL       | ref   | idx_emp_no    | idx_emp_no | 4       | dispatchyo.e.emp_no |    1 |   100.00 | NULL                        |
+----+-------------+-------+------------+-------+---------------+------------+---------+---------------------+------+----------+-----------------------------+
2 rows in set, 1 warning (0.02 sec)

mysql>
explain select * 
from employee as e join salary as s
on s.emp_no = e.emp_no 
where e.emp_no between 2000 and 2010 
order by e.last_name;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=6.31 rows=11) (actual time=0.661..1.018 rows=11 loops=1)
    -> Sort: e.last_name  (cost=2.46 rows=11) (actual time=0.521..0.531 rows=11 loops=1)
        -> Filter: (e.emp_no between 2000 and 2010)  (cost=2.46 rows=11) (actual time=0.245..0.291 rows=11 loops=1)
            -> Index range scan on e using PRIMARY  (cost=2.46 rows=11) (actual time=0.229..0.267 rows=11 loops=1)
    -> Index lookup on s using idx_emp_no (emp_no=e.emp_no)  (cost=0.26 rows=1) (actual time=0.036..0.040 rows=1 loops=11)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

WHERE 절이 아래 2가지 조건을 갖추고 있기 때문에 옵티마이저는 employee 테이블을 드라이빙 테이블로 선택한다. (??)

emp_no between 2000 and 2010 → PK 를 가지고 검색 가능
salary 테이블의 emp_no 컬럼에 인덱스가 있음

ORDER BY 절에서는 인덱스가 없는 컬럼으로 정렬하기 때문에 인덱스 레인지 스캔이 불가능하다.

그러나 employee 테이블에 있는 컬럼이기 때문에 employee 테이블만을 대상으로 검색 & 정렬한 후 그 결과를 salary 테이블과 조인한다.

[참고]

filesort 를 사용하는 다른 케이스 : https://dev.mysql.com/doc/refman/8.0/en/order-by-optimization.html#order-by-filesort

임시 테이블을 사용한 정렬

임시 테이블을 사용하는 케이스 https://dev.mysql.com/doc/refman/8.3/en/internal-temporary-tables.html
임시 테이블을 사용해야 하는 상황 중에 하나로 ORDER BY 절에 드라이빙 테이블 컬럼이 아닌 드리븐 테이블 컬럼이 있는 경우가 있다.

mysql>
explain select * 
from employee as e join salary as s
on s.emp_no = e.emp_no 
where e.emp_no between 2000 and 2010 
order by s.amount;
+----+-------------+-------+------------+-------+---------------+------------+---------+---------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref                 | rows | filtered | Extra                                        |
+----+-------------+-------+------------+-------+---------------+------------+---------+---------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | e     | NULL       | range | PRIMARY       | PRIMARY    | 4       | NULL                |   11 |   100.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | s     | NULL       | ref   | idx_emp_no    | idx_emp_no | 4       | dispatchyo.e.emp_no |    1 |   100.00 | NULL                                         |
+----+-------------+-------+------------+-------+---------------+------------+---------+---------------------+------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql>
explain select * 
from employee as e join salary as s
on s.emp_no = e.emp_no 
where e.emp_no between 2000 and 2010 
order by s.amount;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Sort: s.amount  (actual time=1.152..1.169 rows=11 loops=1)
    -> Stream results  (cost=6.31 rows=11) (actual time=0.250..0.954 rows=11 loops=1)
        -> Nested loop inner join  (cost=6.31 rows=11) (actual time=0.220..0.851 rows=11 loops=1)
            -> Filter: (e.emp_no between 2000 and 2010)  (cost=2.46 rows=11) (actual time=0.139..0.225 rows=11 loops=1)
                -> Index range scan on e using PRIMARY  (cost=2.46 rows=11) (actual time=0.129..0.197 rows=11 loops=1)
            -> Index lookup on s using idx_emp_no (emp_no=e.emp_no)  (cost=0.26 rows=1) (actual time=0.046..0.052 rows=1 loops=11)
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

여기 에서 설명한 것 처럼 employee 가 드라이빙 테이블이 되고, salary 는 드리븐 테이블이지만 ORDER BY 절에 salary 의 컬럼이 있다.

→ salary 테이블과 조인을 먼저 수행하고, 조인된 데이터 레코드들을 salary.amount 컬럼 기준으로 정렬해야 하므로 임시 테이블을 쓰게 된다.

💡

LIMIT 가 성능에 주는 효과

인덱스를 사용한 정렬을 할 때 LIMIT 는 읽어들일 인덱스 레코드의 갯수를 제한하므로 성능에 영향을 준다.

하지만 인덱스를 사용하지 않는 정렬을 할 때, 정렬이 모두 끝난 후에야 최상위/최하위 데이터 레코드 대상으로 갯수를 제한하므로 성능에 영향을 주지 않는다.

GROUP BY 처리

인덱스를 사용하는 경우와 그렇지 않은 경우로 나뉜다.
인덱스를 사용하여 GROUP BY 를 처리하는 경우 아래의 2가지 스캔 방식이 있다.

인덱스를 사용하지 않고 GROUP BY 를 처리하는 경우 임시 테이블을 사용한다.

tight index scan

인덱스를 스캔하면서 그루핑을 수행한다.
조인이 있다면 그루핑 수행한 결과를 가지고 조인한다.
GROUP BY col ORDER BY col 과 같이 정렬을 요구한다면, 이미 인덱스가 정렬되어 있었기 때문에 추가 정렬이 필요하지 않다.

예)

[dep_no 컬럼에 인덱스 있음]

mysql> show index from employee where column_name = 'dep_no';
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee |          1 | idx_dep_no |            1 | dep_no      | A         |          52 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
1 row in set (0.01 sec)

[dep_no 컬럼 대상으로 그루핑하고, dep_no 컬럼 기준으로 정렬]

→ 임시테이블이나 추가 정렬 작업을 수행하지 않기 때문에 Extra 항목에 인덱스 스캔에 관한 것 외에 표기되는 게 없음

mysql> explain select count(*) from employee group by dep_no order by dep_no;
+----+-------------+----------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key        | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employee | NULL       | index | idx_dep_no    | idx_dep_no | 5       | NULL | 296326 |   100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

loose index scan

Extra 항목에 Using index for group-by 가 표기된다.

[employe 테이블에 있는 인덱스]

mysql> show index from employee where key_name = 'idx_dep_last';
+----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee |          1 | idx_dep_last |            1 | dep_no      | A         |          66 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| employee |          1 | idx_dep_last |            2 | last_name   | A         |        3580 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
+----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.01 sec)

[복합인덱스 내 후행 컬럼으로 조건을 걸었을 때 실행 계획]

mysql> explain select dep_no, last_name from employee where last_name = 'Kim' group by dep_no;
+----+-------------+----------+------------+-------+---------------------------------------+--------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table    | partitions | type  | possible_keys                         | key          | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+----------+------------+-------+---------------------------------------+--------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employee | NULL       | range | idx_dep_no,idx_full_name,idx_dep_last | idx_dep_last | 264     | NULL |   67 |   100.00 | Using where; Using index for group-by |
+----+-------------+----------+------------+-------+---------------------------------------+--------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.01 sec)

DISTINCT 처리

아래 쿼리처럼 유니크한 레코드만 가져온다면, GROUP BY 와 동일한 방식으로 처리한다.

mysql> explain SELECT DISTINCT dep_no FROM employee;
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table    | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | employee | NULL       | range | idx_dep_no    | idx_dep_no | 5       | NULL |   52 |   100.00 | Using index for group-by |
+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.02 sec)

 mysql> explain select distinct dep_no from employee where last_name = 'Kim';
+----+-------------+----------+------------+-------+---------------------------------------+--------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table    | partitions | type  | possible_keys                         | key          | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+----------+------------+-------+---------------------------------------+--------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employee | NULL       | range | idx_dep_no,idx_full_name,idx_dep_last | idx_dep_last | 264     | NULL |   67 |   100.00 | Using where; Using index for group-by |
+----+-------------+----------+------------+-------+---------------------------------------+--------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.01 sec)

임시 테이블 (internal temporary table)

처음에는 메모리에 생성하지만 테이블의 크기가 커지면 디스크로 옮겨진다.
Mysql 8.0버전 부터는 메모리에서 임시 테이블을 만들 때 기본적으로 TempTable 이라는 스토리지 엔진을 사용하고, 디스크에 만들 때는 InnoDB 스토리지 엔진을 사용한다. (internal_tmp_mem_storage_engine 이라는 시스템 변수로 메모리 스토리지 엔진을 선택할 수는 있으나, MEMORY 엔진은 트랜잭션을 지원하지 않기 때문에 왠만하면 사용하지 않는 듯.)
TempTable 엔진이 메모리에서 사용할 수 있는 최대 용량은 temptable_max_ram 이라는 시스템 변수로 제어한다.
임시 테이블의 크기가 temptable_max_ram 을 넘어가면 디스크로 옮기는데, temptable_use_mmap 시스템 변수에 따라 아래 2가지 디스크 저장 방식 중 하나를 선택한다.

MMAP 파일에 기록 (MMAP: 파일을 가상 메모리에 매핑?하는 것인데.. InnoDB 테이블에 기록하는 것에 비해 상대적으로 오버헤드가 적다.)
InnoDB 테이블에 기록

임시테이블이 생성되는 경우

ORDER BY 와 GROUP BY 에 명시된 컬럼이 서로 다른 쿼리
ORDER BY 또는 GROUP BY 에 명시된 컬럼이 조인의 순서 상 첫번째 테이블이 아닌 쿼리
DISTINCT 와 ORDER BY 가 둘 다 있는 쿼리 또는 DISTINCT 가 인덱스로 처리되지 못하는 쿼리
UNION 또는 UNION DISTINCT 가 사용된 쿼리 (실행계획표에서 select_type 컬럼이 UNION RESULT 인 경우)
쿼리 실행 계획표에서 select_type 이 DERIVED 인 경우

임시테이블이 디스크에 생성되는 경우

UNION 또는 UNION ALL 에서 SELECT 대상이 되는 컬럼 중 길이가 512바이트 이상인 컬럼이 있는 경우
GROUP BY 또는 DISTINCT 대상이 되는 컬럼 중 길이가 512바이트 이상인 컬럼이 있는 경우
메모리 스토리지 엔진으로 TempTable 을 사용할 때 메모리에서 생성하려는 임시테이블 크기가 temptable_max_ram 보다 큰 경우

임시테이블이 메모리에 생성됐는지, 디스크에 생성됐는지 확인하려면

실행계획표의 Extra 항목에 표시되는 Using temporary 만 봐서는 임시테이블이 메모리에 생성됐는지, 디스크에 생성됐는지 알 수 없다.

→ 상태 변수를 확인하면 된다.

mysql> select * from employee order by last_name desc limit 2000;
+--------+----------------------------------+---------------+--------+
| emp_no | last_name                        | first_name    | dep_no |
...
2000 rows in set (4.85 sec)

mysql> show session status like 'Created_tmp%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_files       | 4     |
| Created_tmp_tables      | 0     |
+-------------------------+-------+
3 rows in set (0.03 sec)

mysql> select * from employee order by last_name desc limit 200000;
+--------+----------------+---------------+--------+
| emp_no | last_name      | first_name    | dep_no |
+--------+----------------+---------------+--------+
...
1 row in set (5.57 sec)

mysql> show session status like 'Created_tmp%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_files       | 12    |
| Created_tmp_tables      | 1     |
+-------------------------+-------+
3 rows in set (0.01 sec)

status variable	meaning
Created_tmp_tables	세션 내에서 수행한 쿼리들이 만든 임시 테이블의 개수 (메모리, 디스크 구분하지 않음)
Created_tmp_disk_tables	세션 내에서 수행한 쿼리들이 디스크에 만든 임시 테이블의 개수
Created_tmp_files	mysql 서버가 tmp 디렉터리 내에서 만든 임시 파일의 개수

메모리에 테이블을 만들었는지 확인하려면 Created_tmp_tables 과 Created_tmp_disk_tables 를 비교하면 된다.

[참고] Created_tmp_disk_tables 에 MMAP 파일에 만들어진 테이블은 반영하지 않는다.