DBT3 Benchmark Queries

Known things about DBT-3 benchmark and its queries

Contents

  1. Q1
  2. Q2
  3. Q3
  4. Q5
  5. Q8
  6. Q9
  7. Q10
  8. Q13
  9. Q15
  10. Q17
  11. Q20

Q1

A simple, one-table query.

select
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
from
        lineitem
where
        l_shipdate <= date_sub('1998-12-01', interval 79 day)
group by
        l_returnflag,
        l_linestatus
order by
        l_returnflag,
        l_linestatus;

Query plan:

+------+-------------+----------+------+---------------+------+---------+------+----------+----------------------------------------------+
| id   | select_type | table    | type | possible_keys | key  | key_len | ref  | rows     | Extra                                        |
+------+-------------+----------+------+---------------+------+---------+------+----------+----------------------------------------------+
|    1 | SIMPLE      | lineitem | ALL  | i_l_shipdate  | NULL | NULL    | NULL | 59711977 | Using where; Using temporary; Using filesort |
+------+-------------+----------+------+---------------+------+---------+------+----------+----------------------------------------------+
  • l_shipdate < date_sub('1998-12-01', interval 79 day) is satisifed by 59,334,576 rows.
  • The table has 59,986,052 rows in total.
  • There are a total of 4 different values of (l_returnflag,l_linestatus). This means, sorting doesn't matter, and temporary table is a very small heap table.

Q2

plans starting with table "part" ... - 8sec on cold cache. scale=10 (scale=30 ETA 1min)

Q3

select
       l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue,
       o_orderdate, o_shippriority
from 
      customer, 
      orders, 
      lineitem
where 
      c_mktsegment = 'BUILDING' and c_custkey = o_custkey
      and l_orderkey = o_orderkey and o_orderdate < date '1995-03-15'
      and l_shipdate > date '1995-03-15'
group by l_orderkey, o_orderdate, o_shippriority
order by revenue desc, o_orderdate
limit 10;

There seems to be an improvement in mysql-5.6: http://jorgenloland.blogspot.ru/2013/02/dbt-3-q3-6-x-performance-in-mysql-5610.html

(speedup can be observed only when the query is in the form like the above (TODO: figure out where do different forms of queries come from?))

EXPLAINs (scale=10):

+------+-------------+----------+--------+---------------------------------------------------------+---------+---------+----------------------------+----------+----------------------------------------------+
| id   | select_type | table    | type   | possible_keys                                           | key     | key_len | ref                        | rows     | Extra                                        |
+------+-------------+----------+--------+---------------------------------------------------------+---------+---------+----------------------------+----------+----------------------------------------------+
|    1 | SIMPLE      | orders   | ALL    | NULL                                                    | NULL    | NULL    | NULL                       | 15115145 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | customer | eq_ref | PRIMARY                                                 | PRIMARY | 4       | dbt3sf10.orders.o_custkey  |        1 | Using where                                  |
|    1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4       | dbt3sf10.orders.o_orderkey |        2 | Using where                                  |
+------+-------------+----------+--------+---------------------------------------------------------+---------+---------+----------------------------+----------+----------------------------------------------+
+------+-------------+----------+--------+---------------------------------------------------------+----------------+---------+----------------------------+---------+---------------------------------------------------------------------+
| id   | select_type | table    | type   | possible_keys                                           | key            | key_len | ref                        | rows    | Extra                                                               |
+------+-------------+----------+--------+---------------------------------------------------------+----------------+---------+----------------------------+---------+---------------------------------------------------------------------+
|    1 | SIMPLE      | orders   | range  | PRIMARY,i_o_date_clerk                                  | i_o_date_clerk | 4       | NULL                       | 7557572 | Using index condition; Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | customer | eq_ref | PRIMARY                                                 | PRIMARY        | 4       | dbt3sf10.orders.o_custkey  |       1 | Using where                                                         |
|    1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity | PRIMARY        | 4       | dbt3sf10.orders.o_orderkey |       2 | Using where                                                         |
+------+-------------+----------+--------+---------------------------------------------------------+----------------+---------+----------------------------+---------+---------------------------------------------------------------------+

With statistics on 'building': we get a plan of customer, orders, lineitem. It is 5% worse. There seems to be no other possibilities.

Q5

Nothing good so far.

watch the "c_nationkey = s_nationkey" condition. It is a "side" condition (ie it is not from the "natural" relationships between tables). It is not clear whether accounting for its selectivity will give anything)

Q8

Timour is analyzing this query.

Q9

Timour is analyzing this query.
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------+------+----------+---------------------------------------------------------------------------+
| id   | select_type | table    | type   | possible_keys                                                                          | key                 | key_len | ref                                              | rows | filtered | Extra                                                                     |
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------+------+----------+---------------------------------------------------------------------------+
|    1 | SIMPLE      | nation   | ALL    | PRIMARY                                                                                | NULL                | NULL    | NULL                                             |   25 |   100.00 | Using temporary; Using filesort                                           |
|    1 | SIMPLE      | supplier | ref    | PRIMARY,i_s_nationkey                                                                  | i_s_nationkey       | 5       | dbt3.nation.n_nationkey                          | 4077 |   100.00 | Using index                                                               |
|    1 | SIMPLE      | partsupp | ref    | PRIMARY,i_ps_partkey,i_ps_suppkey                                                      | i_ps_suppkey        | 4       | dbt3.supplier.s_suppkey                          |   38 |   100.00 | Using join buffer (flat, BKA join); Key-ordered Rowid-ordered scan        |
|    1 | SIMPLE      | part     | eq_ref | PRIMARY                                                                                | PRIMARY             | 4       | dbt3.partsupp.ps_partkey                         |    1 |   100.00 | Using where; Using join buffer (incremental, BKA join); Key-ordered scan  |
|    1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_suppkey_partkey | 10      | dbt3.partsupp.ps_partkey,dbt3.supplier.s_suppkey |    3 |   100.00 | Using join buffer (incremental, BKA join); Key-ordered Rowid-ordered scan |
|    1 | SIMPLE      | orders   | eq_ref | PRIMARY                                                                                | PRIMARY             | 4       | dbt3.lineitem.l_orderkey                         |    1 |   100.00 | Using join buffer (incremental, BKA join); Key-ordered scan               |
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------+------+----------+---------------------------------------------------------------------------+

Watch for "p_name like ..." condition. What if we force table part to be the 1st.

Q10

Q13

SergeyP is analyzing this query.
select
	c_count,
	count(*) as custdist
from
	(
		select
			c_custkey,
			count(o_orderkey) as c_count
		from
			customer left outer join orders on
				c_custkey = o_custkey
				and o_comment not like '%express%requests%'
		group by
			c_custkey
	) as c_orders
group by
	c_count
order by
	custdist desc,
	c_count desc;

Q15

SergeyP is analyzing this query.

Q17

Timour is analyzing this query.
  • Q17 cannot benefit from MDEV-89 because the '<' predicate depends on both query tables. No matter what is the join order, the subquery has to be attached to the last table in the plan.
  • There are no sargable conditions for 'lineitem', and it is bigger than 'part', and selectivity(p_brand = 'Brand#43' and p_container = 'WRAP DRUM') = 0.1%. Therefore table part should be first in the join plan.
  • selectivity(p_partkey = l_partkey) << selectivity(l_quantity < (select ...)), therefore the expensive subquery will be placed correctly after the cheap join condition.
  • One possible improvement for this query is to stop the execution of the subquery as soon as the selected expression becomes false. In this case stop when (l_quantity >= 0.2 * avg(l_quantity)).
select sum(l_extendedprice) / 7.0 as avg_yearly
from lineitem, part
where
  p_partkey = l_partkey
  and p_brand = 'Brand#43'
  and p_container = 'WRAP DRUM'
  and l_quantity < (select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey);
+------+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+----------+------------------------------------------------------------------------------------------------+
| id   | select_type        | table    | type | possible_keys                   | key                 | key_len | ref                 | rows    | filtered | Extra                                                                                          |
+------+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+----------+------------------------------------------------------------------------------------------------+
|    1 | PRIMARY            | part     | ALL  | PRIMARY                         | NULL                | NULL    | NULL                | 6000000 |     1.39 | Using where                                                                                    |
|    1 | PRIMARY            | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      29 |   100.00 | Using where; Subqueries: 2; Using join buffer (flat, BKA join); Key-ordered Rowid-ordered scan |
|    2 | DEPENDENT SUBQUERY | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      29 |   100.00 |                                                                                                |
+------+--------------------+----------+------+---------------------------------+---------------------+---------+---------------------+---------+----------+------------------------------------------------------------------------------------------------+

Q20

  • This query will benefit from MWL#253 (Exact index stats), and MDEV-83 (Cost-based choice for the pushdown of subqueries to joined tables).
explain extended
select sql_calc_found_rows 
  s_name, s_address
from 
  supplier, nation
where s_suppkey in (select ps_suppkey from partsupp
                    where ps_partkey in (select p_partkey from part where p_name like 'g%')
                          and ps_availqty >
                              (select 0.5 * sum(l_quantity)
                               from lineitem
                               where l_partkey = ps_partkey and l_suppkey = ps_suppkey
                                     and l_shipdate >= date('1993-01-01') and l_shipdate < date('1993-01-01') + interval '1' year ))
and s_nationkey = n_nationkey
and n_name = 'UNITED STATES'
order by s_name
limit 10;

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.