DBT3 Benchmark Queries
Known things about DBT-3 benchmark and its queries
Q1
A simple, one-table query.
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date_sub('1998-12-01', interval 79 day) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
Query plan:
+------+-------------+----------+------+---------------+------+---------+------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+----------------------------------------------+ | 1 | SIMPLE | lineitem | ALL | i_l_shipdate | NULL | NULL | NULL | 59711977 | Using where; Using temporary; Using filesort | +------+-------------+----------+------+---------------+------+---------+------+----------+----------------------------------------------+
- l_shipdate < date_sub('1998-12-01', interval 79 day) is satisifed by 59,334,576 rows.
- The table has 59,986,052 rows in total.
- There are a total of 4 different values of
(l_returnflag,l_linestatus)
. This means, sorting doesn't matter, and temporary table is a very small heap table.
Q2
plans starting with table "part" ... - 8sec on cold cache. scale=10 (scale=30 ETA 1min)
Q3
select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-15' and l_shipdate > date '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
There seems to be an improvement in mysql-5.6: http://jorgenloland.blogspot.ru/2013/02/dbt-3-q3-6-x-performance-in-mysql-5610.html
(speedup can be observed only when the query is in the form like the above (TODO: figure out where do different forms of queries come from?))
EXPLAINs (scale=10):
+------+-------------+----------+--------+---------------------------------------------------------+---------+---------+----------------------------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+--------+---------------------------------------------------------+---------+---------+----------------------------+----------+----------------------------------------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15115145 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | customer | eq_ref | PRIMARY | PRIMARY | 4 | dbt3sf10.orders.o_custkey | 1 | Using where | | 1 | SIMPLE | lineitem | ref | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | dbt3sf10.orders.o_orderkey | 2 | Using where | +------+-------------+----------+--------+---------------------------------------------------------+---------+---------+----------------------------+----------+----------------------------------------------+
+------+-------------+----------+--------+---------------------------------------------------------+----------------+---------+----------------------------+---------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+--------+---------------------------------------------------------+----------------+---------+----------------------------+---------+---------------------------------------------------------------------+ | 1 | SIMPLE | orders | range | PRIMARY,i_o_date_clerk | i_o_date_clerk | 4 | NULL | 7557572 | Using index condition; Using where; Using temporary; Using filesort | | 1 | SIMPLE | customer | eq_ref | PRIMARY | PRIMARY | 4 | dbt3sf10.orders.o_custkey | 1 | Using where | | 1 | SIMPLE | lineitem | ref | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | dbt3sf10.orders.o_orderkey | 2 | Using where | +------+-------------+----------+--------+---------------------------------------------------------+----------------+---------+----------------------------+---------+---------------------------------------------------------------------+
With statistics on 'building': we get a plan of customer, orders, lineitem. It is 5% worse. There seems to be no other possibilities.
Q5
Nothing good so far.
watch the "c_nationkey = s_nationkey" condition. It is a "side" condition (ie it is not from the "natural" relationships between tables). It is not clear whether accounting for its selectivity will give anything)
Q8
Timour is analyzing this query.
Q9
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------+------+----------+---------------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------+------+----------+---------------------------------------------------------------------------+ | 1 | SIMPLE | nation | ALL | PRIMARY | NULL | NULL | NULL | 25 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | supplier | ref | PRIMARY,i_s_nationkey | i_s_nationkey | 5 | dbt3.nation.n_nationkey | 4077 | 100.00 | Using index | | 1 | SIMPLE | partsupp | ref | PRIMARY,i_ps_partkey,i_ps_suppkey | i_ps_suppkey | 4 | dbt3.supplier.s_suppkey | 38 | 100.00 | Using join buffer (flat, BKA join); Key-ordered Rowid-ordered scan | | 1 | SIMPLE | part | eq_ref | PRIMARY | PRIMARY | 4 | dbt3.partsupp.ps_partkey | 1 | 100.00 | Using where; Using join buffer (incremental, BKA join); Key-ordered scan | | 1 | SIMPLE | lineitem | ref | PRIMARY,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_suppkey_partkey | 10 | dbt3.partsupp.ps_partkey,dbt3.supplier.s_suppkey | 3 | 100.00 | Using join buffer (incremental, BKA join); Key-ordered Rowid-ordered scan | | 1 | SIMPLE | orders | eq_ref | PRIMARY | PRIMARY | 4 | dbt3.lineitem.l_orderkey | 1 | 100.00 | Using join buffer (incremental, BKA join); Key-ordered scan | +------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------+------+----------+---------------------------------------------------------------------------+
Watch for "p_name like ..." condition. What if we force table part to be the 1st.
Q20
- This query will benefit from MWL#253 (Exact index stats), and MDEV-83 (Cost-based choice for the pushdown of subqueries to joined tables).
explain extended select sql_calc_found_rows s_name, s_address from supplier, nation where s_suppkey in (select ps_suppkey from partsupp where ps_partkey in (select p_partkey from part where p_name like 'g%') and ps_availqty > (select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('1993-01-01') and l_shipdate < date('1993-01-01') + interval '1' year )) and s_nationkey = n_nationkey and n_name = 'UNITED STATES' order by s_name limit 10;