Analyzing Queries in ColumnStore

You are viewing an old version of this article. View the current version here.

Active SQL statement

show processlist

The MariaDB show processlist command may be used to see a list of active queries.

show processlist;
+----+------+-----------+-------+---------+------+-------+--------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+-------+---------+------+-------+--------------+
| 73 | root | localhost | ssb10 | Query | 0 | NULL | show processlist
+----+------+-----------+-------+---------+------+-------+--------------+
1 row in set (0.01 sec)

getActiveSQLStatements

getActiveSQLStatements is a mcsdmin command that shows which SQL statements are currently being executed on the database:

mcsadmin> getActiveSQLStatements
getactivesqlstatements Wed Oct 7 08:38:32 2015
Get List of Active SQL Statements
=================================
Start Time    Time (hh:mm:ss) Session ID SQL Statement
---------------- ---------------- -------------------- ------------------------------------------------------------
Oct 7 08:38:30    00:00:03       73 select c_name,sum(lo_revenue) from customer, lineorder where lo_custkey = c_custkey and c_custkey = 6 group by c_name

Single query statistics

calgetstats

This SQL stored function provides statistics about resources used on the User Module (UM) node, PM node, and network by the last run query. Example:

select count(*) from wide2;
+----------+                                       
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (0.22 sec)

select calgetstats();
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| calgetstats()                                                                                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Stats: MaxMemPct-0; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-1931; CacheI/O-2446; BlocksTouched-2443; PartitionBlocksEliminated-0; MsgBytesIn-73KB; MsgBytesOut-1KB; Mode-Distributed |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

The output contains information on:

  • MaxMemPct - Peak memory utilization on the User Module, likely in support of a large (User Module) based hash join operation.
  • NumTempFiles - Report on any temporary files created in support of query operations larger than available memory, typically for unusual join operations where the smaller table join cardinality exceeds some configurable threshold.
  • TempFileSpace - Report on space used by temporary files created in support of query operations larger than available memory, typically for unusual join operations where the smaller table join cardinality exceeds some configurable threshold.
  • PhyI/O - Number of 8k blocks read from disk, SSD, or other persistent storage.
  • CacheI/O - Approximate number of 8k blocks processed in memory, adjusted down by the number of discrete PhyI/O calls required.
  • BlocksTouched - Approximate number of 8k blocks processed in memory.
  • PartitionBlocksEliminated - The number of block touches eliminated via the Extent Map elimination behavior.
  • MsgBytesIn, MsgByteOut - Message size in MB sent between nodes in support of the query.

Query plan

While the MariaDB Server's EXPLAIN utility can be used to look at the query plan, it is somewhat less helpful for ColumnStore tables as ColumnStore does not use indexes or make use of MariaDB I/O functionality. The execution plan for a query on a ColumnStore table is made up of multiple steps. Each step in the query plan performs a set of operations that are issued from the User Module to the set of Performance Modules in support of a given step in a query.

  • Full Column Scan - an operation that scans each entry in a column using all available threads on the Performance Modules. Speed of operation is generally related to the size of the data type and the total number of rows in the column. The closest analogy for a traditional system is an index scan operation.
  • Partitioned Column Scan - an operation that uses the Extent Map to identify that certain portions of the column do not contain any matching values for a given set of filters. The closest analogy for a traditional row-based DBMS is a partitioned index scan, or partitioned table scan operation.
  • Column lookup by row offset - once the set of matching filters have been applied and the minimal set of rows have been identified, additional blocks are requested using a calculation that determines exactly which block is required. The closest analogy for a traditional system is a lookup by rowid.

These operations are automatically executed together in order to execute appropriate filters and column lookup by row offset.

Viewing the ColumnStore query plan

In MariaDB ColumnStore there is a set of SQL tracing stored functions provided to see the distributed query execution plan between the UM and the PM.

The basic steps to using these SQL tracing stored functions are:

  1. Start the trace for the particular session.
  2. Execute the SQL statement in question.
  3. Review the trace collected for the statement. As an example, the following session starts a trace, issues a query against a 6 million row fact table and 300,000 row dimension table, and then reviews the output from the trace:
select calsettrace(1);
+----------------+
| calsettrace(1) |
+----------------+
| 1              |
+----------------+
1 row in set (0.04 sec)

select c_name, sum(lo_revenue)
-> from customer, lineorder
-> where lo_custkey = c_custkey and
-> c_custkey = 5
-> group by c_name;
+--------------------+-----------------+
| c_name | sum(lo_revenue) |
+--------------------+-----------------+
| Customer#000000005 | 552483078.00 |
+--------------------+-----------------+
1 row in set, 0 warning (1.23 sec)

select calgettrace();
+-----------------------------------------------------------------------------------+
| calgettrace()
|+------------------------------------------------------------------------------------+
| 
Desc Mode Table TableOID ReferencedOIDs PIO LIO PBE Elapsed Rows
BPS  PM   customer  3215  (3328,3336)        0   150 0   0.017   1  
BPS  PM   lineorder 3278  (3281,3291)    4096   29808 0  0 0.766 63  
HJS  PM   lineorder  3278   -        -         -   -   0.000 -   -  
ADS  UM       -          -        -        -   -    -   0.724    -   
|
+-----------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

Query statistics history

MariaDB ColumnStore query statistics history can be retrieved for analysis. By default the query stats collection is disabled. To enable the collection of query stats, the <QueryStats><Enabled> element in the ColumnStore.XML configuration file should be set to Y (default is N).

<QueryStats>
<Enabled>Y</Enabled>
</QueryStats>

Cross Engine Support must also be enabled before enabling. See the Cross Engine Configuration section.

When enabled the history of query statistics across all sessions along with execution time, and those stats provided by calgetstats() is stored in a table in the infinidb_querystats schema. Only queries in the following ColumnStore syntax are available for statistics monitoring:

  • SELECT
  • INSERT
  • UPDATE
  • DELETE
  • INSERT SELECT
  • LOAD DATA INFILE

Query statistics table

When QueryStats is enabled, the query statistics history is collected in the querystats table in the infinidb_querystats schema.

The columns of this table are:

  • Session ID (sessionID) The session number that executed the statement.
  • Host (host) The host that executed the statement.
  • User ID (user) The user that executed the statement.
  • Priority (priority) The priority the user has for this statement.
  • Query Execution Times (startTime, endTime) Calculated as end time – start time. start time: the time that the query gets to ExeMgr, DDLProc, or DMLProc end time: the time that the last result packet exits ExeMgr, DDLProc or DMLProc
  • Rows returned or affected (rows) The number of rows returned for SELECT queries, or the number of rows affected by DML queries. Not valid for DDL and other query types.
  • Error Number (errNo) The IDB error number if this query failed, 0 if it succeeded.
  • Physical I/O (phyIO) The number of blocks that the query accessed from the disk, including the pre-fetch blocks. This statistic is only valid for the queries that are processed by ExeMgr, i.e. SELECT, DML with WHERE clause, and INSERT SELECT.
  • Cache I/O (cacheIO) The number of blocks that the query accessed from the cache. This statistic is only valid for queries that are processed by ExeMgr, i.e. SELECT, DML with WHERE clause, and INSERT SELECT.
  • Blocks Touched (blocksTouched) The total number of blocks that the query accessed physically and from the cache. This should be equal or less than the sum of physical I/O and cache I/O. This statistic is only valid for queries that are processed by ExeMgr, i.e. SELECT, DML with WHERE clause, and INSERT SELECT.
  • Partition Blocks Eliminated (CPBlocksSkipped) The number of blocks being eliminated by the extent map casual partition. This statistic is only valid for queries that are processed by ExeMgr, i.e. SELECT, DML with WHERE clause, and INSERT SELECT.
  • Messages from UM to PM (msgOutUM) The number of messages in bytes that ExeMgr sends to the PrimProc. If a message needs to be distributed to all the PMs, the sum of all the distributed messages will be counted. Only valid for queries that are processed by ExeMgr, i.e. SELECT, DML with WHERE clause, and INSERT SELECT.
  • Messages from PM to UM (msgInUM) The number of messages in bytes that PrimProc sends to the ExeMgr. Only valid for queries that are processed by ExeMgr, i.e. SELECT, DML with where clause, and INSERT SELECT.
  • Memory Utilization (maxMemPct) This field shows memory utilization for the User Module (UM) in support of any UM join, group by, aggregation, distinct, or other operation.
  • Blocks Changed (blocksChanged) Total number of blocks that queries physically changed on disk. This is only for delete/update statements.
  • Temp Files (numTempFiles) This field shows any temporary file utilization for the User Module (UM) in support of any UM join, group by, aggregation, distinct, or other operation.
  • Temp File Space (tempFileSpace) This shows the size of any temporary file utilization for the User Module (UM) in support of any UM join, group by, aggregation, distinct, or other operation.

Query statistics viewing

Users can view the query statistics by selecting the rows from the query stats table in the infinidb_querystats schema. Examples listed below:

  • Example 1: List execution time, rows returned for all the select queries within the past 12 hours:
select queryid, query, endtime-starttime, rows from querystats 
where starttime >= now() - interval 12 hour and querytype = 'SELECT'; 
  • Example 2: List the three slowest running select queries of session 2 within the past 12 hours:
select a.* from (select endtime-starttime execTime, query from queryStats 
where sessionid = 2 and querytype = 'SELECT' and starttime >= now()-interval 12 hour
order by 1 limit 3) a;
  • Example 3: List the average, min and max running time of all the INSERT SELECT queries within the past 12 hours:
select min(endtime-starttime), max(endtime-starttime), avg(endtime-starttime) from querystats 
where querytype='INSERT SELECT' and starttime >= now() - interval 12 hour;

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.