MariaDB ColumnStore distributed aggregate functions usable in SELECT, HAVING, and ORDER BY: AVG, COUNT, MAX, MIN, CORR, COVAR_POP, COVAR_SAMP, and other statistical aggregates.
Functions
MariaDB ColumnStore supports the following aggregate functions, these can be used in the SELECT, HAVING, and ORDER BY clauses of the SQL statement.
Function
Description
AVG([DISTINCT] column)
Average value of a numeric (INT variations, NUMERIC, DECIMAL) column
CORR (ColumnY, ColumnX)
The correlation coefficient for non-null pairs in a group.
COUNT (*, [DISTINCT] column)
The number of rows returned by a query or grouping. All datatypes are supported
COVAR_POP (ColumnY, ColumnX)
The population covariance for non-null pairs in a group.
COVAR_SAMP (ColumnY, ColumnX)
The sample covariance for non-null pairs in a group.
MAX ([DISTINCT] column)
The maximum value of a column. All datatypes are supported.
MIN ([DISTINCT] column)
The maximum value of a column. All datatypes are supported.
REGR_AVGX (ColumnY, ColumnX)
Average of the independent variable (sum(ColumnX)/N), where N is number of rows processed by the query
REGR_AVGY (ColumnY, ColumnX)
Average of the dependent variable (sum(ColumnY)/N), where N is number of rows processed by the query
REGR_COUNT (ColumnY, ColumnX)
The total number of input rows in which both column Y and column X are nonnull
REGR_INTERCEPT (ColumnY, ColumnX)
The y-intercept of the least-squares-fit linear equation determined by the (ColumnX, ColumnY) pairs
REGR_R2(ColumnY, ColumnX)
Square of the correlation coefficient. correlation coefficient is the regr_intercept(ColumnY, ColumnX) for linear model
REGR_SLOPE(ColumnY, ColumnX)
The slope of the least-squares-fit linear equation determined by the (ColumnX, ColumnY) pairs
REGR_SXX(ColumnY, ColumnX)
REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs.
REGR_SXY(ColumnY, ColumnX)
REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs.
REGR_SYY(ColumnY, ColumnX)
REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs.
STD(), STDDEV(), STDDEV_POP()
The population standard deviation of a numeric (INT variations, NUMERIC, DECIMAL) column
STDDEV_SAMP()
The sample standard deviation of a numeric (INT variations, NUMERIC, DECIMAL) column
SUM([DISTINCT] column)
The sum of a numeric (INT variations, NUMERIC, DECIMAL) column
VARIANCE(), VAR_POP()
The population standard variance of a numeric (INT variations, NUMERIC, DECIMAL) column
VAR_SAMP()
The population standard variance of a numeric (INT variations, NUMERIC, DECIMAL) column
Regression functions (REGR_AVGX to REGR_YY), CORR, COVAR_POP and COVAR_SAMP are supported for version 1.2.0 and higher
Example
An example group by query using aggregate functions is:
SELECT year(o_orderdate) order_year,
AVG(o_totalprice) avg_totalprice,
MAX(o_totalprice) max_totalprice,
COUNT(*) order_count
FROM orders
GROUP BY order_year
ORDER BY order_year;