1 of 1

ColumnStore Window Functions

Review window function support in ColumnStore. This page details the specific window functions and limitations applicable when using the ColumnStore storage engine.

Introduction

MariaDB ColumnStore provides support for window functions broadly following the SQL 2003 specification. A window function allows for calculations relating to a window of data surrounding the current row in a result set. This capability provides for simplified queries in support of common business questions such as cumulative totals, rolling averages, and top 10 lists.

Aggregate functions are utilized for window functions however differ in behavior from a group by query because the rows remain ungrouped. This provides support for cumulative sums and rolling averages, for example.

Two key concepts for window functions are Partition and Frame:

A Partition is a group of rows, or window, that have the same value for a specific column, for example a Partition can be created over a time period such as a quarter or lookup values.
The Frame for each row is a subset of the row's Partition. The frame typically is dynamic allowing for a sliding frame of rows within the Partition. The Frame determines the range of rows for the windowing function. A Frame could be defined as the last X rows and next Y rows all the way up to the entire Partition.

Window functions are applied after joins, group by, and having clauses are calculated.

Syntax

A window function is applied in the select clause using the following syntax:

where window_definition is defined as:

PARTITION BY:

Divides the window result set into groups based on one or more expressions.
An expression may be a constant, column, and non window function expressions.
A query is not limited to a single partition by clause. Different partition clauses can be used across different window function applications.
The partition by columns do not need to be in the select list but do need to be available from the query result set.

ORDER BY:

Defines the ordering of values within the partition.
Can be ordered by multiple keys which may be a constant, column or non window function expression.
The order by columns do not need to be in the select list but need to be available from the query result set.
Use of a select column alias from the query is not supported.

and the optional frame_clause is defined as:

and the optional frame_start and frame_end are defined as (value being a numeric expression):

RANGE/ROWS:

Defines the windowing clause for calculating the set of rows that the function applies to for calculating a given rows window function result.
Requires an ORDER BY clause to define the row order for the window.
ROWS specify the window in physical units, i.e. result set rows and must be a constant or expression evaluating to a positive numeric value.

Supported Functions

Function

Description

Examples

Example Schema

The examples are all based on the following simplified sales opportunity table:

Some example values are (thanks to for sample data generation):

accountName

name

owner

amount

closeDate

stageName

The schema, sample data, and queries are available as an attachment to this article.

Cumulative Sum and Running Max Example

Window functions can be used to achieve cumulative / running calculations on a detail report. In this case a won opportunity report for a 7 day period adds columns to show the accumulated won amount as well as the current highest opportunity amount in preceding rows.

with example results:

owner

accountName

CloseDate

amount

cumeWon

runningMax

Partitioned Cumulative Sum and Running Max Example

The above example can be partitioned, so that the window functions are over a particular field grouping such as owner and accumulate within that grouping. This is achieved by adding the syntax "partition by" in the window function clause.

With example results:

owner

accountName

CloseDate

amount

cumeWon

runningMax

Ranking / Top Results

The rank window function allows for ranking or assigning a numeric order value based on the window function definition. Using the Rank() function will result in the same value for ties / equal values and the next rank value skipped. The Dense_Rank() function behaves similarly except the next consecutive number is used after a tie rather than skipped. The Row_Number() function will provide a unique ordering value. The example query shows the Rank() function being applied to rank sales reps by the number of opportunities for Q4 2016.

with example results (note the query is technically incorrect by using closeDate < '2016-12-31' however this creates a tie scenario for illustrative purposes):

owner

wonCount

rank

If the dense_rank function is used the rank values would be 1,2,3,3,4 and for the row_number function the values would be 1,2,3,4,5.

First and Last Values

The first_value and last_value functions allow determining the first and last values of a given range. Combined with a group by this allows summarizing opening and closing values. The example shows a more complex case where detailed information is presented for first and last opportunity by quarter.

with example results:

year

quarter

firstAccountName

firstOwner

firstAmount

lastAccountName

lastOwner

lastAmount

Prior and Next Example

Sometimes it useful to understand the previous and next values in the context of a given row. The lag and lead window functions provide this capability. By default the offset is one providing the prior or next value but can also be provided to get a larger offset. The example query is a report of opportunities by account name showing the opportunity amount, and the prior and next opportunity amount for that account by close date.

With example results:

accountName

closeDate

currentOppAmount

priorAmount

nextAmount

Quartiles Example

The NTile window function allows for breaking up a data set into portions assigned a numeric value to each portion of the range. NTile(4) breaks the data up into quartiles (4 sets). The example query produces a report of all opportunities summarizing the quartile boundaries of amount values.

With example results:

quartile

min

max

Percentile Example

The percentile functions have a slightly different syntax from other window functions as can be seen in the example below. These functions can be only applied against numeric values. The argument to the function is the percentile to evaluate. Following 'within group' is the sort expression which indicates the sort column and optionally order. Finally after 'over' is an optional partition by clause, for no partition clause use 'over ()'. The example below utilizes the value 0.5 to calculate the median opportunity amount in the rows. The values differ sometimes because percentile_cont will return the average of the 2 middle rows for an even data set while percentile_desc returns the first encountered in the sort.

With example results:

owner

accountName

CloseDate

amount

pct_cont

pct_disc

_{This page is licensed: CC BY-SA / Gnu FDL}

ColumnStore Window Functions

Review window function support in ColumnStore. This page details the specific window functions and limitations applicable when using the ColumnStore storage engine.

Introduction

Two key concepts for window functions are Partition and Frame:

A Partition is a group of rows, or window, that have the same value for a specific column, for example a Partition can be created over a time period such as a quarter or lookup values.
The Frame for each row is a subset of the row's Partition. The frame typically is dynamic allowing for a sliding frame of rows within the Partition. The Frame determines the range of rows for the windowing function. A Frame could be defined as the last X rows and next Y rows all the way up to the entire Partition.

Window functions are applied after joins, group by, and having clauses are calculated.

Syntax

A window function is applied in the select clause using the following syntax:

where window_definition is defined as:

PARTITION BY:

Divides the window result set into groups based on one or more expressions.
An expression may be a constant, column, and non window function expressions.
A query is not limited to a single partition by clause. Different partition clauses can be used across different window function applications.
The partition by columns do not need to be in the select list but do need to be available from the query result set.

ORDER BY:

Defines the ordering of values within the partition.
Can be ordered by multiple keys which may be a constant, column or non window function expression.
The order by columns do not need to be in the select list but need to be available from the query result set.
Use of a select column alias from the query is not supported.

and the optional frame_clause is defined as:

and the optional frame_start and frame_end are defined as (value being a numeric expression):

RANGE/ROWS:

Defines the windowing clause for calculating the set of rows that the function applies to for calculating a given rows window function result.
Requires an ORDER BY clause to define the row order for the window.
ROWS specify the window in physical units, i.e. result set rows and must be a constant or expression evaluating to a positive numeric value.

Supported Functions

Function

Description

Examples

Example Schema

The examples are all based on the following simplified sales opportunity table:

Some example values are (thanks to for sample data generation):

accountName

name

owner

amount

closeDate

stageName

The schema, sample data, and queries are available as an attachment to this article.

Cumulative Sum and Running Max Example

with example results:

owner

accountName

CloseDate

amount

cumeWon

runningMax

Partitioned Cumulative Sum and Running Max Example

With example results:

owner

accountName

CloseDate

amount

cumeWon

runningMax

Ranking / Top Results

with example results (note the query is technically incorrect by using closeDate < '2016-12-31' however this creates a tie scenario for illustrative purposes):

owner

wonCount

rank

If the dense_rank function is used the rank values would be 1,2,3,3,4 and for the row_number function the values would be 1,2,3,4,5.

First and Last Values

with example results:

year

quarter

firstAccountName

firstOwner

firstAmount

lastAccountName

lastOwner

lastAmount

Prior and Next Example

With example results:

accountName

closeDate

currentOppAmount

priorAmount

nextAmount

Quartiles Example

With example results:

quartile

min

max

Percentile Example

With example results:

owner

accountName

CloseDate

amount

pct_cont

pct_disc

_{This page is licensed: CC BY-SA / Gnu FDL}