CONNECT PIVOT Table Type

This table type can be used to transform the result of another table or view (called the source table) into a pivoted table along “pivot” and “facts” columns. A pivot table is a great reporting tool that sorts and sums (by default) independent of the original data layout in the source table.

For example, let us suppose you have the following “Expenses” table:

WhoWeekWhatAmount
Joe3Beer18.00
Beth4Food17.00
Janet5Beer14.00
Joe3Food12.00
Joe4Beer19.00
Janet5Car12.00
Joe3Food19.00
Beth4Beer15.00
Janet5Beer19.00
Joe3Car20.00
Joe4Beer16.00
Beth5Food12.00
Beth3Beer16.00
Joe4Food17.00
Joe5Beer14.00
Janet3Car19.00
Joe4Food17.00
Beth5Beer20.00
Janet3Food18.00
Joe4Beer14.00
Joe5Food12.00
Janet3Beer18.00
Janet4Car17.00
Janet5Food12.00

Pivoting the table contents using the 'Who' and 'Week' fields for the left columns, and the 'What' field for the top heading and summing the 'Amount' fields for each cell in the new table, gives the following desired result:

WhoWeekBeerCarFood
Beth316.000.000.00
Beth415.000.0017.00
Beth520.000.0012.00
Janet318.0019.0018.00
Janet40.0017.000.00
Janet533.0012.0012.00
Joe318.0020.0031.00
Joe449.000.0034.00
Joe514.000.0012.00

Note that SQL enables you to get the same result presented differently by using the “group by” clause, namely:

select who, week, what, sum(amount) from expenses
       group by who, week, what;

However there is no way to get the pivoted layout shown above just using SQL. Even using embedded SQL programming for some DBMS is not quite simple and automatic.

The Pivot table type of CONNECT makes doing this much simpler.

Using the PIVOT Tables Type

To get the result shown in the example above, just define it as a new table with the statement:

create table pivex
engine=connect table_type=pivot tabname=expenses;

You can now use it as any other table, for instance to display the result shown above, just say:

select * from pivex;

The CONNECT implementation of the PIVOT table type does much of the work required to transform the source table:

  1. Finding the “Facts” column, by default the last column of the source table. Finding “Facts” or “Pivot” columns work only for table based pivot tables. They do not for view or srcdef based pivot tables, for which they must be explicitly specified.
  2. Finding the “Pivot” column, by default the last remaining column.
  3. Choosing the aggregate function to use, “SUM” by default.
  4. Constructing and executing the “Group By” on the “Facts” column, getting its result in memory.
  5. Getting all the distinct values in the “Pivot” column and defining a “Data” column for each.
  6. Spreading the result of the intermediate memory table into the final table.

The source table “Pivot” column must not be nullable (there are no such things as a “null” column) The creation will be refused even is this nullable column actually does not contain null values.

If a different result is desired, Create Table options are available to change the defaults used by Pivot. For instance if we want to display the average expense for each person and product, spread in columns for each week, use the following statement:

create table pivex2
engine=connect table_type=pivot tabname=expenses
option_list='PivotCol=Week,Function=AVG';

Now saying:

select * from pivex2;

Will display the resulting table:

WhoWhat345
BethBeer16.0015.0020.00
BethFood0.0017.0012.00
JanetBeer18.000.0016.50
JanetCar19.0017.0012.00
JanetFood18.000.0012.00
JoeBeer18.0016.3314.00
JoeCar20.000.000.00
JoeFood15.5017.0012.00

Restricting the Columns in a Pivot Table

Let us suppose that we want a Pivot table from expenses summing the expenses for all people and products whatever week it was bought. We can do this just by removing from the pivex table the week column from the column list.

alter table pivex drop column week;

The result we get from the new table is:

WhoBeerCarFood
Beth51.000.0029.00
Janet51.0048.0030.00
Joe81.0020.0077.00

Note: Restricting columns is also needed when the source table contains extra columns that should not be part of the pivot table. This is true in particular for key columns that prevent a proper grouping.

PIVOT Create Table Syntax

The Create Table statement for PIVOT tables uses the following syntax:

create table pivot_table_name
[(column_definition)]
engine=CONNECT table_type=PIVOT
{tabname='source_table_name' | srcdef='source_table_def'}
[option_list='pivot_table_option_list'];

The column definition has two sets of columns:

  1. A set of columns belonging to the source table, not including the “facts” and “pivot” columns.
  2. “Data” columns receiving the values of the aggregated “facts” columns named from the values of the “pivot” column. They are indicated by the “flag” option.

The options and sub-options available for Pivot tables are:

OptionTypeDescription
Tabname[DB.]NameThe name of the table to “pivot”. If not set SrcDef must be specified.
SrcDefSQL_statementThe statement used to generate the intermediate mysql table.
DBnamenameThe name of the database containing the source table. Defaults to the current database.
Function* nameThe name of the aggregate function used for the data columns, SUM by default.
PivotCol* nameSpecifies the name of the Pivot column whose values are used to fill the “data” columns having the flag option.
FncCol* [func(]name[)]Specifies the name of the data “Facts” column. If the form func(name) is used, the aggregate function name is set to func.
Groupby* BooleanSet it to True (1 or Yes) if the table already has a GROUP BY format.
Accept* BooleanTo accept non matching Pivot column values.
  • : These options must be specified in the OPTION_LIST.

Additional Access Options

There are four cases where pivot must call the server containing the source table or on which the SrcDef statement must be executed:

1. The source table is not a CONNECT table. 2. The SrcDef option is specified. 3. The source table is on another server. 4. The columns are not specified.

By default, pivot tries to call the currently used server using host=localhost, user=root not using password, and port=3306. However, this may not be what is needed, in particular if the local root user has a password in which case you can get an “access denied” error message when creating or using the pivot table.

Specify the host, user, password and/or port options in the option_list to override the default connection options used to access the source table, get column specifications, execute the generated group by or SrcDef query.

Defining a Pivot Table

There are principally two ways to define a PIVOT table:

1. From an existing table or view. 2. Directly giving the SQL statement returning the result to pivot.

Defining a Pivot Table from a Source Table

The tabname standard table option is used to give the name of the source table or view.

For tables, the internal Group By will be internally generated, except when the GROUPBY option is specified as true. Do it only when the table or view has a valid GROUP BY format.

Directly Defining the Source of a Pivot Table in SQL

Alternatively, the internal source can be directly defined using the SrcDef option that must have the proper group by format.

As we have seen above, a proper Pivot Table is made from an internal intermediate table resulting from the execution of a GROUP BY statement. In many cases, it is simpler or desirable to directly specify this when creating the pivot table. This may be because the source is the result of a complex process including filtering and/or joining tables.

To do this, use the SrcDef option, often replacing all other options. For instance, suppose that in the first example we are only interested in weeks 4 and 5. We could of course display it by:

select * from pivex where week in (4,5);

However, what if this table is a huge table? In this case, the correct way to do it is to define the pivot table as this:

create table pivex4
engine=connect table_type=pivot
option_list='PivotCol=what,FncCol=amount'
SrcDef='select who, week, what, sum(amount) from expenses
where week in (4,5) group by who, week, what';

If your source table has millions of records and you plan to pivot only a small subset of it, doing so will make a lot of a difference performance wise. In addition, you have entire liberty to use expressions, scalar functions, aliases, join, where and having clauses in your SQL statement. The only constraint is that you are responsible for the result of this statement to have the correct format for the pivot processing.

Using SrcDef also permits to use expressions and/or scalar functions. For instance:

create table xpivot (
Who char(10) not null,
What char(12) not null,
First double(8,2) flag=1,
Middle double(8,2) flag=1,
Last double(8,2) flag=1)
engine=connect table_type=PIVOT
option_list='PivotCol=wk,FncCol=amnt'
Srcdef='select who, what, case when week=3 then ''First'' when
week=5 then ''Last'' else ''Middle'' end as wk, sum(amount) *
6.56 as amnt from expenses group by who, what, wk';

Now the statement:

select * from xpivot;

Will display the result:

WhoWhatFirstMiddleLast
BethBeer104.9698.40131.20
BethFood0.00111.5278.72
JanetBeer118.080.00216.48
JanetCar124.64111.5278.72
JanetFood118.080.0078.72
JoeBeer118.08321.4491.84
JoeCar131.200.000.00
JoeFood203.36223.0478.72

Note 1: to avoid multiple lines having the same fixed column values, it is mandatory in SrcDef to place the pivot column at the end of the group by list.

Note 2: in the create statement SrcDef, it is mandatory to give aliases to the columns containing expressions so they are recognized by the other options.

Note 3: in the SrcDef select statement, quotes must be escaped because the entire statement is passed to MariaDB between quotes. Alternatively, specify it between double quotes.

Note 4: We could have left CONNECT do the column definitions. However, because they are defined from the sorted names, the Middle column had been placed at the end of them.

Specifying the Columns Corresponding to the Pivot Column

These columns must be named from the values existing in the “pivot” column. For instance, supposing we have the following pet table:

nameracenumber
Johndog2
Billcat1
Marydog1
Marycat1
Lisbethrabbit2
Kevincat2
Kevinbird6
Donalddog1
Donaldfish3

Pivoting it using race as the pivot column is done with:

create table pivet
engine=connect table_type=pivot tabname=pet
option_list='PivotCol=race,groupby=1';

This gives the result:

namedogcatrabbitbirdfish
John20000
Bill01000
Mary11000
Lisbeth00200
Kevin02060
Donald10003

By the way, does this ring a bell? It shows that in a way PIVOT tables are doing the opposite of what OCCUR tables do.

We can alternatively define specifically the table columns but what happens if the Pivot column contains values that is not matching a “data” column? There are three cases depending on the specified options and flags.

First case: If no specific options are specified, this is an error an when trying to display the table. The query will abort with an error message stating that a non-matching value was met. Note that because the column list is established when creating the table, this is prone to occur if some rows containing new values for the pivot column are inserted in the source table. If this happens, you should re-create the table or manually add the new columns to the pivot table.

Second case: The accept option was specified. For instance:

create table xpivet2 (
name varchar(12) not null,
dog int not null default 0 flag=1,
cat int not null default 0 flag=1)
engine=connect table_type=pivot tabname=pet
option_list='PivotCol=race,groupby=1,Accept=1';

No error will be raised and the non-matching values will be ignored. This table will be displayed as:

namedogcat
John20
Bill01
Mary11
Lisbeth00
Kevin02
Donald10

Third case: A “dump” column was specified with the flag value equal to 2. All non-matching values will be added in this column. For instance:

create table xpivet (
name varchar(12) not null,
dog int not null default 0 flag=1,
cat int not null default 0 flag=1,
other int not null default 0 flag=2)
engine=connect table_type=pivot tabname=pet
option_list='PivotCol=race,groupby=1';

This table will be displayed as:

namedogcatother
John200
Bill010
Mary110
Lisbeth002
Kevin026
Donald103

It is a good idea to provide such a “dump” column if the source table is prone to be inserted new rows that can have a value for the pivot column that did not exist when the pivot table was created.

Pivoting Big Source Tables

This may sometimes be risky. If the pivot column contains too many distinct values, the resulting table may have too many columns. In all cases the process involved, finding distinct values when creating the table or doing the group by when using it, can be very long and sometimes can fail because of exhausted memory.

Restrictions by a where clause should be applied to the source table when creating the pivot table rather than to the pivot table itself. This can be done by creating an intermediate table or using as source a view or a srcdef option.

All PIVOT tables are read only.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.