What is Flexviews?

Flexviews is a materialized views implementation for MariaDB, MySQL and its forks. It includes a simple SQL API that is used to create materialized views and to refresh them. The advantage of using Flexviews is that the materialized views are incrementally refreshed, that is, the views are updated efficiently by using special logs which record the changes to database tables. Flexviews includes tools which create and maintain these logs. The views created by Flexviews include support for JOINs and for all major aggregate functions.

Flexviews works well with Shard-Query, that can be used to shard the source data, making the refresh process faster.

What is a materialized view?

A "regular" view is a virtual table representing the result of a database query. Each time the view is accessed, the RDBMS must run the query to produce the result set for the view.

A materialized view is similar to regular view, except that the results are stored into an actual database table, not a virtual one. The result set is effectively cached for a period of time. When the underlying data changes the view becomes stale. Because of this, materialized views must be frequently "refreshed" to bring them up-to-date.

By caching result set data into a real table, the data can be accessed much more quickly, but at the cost of some data being potentially out-of-date. It is most useful in data warehousing scenarios, where frequent queries of the actual base tables can be extremely expensive. Since the materialized view is stored as a real table, you can build indexes on any combination of columns of the view, enabling drastic speedups in response time for queries that access the view.

MariaDB and MySQL do not support materialized views natively. Flexviews creates them as regular tables and keeps them up to date.

Flexviews resources

The swanhart-tools repository, on GitHub, contains tools for MariaDB and MySQL from Justin Swanhart. Flexviews is located in the flexviews directory. The whole project can be downloaded via the GitHub's Download button. It is possible to report bugs using the Issues tab.

A project on Google Code also exists, but it is obsolete. All bugs from Google Code and all downloads from Google Code are deprecated and will be removed.

A Facebook page is also available.

The following articles are also useful:

Requirements

  • MariaDB or MySQL, version 5.1+
  • PHP 5.2+ required, 5.3+ is recommended (for FlexCDC)
    • pcntl extension

MariaDB required settings:

MariaDB suggested settings:

Refresh Methods

Materialized views support two refresh methods: INCREMENTAL or COMPLETE.

INCREMENTAL refresh is preferable when it can be used. Incrementally refreshing the view requires examining only the rows which changed since the view was last refreshed. Flexviews must be instructed about the calculations that it must perform on the data, and this can be done via the SQL_API or via convert.php (see below). However, not all SQL expressions are supported. INCREMENTAL refresh requires the FlexCDC utility, that examines the binary log and records the changes into tables called change logs (one for each materialized view). An SQL event called apply_views periodically applies these changes to the materialized views. This can also be applied manually by calling the refresh_all() stored procedure, or refresh().

COMPLETE refresh is slower, because it rebuild the data in the materialized view each time. The main reason for using this method is that the materialized views can be based on any SQL statement. Also, FlexCDC is not needed for COMPLETE refresh. An SQL event periodically rebuilds the materialized views.

Supported SQL

The materialized views that use the COMPLETE refresh method can be based on any SELECT statement, as long as it is deterministic. However, for the INCREMENTAL method, materialized view's definitions only support the following clauses:

  • COUNT/SUM/AVG/MIN/MAX()
  • COUNT(DISTINCT)
  • GROUP BY
  • INNER JOIN
  • WHERE
  • All non-deterministic expressions.

Unsupported clauses are:

  • UNION
  • Subqueries/Derived tables
  • AVG(DISTINCT)
  • SUM(DISTINCT)
  • GROUP_CONCAT
  • HAVING
  • ORDER BY
  • LEFT/RIGHT/CROSS JOIN
  • All deterministic expressions

Instead of HAVING, you can use a WHERE when querying the materialize view. And since ORDER BY cannot be used in a materialized view's definition, it should be used while querying the view.

Interfaces and tools

SQL API

An SQL API provides the stored routines can be used to create, define and enable materialized views.

The API has a documentation automatically-generated with ROBODoc, located in the manual.html file.

Many routines produce a warning for non-fatal errors. If the @fv_force user-variable is set exactly to TRUE, such warnings are turned into errors.

MariaDB starting with 5.5

On MariaDB 5.5 and above, errors have a 45000 SQLSTATE value, and the property CLASS_ORIGIN (only readable via GET DIAGNOSTICS)is set to flexviews.

MariaDB until 5.5

On older versions, the only significative property is the error message.

convert.php

The convert.php script can be used via the command line to create and enable INCREMENTAL materialized views without directly calling the SQL API. The syntax is:

php convert.php [schema] < file_name

schema, if specified, is the name of the default database. The specified schema is used for:

  • The location into which to place the materialized view(s).
  • The location to use for tables that are not explicitly prefixed with a schema name in the FROM clause of the SELECT statement.

file_name is a file which contains CREATE TABLE ... SELECT and INSERT ... SELECT statements, which will be translated to SQL API calls. Each statement must be terminated with a semicolon (;).

FlexCDC

Flexviews includes FlexCDC - a Change Data Capture utility which makes captures table changes to table change logs, making incremental refreshing of views possible. It is also called the consumer.

Data changes are written by the server to the binary log. FlexCDC detects such changes and writes them to log tables. Log tables are stored in the flexviews database and contain the changes that need to be applied to bring the materialized views up to date.

This image shows how FlexCDC works:

FlexCDC

  1. FlexCDC invokes mysqlbinlog in an external process with the commandline options --base64-output=decode-rows -v. This instructs the utility to present RBR base64 entries as an easily readable SBR notation.
  2. mysqlbinlog connects to MySQL and asks for binary logs. The output from mysqlbinlog is captured by FlexCDC and processed further.
  3. Instead of applying the actual changes, FlexCDC records the changes into log tables. FlexCDC assigns a unique monotonically increasing transaction id to each set of changes. FlexCDC inserts each set changes into one or more table changelogs (one per changed table) in a single transaction.

FlexCDC is written in PHP and is located in the consumer directory. By default, it reads a configuration file (consumer.ini) located in the working directory. There are three required .ini sections: [flexcdc], [source], [dest]. See the consumer.ini.example file for an example and settings descriptions.

After changing the configuration file, setup_flexcdc.php can be used to create the metadata tables. If the script detects a problems, it exists with an error.

FlexCDC can be executed via run_consumer.php. A shell script called consumer_safe.sh is also avaible. It restarts FlexCDC in case it crashes. It is still possible to stop FlexCDC by sending a HUP signal. consumer_safe.sh creates a .pid file, which tells the number of the thread that can receive the HUP.

FlexCDC only works on Unix-like systems.

Comments

Comments loading...
Loading