Understanding MariaDB architecture
MariaDB architecture is partly different from the architecture of traditional DBMSs, like SQL Server. Here we will examine the main components that a new MariaDB DBA needs to know. We will also discuss a bit of history, because this may help understand MariaDB philosophy and certain design choices.
This section is an overview of the most important components. More information is included in specific section of this migration guide, or in other pages of the MariaDB KnowledgeBase (see the links).
MariaDB was born from the source code of MySQL, in 2008. Therefore, its history begins with MySQL.
MySQL was born at the beginning of the 90s. Back in the days, if compared to its existing competitors, MySQL was lightweight, simple to install, easy to learn. While it had a very limited set of features, it was also fast in certain common operations. And it was open source. These characteristics made it suitable to back the simple websites that existed at that time.
The web evolved rapidly, and the same happened to MySQL. Being open source helped a lot in this respect. Because the community needed functionalities that weren’t supported at that time.
MySQL was probably the first database system to support a pluggable storage engine architecture. Basically, this means that MySQL knows very little about creating or populating a table, reading from it, building proper indexes and caches. It just delegated all these operations to a special plugin type called storage engines.
One of the first plugins developed by third parties was InnoDB. It is very fast, and it adds two important features that are not supported otherwise: transactions and foreign keys. Nowadays InnoDB is the default MariaDB storage engine, and it is the best choice for most use cases. But for particular needs, sometimes using a different storage engine is desirable.
Note that when MariaDB asks a storage engine to write or read a row, the storage engine could theoretically do anything. This led to the creation of very interesting alternative engines, like BLACKHOLE (which doesn’t write or read any data, acting like the /dev/null file in Linux), or CONNECT (which can read and write to files written in many different formats, or remote DBMSs, or some other special data sources).
When we create a table, we specify its storage engine. It is possible to convert an existing table to another storage engine, thought this is a blocking operation which requires a complete table copy. 3rd party storage engines can also be installed while MariaDB is running.
Note that it is perfectly possible to use tables with different storage engines in the same transaction (even if some engines are not transactional). It is even possible to use different engines in the same query, for example with JOINs and subqueries.
The binary log
As we mentioned, different tables can be built using different storage engines. It is important to notice that not all engines are transactional, and that different engines implement the transaction logs in different ways. For this reason, MariaDB cannot replicate data from a master to a slave using an equivalent of SQL Server transactional replication.
Instead, it needs a global mechanism to log the changes that are applied to data. This mechanism is the binary log, often abbreviated to binlog.
The binary log can be written in two formats:
- STATEMENT logs SQL statements that modify data;
- ROW logs a reference to the rows that has been modified, if any (usually it’s the primary key), and the new values that have been added or modified, in a binary format.
In most cases, STATEMENT is slower because the SQL statement needs to be re-executed by the slave, and because certain statements may produce a different result in the slave (think about queries that user LIMIT without ORDER BY, or the CURRENT_TIMESTAMP() function). But there are exceptions, and anyway DDL statements are always logged as STATEMENT to avoid flooding the binary log. Therefore, the binary log may well contain both ROW and STATEMENT entries. One can even set binlog_format=MIXED to log changes as STATEMENT, except when they may produce different results on a slave.
The binary log allows:
- incremental backups;
- flashback, which means, seeing data as they were in a point of time in the past;
- restore a backup and re-apply the binary log, with the exception of a data change which caused problems (human mistake, application bug, SQL injection);
- Capture Data Changes (CDC), by streaming the binary log to technologies like Apache Kafka.
The binary log can be inspected using the mysqlbinlog utility, which comes with MariaDB.
As mentioned, storage engines are a special type of plugins. But others exist. For example plugins can add authentication methods, new features, SQL syntax, functions, informative tables, and so on.
Many plugins are installed by default, or available but not installed by default. They can be installed or uninstalled at runtime with SQL statements (INSTALL PLUGIN, INSTALL SONAME, UNINSTALL PLUGIN). 3rd party plugins can be made available for installation by simply copying them to the plugin_dir.
It is important to note that different plugins may have different maturity levels. It is possible to prevent the installation of plugins we don’t consider production-ready by setting the plugin_maturity system variable. For plugins that are distributed with MariaDB, the maturity level is determined by the MariaDB team based on the bugs reported and fixed.
Some plugins are developed by 3rd parties. Even some 3rd party plugins are included in MariaDB official distributions - the ones available in mariadb.org. All plugins distributed with MariaDB are maintained by the MariaDB company or the MariaDB Foundation.
In MariaDB every authorization method (including the default one) is provided by a plugin. A user can be required to use a certain authentication plugin. This gives us a big flexibility and control. Windows users may be interested in gsapi (which supports Windows authentication, Kerberos and NTLM) and name_pipe (which uses named pipe impersonation).
Other plugins that can be very useful include userstat, which includes statistics about resources and table usage, and METADATA_LOCK_INFO, which provides information about metadata locks.
MariaDB supports thread pool. It works differently on UNIX and on Windows. On Windows, it is enabled by default and its implementation is quite similar to SQL Server. It uses the Windows native CreateThreadpool API.