Announcing MariaDB Community Server 11.7 RC with Vector Search and 11.6 GA

We are pleased to announce the general availability (GA) of MariaDB Community Server 11.6 and the release candidate (RC) of MariaDB Community Server 11.7. Both continue on our commitment to delivering fast innovation to the community.

MariaDB Community Server 11.6 now GA

MariaDB Community Server 11.6 is now available as GA. Starting with this version UNICODE is now the default character set. A new, detailed replication lag representation has been added and the replication status is available as a table in the information schema.

The new PARSEC authentication plugin improves security by introducing salted passwords. Our newest connector release series supports the new authentication.

MariaDB Galera Cluster now has a built-in SST user account management.

More details about the new features available with MariaDB Community Server 11.6 are included in the 11.6 RC blog post.

MariaDB Community Server 11.7 RC with Vector Search

With our next release, MariaDB Community Server 11.7 is a release candidate (RC), with vector search as the cornerstone of the release.

Vector Search

After announcing the preview of vector search in MariaDB Server, the vector search capability has now been added to the MariaDB Community Server 11.7 release. The linked blog post explains the idea of the new vector search feature in detail, based on the implementation of the preview release.

Vector Embeddings

Vector embeddings are numerical representations [0.2, -0.5, 0.8, 0.1, …] that capture semantic meaning or features of data in a multi-dimensional space. The embedding transforms simple to complex data such as text, images, audio, or video into a series of numbers (a vector) where similar items are positioned together in the multi-dimensional vector space.

For example, a word embedding of the word “dog” would be close in a vector embedding space to the word “puppy”, whereas “dog” would not be close to the word “airplane”. The embedding representation can be used in similarity search, recommendation systems, or more generally in traditional AI/ML systems and GenAI systems.

New Data Type VECTOR()

With the RC release a new data type VECTOR(N) has been added to represent the vector column.

Example:

CREATE TABLE myVectorSearch (
    id INT PRIMARY KEY,
    v VECTOR(3) NOT NULL,
    VECTOR INDEX (v)
);

Conversion Functions

The two conversion functions VEC_FromText and VEC_ToText can be used to:

Convert a binary vector (little-endian IEEE float format) into a json array of numbers
Convert a json array of numbers into a little-endian IEEE float sequence of bytes (4 bytes per float)

Example:

INSERT INTO myVectorSearch VALUES (1, Vec_FromText('[1,2,3]'));
SELECT Vec_ToText(v) from myVectorSearch;

Comparison Functions

Comparing vectors, calculating how close they are, is the key functionality needed by an application working with vector search. MariaDB Community Server 11.7 provides two functions for calculating the distance between vectors. Which one to use depends on the application and on how the vectors were generated.

VEC_DISTANCE_EUCLIDEAN() Takes two vectors and computes the straight line (L2) Euclidean distance between two multi-dimensional points in vector space
VEC_DISTANCE_COSINE() Measures the cosine distance between two vectors in a multi-dimensional vector space

Example:

SELECT id FROM myVectorSearch ORDER BY VEC_DISTANCE__EUCLIDEAN(v, Vec_FromText('[2,4,6]')) LIMIT 5;t
SELECT id FROM myVectorSearch ORDER BY VEC_DISTANCE_COSINE(v, Vec_FromText('[2,4,6]')) LIMIT 5;

Configuration Options

The vector search feature requires some new system variables for controlling the general behavior. Four new system variables have been added

mhnsw_max_cache_size Upper limit for one MHNSW vector index cache
mhnsw_default_distance Default value for the DISTANCE vector index option
mhnsw_default_m Default value for the M vector index option
mhnsw_ef_search Defines the minimal number of result candidates to look for in the vector index for ORDER BY … LIMIT N queries.

If you are interested in initial benchmark results of vector search done with the preview release, this can be found in this blog. Blog posts covering new benchmarks and AI Frameworks like LLamaIndex and Spring AI will be published in the coming weeks.

New SQL Functions UUID_v4() and UUID_v7()

The new SQL functions UUID_v4() and UUID_v7() are now available to generate a UUID with different generation methods than the function UUID(), which exists already and represents UUID version 1. Depending on the use case one can be better than the other.

The versions differ in the way the UUID is generated:

UUID v1 uses a combination of a MAC address and a unix timestamp with a 100ns precision. UUID v1 is not sorted in a chronological order when using ORDER BY.
For UUID v4 is the bits a UUID v4 consists of are generated randomly, with no underlying logic. It’s randomness has a disadvantage when using it as a primary key
UUID v7 is similar to UUID v1, it uses a combination of random bytes and a unix timestamp with a 250ns precision. Unlike UUID v1, performing ORDER BY on UUID v7 values will return them in chronological order.

Enhancements to System Versioned Tables

System Versioned Tables is a powerful feature for auditing changes to data. Enabling System Versioned Tables is as easy as creating a table by using

create table contracts (...) with system versioning;

Or enabling the feature for an existing table by using

ALTER TABLE contracts ADD SYSTEM VERSIONING;

In both cases there will be invisible fields created in the table to track the timestamps and period for which the data is valid. A DBA/DevOps might want these fields to be visible, which before MariaDB Community Server 11.7 is only possible by creating the table and mentioning these fields explicitly in the CREATE TABLE statement. In 11.7 it is possible to change such implicit fields to explicit ones by the statement

set @@system_versioning_alter_history= keep;

ALTER TABLE contracts ADD COLUMN rs timestamp(6) AS ROW START, ADD COLUMN re timestamp(6) AS ROW END, ADD PERIOD FOR SYSTEM_TIME (rs,re);

Compatibility Enhancements

ROW Data Type As Return Value For Stored Functions

To enhance our compatibility with other database vendors like Oracle, explicit and anchored ROW data types are now supported as return values for stored functions for both, sql_mode = “Oracle” and sql_mode = DEFAULT.

Explicit ROW(..members…) for both sql_mode=DEFAULT and sql_mode=ORACLE
CREATE FUNCTION f1() RETURNS ROW(a INT, b VARCHAR(32)) …
Anchored “ROW TYPE OF [db1.]table1” declarations for sql_mode=DEFAULT
CREATE FUNCTION f1() RETURNS ROW TYPE OF test.t1 …
Anchored “[db1.]table1%ROWTYPE” declarations for sql_mode=ORACLE
CREATE FUNCTION f1() RETURN test.t1%ROWTYPE ...
Anchored scalar data types in RETURNS clause “TYPE OF [db1.]table1.column1” for sql_mode=DEFAULT
CREATE FUNCTION f1() RETURNS TYPE OF test.t1.column1;
Anchored scalar data types in RETURNS clause “[db1.]table1.column1” for sql_mode=ORACLE
CREATE FUNCTION f1() RETURN test.t1.column1%TYPE;

Miscellaneous

New SHOW CREATE SERVER statement

To show the create statement to be used to re-create a server like it is possible for a table, the new statement SHOW CREATE SERVER <server_name> can now be used.

Example

SHOW CREATE SERVER srv1\G
*************************** 1. row ***************************
       Server: srv1
Create Server: CREATE SERVER `srv1` FOREIGN DATA WRAPPER mysql 
  OPTIONS (HOST '172.30.0.58', DATABASE 'db1', USER 'maxscale', PASSWORD 'password');

New Slow Query Log Option log_slow_always_query_time

Logging of slow queries can be controlled by various options, log_slow_rate_limit and log_slow_min_examined_row_limit are two of them, both used to reduce the number of slow queries to be logged even if long_query_time was exceeded. The disadvantage here is that you might miss some queries which exceed long_query_time by far. Such queries should always be logged.

The new option log_slow_always_query_time now allows to define a threshold which, when exceeded by a query, will assure that the query is getting logged although it would have been filtered out by log_slow_rate_limit or log_slow_min_examined_row_limit.

New Replication Option –slave-abort-blocking-timeout

A typical case, where replication lags are encountered is when long running queries are executed on the replica by a non-replication thread, which can block replication threads from applying changes.

The new option --slave-abort-blocking-timeout can now be used to define a timeout which, when reached, will trigger to kill the blocking non-replication query to assure that the replicated event can be applied.

Resources

MariaDB Community Server 11.6 RC (Blog Post)

MariaDB Community Server 11.6 release (Release Notes)

MariaDB Community Server 11.7 release (Release Notes)

MariaDB Vector preview is out (Blog Post)

How Fast Is MariaDB Vector? (Blog Post)

Download MariaDB Community Server 11.6 GA or 11.7 RC at mariadb.com/downloads.