Vector Overview

You are viewing an old version of this article. View the current version here.
MariaDB starting with 11.6

Vectors were introduced in MariaDB 11.6 Vector Preview.

MariaDB Vector is a feature that allows MariaDB Server to perform as a relational vector database. Vectors generated by your favourite AI model can be stored and searched for in MariaDB.

It is only available in the Vector Preview release, for source download only, or through the the pre-built docker image on quay.io.

In this initial preview, there is not yet a dedicated VECTOR data type, as we've focused on implementing a usable indexed search.

Any data type that can store a sequence of bytes can be used, for example VARBINARY or BLOB. The index is implemented in a way that supports different storage engines, and InnoDB and MyISAM at least have been tested and known to work.

The initial implementation uses the Modified HNSW algorithm for searching in the vector index (to solve the so-called Approximate Nearest Neighbor problem), and defaults to Euclidean distance.

Creating

Vectors are defined using VECTOR INDEX for the index definition in the CREATE TABLE statement. For now, for testing purposes, avoid the CREATE VECTOR INDEX statement, which hasn't been fully implemented. Declare the index as a part of your CREATE TABLE statement:

CREATE TABLE v (
     id INT PRIMARY KEY,
     v BLOB NOT NULL,
     VECTOR INDEX (v)
);

Inserting

Vector columns store 32-bit IEEE 754 floating point numbers.

INSERT INTO v VALUES 
     (1, x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
     (2, x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
     (3,x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
     (4,x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
     (5,x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
     (6,x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
     (7,x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
     (8,x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
     (9,x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
     (10,x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');

Querying

VEC_DISTANCE_EUCLIDEAN is a function that calculates a Euclidean (L2) distance between two points.

SELECT id FROM v ORDER BY 
  VEC_DISTANCE_EUCLIDEAN(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');
+----+
| id |
+----+
| 10 |
|  7 |
|  3 |
|  9 |
|  2 |
|  1 |
|  5 |
|  4 |
|  6 |
|  8 |
+----+

Most commonly, this kind of query is done with a limit, for example to return vectors that are closest to a given vector, such as from a user search query, image or a song fragment.

SELECT id FROM v 
  ORDER BY VEC_Distance(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e') LIMIT 2;
+----+
| id |
+----+
| 10 |
|  7 |
+----+

System Variables

There are a number of system variables used for vectors:

  • mhnsw_cache_size - size of the cache for the MHNSW vector index.
  • mhnsw_distance_function - distance function to build the vector index for.
  • mhnsw_max_edges_per_node - Larger values means slower INSERT, larger index size and higher memory consumption, but better search results. Should not be changed once vector indexes have been created.
  • mhnsw_min_limit - Defines the minimal number of result candidates to look for in the vector index for ORDER BY ... LIMIT N queries. The search will never search for less rows than that, even if LIMIT is smaller.

See Also

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.