MariaDB Vector is a feature that allows MariaDB Server to perform as a relational vector database. Vectors generated by an AI model can be stored and searched in MariaDB.
The initial implementation uses the modified HNSW algorithm for searching in the vector index (to solve the so-called Approximate Nearest Neighbor problem), and defaults to Euclidean distance. Concurrent reads/writes and all transaction isolation levels are supported.
MariaDB uses int16 for indexes, which gives 15 bits to store the value, rather than 10 bits for float16.
Vectors can be defined using VECTOR INDEX for the index definition, and using the in the statement.
The distance function used to build the vector index can be euclidean (the default) or cosine. An additional option, M, can be used to configure the vector index. Larger values mean slower SELECT and INSERT statements, larger index size and higher memory consumption but more accurate results. The valid range is from 3 to 200.
Vector columns store .
Alternatively, you can use VEC_FromText() function:
For vector indexes built with the euclidean function, can be used. It calculates a Euclidean (L2) distance between two points:
Most commonly, this kind of query is done with a limit, for example to return vectors that are closest to a given vector, such as from a user search query, image or a song fragment:
For vector indexes built with the cosine function, can be used. It calculates a between two vectors:
The function is a generic function that behaves either as or , depending on the underlying index type:
There are a number of system variables used for vectors. See .
MariaDB Vector is integrated in several frameworks, see .
(video)
(video)
(video)
(video)
This page is licensed: CC BY-SA / Gnu FDL
CREATE TABLE v (
id INT PRIMARY KEY,
v VECTOR(5) NOT NULL,
VECTOR INDEX (v)
);CREATE TABLE embeddings (
doc_id BIGINT UNSIGNED PRIMARY KEY,
embedding VECTOR(1536) NOT NULL,
VECTOR INDEX (embedding) M=8 DISTANCE=cosine
);INSERT INTO v VALUES
(1, x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
(2, x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
(3,x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
(4,x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
(5,x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
(6,x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
(7,x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
(8,x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
(9,x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
(10,x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');INSERT INTO v VALUES
(1,Vec_FromText('[0.418708,0.809902,0.823193,0.598179,0.0332549]')),
(2,Vec_FromText('[0.687774,0.789588,0.496138,0.57487,0.917617]')),
(3,Vec_FromText('[0.333221,0.962687,0.467263,0.448235,0.475671]')),
(4,Vec_FromText('[0.822185,0.185643,0.683452,0.211072,0.554056]')),
(5,Vec_FromText('[0.437057,0.167281,0.0770977,0.428638,0.241591]')),
(6,Vec_FromText('[0.76956,0.926895,0.803376,0.0157961,0.589042]')),
(7,Vec_FromText('[0.493999,0.641957,0.761598,0.94276,0.425865]')),
(8,Vec_FromText('[0.924108,0.275466,0.0543329,0.0731585,0.136344]')),
(9,Vec_FromText('[0.186956,0.69666,0.0356002,0.668875,0.84722]')),
(10,Vec_FromText('[0.415294,0.609278,0.426765,0.988832,0.475556]'));SELECT id FROM v ORDER BY
VEC_DISTANCE_EUCLIDEAN(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');
+----+
| id |
+----+
| 10 |
| 7 |
| 3 |
| 9 |
| 2 |
| 1 |
| 5 |
| 4 |
| 6 |
| 8 |
+----+SELECT id FROM v
ORDER BY VEC_DISTANCE_EUCLIDEAN(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e')
LIMIT 2;
+----+
| id |
+----+
| 10 |
| 7 |
+----+SELECT VEC_DISTANCE_COSINE(VEC_FROMTEXT('[1,2,3]'), VEC_FROMTEXT('[3,5,7]'));SELECT id FROM v
ORDER BY VEC_DISTANCE(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');
+----+
| id |
+----+
| 10 |
| 7 |
| 3 |
| 9 |
| 2 |
| 1 |
| 5 |
| 4 |
| 6 |
| 8 |
+----+Explore vector data types. This section details how to store and manage numerical arrays, enabling efficient vector similarity search and machine learning applications within your database.
VECTOR(N)The VECTOR data type was added as part of the feature, which permits MariaDB Server to perform as a relational vector database. N represents the fixed number of dimensions of the vector up to a maximum of 16383. The N dimension will be determined by the embedding algorithm.
This page is licensed: CC BY-SA / Gnu FDL
Create tables optimized for vector storage. Learn to define columns with the VECTOR data type and configure vector indexes for similarity search.
MariaDB has a dedicated data type with a built-in data validation. N is the number of dimensions that all vector values in the column have.
Inserting vectors with different dimensionalities will result in an error.
Consider the following table:
To have a fast vector search, you have to index the vector column, creating a VECTOR index:
Note that there can be only one vector index in the table, and the indexed vector column must be NOT NULL.
There are two options that can be used to configure the vector index:
M — Larger values mean slower SELECT and INSERT statements, larger index size and higher memory consumption, but more accurate results. The valid range is from 3 to 200.
DISTANCE — Distance function to build the vector index for. Searches using a different distance function will not be able to use a vector index. Valid values are cosine and euclidean (the default).
This page is licensed: CC BY-SA / Gnu FDL
CREATE TABLE embeddings (
doc_id BIGINT UNSIGNED PRIMARY KEY,
embedding VECTOR(1536)
);LlamaIndex, MariaDB Vector Store - Python
VectorDBBench - benchmarking for vector databases
AutoGen - Agent to agent, Python
DB-GPT - private LLM, vector search and text2sql, see integration docs, Python
DSPy - Workflow, not accepting external integrations anymore, Python
Feast - machine learning (not GenAI), Python
- visit link to vote for suggestion
- Agentic workflow, Python
- Text-to-SQL RAG Library simplifying database interactions, Python
- AI Interface, Python & Javascript
- Vector search and text2sql, Python
For further alternatives, see Qdrant's list of framework integrations.
This page is licensed: CC BY-SA / Gnu FDL
CREATE TABLE t1 (id INT AUTO_INCREMENT PRIMARY KEY, v VECTOR(5) NOT NULL,
VECTOR INDEX (v));CREATE TABLE embeddings (
doc_id BIGINT UNSIGNED PRIMARY KEY,
embedding VECTOR(1536) NOT NULL,
VECTOR INDEX (embedding)
);CREATE TABLE embeddings (
doc_id BIGINT UNSIGNED PRIMARY KEY,
embedding VECTOR(1536) NOT NULL,
VECTOR INDEX (embedding) M=8 DISTANCE=cosine
);DISTANCE option is not explicitly defined during index creation.Command line: --mhnsw-default-distance=val
Scope: Global, Session
Dynamic: Yes
Data Type: enum
Default Value: euclidean
Valid Values:
euclidean Calculates straight-line distance between vectors. Best for spatial data, images, etc, when absolute magnitude matters.
cosine Measures directional similarity between vectors. Ideal for text embeddings, semantic search, and when vector magnitude is less important.
Introduced: MariaDB 11.7.1
Description: Defines the default value for the M parameter in MHNSW vector indexing. The M parameter controls the number of connections per layer in the graph structure, influencing the balance between search performance and index size.
Larger M → Better search accuracy, but larger index size and slower updates and searches.
Smaller M → Faster updates and searches, smaller index, but potentially less accurate search.
Command line: --mhnsw-default-m=#
Scope: Global, Session
Dynamic: Yes
Data Type: int unsigned
Default Value: 6
Range: 3 to 200
Introduced: MariaDB 11.7.1
Description: Defines the minimal number of result candidates to look for in the vector index for ORDER BY ... LIMIT N queries. The search will never search for less rows than that, even if LIMIT is smaller. This notably improves the search quality at low LIMIT values, at the expense of search time. Higher values may increase search quality but will also impact query performance.
Command line: --mhnsw-ef-search=#
Scope: Global, Session
Dynamic: Yes
Data Type: int unsigned
Default Value: 20
Range: 1 to 10000
Introduced: MariaDB 11.7.1
Description: Upper limit for one MHNSW vector index cache. This limits the amount of memory that can be used for caching the index, ensuring efficient memory utilization.
Command line: --mhnsw-max-cache-size=#
Scope: Global
Dynamic: Yes
Data Type: bigint unsigned
Default Value: 16777216 (16 MB)
Range: 1048576 to 18446744073709551615
Introduced: MariaDB 11.7.1
This page is licensed: CC BY-SA / Gnu FDL
Hybrid search combines the keyword precision of full-text search with the conceptual understanding of vector search to produce a single, superior set of results.
Full-text search is the traditional keyword-based search, excelling at finding documents that contain the exact words from your query. Behind the scenes, it relies on a data structure called an inverted index—a dictionary that maps each word to a list of documents it appears in, allowing for very fast lookups. For instance, a search for 'apple pie recipe' will instantly find all documents indexed under those three words: ‘apple‘, ‘pie‘ & ‘recipe‘.
Vector search is a modern search method based on meaning. It finds documents that are conceptually similar to your query, even if they don't share any keywords. It works by converting both the query and the documents into numerical representations called "vector embeddings." These vectors exist as points in a high-dimensional conceptual space. A search then finds the "nearest neighbors"—the document points that are closest to the query point. For instance, a vector search for "apple pie recipe" might also return a document titled "how to bake a Granny Smith tart," because the model understands that "tart" is similar to "pie" and "Granny Smith" is a type of "apple."
Full-text search offers precise keyword matching, while vector search provides nuanced understanding of concepts. Together, they handle ambiguity and ensure critical queries aren't missed, resulting in a robust, intelligent search experience.
When a user searches for "sustainable coffee pods," the two search systems return the following ranked lists.
Full-Text Search Results
Vector Search Results
When two search systems produce distinct lists of titles, a traditional merging approach would prioritize titles that appear in both lists, similar to an INNER JOIN operation.
An INNER JOIN only returns results found by both search systems, discarding valuable items that appear in only one list.
If we were to INNER JOIN the two lists from our "sustainable coffee pods" search, the result would be:
This result is severely incomplete. It correctly finds the one common item but completely discards the #1 ranked result from both searches ('Eco Coffee Pods' and 'Compostable Espresso Pods') simply because they were specialists found by only one system.
To solve the problem of discarded results, we use Reciprocal Rank Fusion (RRF). The power of RRF lies in its simplicity: it operates on the rank of an item (1st, 2nd, 3rd) in a list, not its raw, non-comparable score. This makes it highly effective for merging lists from different systems without needing to normalize their scores.
A helpful way to think of this is to imagine your different search systems as a "panel of expert advisors"; RRF intelligently combines their opinions using the formula:
Where rank is the position of a document in a list, and k is a tuning constant that moderates the influence of that rank.
A key advantage of RRF is that it ignores the raw scores from the search systems and uses only the rank. The table below shows the partial RRF score for the top items in our "sustainable coffee pods" search, calculated with k=60.
To build the model, our query uses the following Common Table Expressions (CTEs).
The annotated SQL query uses the CTEs to perform the hybrid search:
With k=60, the final CTE merges and sums the partial scores. The consensus item, found in both lists, rises to the top.
As introduced earlier, the k parameter can be fine-tuned for specific situations. Here are three common scenarios to consider.
Combining Diverse, Specialist Systems
For merging results from different methods like keyword and vector search, a higher k (such as 60) is ideal to balance their contributions. Our main "sustainable coffee pods" search is a perfect illustration of this, where the consensus item wins but the high-quality specialists are ranked immediately after.
Handling Mixed-Quality or "Noisy" Systems
If you are fusing results from reliable systems and one experimental, less predictable system, a higher k (60 or more) is the safest choice to prevent an outlier from disproportionately influencing the final rank.
Consider our search with a third, "noisy" system that incorrectly ranks 'Eco-Friendly Car Wax' (ID 8) at #1. A high k value minimizes the impact of this error.
The high k value correctly ensures that the consensus result from the two reliable systems easily beats the single, erroneous result from the noisy system.
Fusing High-Quality, Similar Systems
If you are combining lists from two very similar, high-performing algorithms, you can experiment with a slightly lower k (30-50) to give more weight to a top-ranked document.
Consider a case where we fuse two similar vector models (Vector_A, Vector_B) that both rank 'Compostable Espresso Pods' as #1 and 'Recyclable Coffee Capsules' as #2. A lower k makes the winner more decisive.
With k=30, the score separation between the #1 and #2 results is more than three times larger, showing higher confidence in the top result, which is desirable when you trust both systems.
A formal, 3-step process can be used to scientifically determine the best k value for your data through offline evaluation. To illustrate this process, which requires a pre-judged "ground truth" set with graded relevance, we will use a separate, self-contained case study.
Gather Prerequisites
You need: Multiple Ranked Lists, a "Ground Truth" Set, and an Evaluation Metric (like NDCG). With a query like "healthy breakfast", your ground truth might look like this:
And your raw ranked lists might be:
Full-Text Results: [D, A, E]
Vector Results: [B, C, A]
Run the Experiment
You iterate through k values, applying the RRF formula to the raw lists to generate a final ranking for each k.
Final Ranked List (k=10):
Final Ranked List (k=60):
Analyze and Select
You use your metric to "grade" each list against the Ground Truth and choose the k with the highest score. The k=60 list is clearly better as it placed the two "Highly Relevant" documents (A and B) at the top.
In this experiment, k=60 is the winner. A key advantage of RRF is that its performance is "not critically sensitive to the choice of k, making it a robust and reliable method" for improving search relevance.
0.01639
2
2
Compostable Espresso Pods
0
0.01639
0.01639
2
2
Compostable Espresso Pods
0.01639
Specialist
E
'Cinnamon Roll'
0 (None)
5
E
'Cinnamon Roll'
5
E
'Cinnamon Roll'
1
1
'Eco Coffee Pods - 100 Count'
2
3
'Recyclable Coffee Capsules'
3
4
'Morning Roast Coffee Beans'
1
2
'Compostable Espresso Pods'
2
3
'Recyclable Coffee Capsules'
3
6
'Bamboo Reusable Coffee Filter'
3
'Recyclable Coffee Capsules'
vector_limit_search
Runs the initial vector search to retrieve the top N conceptually similar items.
fulltext_limit_search
Runs the initial full-text search to retrieve the top N keyword matches.
vector_score
Calculates a partial_rrf score for each item from the vector search based on its rank.
fulltext_score
Calculates a partial_rrf score for each item from the full-text search based on its rank.
full_outer_join_output
Merges the two score lists and sums the partial scores into a total_rrf.
3
Recyclable Coffee Capsules
0.01613
0.01613
0.03226
1
1
Eco Coffee Pods
0.01639
1
3
Recyclable Coffee Capsules
0.03226
Consensus Winner
2
1
Eco Coffee Pods
0.01639
Specialist
Recyclable Coffee Capsules
2
2
-
1/62 + 1/62 = 0.03226
Eco-Friendly Car Wax
-
-
1
1/61 = 0.01639
Compostable Pods
1, 1
1/31 + 1/31 = 0.0645
1/61 + 1/61 = 0.0328
Recyclable Capsules
2, 2
1/32 + 1/32 = 0.0625
1/62 + 1/62 = 0.0322
Score Difference
0.0020
A
'Oatmeal with Berries'
3 (High)
B
'Green Smoothie'
3 (High)
C
'Avocado Toast'
2 (Medium)
D
'Bacon and Eggs'
1 (Low)
1
B
'Green Smoothie'
2
D
'Bacon and Eggs'
3
A
'Oatmeal with Berries'
4
C
'Avocado Toast'
1
A
'Oatmeal with Berries'
2
B
'Green Smoothie'
3
C
'Avocado Toast'
4
D
'Bacon and Eggs'
10
0.85
60
0.92
0
2
0.0006
-- Set the parameters for our query.
SET @k = 60; -- The RRF smoothing constant.
SET @search_term = "sustainable coffee pods";
-- The vector for the search term would be set here.
-- SET @search_term_vector = VEC_FromText("...");
-- Use CTEs to break the logic into sequential steps.
WITH
-- STEP 1: Get top results from each search method.
vector_limit_search AS (
SELECT id, title, VEC_DISTANCE_EUCLIDEAN(embedding, @search_term_vector) AS dist
FROM products ORDER BY dist ASC LIMIT 10
),
fulltext_limit_search AS (
SELECT id, title, MATCH(title) AGAINST (@search_term) AS match_score
FROM products WHERE MATCH(title) AGAINST (@search_term)
ORDER BY match_score DESC LIMIT 10
),
-- STEP 2: Calculate partial RRF scores for each list.
vector_score AS (
SELECT id, title, 1 / (@k + RANK() OVER (ORDER BY dist ASC)) AS partial_rrf
FROM vector_limit_search
),
fulltext_score AS (
SELECT id, title, 1 / (@k + RANK() OVER (ORDER BY match_score DESC)) AS partial_rrf
FROM fulltext_limit_search
),
-- STEP 3: Merge the two lists and sum the scores.
full_outer_join_output AS (
SELECT v.id, v.title, (v.partial_rrf + IFNULL(f.partial_rrf, 0)) AS total_rrf
FROM vector_score v LEFT JOIN fulltext_score f USING (id)
UNION
SELECT f.id, f.title, (IFNULL(v.partial_rrf, 0) + f.partial_rrf) AS total_rrf
FROM fulltext_score f LEFT JOIN vector_score v USING (id)
)
-- STEP 4: Select the final, unified list.
SELECT id, title, total_rrf FROM full_outer_join_output
ORDER BY total_rrf DESC
LIMIT 10;
WEBINAR
The Next Generation of MariaDB: Powered by Vector Search

WEBINAR
The Next Generation of MariaDB: Powered by Vector Search

WEBINAR
The Next Generation of MariaDB: Powered by Vector Search

WEBINAR
The Next Generation of MariaDB: Powered by Vector Search