1 of 7

Vectors

Explore vector data types. This section details how to store and manage numerical arrays, enabling efficient vector similarity search and machine learning applications within your database.

Vector Overview CREATE TABLE with Vectors Vector System Variables Vector Functions VECTOR Vector Framework Integrations

Vector Overview

MariaDB Vector is a feature that allows MariaDB Server to perform as a relational vector database. Vectors generated by an AI model can be stored and searched in MariaDB.

The initial implementation uses the modified HNSW algorithm for searching in the vector index (to solve the so-called Approximate Nearest Neighbor problem), and defaults to Euclidean distance. Concurrent reads/writes and all transaction isolation levels are supported.

MariaDB uses int16 for indexes, which gives 15 bits to store the value, rather than 10 bits for float16.

Creating

Vectors can be defined using VECTOR INDEX for the index definition, and using the in the statement.

The distance function used to build the vector index can be euclidean (the default) or cosine. An additional option, M, can be used to configure the vector index. Larger values mean slower SELECT and INSERT statements, larger index size and higher memory consumption but more accurate results. The valid range is from 3 to 200.

Inserting

Vector columns store .

Alternatively, you can use VEC_FromText() function:

Querying

For vector indexes built with the euclidean function, can be used. It calculates a Euclidean (L2) distance between two points:

Most commonly, this kind of query is done with a limit, for example to return vectors that are closest to a given vector, such as from a user search query, image or a song fragment:

For vector indexes built with the cosine function, can be used. It calculates a between two vectors:

The function is a generic function that behaves either as or , depending on the underlying index type:

System Variables

There are a number of system variables used for vectors. See .

Vector Framework Integrations

MariaDB Vector is integrated in several frameworks, see .

CREATE TABLE with Vectors

Create tables optimized for vector storage. Learn to define columns with the VECTOR data type and configure vector indexes for similarity search.

MariaDB has a dedicated data type with a built-in data validation. N is the number of dimensions that all vector values in the column have.

Vector indexes are dimensionality-specific.

Vector System Variables

This page documents system variables related to .

See for instructions on setting them.

Also see the .

`mhnsw_default_distance`

Vector Framework Integrations

MariaDB Vector has integrations in several frameworks.

AI Framework Integrations

LangChain, MariaDB Vector Store - Python
- Node.js
- Java
- Python
- Python
- Java
- benchmarking for vector databases

Potential Future Vector or AI Integrations

- Agent to agent, Python
- private LLM, vector search and text2sql, see , Python
- Workflow, not accepting external integrations anymore, Python
- machine learning (not GenAI), Python

For further alternatives, see .

_{This page is licensed: CC BY-SA / Gnu FDL}

VECTOR

VECTOR is available from MariaDB 11.7.1.

Syntax

VECTOR(N)

Description

The VECTOR data type was added as part of the feature, which permits MariaDB Server to perform as a relational vector database. N represents the fixed number of dimensions of the vector up to a maximum of 16383. The N dimension will be determined by the embedding algorithm.

Example

Optimizing Hybrid Search Query with Reciprocal Rank Fusion (RRF)

Hybrid search combines the keyword precision of full-text search with the conceptual understanding of vector search to produce a single, superior set of results.

Full-Text Search

Full-text search is the traditional keyword-based search, excelling at finding documents that contain the exact words from your query. Behind the scenes, it relies on a data structure called an inverted index—a dictionary that maps each word to a list of documents it appears in, allowing for very fast lookups. For instance, a search for 'apple pie recipe' will instantly find all documents indexed under those three words: ‘apple‘, ‘pie‘ & ‘recipe‘.

Vector Search

Vector search is a modern search method based on meaning. It finds documents that are conceptually similar to your query, even if they don't share any keywords. It works by converting both the query and the documents into numerical representations called "vector embeddings." These vectors exist as points in a high-dimensional conceptual space. A search then finds the "nearest neighbors"—the document points that are closest to the query point. For instance, a vector search for "apple pie recipe" might also return a document titled "how to bake a Granny Smith tart," because the model understands that "tart" is similar to "pie" and "Granny Smith" is a type of "apple."

The Power of Hybrid Search

Full-text search offers precise keyword matching, while vector search provides nuanced understanding of concepts. Together, they handle ambiguity and ensure critical queries aren't missed, resulting in a robust, intelligent search experience.

Combining Full-Text and Vector Search

A Search for "Sustainable Coffee Pods"

When a user searches for "sustainable coffee pods," the two search systems return the following ranked lists.

Full-Text Search Results

Rank

Title

Vector Search Results

Rank

Title

When two search systems produce distinct lists of titles, a traditional merging approach would prioritize titles that appear in both lists, similar to an INNER JOIN operation.

Why a Simple INNER JOIN Is Insufficient

An INNER JOIN only returns results found by both search systems, discarding valuable items that appear in only one list.

If we were to INNER JOIN the two lists from our "sustainable coffee pods" search, the result would be:

Title

This result is severely incomplete. It correctly finds the one common item but completely discards the #1 ranked result from both searches ('Eco Coffee Pods' and 'Compostable Espresso Pods') simply because they were specialists found by only one system.

Reciprocal Rank Fusion (RRF)

To solve the problem of discarded results, we use Reciprocal Rank Fusion (RRF). The power of RRF lies in its simplicity: it operates on the rank of an item (1st, 2nd, 3rd) in a list, not its raw, non-comparable score. This makes it highly effective for merging lists from different systems without needing to normalize their scores.

A helpful way to think of this is to imagine your different search systems as a "panel of expert advisors"; RRF intelligently combines their opinions using the formula:

Where rank is the position of a document in a list, and k is a tuning constant that moderates the influence of that rank.

RRF's k Parameter

The k parameter is the primary tuning knob for the RRF algorithm, acting as a smoothing factor that controls how results are weighted. A low k gives immense power to a top-ranked result, while a high k is more skeptical of a single top pick and rewards items found by multiple systems (consensus). A value of k=60 is a robust and effective baseline for general use.

A key advantage of RRF is that it ignores the raw scores from the search systems and uses only the rank. The table below shows the partial RRF score for the top items in our "sustainable coffee pods" search, calculated with k=60.

Building the Hybrid Query Optimization model with CTEs

To build the model, our query uses the following Common Table Expressions (CTEs).

CTE Name

Purpose

The annotated SQL query uses the CTEs to perform the hybrid search:

Final Calculation for "Sustainable Coffee Pods"

With k=60, the final CTE merges and sums the partial scores. The consensus item, found in both lists, rises to the top.

Title

fulltext_rrf

vector_rrf

total_rrf

Final Rank

Tuning the k Parameter

As introduced earlier, the k parameter can be fine-tuned for specific situations. Here are three common scenarios to consider.

Combining Diverse, Specialist Systems

For merging results from different methods like keyword and vector search, a higher k (such as 60) is ideal to balance their contributions. Our main "sustainable coffee pods" search is a perfect illustration of this, where the consensus item wins but the high-quality specialists are ranked immediately after.

Final Rank

Title

Total RRF

Note

Handling Mixed-Quality or "Noisy" Systems

If you are fusing results from reliable systems and one experimental, less predictable system, a higher k (60 or more) is the safest choice to prevent an outlier from disproportionately influencing the final rank.

Consider our search with a third, "noisy" system that incorrectly ranks 'Eco-Friendly Car Wax' (ID 8) at #1. A high k value minimizes the impact of this error.

Product

FT Rank

Vector Rank

Noisy Rank

Total RRF (k=60)

The high k value correctly ensures that the consensus result from the two reliable systems easily beats the single, erroneous result from the noisy system.

Fusing High-Quality, Similar Systems

If you are combining lists from two very similar, high-performing algorithms, you can experiment with a slightly lower k (30-50) to give more weight to a top-ranked document.

Consider a case where we fuse two similar vector models (Vector_A, Vector_B) that both rank 'Compostable Espresso Pods' as #1 and 'Recyclable Coffee Capsules' as #2. A lower k makes the winner more decisive.

Product

Ranks

Total RRF (k=30)

Total RRF (k=60)

With k=30, the score separation between the #1 and #2 results is more than three times larger, showing higher confidence in the top result, which is desirable when you trust both systems.

Determining the Best k (The Experimental Method)

A formal, 3-step process can be used to scientifically determine the best k value for your data through offline evaluation. To illustrate this process, which requires a pre-judged "ground truth" set with graded relevance, we will use a separate, self-contained case study.

Gather Prerequisites

You need: Multiple Ranked Lists, a "Ground Truth" Set, and an Evaluation Metric (like NDCG). With a query like "healthy breakfast", your ground truth might look like this:

Doc ID

Title

Relevance

And your raw ranked lists might be:

Full-Text Results: [D, A, E]
Vector Results: [B, C, A]

Run the Experiment

You iterate through k values, applying the RRF formula to the raw lists to generate a final ranking for each k.

Final Ranked List (k=10):

Rank

Title

Final Ranked List (k=60):

Rank

Title

Analyze and Select

You use your metric to "grade" each list against the Ground Truth and choose the k with the highest score. The k=60 list is clearly better as it placed the two "Highly Relevant" documents (A and B) at the top.

k Value

Performance Score (NDCG)

In this experiment, k=60 is the winner. A key advantage of RRF is that its performance is "not critically sensitive to the choice of k, making it a robust and reliable method" for improving search relevance.

Vector Overview

MariaDB Vector is a feature that allows MariaDB Server to perform as a relational vector database. Vectors generated by an AI model can be stored and searched in MariaDB.

MariaDB uses int16 for indexes, which gives 15 bits to store the value, rather than 10 bits for float16.

Creating

Vectors can be defined using VECTOR INDEX for the index definition, and using the in the statement.

Inserting

Vector columns store .

Alternatively, you can use VEC_FromText() function:

Querying

For vector indexes built with the euclidean function, can be used. It calculates a Euclidean (L2) distance between two points:

Most commonly, this kind of query is done with a limit, for example to return vectors that are closest to a given vector, such as from a user search query, image or a song fragment:

For vector indexes built with the cosine function, can be used. It calculates a between two vectors:

The function is a generic function that behaves either as or , depending on the underlying index type:

System Variables

There are a number of system variables used for vectors. See .

Vector Framework Integrations

MariaDB Vector is integrated in several frameworks, see .

Optimizing Hybrid Search Query with Reciprocal Rank Fusion (RRF)

Hybrid search combines the keyword precision of full-text search with the conceptual understanding of vector search to produce a single, superior set of results.

Full-Text Search

Vector Search

The Power of Hybrid Search

Combining Full-Text and Vector Search

A Search for "Sustainable Coffee Pods"

When a user searches for "sustainable coffee pods," the two search systems return the following ranked lists.

Full-Text Search Results

Rank

Title

Vector Search Results

Rank

Title

When two search systems produce distinct lists of titles, a traditional merging approach would prioritize titles that appear in both lists, similar to an INNER JOIN operation.

Why a Simple INNER JOIN Is Insufficient

An INNER JOIN only returns results found by both search systems, discarding valuable items that appear in only one list.

If we were to INNER JOIN the two lists from our "sustainable coffee pods" search, the result would be:

Title

Reciprocal Rank Fusion (RRF)

A helpful way to think of this is to imagine your different search systems as a "panel of expert advisors"; RRF intelligently combines their opinions using the formula:

Where rank is the position of a document in a list, and k is a tuning constant that moderates the influence of that rank.

RRF's k Parameter

Building the Hybrid Query Optimization model with CTEs

To build the model, our query uses the following Common Table Expressions (CTEs).

CTE Name

Purpose

The annotated SQL query uses the CTEs to perform the hybrid search:

Final Calculation for "Sustainable Coffee Pods"

With k=60, the final CTE merges and sums the partial scores. The consensus item, found in both lists, rises to the top.

Title

fulltext_rrf

vector_rrf

total_rrf

Final Rank

Tuning the k Parameter

As introduced earlier, the k parameter can be fine-tuned for specific situations. Here are three common scenarios to consider.

Combining Diverse, Specialist Systems

Final Rank

Title

Total RRF

Note

Handling Mixed-Quality or "Noisy" Systems

Consider our search with a third, "noisy" system that incorrectly ranks 'Eco-Friendly Car Wax' (ID 8) at #1. A high k value minimizes the impact of this error.

Product

FT Rank

Vector Rank

Noisy Rank

Total RRF (k=60)

The high k value correctly ensures that the consensus result from the two reliable systems easily beats the single, erroneous result from the noisy system.

Fusing High-Quality, Similar Systems

If you are combining lists from two very similar, high-performing algorithms, you can experiment with a slightly lower k (30-50) to give more weight to a top-ranked document.

Product

Ranks

Total RRF (k=30)

Total RRF (k=60)

With k=30, the score separation between the #1 and #2 results is more than three times larger, showing higher confidence in the top result, which is desirable when you trust both systems.

Determining the Best k (The Experimental Method)

Gather Prerequisites

You need: Multiple Ranked Lists, a "Ground Truth" Set, and an Evaluation Metric (like NDCG). With a query like "healthy breakfast", your ground truth might look like this:

Doc ID

Title

Relevance

And your raw ranked lists might be:

Full-Text Results: [D, A, E]
Vector Results: [B, C, A]

Run the Experiment

You iterate through k values, applying the RRF formula to the raw lists to generate a final ranking for each k.

Final Ranked List (k=10):

Rank

Title

Final Ranked List (k=60):

Rank

Title

Analyze and Select

k Value

Performance Score (NDCG)

Vectors

Vector Overview

Creating

Inserting

Querying

System Variables

Vector Framework Integrations

See Also

CREATE TABLE with Vectors

Vector System Variables

mhnsw_default_distance

Vector Framework Integrations

AI Framework Integrations

Potential Future Vector or AI Integrations

VECTOR

Syntax

Description

Example

See Also

Optimizing Hybrid Search Query with Reciprocal Rank Fusion (RRF)

Full-Text Search

Vector Search

The Power of Hybrid Search

Combining Full-Text and Vector Search

A Search for "Sustainable Coffee Pods"

Why a Simple INNER JOIN Is Insufficient

Reciprocal Rank Fusion (RRF)

Building the Hybrid Query Optimization model with CTEs

Final Calculation for "Sustainable Coffee Pods"

Tuning the k Parameter

Determining the Best k (The Experimental Method)

Further Reading

Vector Framework Integrations

AI Framework Integrations

Potential Future Vector or AI Integrations

Vectors

VECTOR

Syntax

Description

Example

See Also

Vector Overview

Creating

Inserting

Querying

System Variables

Vector Framework Integrations

See Also

Vector System Variables

mhnsw_default_distance

mhnsw_default_m

mhnsw_ef_search

mhnsw_max_cache_size

CREATE TABLE with Vectors

Optimizing Hybrid Search Query with Reciprocal Rank Fusion (RRF)

Full-Text Search

Vector Search

The Power of Hybrid Search

Combining Full-Text and Vector Search

A Search for "Sustainable Coffee Pods"

Why a Simple INNER JOIN Is Insufficient

Reciprocal Rank Fusion (RRF)

Building the Hybrid Query Optimization model with CTEs

Final Calculation for "Sustainable Coffee Pods"

Tuning the k Parameter

Determining the Best k (The Experimental Method)

Further Reading

`mhnsw_default_distance`

`mhnsw_default_distance`

`mhnsw_default_m`

`mhnsw_ef_search`

`mhnsw_max_cache_size`