Introducing Vector Search With the Latest Version of MariaDB Enterprise Platform

We are excited to announce the launch of a new version of MariaDB Enterprise Server that features native support for vector data types and vector indexing/search capabilities. These powerful features for use with Artificial Intelligence (AI) capabilities, which was initially offered and tested in the MariaDB Community Server edition, are now being brought into the enterprise-grade environment. This marks a large step forward for organizations seeking to integrate AI-driven functionalities directly into their core database systems, in an easy way.

MariaDB vs. pgvector – MariaDB Comes Out on Top with Blazing Fast Vector Performance

Recent benchmarks conducted by Small Datum LLC, provide a detailed comparison between MariaDB and pgvector, a prominent vector similarity search extension for PostgreSQL. The results showcase MariaDB’s exceptional performance and ease of use:

High QPS (Queries Per Second): MariaDB consistently outperforms pgvector, delivering up to twice the QPS for a given recall target.
Fast Index Creation Time: MariaDB is significantly faster, requiring less time to build an index while achieving comparable or superior recall (the percentage of truly relevant results (considered “true nearest neighbors”) that are returned by a query).
Ease of Tuning: MariaDB simplifies the tuning process by eliminating the need for complex parameter configurations, which are required by pgvector.

Benefits of Using Vector Capabilities with MariaDB

MariaDB has all of the capabilities to enable organizations to leverage AI – with a database that is open source, well known and time-tested with high performance.

Benefits of MariaDB’s approach:

Simple data stack
- No need to maintain separate vector databases, thus:
  - Reduction in complex data synchronization processes / reduce operational overhead
- Use current security and access controls
Native use of vector embeddings, not an add-on capability
- Seamless integration with existing SQL operations
- No plug in required, thus no additional management required
- Fully open source

Vector capabilities within MariaDB Enterprise Server empower organizations to leverage the power of AI in use cases like Retrieval Augmented Generation (RAG) while maintaining simplicity, efficiency and seamless integration with their existing database infrastructure.

The Power of Vector Databases

Vector databases significantly enhance AI-powered applications in several key ways.

Efficient Data Management and Retrieval

Vector databases excel at storing and retrieving high-dimensional data, enabling fast similarity searches on complex data types like text and images. Vector search in the context of GenAI is particularly powerful for RAG, where vector databases serve as an additional knowledge layer for large language models (LLMs). By storing documents, knowledge bases and enterprise data as vector embeddings, RAG systems can quickly retrieve relevant context to ground AI responses in accurate, authorized, and up-to-date information beyond what the LLM may have as internal knowledge. In addition, storing data in a vector database can be more reliable from a performance perspective vs. other search methods like internet search, where what you get may be inaccurate, or written by another AI. Searching the internet as part of an LLM’s process can also be slow and can orient an LLM in a direction that may not be aligned with the original question or intent depending on how much data is being processed.

Some RAG use cases include:

Enterprise search and knowledge management
Legal document analysis
Code generation

LLM Hallucinations

An LLM hallucination occurs when LLMs generate content that appears believable but is factually incorrect or completely fabricated. The other issue is the way the LLM will respond – it sounds quite confident. As users trust AI more and more, this is an area where paying close attention can pay off and provide you with a competitive edge.

Some examples include:

Making up fake sources, citations or references
Creating false facts or statistics
Generating fabricated information about people, places or events

For RAG-enabled systems, hallucinations are reduced by grounding LLM responses in retrieved information rather than relying solely on the model’s trained knowledge. The vector database serves as an authoritative knowledge source, helping ensure responses align with verifiable information.

Semantic Search Capabilities

Unlike traditional keyword-based searches, vector databases enable semantic search that understands the meaning behind queries, not just exact matches. This capability is essential for:

Chatbots and code generation systems
Cross-lingual search
Extractive and abstractive NLP processing
AI agents

The semantic understanding enables contextual retrieval, which is particularly valuable for RAG applications where the meaning of the underlying text is fundamental for more accurate responses.

The Hidden Cost of Separate Solutions

As vector embeddings have not been used extensively in the past, many organizations have recently invested in separate stand-alone vector database systems in addition to their traditional databases. This results in having reliable relational databases for their core business data, and a specialized vector database for AI-driven features. This separation comes with hidden costs: data synchronization overhead, increased system complexity and the operational weight of maintaining multiple database systems.

This is exactly why we have taken a different approach at MariaDB. By introducing native vector data types, HNSW indexing, and both cosine and Euclidean distance calculations directly into our enterprise database, we’re eliminating these artificial boundaries. You shouldn’t need one database for your customer records and another for your knowledge base embeddings – they can be together.

Technical Enablement

What are the benefits of using vector embeddings with relational data together? When you store vectors alongside traditional data types, you unlock some new possibilities. Imagine a product catalog where each item has both structured data (price, category, inventory level) and a vector representation of the product’s visual features or description. With a single SQL query, you will now be able to find similar products that are in stock and within a specific price range. This kind of multi-database hybrid search would be complex and slow if split across multiple systems.

The choice of similarity metrics matters too. We’ve implemented both cosine distance and L2 Euclidean distance because each has its own benefit depending on the use case:

Cosine distance excels when you care about the direction of vectors, but not their magnitude, making it perfect for text embeddings and semantic search
Euclidean distance works better when absolute distances matter, like in image similarity or anomaly detection

Combined with slightly modified (we enabled more consistent performance across varying dataset characteristics) MHNSW [MariaDB Hierarchical Navigable Small World] indexing, vector embedding search operations become practical at scale. MHNSW creates a layered graph structure that dramatically speeds up nearest neighbor search, making it possible to find similar vectors in milliseconds, even in collections of millions of vectors.

Application Frameworks

Application frameworks like LlamaIndex, Spring AI, and LangChain (coming soon) simplify the integration of vector databases with AI applications, unlocking the full potential of MariaDB’s vector capabilities in RAG systems and beyond.

Framework Highlights

LlamaIndex: Bridges LLMs with MariaDB for easy to integrate external knowledge retrieval, ideal for simplifying dynamic RAG workflows.
- https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/mariadb/
Spring AI: An enterprise-focused framework leveraging the Spring ecosystem for scalable and production capable AI applications.
- https://docs.spring.io/spring-ai/reference/api/vectordbs/mariadb.html
LangChain: (Upcoming) Similar to Llamaindex, a modular framework for AI workflows that has significant traction in the marketplace.

These critical frameworks streamline query generation, embedding searches and GenAI response synthesis, which enables faster development of chatbots, recommendation systems, and AI Agents.

The Enterprise Perspective

From an enterprise standpoint, this integration solves several challenges that arise in a multi-system setup:

Simplicity: One database to manage means fewer points of failure
Cost Efficiency: Reduced infrastructure complexity will lower cloud and datacenter costs
Performance: Enabling composite queries that combine traditional and vector operations will be a better UX for your users – both human and AI
Governance: With vector embeddings stored alongside traditional data, your existing backup, security and operational procedures cover your new AI capabilities

The Future of Databases Enabling AI with MariaDB

MariaDB’s integration of vector capabilities into MariaDB Enterprise Server marks a transformative moment for organizations seeking to leverage GenAI. By combining the power of vector embeddings with the reliability of a trusted relational database, MariaDB delivers a unified, high-performance solution that eliminates the need for separate systems and reduces operational complexity.

With native support for vector data types, efficient indexing, and seamless integration into existing SQL workflows, MariaDB empowers businesses to unlock new AI-driven possibilities—whether through enhanced RAG systems, general semantic search, or more efficient data management with only running one database, but having the power of vector search.

Try MariaDB Enterprise Server with Vector Search

MariaDB Enterprise Server 11.4 with vector search is available today. Non-customers can test vector search in the latest version of MariaDB Community Server 11.7 (RC).

Get started today and experience the MariaDB vector advantage! Download the MariaDB Enterprise Server 11.4 version to start using vector search today.