githubEdit

Vector Stores

MariaDBStore provides a LangChain-compatible vector store backed by MariaDB, supporting similarity search, metadata filtering, and maximal marginal relevance retrieval.

Version: langchain-mariadb v0.0.21

DistanceStrategy

Distance strategies for vector similarity.

Attributes

  • EUCLIDEAN:

  • COSINE:


TableConfig

Configuration for database table names.

Constructor

__init__(embedding_table: Optional[str] = None, collection_table: Optional[str] = None) -> None

Initialize TableConfig with custom or default table names.

Parameters:

  • embedding_table (Optional[str]): Name for embedding table (default: langchain_embedding)

  • collection_table (Optional[str]): Name for collection table (default: langchain_collection)

Methods

default

Create TableConfig with default values.

Attributes

  • embedding_table (str):

  • collection_table (str):


ColumnConfig

Configuration for database column names.

Constructor

Initialize ColumnConfig with custom or default column names.

Parameters:

  • embedding_id (Optional[str]): Name for embedding ID column (default: id)

  • embedding (Optional[str]): Name for embedding vector column (default: embedding)

  • content (Optional[str]): Name for content column (default: content)

  • metadata (Optional[str]): Name for metadata column (default: metadata)

  • collection_id (Optional[str]): Name for collection ID column (default: id)

  • collection_label (Optional[str]): Name for collection label column (default: label)

  • collection_metadata (Optional[str]): Name for collection metadata column (default: metadata)

Methods

default

Create ColumnConfig with default values.

Attributes

  • embedding_id (str):

  • embedding (str):

  • content (str):

  • metadata (str):

  • collection_id (str):

  • collection_label (str):

  • collection_metadata (str):


MariaDBStoreSettings

Configuration for MariaDBStore.

Constructor

Initialize MariaDBStoreSettings with custom or default configurations.

Parameters:

  • tables (Optional[TableConfig]): Table configuration

  • columns (Optional[ColumnConfig]): Column configuration

  • pre_delete_collection (bool): delete existing collection (default: False)

Methods

default

Create MariaDBStoreSettings with default values.

Attributes

  • tables (TableConfig):

  • columns (ColumnConfig):

  • pre_delete_collection (bool):

  • lazy_init (bool):


MariaDBStore

MariaDB vector store integration for LangChain.

Examples:

Constructor

Initialize the MariaDB vector store.

Parameters:

  • embeddings (Embeddings): Embeddings object for creating embeddings

  • embedding_length (Optional[int]): Length of embedding vectors (default: 1536)

  • datasource (Union[Engine | str]): datasource (connection string, sqlalchemy engine or MariaDB connection pool)

  • collection_name (str): Name of the collection to store vectors

  • collection_metadata (Optional[dict]): Optional metadata for the collection

  • distance_strategy (DistanceStrategy): Strategy for distances (COSINE or EUCLIDEAN)

  • config (MariaDBStoreSettings): Store configuration for tables and columns

  • logger (Optional[logging.Logger]): Optional logger instance for debugging

  • relevance_score_fn (Optional[Callable[[float], float]]): function to override relevance score calculation

Methods

create_tables_if_not_exists

Create the necessary database tables if they don't exist.

drop_tables

Drop all tables used by the vector store.

create_collection

Create a new collection or retrieve existing one.

delete_collection

Delete the current collection and its associated data.

delete

Delete vectors by their IDs.

Parameters:

  • ids (Optional[List[str]]): List of IDs to delete

  • **kwargs (Any): Additional arguments (not used)

get_by_ids

Get documents by their IDs.

add_embeddings

Add embeddings to the vectorstore.

Parameters:

  • texts (Sequence[str]): Sequence of strings to add

  • embeddings (List[List[float]]): List of embedding vectors

  • metadatas (Optional[List[dict]]): Optional list of metadata dicts for each text

  • ids (Optional[List[str]]): Optional list of IDs for the documents

  • **kwargs (Any): Additional arguments (not used)

Returns:

List[str] - List of IDs for the added documents

Raises:

  • ValueError: If any provided ID contains invalid characters

add_texts

Run more texts through the embeddings and add to the vectorstore.

Parameters:

  • texts (Iterable[str]): Iterable of strings to add to the vectorstore.

  • metadatas (Optional[List[dict]]): Optional list of metadatas associated with the texts.

  • ids (Optional[List[str]]): Optional list of ids for the texts. If not provided, will generate a new id for each text.

  • kwargs (Any): vectorstore specific parameters

Returns:

List[str] - List of ids from adding the texts into the vectorstore.

Run similarity search with MariaDB.

Parameters:

  • query (str): Query text to search for

  • k (int): Number of results to return (default: 4)

  • filter (Union[None, dict]): Optional filter by metadata

  • **kwargs (Any): Additional arguments passed to similarity_search_by_vector

Returns:

List[Document] - List of Documents most similar to the query

similarity_search_with_score

Return docs most similar to query along with scores.

Parameters:

  • query (str): Text to look up documents similar to

  • k (int): Number of Documents to return (default: 4)

  • filter (Union[None, dict]): Optional filter by metadata

Returns:

List[Tuple[Document, float]] - List of tuples of (Document, similarity_score)

similarity_search_with_score_by_vector

Return docs most similar to embedding vector along with scores.

Parameters:

  • embedding (List[float]): Embedding vector to look up documents similar to

  • k (int): Number of Documents to return (default: 4)

  • filter (Union[None, dict]): Optional filter by metadata

Returns:

List[Tuple[Document, float]] - List of tuples of (Document, similarity_score)

similarity_search_by_vector

Return docs most similar to embedding vector.

Parameters:

  • embedding (List[float]): Embedding vector to look up documents similar to

  • k (int): Number of Documents to return (default: 4)

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments (not used)

Returns:

List[Document] - List of Documents most similar to the query vector

Return docs selected using maximal marginal relevance.

Parameters:

  • query (str): Text to look up documents similar to

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments passed to search_by_vector

Returns:

List[Document] - List of Documents selected by maximal marginal relevance

Return docs selected using maximal marginal relevance asynchronously.

Parameters:

  • query (str): Text to look up documents similar to

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments passed to search_by_vector

Returns:

List[Document] - List of Documents selected by maximal marginal relevance

max_marginal_relevance_search_with_score

Return docs selected using maximal marginal relevance with scores.

Parameters:

  • query (str): Text to look up documents similar to

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments passed to search_by_vector

Returns:

List[Tuple[Document, float]] - List of tuples of (Document, score) selected by maximal marginal relevance

amax_marginal_relevance_search_with_score

Return docs selected using maximal marginal relevance with scores asynchronously.

Parameters:

  • query (str): Text to look up documents similar to

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments passed to search_by_vector

Returns:

List[Tuple[Document, float]] - List of tuples of (Document, score) selected by maximal marginal relevance

max_marginal_relevance_search_by_vector

Return docs selected using maximal marginal relevance.

Parameters:

  • embedding (List[float]): Query embedding vector

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments (not used)

Returns:

List[Document] - List of Documents selected by maximal marginal relevance

amax_marginal_relevance_search_by_vector

Return docs selected using maximal marginal relevance asynchronously.

Parameters:

  • embedding (List[float]): Query embedding vector

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments (not used)

Returns:

List[Document] - List of Documents selected by maximal marginal relevance

max_marginal_relevance_search_with_score_by_vector

Return docs selected using maximal marginal relevance with scores.

Parameters:

  • embedding (List[float]): Query embedding vector

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments (not used)

Returns:

List[Tuple[Document, float]] - List of tuples of (Document, score) selected by maximal marginal relevance

amax_marginal_relevance_search_with_score_by_vector

Return docs selected using maximal marginal relevance with scores asynchronously.

Parameters:

  • embedding (List[float]): Query embedding vector

  • k (int): Number of documents to return (default: 4)

  • fetch_k (int): Number of documents to fetch before selecting top-k (default: 20)

  • lambda_mult (float): Balance between relevance and diversity, 0-1 (default: 0.5) 0 = maximize diversity, 1 = maximize relevance

  • filter (Union[None, dict]): Optional metadata filter

  • **kwargs (Any): Additional arguments (not used)

Returns:

List[Tuple[Document, float]] - List of tuples of (Document, score) selected by maximal marginal relevance

from_texts

Create a MariaDBStore instance from texts.

Parameters:

  • texts (List[str]): List of text strings to store

  • embedding (Embeddings): Embeddings object for creating embeddings

  • metadatas (Optional[List[dict]]): Optional list of metadata dicts for each text

  • ids (Optional[List[str]]): Optional list of unique IDs for each text

  • datasource (Optional[Union[Engine, str]]): Database connection (connection string or sqlalchemy engine)

  • collection_name (str): Name of the collection to store vectors

  • distance_strategy (DistanceStrategy): Strategy for distances (COSINE or EUCLIDEAN)

  • embedding_length (Optional[int]): Length of embedding vectors (default: 1536)

  • config (MariaDBStoreSettings): Store configuration for tables and columns

  • logger (Optional[logging.Logger]): Optional logger instance for debugging

  • relevance_score_fn (Optional[Callable[[float], float]]): Optional function to override relevance score calculation

  • **kwargs (Any): Additional arguments passed to add_embeddings

Returns:

MariaDBStore - MariaDBStore instance initialized with the provided texts

Raises:

  • ValueError: If datasource is not provided.

from_embeddings

Create a MariaDBStore instance from text-embedding pairs.

Parameters:

  • text_embeddings (List[Tuple[str, List[float]]]): List of (text, embedding) tuples

  • ids (Optional[List[str]]): Optional list of IDs for the documents

  • metadatas (Optional[List[dict]]): Optional list of metadata dicts

  • embedding (Embeddings): Embeddings object for creating embeddings

  • distance_strategy (DistanceStrategy): Strategy for computing distances

  • relevance_score_fn (Optional[Callable[[float], float]]): Optional function to compute relevance scores

  • config (MariaDBStoreSettings): Store configuration for tables and columns

  • **kwargs (Any): Additional arguments including datasource, collection_name

Returns:

MariaDBStore - MariaDBStore instance

from_existing_index

Create a MariaDBStore instance from an existing index.

Parameters:

  • embedding (Embeddings): Embeddings object for creating embeddings

  • collection_name (str): Name of collection (default: langchain)

  • distance_strategy (DistanceStrategy): Strategy for computing distances

  • datasource (Union[Engine | str]): datasource (connection string, sqlalchemy engine or MariaDB connection pool)

  • **kwargs (Any): Additional arguments passed to constructor

Returns:

MariaDBStore - MariaDBStore instance connected to existing index

from_documents

Create a MariaDBStore instance from documents.

Parameters:

  • documents (List[Document]): List of Document objects to store

  • embedding (Embeddings): Embeddings object for creating embeddings

  • ids (Optional[List[str]]): Optional list of IDs for the documents

  • datasource (Optional[Union[Engine, str]]): Database connection (connection string or sqlalchemy engine)

  • collection_name (str): Name of the collection to store vectors

  • distance_strategy (DistanceStrategy): Strategy for distances (COSINE or EUCLIDEAN)

  • embedding_length (Optional[int]): Length of embedding vectors (default: 1536)

  • config (MariaDBStoreSettings): Store configuration for tables and columns

  • logger (Optional[logging.Logger]): Optional logger instance for debugging

  • relevance_score_fn (Optional[Callable[[float], float]]): Optional function to override relevance score calculation

  • **kwargs (Any): Additional arguments passed to from_texts

Returns:

MariaDBStore - MariaDBStore instance

as_retriever

Return VectorStoreRetriever initialized from this VectorStore.

Parameters:

  • **kwargs (Any): Keyword arguments to pass to the search function. Can include:

    • search_type: Defines the type of search that the Retriever should perform. Can be 'similarity' (default), 'mmr', or 'similarity_score_threshold'.

    • search_kwargs: Keyword arguments to pass to the search function. Can include:

    • k: Amount of documents to return (Default: 4)

    • score_threshold: Minimum relevance threshold for similarity_score_threshold

    • fetch_k: Amount of documents to pass to MMR algorithm (Default: 20)

    • lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5)

    • filter: Filter by document metadata

Returns:

VectorStoreRetriever - VectorStoreRetriever instance configured for this vector store.

Examples:

Attributes

  • embedding_function:

  • collection_name:

  • collection_metadata:

  • pre_delete_collection:

  • lazy_init:

  • logger:

  • override_relevance_score_fn:

  • embeddings (Embeddings):


Last updated

Was this helpful?