githubEdit

Orchestration

MariaDB AI RAG orchestration endpoints coordinate multi-step RAG workflows in one request, covering ingestion, chunking, retrieval, streaming generation, and an end-to-end pipeline call.

Overview

The orchestration module provides high-level endpoints that coordinate multiple steps of the RAG (Retrieval Augmented Generation) workflow. These endpoints simplify integration by handling complex workflows in a single request.

Orchestration Endpoints

Ingestion Orchestration

POST /orchestrate/ingestion

Purpose: Orchestrates the complete ingestion workflow: document upload, processing, chunking, and embedding. This endpoint handles the entire pipeline from files to searchable chunks.

Request: multipart/form-data with files and optional configuration

Form Parameters:

  • files: One or more files to upload (optional if using file_paths)

  • file_paths: JSON array of server-side file paths (optional if using files)

  • config: JSON string with chunking configuration (optional)

Configuration JSON Structure:

{
  "chunking_method": "semantic",
  "chunk_size": 512,
  "cloud_storage_sources": [
    {
      "integration_id": "int_abc123",
      "prefix": "financial_reports/Q3/",
      "recursive": true,
      "file_extensions": [".pdf"]
    }
  ],
  "document_processing": {
    "processor_type": "layout_aware_standard",
    "enable_ocr": true,
    "ocr_provider": "rapidocr",
    "enable_table_extraction": true,
    "table_structure_mode": "accurate"
  }
}

Parameters:

  • chunking_method

    • recursive: Recursive text splitting (default)

    • sentence: Sentence-based chunking

    • token: Token-based chunking

    • semantic: Semantic similarity-based chunking (requires threshold)

  • chunk_size: Number of tokens/characters per chunk (default: 512).

  • cloud_storage_sources (optional): A JSON array used to ingest from an external integration (S3, GCS, or MinIO). Requires an integration_id and a prefix.

  • document_processing (optional): A stringified JSON object defining the extraction tier:

    • processor_type: "base", "layout_aware_standard" (Docling), or "layout_aware_advanced" (LlamaParse).

    • enable_ocr: Boolean to turn on OCR for scanned documents.

Response:

Usage Example: Upload and process documents in a single request.

Generation Orchestration

Purpose: Orchestrates the complete RAG workflow: retrieval and text generation in a single request. Automatically retrieves relevant chunks and generates a response.

Request body (with a reranking object):

Request Parameters:

  • query (required): The user's question or prompt

  • document_ids (optional): Filter retrieval to specific documents (default: all documents)

  • retrieval_method (optional): semantic, fulltext, or hybrid (default: hybrid)

  • model_type (optional): The backend library to use. Valid values: "flashrank", "sentence-transformers", "cohere", "hybrid" (default: "flashrank").

  • model_name (optional): The specific reranker model to load (default: "ms-marco-MiniLM-L-12-v2")

  • top_k (optional): Number of chunks to retrieve (default: 5)

  • reranking (optional): A JSON object to enable a high-accuracy second pass.

    • enabled (optional): Set to true to activate reranking (default: false).

    • model_type (optional): The backend library (flashrank, sentence-transformers, cohere, hybrid) (default: "flashrank").

    • model_name (optional): The specific reranker model to load (default: "ms-marco-MiniLM-L-12-v2").

    • top_k (optional): Number of reranked results to return to the LLM.

  • llm_provider (optional): LLM provider - openai, anthropic, gemini, cohere, ollama, azure, bedrock

  • temperature (optional): Controls randomness (0.0-2.0, default: 0.7)

  • top_p (optional): Nucleus sampling (0.0-1.0, default: 0.9)

  • max_tokens (optional): Maximum tokens to generate (1-8192, default: 500)

  • stream (optional): Enable streaming (default: false)

Response:

Response Fields:

  • query: The original query

  • response: The generated response

  • chunks_used: Number of chunks used for generation

  • retrieval_method: The retrieval method used

  • processing_time_seconds: Total processing time

  • retrieval_time_ms: Time spent on retrieval (optional)

  • generation_time_ms: Time spent on generation (optional)

Usage Example: Query documents and get an AI-generated response in one call.

Streaming Generation Orchestration

Purpose: Same as the generation endpoint but returns the response as a Server-Sent Events (SSE) stream for real-time display.

Request body: Same as /orchestrate/generation

Response: Server-Sent Events (SSE) stream with the following event types:

Usage Example: Use this endpoint for real-time user interfaces with progressive response display.

Full Pipeline

Purpose: Executes the complete RAG pipeline in a single request: document ingestion, chunking, embedding, retrieval, and generation. This is the ultimate one-call solution for processing new documents and getting an AI-generated answer.

Request: multipart/form-data with files and configuration

Form Parameters:

  • files: One or more files to upload (optional if using file_paths)

  • file_paths: JSON array of server-side file paths (optional if using files)

  • query: The query string (required)

  • config: JSON string with pipeline configuration (optional)

Configuration JSON Structure:

Response:

Response Fields:

  • message: Status message

  • document_ids: IDs of ingested documents

  • chunk_count: Total chunks created

  • query: The original query

  • response: The AI-generated response

  • chunks_used: Number of chunks used for generation

  • total_processing_time_seconds: Total pipeline execution time

Usage Example: Process new documents and get an answer in one request.

This page is: Copyright © 2025 MariaDB. All rights reserved.

spinner

Last updated

Was this helpful?