Orchestration

Overview

The orchestration module provides high-level endpoints that coordinate multiple steps of the RAG (Retrieval Augmented Generation) workflow. These endpoints simplify integration by handling complex workflows in a single request.

Orchestration Endpoints

Ingestion Orchestration

POST /orchestrate/ingestion

Purpose: Orchestrates the complete ingestion workflow: document upload, processing, chunking, and embedding. This endpoint handles the entire pipeline from files to searchable chunks.

Request: multipart/form-data with files and optional configuration

Form Parameters:

  • files: One or more files to upload (optional if using file_paths)

  • file_paths: JSON array of server-side file paths (optional if using files)

  • config: JSON string with chunking configuration (optional)

Configuration JSON Structure:

{
  "chunking_method": "recursive",
  "chunk_size": 512,
  "chunk_overlap": 128,
  "threshold": 0.8,
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small",
  "embedding_batch_size": 32
}

Chunking Methods:

  • recursive: Recursive text splitting (default)

  • sentence: Sentence-based chunking

  • token: Token-based chunking

  • semantic: Semantic similarity-based chunking (requires threshold)

Response:

Usage Example: Upload and process documents in a single request.

Generation Orchestration

Purpose: Orchestrates the complete RAG workflow: retrieval and text generation in a single request. Automatically retrieves relevant chunks and generates a response.

Request body:

Request Parameters:

  • query (required): The user's question or prompt

  • document_ids (optional): Filter retrieval to specific documents (default: all documents)

  • retrieval_method (optional): semantic, fulltext, or hybrid (default: hybrid)

  • top_k (optional): Number of chunks to retrieve (default: 5)

  • llm_provider (optional): LLM provider - openai, anthropic, gemini, cohere, ollama, azure, bedrock

  • llm_model (optional): Specific model to use

  • temperature (optional): Controls randomness (0.0-2.0, default: 0.7)

  • top_p (optional): Nucleus sampling (0.0-1.0, default: 0.9)

  • max_tokens (optional): Maximum tokens to generate (1-8192, default: 1000)

  • stream (optional): Enable streaming (default: false)

Response:

Response Fields:

  • query: The original query

  • response: The generated response

  • chunks_used: Number of chunks used for generation

  • retrieval_method: The retrieval method used

  • processing_time_seconds: Total processing time

  • retrieval_time_ms: Time spent on retrieval (optional)

  • generation_time_ms: Time spent on generation (optional)

Usage Example: Query documents and get an AI-generated response in one call.

Streaming Generation Orchestration

Purpose: Same as the generation endpoint but returns the response as a Server-Sent Events (SSE) stream for real-time display.

Request body: Same as /orchestrate/generation

Response: Server-Sent Events (SSE) stream with the following event types:

Usage Example: Use this endpoint for real-time user interfaces with progressive response display.

Full Pipeline

Purpose: Executes the complete RAG pipeline in a single request: document ingestion, chunking, embedding, retrieval, and generation. This is the ultimate one-call solution for processing new documents and getting an AI-generated answer.

Request: multipart/form-data with files and configuration

Form Parameters:

  • files: One or more files to upload (optional if using file_paths)

  • file_paths: JSON array of server-side file paths (optional if using files)

  • query: The query string (required)

  • config: JSON string with pipeline configuration (optional)

Configuration JSON Structure:

Response:

Response Fields:

  • message: Status message

  • document_ids: IDs of ingested documents

  • chunk_count: Total chunks created

  • query: The original query

  • response: The AI-generated response

  • chunks_used: Number of chunks used for generation

  • total_processing_time_seconds: Total pipeline execution time

Usage Example: Process new documents and get an answer in one request.

This page is: Copyright © 2025 MariaDB. All rights reserved.

Last updated

Was this helpful?