Orchestration
Overview
The orchestration module provides high-level endpoints that coordinate multiple steps of the RAG (Retrieval Augmented Generation) workflow. These endpoints simplify integration by handling complex workflows in a single request.
Orchestration Endpoints
Ingestion Orchestration
POST /orchestrate/ingestionPurpose: Orchestrates the complete ingestion workflow: document upload, processing, chunking, and embedding. This endpoint handles the entire pipeline from files to searchable chunks.
Request: multipart/form-data with files and optional configuration
Form Parameters:
files: One or more files to upload (optional if usingfile_paths)file_paths: JSON array of server-side file paths (optional if usingfiles)config: JSON string with chunking configuration (optional)
Configuration JSON Structure:
{
"chunking_method": "recursive",
"chunk_size": 512,
"chunk_overlap": 128,
"threshold": 0.8,
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small",
"embedding_batch_size": 32
}Chunking Methods:
recursive: Recursive text splitting (default)sentence: Sentence-based chunkingtoken: Token-based chunkingsemantic: Semantic similarity-based chunking (requiresthreshold)
Response:
Usage Example: Upload and process documents in a single request.
Generation Orchestration
Purpose: Orchestrates the complete RAG workflow: retrieval and text generation in a single request. Automatically retrieves relevant chunks and generates a response.
Request body:
Request Parameters:
query(required): The user's question or promptdocument_ids(optional): Filter retrieval to specific documents (default: all documents)retrieval_method(optional):semantic,fulltext, orhybrid(default:hybrid)top_k(optional): Number of chunks to retrieve (default: 5)llm_provider(optional): LLM provider -openai,anthropic,gemini,cohere,ollama,azure,bedrockllm_model(optional): Specific model to usetemperature(optional): Controls randomness (0.0-2.0, default: 0.7)top_p(optional): Nucleus sampling (0.0-1.0, default: 0.9)max_tokens(optional): Maximum tokens to generate (1-8192, default: 1000)stream(optional): Enable streaming (default: false)
Response:
Response Fields:
query: The original queryresponse: The generated responsechunks_used: Number of chunks used for generationretrieval_method: The retrieval method usedprocessing_time_seconds: Total processing timeretrieval_time_ms: Time spent on retrieval (optional)generation_time_ms: Time spent on generation (optional)
Usage Example: Query documents and get an AI-generated response in one call.
Streaming Generation Orchestration
Purpose: Same as the generation endpoint but returns the response as a Server-Sent Events (SSE) stream for real-time display.
Request body: Same as /orchestrate/generation
Response: Server-Sent Events (SSE) stream with the following event types:
Usage Example: Use this endpoint for real-time user interfaces with progressive response display.
Full Pipeline
Purpose: Executes the complete RAG pipeline in a single request: document ingestion, chunking, embedding, retrieval, and generation. This is the ultimate one-call solution for processing new documents and getting an AI-generated answer.
Request: multipart/form-data with files and configuration
Form Parameters:
files: One or more files to upload (optional if usingfile_paths)file_paths: JSON array of server-side file paths (optional if usingfiles)query: The query string (required)config: JSON string with pipeline configuration (optional)
Configuration JSON Structure:
Response:
Response Fields:
message: Status messagedocument_ids: IDs of ingested documentschunk_count: Total chunks createdquery: The original queryresponse: The AI-generated responsechunks_used: Number of chunks used for generationtotal_processing_time_seconds: Total pipeline execution time
Usage Example: Process new documents and get an answer in one request.
This page is: Copyright © 2025 MariaDB. All rights reserved.
Last updated
Was this helpful?

