Orchestration
Overview
The orchestration module provides high-level endpoints that coordinate multiple steps of the RAG (Retrieval Augmented Generation) workflow. These endpoints simplify integration by handling complex workflows in a single request.
Orchestration Endpoints
Ingestion Orchestration
POST /orchestrate/ingestionPurpose: Orchestrates the complete ingestion workflow: document upload, processing, chunking, and embedding. This endpoint handles the entire pipeline from files to searchable chunks.
Request: multipart/form-data with files and optional configuration
Form Parameters:
files: One or more files to upload (optional if usingfile_paths)file_paths: JSON array of server-side file paths (optional if usingfiles)config: JSON string with chunking configuration (optional)
Configuration JSON Structure:
{
"chunking_method": "recursive",
"chunk_size": 512,
"chunk_overlap": 128,
"threshold": 0.8,
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small",
"embedding_batch_size": 32
}Chunking Methods:
recursive: Recursive text splitting (default)sentence: Sentence-based chunkingtoken: Token-based chunkingsemantic: Semantic similarity-based chunking (requiresthreshold)
Response:
{
"message": "Successfully ingested and chunked 2 documents",
"document_ids": [42, 43],
"chunk_count": 25,
"processing_time_seconds": 15.2
}Usage Example: Upload and process documents in a single request.
# Upload files with custom chunking config
curl -X POST "http://localhost:8000/orchestrate/ingestion" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "files=@/path/to/document1.pdf" \
-F "files=@/path/to/document2.docx" \
-F 'config={"chunking_method":"semantic","chunk_size":512,"threshold":0.8}'
# Use server-side file paths
curl -X POST "http://localhost:8000/orchestrate/ingestion" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F 'file_paths=["./docs/file1.pdf","./docs/file2.pdf"]' \
-F 'config={"chunking_method":"recursive","chunk_size":512}'Generation Orchestration
POST /orchestrate/generationPurpose: Orchestrates the complete RAG workflow: retrieval and text generation in a single request. Automatically retrieves relevant chunks and generates a response.
Request body:
{
"query": "What are the key features?",
"document_ids": [42, 43],
"retrieval_method": "hybrid",
"top_k": 5,
"llm_provider": "openai",
"llm_model": "gpt-4",
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 1000
}Request Parameters:
query(required): The user's question or promptdocument_ids(optional): Filter retrieval to specific documents (default: all documents)retrieval_method(optional):semantic,fulltext, orhybrid(default:hybrid)top_k(optional): Number of chunks to retrieve (default: 5)llm_provider(optional): LLM provider -openai,anthropic,gemini,cohere,ollama,azure,bedrockllm_model(optional): Specific model to usetemperature(optional): Controls randomness (0.0-2.0, default: 0.7)top_p(optional): Nucleus sampling (0.0-1.0, default: 0.9)max_tokens(optional): Maximum tokens to generate (1-8192, default: 1000)stream(optional): Enable streaming (default: false)
Response:
{
"query": "What are the key features?",
"response": "The key features include document processing, semantic search, and AI-powered generation...",
"chunks_used": 5,
"retrieval_method": "hybrid",
"processing_time_seconds": 2.5,
"retrieval_time_ms": 150.3,
"generation_time_ms": 2350.7
}Response Fields:
query: The original queryresponse: The generated responsechunks_used: Number of chunks used for generationretrieval_method: The retrieval method usedprocessing_time_seconds: Total processing timeretrieval_time_ms: Time spent on retrieval (optional)generation_time_ms: Time spent on generation (optional)
Usage Example: Query documents and get an AI-generated response in one call.
curl -X POST "http://localhost:8000/orchestrate/generation" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the key features?",
"document_ids": [42, 43],
"retrieval_method": "hybrid",
"top_k": 5,
"llm_provider": "openai",
"llm_model": "gpt-4"
}'Streaming Generation Orchestration
POST /orchestrate/generation-streamPurpose: Same as the generation endpoint but returns the response as a Server-Sent Events (SSE) stream for real-time display.
Request body: Same as /orchestrate/generation
Response: Server-Sent Events (SSE) stream with the following event types:
// Start event
{"type": "start", "query": "What are the key features?"}
// Retrieval complete event
{"type": "retrieval_complete", "chunks_retrieved": 5, "retrieval_time_ms": 150.3}
// Token events (streamed as generated)
{"type": "token", "content": "The", "chunk_index": 1}
{"type": "token", "content": " key", "chunk_index": 2}
{"type": "token", "content": " features", "chunk_index": 3}
// Completion event
{"type": "complete", "total_time_seconds": 2.5, "tokens_generated": 150}
// Error event (if error occurs)
{"type": "error", "message": "Error description"}Usage Example: Use this endpoint for real-time user interfaces with progressive response display.
curl -X POST "http://localhost:8000/orchestrate/generation-stream" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the key features?",
"document_ids": [42, 43],
"retrieval_method": "hybrid",
"top_k": 5
}'Full Pipeline
POST /orchestrate/full-pipelinePurpose: Executes the complete RAG pipeline in a single request: document ingestion, chunking, embedding, retrieval, and generation. This is the ultimate one-call solution for processing new documents and getting an AI-generated answer.
Request: multipart/form-data with files and configuration
Form Parameters:
files: One or more files to upload (optional if usingfile_paths)file_paths: JSON array of server-side file paths (optional if usingfiles)query: The query string (required)config: JSON string with pipeline configuration (optional)
Configuration JSON Structure:
{
"chunking_method": "recursive",
"chunk_size": 512,
"chunk_overlap": 128,
"retrieval_method": "hybrid",
"top_k": 5,
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small",
"llm_provider": "openai",
"llm_model": "gpt-4",
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 1000
}Response:
{
"message": "Full pipeline completed successfully",
"document_ids": [42, 43],
"chunk_count": 25,
"query": "What are the key features?",
"response": "The key features include document processing, semantic search, and AI-powered generation...",
"chunks_used": 5,
"total_processing_time_seconds": 20.5
}Response Fields:
message: Status messagedocument_ids: IDs of ingested documentschunk_count: Total chunks createdquery: The original queryresponse: The AI-generated responsechunks_used: Number of chunks used for generationtotal_processing_time_seconds: Total pipeline execution time
Usage Example: Process new documents and get an answer in one request.
# Upload files and query
curl -X POST "http://localhost:8000/orchestrate/full-pipeline" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "files=@/path/to/document1.pdf" \
-F "files=@/path/to/document2.docx" \
-F "query=What are the key features?" \
-F 'config={"chunking_method":"semantic","retrieval_method":"hybrid","top_k":5}'
# Use server-side file paths
curl -X POST "http://localhost:8000/orchestrate/full-pipeline" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F 'file_paths=["./docs/file1.pdf","./docs/file2.pdf"]' \
-F "query=Summarize the main points" \
-F 'config={"chunking_method":"recursive","llm_model":"gpt-4"}'This page is: Copyright © 2025 MariaDB. All rights reserved.
Last updated
Was this helpful?

