Orchestration

Overview

The orchestration module provides high-level endpoints that coordinate multiple steps of the RAG (Retrieval Augmented Generation) workflow. These endpoints simplify integration by handling complex workflows in a single request.

Orchestration Endpoints

Ingestion Orchestration

POST /orchestrate/ingestion

Purpose: Orchestrates the complete ingestion workflow: document upload, processing, chunking, and embedding. This endpoint handles the entire pipeline from files to searchable chunks.

Request: multipart/form-data with files and optional configuration

Form Parameters:

  • files: One or more files to upload (optional if using file_paths)

  • file_paths: JSON array of server-side file paths (optional if using files)

  • config: JSON string with chunking configuration (optional)

Configuration JSON Structure:

{
  "chunking_method": "recursive",
  "chunk_size": 512,
  "chunk_overlap": 128,
  "threshold": 0.8,
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small",
  "embedding_batch_size": 32
}

Chunking Methods:

  • recursive: Recursive text splitting (default)

  • sentence: Sentence-based chunking

  • token: Token-based chunking

  • semantic: Semantic similarity-based chunking (requires threshold)

Response:

{
  "message": "Successfully ingested and chunked 2 documents",
  "document_ids": [42, 43],
  "chunk_count": 25,
  "processing_time_seconds": 15.2
}

Usage Example: Upload and process documents in a single request.

# Upload files with custom chunking config
curl -X POST "http://localhost:8000/orchestrate/ingestion" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.docx" \
  -F 'config={"chunking_method":"semantic","chunk_size":512,"threshold":0.8}'

# Use server-side file paths
curl -X POST "http://localhost:8000/orchestrate/ingestion" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F 'file_paths=["./docs/file1.pdf","./docs/file2.pdf"]' \
  -F 'config={"chunking_method":"recursive","chunk_size":512}'

Generation Orchestration

POST /orchestrate/generation

Purpose: Orchestrates the complete RAG workflow: retrieval and text generation in a single request. Automatically retrieves relevant chunks and generates a response.

Request body:

{
  "query": "What are the key features?",
  "document_ids": [42, 43],
  "retrieval_method": "hybrid",
  "top_k": 5,
  "llm_provider": "openai",
  "llm_model": "gpt-4",
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 1000
}

Request Parameters:

  • query (required): The user's question or prompt

  • document_ids (optional): Filter retrieval to specific documents (default: all documents)

  • retrieval_method (optional): semantic, fulltext, or hybrid (default: hybrid)

  • top_k (optional): Number of chunks to retrieve (default: 5)

  • llm_provider (optional): LLM provider - openai, anthropic, gemini, cohere, ollama, azure, bedrock

  • llm_model (optional): Specific model to use

  • temperature (optional): Controls randomness (0.0-2.0, default: 0.7)

  • top_p (optional): Nucleus sampling (0.0-1.0, default: 0.9)

  • max_tokens (optional): Maximum tokens to generate (1-8192, default: 1000)

  • stream (optional): Enable streaming (default: false)

Response:

{
  "query": "What are the key features?",
  "response": "The key features include document processing, semantic search, and AI-powered generation...",
  "chunks_used": 5,
  "retrieval_method": "hybrid",
  "processing_time_seconds": 2.5,
  "retrieval_time_ms": 150.3,
  "generation_time_ms": 2350.7
}

Response Fields:

  • query: The original query

  • response: The generated response

  • chunks_used: Number of chunks used for generation

  • retrieval_method: The retrieval method used

  • processing_time_seconds: Total processing time

  • retrieval_time_ms: Time spent on retrieval (optional)

  • generation_time_ms: Time spent on generation (optional)

Usage Example: Query documents and get an AI-generated response in one call.

curl -X POST "http://localhost:8000/orchestrate/generation" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key features?",
    "document_ids": [42, 43],
    "retrieval_method": "hybrid",
    "top_k": 5,
    "llm_provider": "openai",
    "llm_model": "gpt-4"
  }'

Streaming Generation Orchestration

POST /orchestrate/generation-stream

Purpose: Same as the generation endpoint but returns the response as a Server-Sent Events (SSE) stream for real-time display.

Request body: Same as /orchestrate/generation

Response: Server-Sent Events (SSE) stream with the following event types:

// Start event
{"type": "start", "query": "What are the key features?"}

// Retrieval complete event
{"type": "retrieval_complete", "chunks_retrieved": 5, "retrieval_time_ms": 150.3}

// Token events (streamed as generated)
{"type": "token", "content": "The", "chunk_index": 1}
{"type": "token", "content": " key", "chunk_index": 2}
{"type": "token", "content": " features", "chunk_index": 3}

// Completion event
{"type": "complete", "total_time_seconds": 2.5, "tokens_generated": 150}

// Error event (if error occurs)
{"type": "error", "message": "Error description"}

Usage Example: Use this endpoint for real-time user interfaces with progressive response display.

curl -X POST "http://localhost:8000/orchestrate/generation-stream" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key features?",
    "document_ids": [42, 43],
    "retrieval_method": "hybrid",
    "top_k": 5
  }'

Full Pipeline

POST /orchestrate/full-pipeline

Purpose: Executes the complete RAG pipeline in a single request: document ingestion, chunking, embedding, retrieval, and generation. This is the ultimate one-call solution for processing new documents and getting an AI-generated answer.

Request: multipart/form-data with files and configuration

Form Parameters:

  • files: One or more files to upload (optional if using file_paths)

  • file_paths: JSON array of server-side file paths (optional if using files)

  • query: The query string (required)

  • config: JSON string with pipeline configuration (optional)

Configuration JSON Structure:

{
  "chunking_method": "recursive",
  "chunk_size": 512,
  "chunk_overlap": 128,
  "retrieval_method": "hybrid",
  "top_k": 5,
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small",
  "llm_provider": "openai",
  "llm_model": "gpt-4",
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 1000
}

Response:

{
  "message": "Full pipeline completed successfully",
  "document_ids": [42, 43],
  "chunk_count": 25,
  "query": "What are the key features?",
  "response": "The key features include document processing, semantic search, and AI-powered generation...",
  "chunks_used": 5,
  "total_processing_time_seconds": 20.5
}

Response Fields:

  • message: Status message

  • document_ids: IDs of ingested documents

  • chunk_count: Total chunks created

  • query: The original query

  • response: The AI-generated response

  • chunks_used: Number of chunks used for generation

  • total_processing_time_seconds: Total pipeline execution time

Usage Example: Process new documents and get an answer in one request.

# Upload files and query
curl -X POST "http://localhost:8000/orchestrate/full-pipeline" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.docx" \
  -F "query=What are the key features?" \
  -F 'config={"chunking_method":"semantic","retrieval_method":"hybrid","top_k":5}'

# Use server-side file paths
curl -X POST "http://localhost:8000/orchestrate/full-pipeline" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F 'file_paths=["./docs/file1.pdf","./docs/file2.pdf"]' \
  -F "query=Summarize the main points" \
  -F 'config={"chunking_method":"recursive","llm_model":"gpt-4"}'

This page is: Copyright © 2025 MariaDB. All rights reserved.

Last updated

Was this helpful?