Orchestration

Overview

The orchestration module provides high-level endpoints that coordinate multiple steps of the RAG (Retrieval Augmented Generation) workflow. These endpoints simplify integration by handling complex workflows in a single request.

Orchestration Endpoints

Ingestion Orchestration

POST /orchestrate/ingestion

Purpose: Orchestrates the complete ingestion workflow: document upload, processing, chunking, and embedding. This endpoint handles the entire pipeline from files to searchable chunks.

Request: multipart/form-data with files and optional configuration

Form Parameters:

files: One or more files to upload (optional if using file_paths)
file_paths: JSON array of server-side file paths (optional if using files)
config: JSON string with chunking configuration (optional)

Configuration JSON Structure:

{
  "chunking_method": "recursive",
  "chunk_size": 512,
  "chunk_overlap": 128,
  "threshold": 0.8,
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small",
  "embedding_batch_size": 32
}

Chunking Methods:

recursive: Recursive text splitting (default)
sentence: Sentence-based chunking
token: Token-based chunking
semantic: Semantic similarity-based chunking (requires threshold)

Response:

{
  "message": "Successfully ingested and chunked 2 documents",
  "document_ids": [42, 43],
  "chunk_count": 25,
  "processing_time_seconds": 15.2
}

Usage Example: Upload and process documents in a single request.

# Upload files with custom chunking config
curl -X POST "http://localhost:8000/orchestrate/ingestion" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.docx" \
  -F 'config={"chunking_method":"semantic","chunk_size":512,"threshold":0.8}'

# Use server-side file paths
curl -X POST "http://localhost:8000/orchestrate/ingestion" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F 'file_paths=["./docs/file1.pdf","./docs/file2.pdf"]' \
  -F 'config={"chunking_method":"recursive","chunk_size":512}'

Generation Orchestration

POST /orchestrate/generation

Purpose: Orchestrates the complete RAG workflow: retrieval and text generation in a single request. Automatically retrieves relevant chunks and generates a response.

Request body:

{
  "query": "What are the key features?",
  "document_ids": [42, 43],
  "retrieval_method": "hybrid",
  "top_k": 5,
  "llm_provider": "openai",
  "llm_model": "gpt-4",
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 1000
}

Request Parameters:

query (required): The user's question or prompt
document_ids (optional): Filter retrieval to specific documents (default: all documents)
retrieval_method (optional): semantic, fulltext, or hybrid (default: hybrid)
top_k (optional): Number of chunks to retrieve (default: 5)
llm_provider (optional): LLM provider - openai, anthropic, gemini, cohere, ollama, azure, bedrock
llm_model (optional): Specific model to use
temperature (optional): Controls randomness (0.0-2.0, default: 0.7)
top_p (optional): Nucleus sampling (0.0-1.0, default: 0.9)
max_tokens (optional): Maximum tokens to generate (1-8192, default: 1000)
stream (optional): Enable streaming (default: false)

Response:

{
  "query": "What are the key features?",
  "response": "The key features include document processing, semantic search, and AI-powered generation...",
  "chunks_used": 5,
  "retrieval_method": "hybrid",
  "processing_time_seconds": 2.5,
  "retrieval_time_ms": 150.3,
  "generation_time_ms": 2350.7
}

Response Fields:

query: The original query
response: The generated response
chunks_used: Number of chunks used for generation
retrieval_method: The retrieval method used
processing_time_seconds: Total processing time
retrieval_time_ms: Time spent on retrieval (optional)
generation_time_ms: Time spent on generation (optional)

Usage Example: Query documents and get an AI-generated response in one call.

curl -X POST "http://localhost:8000/orchestrate/generation" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key features?",
    "document_ids": [42, 43],
    "retrieval_method": "hybrid",
    "top_k": 5,
    "llm_provider": "openai",
    "llm_model": "gpt-4"
  }'

Streaming Generation Orchestration

POST /orchestrate/generation-stream

Purpose: Same as the generation endpoint but returns the response as a Server-Sent Events (SSE) stream for real-time display.

Request body: Same as /orchestrate/generation

Response: Server-Sent Events (SSE) stream with the following event types:

// Start event
{"type": "start", "query": "What are the key features?"}

// Retrieval complete event
{"type": "retrieval_complete", "chunks_retrieved": 5, "retrieval_time_ms": 150.3}

// Token events (streamed as generated)
{"type": "token", "content": "The", "chunk_index": 1}
{"type": "token", "content": " key", "chunk_index": 2}
{"type": "token", "content": " features", "chunk_index": 3}

// Completion event
{"type": "complete", "total_time_seconds": 2.5, "tokens_generated": 150}

// Error event (if error occurs)
{"type": "error", "message": "Error description"}

Usage Example: Use this endpoint for real-time user interfaces with progressive response display.

curl -X POST "http://localhost:8000/orchestrate/generation-stream" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key features?",
    "document_ids": [42, 43],
    "retrieval_method": "hybrid",
    "top_k": 5
  }'

Full Pipeline

POST /orchestrate/full-pipeline

Purpose: Executes the complete RAG pipeline in a single request: document ingestion, chunking, embedding, retrieval, and generation. This is the ultimate one-call solution for processing new documents and getting an AI-generated answer.

Request: multipart/form-data with files and configuration

Form Parameters:

files: One or more files to upload (optional if using file_paths)
file_paths: JSON array of server-side file paths (optional if using files)
query: The query string (required)
config: JSON string with pipeline configuration (optional)

Configuration JSON Structure:

{
  "chunking_method": "recursive",
  "chunk_size": 512,
  "chunk_overlap": 128,
  "retrieval_method": "hybrid",
  "top_k": 5,
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small",
  "llm_provider": "openai",
  "llm_model": "gpt-4",
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 1000
}

Response:

{
  "message": "Full pipeline completed successfully",
  "document_ids": [42, 43],
  "chunk_count": 25,
  "query": "What are the key features?",
  "response": "The key features include document processing, semantic search, and AI-powered generation...",
  "chunks_used": 5,
  "total_processing_time_seconds": 20.5
}

Response Fields:

message: Status message
document_ids: IDs of ingested documents
chunk_count: Total chunks created
query: The original query
response: The AI-generated response
chunks_used: Number of chunks used for generation
total_processing_time_seconds: Total pipeline execution time

Usage Example: Process new documents and get an answer in one request.

# Upload files and query
curl -X POST "http://localhost:8000/orchestrate/full-pipeline" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.docx" \
  -F "query=What are the key features?" \
  -F 'config={"chunking_method":"semantic","retrieval_method":"hybrid","top_k":5}'

# Use server-side file paths
curl -X POST "http://localhost:8000/orchestrate/full-pipeline" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F 'file_paths=["./docs/file1.pdf","./docs/file2.pdf"]' \
  -F "query=Summarize the main points" \
  -F 'config={"chunking_method":"recursive","llm_model":"gpt-4"}'

PreviousAccess Control NextPerformance & Troubleshooting

Last updated 4 months ago

Was this helpful?

Good afternoon

hashtagOverview

hashtagOrchestration Endpoints

hashtagIngestion Orchestration

hashtagGeneration Orchestration

hashtagStreaming Generation Orchestration

hashtagFull Pipeline

Overview

Orchestration Endpoints

Ingestion Orchestration

Generation Orchestration

Streaming Generation Orchestration

Full Pipeline