Technical Architecture

System Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Windows Host System                         │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │              Docker Desktop (WSL 2 Backend)                 │    │
│  │                                                             │    │
│  │  ┌────────────────────────────────────────────────────────┐ │    │
│  │  │              Docker Network: ai-nexus-network          │ │    │
│  │  │                  (Bridge Driver)                       │ │    │
│  │  │                                                        │ │    │
│  │  │  ┌──────────────────────────────────────────────────┐  │ │    │
│  │  │  │      ai-nexus Container (Ubuntu 24.04)           │  │ │    │
│  │  │  │                                                  │  │ │    │
│  │  │  │  ┌────────────────────────────────────────────┐  │  │ │    │
│  │  │  │  │  Process 1: RAG API (PID: dynamic)         │  │  │ │    │
│  │  │  │  │  - Framework: FastAPI                      │  │  │ │    │
│  │  │  │  │  - Server: Uvicorn (ASGI)                  │  │  │ │    │
│  │  │  │  │  - Bind: 0.0.0.0:8000                      │  │  │ │    │
│  │  │  │  │  - Workers: 1                              │  │  │ │    │
│  │  │  │  │  - Binary: /opt/rag-in-a-box/bin/rag-api   │  │  │ │    │
│  │  │  │  └────────────────────────────────────────────┘  │  │ │    │
│  │  │  │                                                  │  │ │    │
│  │  │  │  ┌────────────────────────────────────────────┐  │  │ │    │
│  │  │  │  │  Process 2: MCP Server (PID: dynamic)      │  │  │ │    │
│  │  │  │  │  - Framework: FastAPI                      │  │  │ │    │
│  │  │  │  │  - Server: Uvicorn (ASGI)                  │  │  │ │    │
│  │  │  │  │  - Bind: 0.0.0.0:8002                      │  │  │ │    │
│  │  │  │  │  - Workers: 1                              │  │  │ │    │
│  │  │  │  │  - Binary: /opt/rag-in-a-box/bin/mcp-server│  │  │ │    │
│  │  │  │  └────────────────────────────────────────────┘  │  │ │    │
│  │  │  │                                                  │  │ │    │
│  │  │  │  Startup: start-services.sh                      │  │ │    │
│  │  │  │  Health Check: 180s timeout, 10s interval        │  │ │    │
│  │  │  └──────────────────┬────────────────────────────┘     │ │    │
│  │  │                     │                                  │ │    │
│  │  │                     │ MySQL Protocol (Port 3306)       │ │    │
│  │  │                     │                                  │ │    │
│  │  │  ┌──────────────────▼────────────────────────────┐     │ │    │
│  │  │  │      mysql-db Container (MariaDB 11)          │     │ │    │
│  │  │  │                                               │     │ │    │
│  │  │  │  ┌──────────────────────────────────────────┐ │     │ │    │
│  │  │  │  │  MariaDB Server                          │ │     │ │    │
│  │  │  │  │  - Version: 11.x                         │ │     │ │    │
│  │  │  │  │  - Storage Engine: InnoDB                │ │     │ │    │
│  │  │  │  │  - Character Set: utf8mb4                │ │     │ │    │
│  │  │  │  │  - Collation: utf8mb4_unicode_ci         │ │     │ │    │
│  │  │  │  │  - Page Size: 16KB                       │ │     │ │    │
│  │  │  │  │  - Row Format: Dynamic                   │ │     │ │    │
│  │  │  │  └──────────────────────────────────────────┘ │     │ │    │
│  │  │  │                                               │     │ │    │
│  │  │  │  ┌──────────────────────────────────────────┐ │     │ │    │
│  │  │  │  │  Persistent Volume: mysql_data           │ │     │ │    │
│  │  │  │  │  - Database: kb_chunks                   │ │     │ │    │
│  │  │  │  │  - Tables: documents_*, vdb_tbl_*        │ │     │ │    │
│  │  │  │  │  - Indexes: Vector indexes               │ │     │ │    │
│  │  │  │  └──────────────────────────────────────────┘ │     │ │    │
│  │  │  └────────────────────────────────────────────┘ │ │           │
│  │  └─────────────────────────────────────────────────┘ │           │
│  └───────────────────────────────────────────────────────┘          │
│                                                                     │
│  Port Mappings (Host → Container):                                  │
│  - 8000:8000  (RAG API)                                             │
│  - 8002:8002  (MCP Server)                                          │
│  - 3306:3306  (MariaDB)                                             │
└─────────────────────────────────────────────────────────────────────┘

External Services (Internet):
┌─────────────────────────────────────────────────┐
│  Google Generative AI API                       │
│  - Endpoint: generativelanguage.googleapis.com  │
│  - Embedding: text-embedding-004                │
│  - LLM: gemini-2.0-flash                        │
└─────────────────────────────────────────────────┘

Container Dependency Graph

Start Order:
1. mysql-db (MariaDB)
   ├─ Health Check: 30s start period, 10s interval
   └─ Condition: service_healthy

2. ai-nexus (Application)
   ├─ Depends on: mysql-db (healthy)
   ├─ Startup Script: start-services.sh
   │   ├─ Start RAG API (background)
   │   ├─ Wait for RAG API health (max 180s)
   │   └─ Start MCP Server (foreground)
   └─ Restart Policy: unless-stopped

Component Details

1. RAG API Component

Binary Location: /opt/rag-in-a-box/bin/rag-api

Responsibilities:

Document ingestion and processing
Text chunking and embedding generation
Vector storage and retrieval
Semantic search
RAG query processing
Authentication and authorization

Technology Stack:

Framework: FastAPI (Python)
ASGI Server: Uvicorn
Database Driver: PyMySQL / aiomysql
Embedding Client: Google Generative AI SDK
Document Processing: LangChain / Custom parsers

Endpoints:

Authentication:
POST   /token                 - Generate JWT token

Document Management:
POST   /documents/ingest      - Upload and process documents
GET    /documents             - List all documents
GET    /documents/{id}        - Get document details
DELETE /documents/{id}        - Delete document

RAG Operations:
POST   /generate              - Generate RAG response
POST   /search                - Semantic search
GET    /embeddings/{doc_id}   - Get document embeddings

Health & Status:
GET    /health                - Health check
GET    /                      - API info
GET    /docs                  - Swagger UI
GET    /openapi.json          - OpenAPI spec

Configuration Variables:

APP_HOST=0.0.0.0
APP_PORT=8000
DB_HOST=mysql-db
DB_PORT=3306
DB_USER=root
DB_PASSWORD=your_secure_database_password
DB_NAME=kb_chunks
GEMINI_API_KEY=your_gemini_api_key
SECRET_KEY=your_generated_secret_key
JWT_SECRET_KEY=<secret>
EMBEDDING_PROVIDER=gemini
embedding_model=text-embedding-004
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.0-flash
DOCUMENTS_TABLE=documents_DEMO_gemini
VDB_TABLE=vdb_tbl_DEMO_gemini
CHUNK_SIZE=512
CHUNK_OVERLAP=128

2. MCP Server Component

Binary Location: /opt/rag-in-a-box/bin/mcp-server

Responsibilities:

Model Context Protocol implementation
Database tool exposure
Vector store tool exposure
RAG tool exposure
Authentication and rate limiting

Technology Stack:

Framework: FastAPI (Python)
ASGI Server: Uvicorn
Protocol: MCP (Model Context Protocol)
Database Client: PyMySQL

Available Tools:

Core Tools:

health_check - Server health verification
get_server_status - Detailed server status

Database Tools:

list_databases - List all databases
list_tables - List tables in database
get_table_schema - Get table structure
execute_sql - Execute SQL queries
create_database - Create new database
drop_database - Delete database

Vector Store Tools:

create_vector_store - Create vector store
delete_vector_store - Delete vector store
list_vector_stores - List all vector stores
insert_docs_vector_store - Add documents
search_vector_store - Semantic search

RAG Tools:

ingest_documents - Ingest documents via RAG API
generate_response - Generate RAG responses

Configuration Variables:

MCP_HOST=0.0.0.0
MCP_PORT=8002
MCP_MARIADB_HOST=mysql-db
MCP_MARIADB_PORT=3306
MCP_AUTH_SECRET_KEY=<secret>
MCP_ENABLE_AUTH=true
MCP_ENABLE_VECTOR_TOOLS=true
MCP_ENABLE_DATABASE_TOOLS=true
MCP_ENABLE_RAG_TOOLS=true
MCP_READ_ONLY=false
MCP_STANDALONE_MODE=false
MCP_RAG_HEALTHCHECK_ENABLED=true
MCP_LOG_LEVEL=INFO

3. MariaDB Component

Image: mariadb:11

Configuration:

Environment:
  MYSQL_ROOT_PASSWORD: your_secure_database_password
  MYSQL_DATABASE: kb_chunks

Command:
  --character-set-server=utf8mb4
  --collation-server=utf8mb4_unicode_ci
  --innodb-page-size=16k
  --innodb-default-row-format=dynamic

Health Check:
  Test: healthcheck.sh --connect --innodb_initialized
  Interval: 10s
  Timeout: 5s
  Retries: 10
  Start Period: 30s

Volume:
  mysql_data:/var/lib/mysql (persistent)

Database Schema:

-- Documents Table
CREATE TABLE documents_DEMO_gemini (
    id INT AUTO_INCREMENT PRIMARY KEY,
    filename VARCHAR(255) NOT NULL,
    content LONGTEXT,
    metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    INDEX idx_filename (filename),
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Vector Database Table
CREATE TABLE vdb_tbl_DEMO_gemini (
    id INT AUTO_INCREMENT PRIMARY KEY,
    document_id INT NOT NULL,
    chunk_index INT NOT NULL,
    chunk_text LONGTEXT NOT NULL,
    embedding BLOB,  -- 768-dimensional vector (3072 bytes)
    metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (document_id) REFERENCES documents_DEMO_gemini(id) ON DELETE CASCADE,
    INDEX idx_document_id (document_id),
    INDEX idx_chunk_index (chunk_index)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Data Flow

Document Ingestion Flow

User Upload
    │
    ▼
┌───────────────────────────────────────┐
│  1. RAG API - File Reception          │
│  - Validate file type                 │
│  - Check file size (max 200MB)        │
│  - Generate unique ID                 │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  2. Document Processing               │
│  - Extract text from file             │
│  - Clean and normalize text           │
│  - Store in documents table           │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  3. Text Chunking                     │
│  - Method: Recursive character split  │
│  - Chunk size: 512 tokens             │
│  - Overlap: 128 tokens                │
│  - Generate chunk metadata            │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  4. Embedding Generation              │
│  - Batch size: 32 chunks              │
│  - Call Gemini API                    │
│  - Model: text-embedding-004          │
│  - Dimensions: 768                    │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  5. Vector Storage                    │
│  - Store in vdb_tbl_DEMO_gemini       │
│  - Link to document_id                │
│  - Store chunk text + embedding       │
│  - Create indexes                     │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  6. Response to User                  │
│  - Document ID                        │
│  - Number of chunks                   │
│  - Processing status                  │
└───────────────────────────────────────┘

RAG Query Flow

User Query
    │
    ▼
┌───────────────────────────────────────┐
│  1. Query Reception                   │
│  - Validate JWT token                 │
│  - Parse query text                   │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  2. Query Embedding                   │
│  - Call Gemini API                    │
│  - Generate 768-dim vector            │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  3. Similarity Search                 │
│  - Calculate cosine similarity        │
│  - Filter by threshold (0.8)          │
│  - Retrieve top-k chunks (default: 5) │
│  - Order by similarity score          │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  4. Context Preparation               │
│  - Combine retrieved chunks           │
│  - Add source metadata                │
│  - Format for LLM prompt              │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  5. LLM Generation                    │
│  - Construct prompt:                  │
│    "Context: {chunks}"                │
│    "Question: {query}"                │
│  - Call Gemini LLM                    │
│  - Model: gemini-2.0-flash            │
└───────────────┬───────────────────────┘
                │
                ▼
┌───────────────────────────────────────┐
│  6. Response Formatting               │
│  - AI-generated answer                │
│  - Source documents                   │
│  - Confidence scores                  │
│  - Metadata                           │
└───────────────┬───────────────────────┘
                │
                ▼
    Return to User

Security Architecture

Authentication Flow

┌─────────────────────────────────────────────────────────────┐
│  1. Token Generation                                        │
│                                                             │
│  POST /token                                                │
│  Body: {"username": "admin", "password": "password"}        │
│                                                             │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Server validates credentials                          │ │
│  │  Generates JWT with:                                   │ │
│  │  - Header: {"alg": "HS256", "typ": "JWT"}              │ │
│  │  - Payload: {"sub": "admin", "exp": <timestamp>}       │ │
│  │  - Signature: HMAC-SHA256(header.payload, SECRET_KEY)  │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                             │
│  Response: {"access_token": "eyJ...", "token_type": "bearer"}│
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│  2. Authenticated Request                                   │
│                                                             │
│  GET /documents                                             │
│  Headers: {"Authorization": "Bearer eyJ..."}                │
│                                                             │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Server extracts token                                 │ │
│  │  Verifies signature with SECRET_KEY                    │ │
│  │  Checks expiration (30 minutes)                        │ │
│  │  Validates claims                                      │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                             │
│  If valid: Process request                                  │
│  If invalid: Return 401 Unauthorized                        │
└─────────────────────────────────────────────────────────────┘

Security Keys

Critical Requirement: All three keys must be identical for unified authentication:

SECRET_KEY=<same-value>
JWT_SECRET_KEY=<same-value>
MCP_AUTH_SECRET_KEY=<same-value>

Key Generation (for production):

import secrets
key = secrets.token_urlsafe(64)
# Use this key for all three variables

Security Features

JWT Authentication
- Algorithm: HS256
- Expiration: 30 minutes (configurable)
- Unified token for RAG API and MCP Server
Rate Limiting
- 100 requests per minute (default)
- Configurable per endpoint
CORS Configuration
- Allowed origins: Configurable
- Credentials: Supported
- Methods: GET, POST, PUT, DELETE, OPTIONS
File Upload Security
- Max file size: 200MB
- Allowed extensions: .pdf, .txt, .docx, .md, .html, .csv, .json, .xml
- Malware scanning: Optional
- Quarantine: Enabled for suspicious files
Database Security
- Parameterized queries (SQL injection prevention)
- Connection pooling
- Encrypted connections (optional)

Configuration Management

Configuration Modes

1. Standalone Mode

File: config.env.secure.local Usage: Direct environment variables Security: Secrets stored in file Best for: Development, single developer

2. Vault Mode

File: config.env.vault.local Usage: HashiCorp Vault integration Security: Secrets stored in Vault Best for: Team development, production-like

Vault Configuration:

VAULT_ADDR=http://rag-vault:8200
VAULT_TOKEN=rag-root-token
VAULT_SECRET_PATH=rag-in-a-box
VAULT_MOUNT_POINT=secret

3. 1Password Mode

File: config.env.1password.employee Usage: 1Password CLI references Security: Secrets in 1Password vault Best for: Enterprise with 1Password

1Password References:

GEMINI_API_KEY=op://Employee/RAG-API-Keys/gemini
DB_PASSWORD=op://Employee/RAG-Database/password

4. HCP Vault Mode

File: config.env.hcp.live Usage: HashiCorp Cloud Platform Security: Cloud-managed secrets Best for: Production cloud deployments

API Specifications

RAG API Endpoints

POST /token

Description: Generate JWT authentication token

Request:

{
  "username": "admin",
  "password": "your_password"
}

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer"
}

POST /ingest

Description: Upload and process documents

Headers:

Authorization: Bearer <token>
Content-Type: multipart/form-data

Request:

file: <binary-file-data>

Response:

{
  "document_id": 123,
  "filename": "document.pdf",
  "chunks_created": 45,
  "status": "success"
}

POST /generate

Description: Generate RAG response

Headers:

Authorization: Bearer <token>
Content-Type: application/json

Request:

{
  "query": "What is the main topic?",
  "top_k": 5,
  "threshold": 0.8
}

Response:

{
  "answer": "The main topic is...",
  "sources": [
    {
      "document_id": 123,
      "chunk_index": 5,
      "similarity": 0.92,
      "text": "..."
    }
  ],
  "metadata": {
    "processing_time": 1.23,
    "model": "gemini-2.0-flash"
  }
}

Database Schema

Tables

documents_DEMO_gemini

CREATE TABLE documents_DEMO_gemini (
    id INT AUTO_INCREMENT PRIMARY KEY,
    filename VARCHAR(255) NOT NULL,
    content LONGTEXT,
    metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    INDEX idx_filename (filename),
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

vdb_tbl_DEMO_gemini

CREATE TABLE vdb_tbl_DEMO_gemini (
    id INT AUTO_INCREMENT PRIMARY KEY,
    document_id INT NOT NULL,
    chunk_index INT NOT NULL,
    chunk_text LONGTEXT NOT NULL,
    embedding BLOB,
    metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (document_id) REFERENCES documents_DEMO_gemini(id) ON DELETE CASCADE,
    INDEX idx_document_id (document_id),
    INDEX idx_chunk_index (chunk_index)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Vector Storage Format

Embedding Dimensions: 768 (float32) Storage Size: 768 × 4 bytes = 3,072 bytes per vector Format: Binary BLOB Encoding: IEEE 754 single-precision floating-point

Performance Characteristics

Resource Requirements

Per Container:

ai-nexus:
  CPU: 1-2 cores
  RAM: 2-4 GB
  Disk: 1 GB (application)

mysql-db:
  CPU: 1-2 cores
  RAM: 2-4 GB
  Disk: Variable (depends on data)

Performance Metrics

Document Ingestion:

Processing speed: ~5 documents/batch
Chunking: ~100 chunks/second
Embedding generation: ~32 chunks/batch
Total time: ~30-60 seconds per document (depends on size)

Query Performance:

Embedding generation: ~100-200ms
Similarity search: ~50-100ms (depends on dataset size)
LLM generation: ~1-3 seconds
Total response time: ~2-4 seconds

Scalability

Current Limits:

Max file size: 200MB
Max concurrent requests: 100/minute
Database connections: 10 (pool size)

Scaling Options:

Horizontal: Deploy multiple ai-nexus containers
Vertical: Increase container resources
Database: Use read replicas for queries

End of Technical Architecture Document

PreviousDocker Deployment Guide NextDeployment Checklist

Last updated 4 months ago

Was this helpful?

Good afternoon

Technical Architecture

Table of Contents

System Architecture

High-Level Architecture

Container Dependency Graph

Component Details

1. RAG API Component

2. MCP Server Component

3. MariaDB Component

Data Flow

Document Ingestion Flow

RAG Query Flow

Security Architecture

Authentication Flow

Security Keys

Security Features

Configuration Management

Configuration Modes

1. Standalone Mode

2. Vault Mode

3. 1Password Mode

4. HCP Vault Mode

API Specifications

RAG API Endpoints

POST /token

POST /ingest

POST /generate

Database Schema

Tables

documents_DEMO_gemini

vdb_tbl_DEMO_gemini

Vector Storage Format

Performance Characteristics

Resource Requirements

Performance Metrics

Scalability

Good afternoon

hashtagTable of Contents

hashtagSystem Architecture

hashtagHigh-Level Architecture

hashtagContainer Dependency Graph

hashtagComponent Details

hashtag1. RAG API Component

hashtag2. MCP Server Component

hashtag3. MariaDB Component

hashtagData Flow

hashtagDocument Ingestion Flow

hashtagRAG Query Flow

hashtagSecurity Architecture

hashtagAuthentication Flow

hashtagSecurity Keys

hashtagSecurity Features

hashtagConfiguration Management

hashtagConfiguration Modes

hashtag1. Standalone Mode

hashtag2. Vault Mode

hashtag3. 1Password Mode

hashtag4. HCP Vault Mode

hashtagAPI Specifications

hashtagRAG API Endpoints

hashtagPOST /token

hashtagPOST /ingest

hashtagPOST /generate

hashtagDatabase Schema

hashtagTables

hashtagdocuments_DEMO_gemini

hashtagvdb_tbl_DEMO_gemini

hashtagVector Storage Format

hashtagPerformance Characteristics

hashtagResource Requirements

hashtagPerformance Metrics

hashtagScalability

Table of Contents

System Architecture

High-Level Architecture

Container Dependency Graph

Component Details

1. RAG API Component

2. MCP Server Component

3. MariaDB Component

Data Flow

Document Ingestion Flow

RAG Query Flow

Security Architecture

Authentication Flow

Security Keys

Security Features

Configuration Management

Configuration Modes

1. Standalone Mode

2. Vault Mode

3. 1Password Mode

4. HCP Vault Mode

API Specifications

RAG API Endpoints

POST /token

POST /ingest

POST /generate

Database Schema

Tables

documents_DEMO_gemini

vdb_tbl_DEMO_gemini

Vector Storage Format

Performance Characteristics

Resource Requirements

Performance Metrics

Scalability