For the complete documentation index, see llms.txt. This page is also available as Markdown.

Document Management and Chunking

Document management and chunking endpoints in MariaDB AI RAG handle file upload, listing, retrieval, and deletion, and provide recursive, sentence, token, and semantic chunking options.

Document Management Endpoints

Upload Documents

POST /documents/ingest

Purpose: Uploads and processes one or more documents for ingestion into the system. Documents are processed asynchronously in the background.

Request: multipart/form-data with one or more file attachments

Request Parameters:

  • files: One or more files to upload (required)

  • document_processing (optional): Stringified JSON object for advanced processing.

    • processor_type: "base", "layout_aware_standard", or "layout_aware_advanced".

Example Request:

curl -X POST "http://localhost:8000/documents/ingest" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "files=@/path/to/document.pdf" \
  -F 'document_processing={"processor_type": "layout_aware_standard", "enable_ocr": 

Response:

{
  "message": "2 documents have been queued for ingestion.",
  "documents": [
    {
      "id": 42,
      "source": "/uploaded_files/example1.pdf",
      "filename": "example1.pdf",
      "status": "pending",
      "content": null,
      "error_message": null,
      "created_at": "2025-10-20T12:00:00.123456",
      "updated_at": null
    },
    {
      "id": 43,
      "source": "/uploaded_files/example2.docx",
      "filename": "example2.docx",
      "status": "pending",
      "content": null,
      "error_message": null,
      "created_at": "2025-10-20T12:00:00.234567",
      "updated_at": null
    }
  ]
}

Status Values:

  • pending: Document is queued for processing

  • completed: Document has been successfully processed

  • failed: Document processing failed (check error_message)

Usage Example: Upload one or more documents for ingestion.

The endpoint accepts both single and multiple files. Documents are processed asynchronously, so the initial status will be pending. Use the document ID to check processing status later.

List Documents

Purpose: Retrieves a paginated list of all documents uploaded by the authenticated user.

Parameters:

  • skip (optional): Number of records to skip for pagination (default: 0)

  • limit (optional): Maximum number of records to return (default: 100)

Response:

Usage Example: Use this endpoint to monitor all documents in the system, check their processing status, or select documents for further operations.

Retrieve Document

Purpose: Retrieves detailed information about a specific document.

Response:

Usage Example: Use this endpoint to check the status of a specific document or retrieve its metadata.

Delete Documents

Purpose: Deletes multiple documents and their associated chunks and vector embeddings.

Request body:

Response:

Usage Example: Use this endpoint to remove documents that are no longer needed, freeing up storage space and improving search performance.

Chunking Endpoints

Chunk Documents (Batch)

Purpose: Processes multiple documents into chunks and creates vector embeddings for semantic search. Documents are processed asynchronously in the background.

Request body:

Parameters:

  • chunking_methods:

    • recursive: Recursive text splitting (default)

    • sentence: Sentence-based chunking

    • token: Token-based chunking

    • semantic: Semantic similarity-based chunking (requires threshold)

  • chunk_size: Number of characters/tokens per chunk (default: 512).

  • chunk_overlap: Overlap between adjacent chunks (default: 128).

  • threshold: Similarity threshold for merging segments (used only in semantic chunking).

Response:

Usage Example: Use this endpoint after document ingestion to prepare documents for semantic search. The chunking process divides documents into semantically meaningful segments and creates vector embeddings.

For semantic chunking, the threshold parameter controls how similar adjacent chunks should be before they are merged.

Chunk All Documents

Purpose: Processes all documents in the system into chunks. Useful for batch processing or reprocessing all documents with new chunking parameters.

Request body:

Response:

Usage Example: Use this endpoint to reprocess all documents with new chunking settings.

Filter/Retrieve Chunks

Purpose: Retrieves chunks for specific documents. Use this to check if chunking has completed or to retrieve chunk data.

Request body:

Response: Array of chunk objects

Usage Example: Check if documents have been chunked and retrieve their chunks.

This page is: Copyright © 2025 MariaDB. All rights reserved.

spinner

Last updated

Was this helpful?