View on GitHub

Lightspeed core service

Lightspeed core service

BYOK (Bring Your Own Knowledge) Feature Documentation

Overview

The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core enables users to integrate their own knowledge sources into the AI system through Retrieval-Augmented Generation (RAG) functionality. This feature allows the AI to access and utilize custom knowledge bases to provide more accurate, contextual, and domain-specific responses.


Table of Contents


What is BYOK?

BYOK (Bring Your Own Knowledge) is Lightspeed Core’s implementation of Retrieval-Augmented Generation (RAG) that allows you to:

How BYOK Works

The BYOK system operates through a sophisticated chain of components:

  1. Agent Orchestrator: The AI agent acts as the central coordinator, using the LLM as its reasoning engine
  2. RAG Tool: When the agent needs external information, it queries your custom vector database
  3. Vector Database: Your indexed knowledge sources, stored as vector embeddings for semantic search
  4. Embedding Model: Converts queries and documents into vector representations for similarity matching
  5. Context Integration: Retrieved knowledge is integrated into the AI’s response generation process
graph TD
    A[User Query] --> B[AI Agent]
    B --> C{Need External Knowledge?}
    C -->|Yes| D[RAG Tool]
    C -->|No| E[Generate Response]
    D --> F[Vector Database]
    F --> G[Retrieve Relevant Context]
    G --> H[Integrate Context]
    H --> E
    E --> I[Response to User]

Prerequisites

Before implementing BYOK, ensure you have:

Required Tools

System Requirements

Knowledge Sources


Configuration Guide

Step 1: Prepare Your Knowledge Sources

  1. Collect your documents: Gather all knowledge sources you want to include
  2. Convert formats: Convert non-supported formats to markdown (.md) or plain text (.txt)
    • PDF conversion: Use tools like docling to convert PDFs to markdown
    • Adoc conversion: Use custom scripts to convert AsciiDoc to plain text
  3. Organize content: Structure your converted documents for optimal indexing
  4. Format validation: Ensure all documents are in supported formats (.md or .txt)

Step 2: Create Vector Database

Use the rag-content tool to create a compatible vector database: Please refer https://github.com/lightspeed-core/rag-content to create your vector database

Metadata Configuration: When using the rag-content tool, you need to create a custom_processor.py script to handle document metadata:

  1. Document URL References: Implement the url_function in your custom_processor.py to add URL metadata to each document chunk
  2. Title Extraction: The system automatically extracts the document title from the first line of each file
  3. Custom Metadata: You can add additional metadata fields as needed for your use case

Example custom_processor.py structure:

class CustomMetadataProcessor(MetadataProcessor):

    def __init__(self, url):
        self.url = url

    def url_function(self, file_path: str) -> str:
        # Return a URL for the file, so it can be referenced when used
        # in an answer
        return self.url

Important Notes:

Step 3: Configure Embedding Model

You have two options for configuring your embedding model:

Option 1: Use rag-content Download Script (Optional)

You can use the embedding generation step mentioned in the rag-content repo:

mkdir ./embeddings_model
pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2 

Option 2: Manual Download and Configuration

Alternatively, you can download your own embedding model and update the path in your YAML configuration:

  1. Download your preferred embedding model from Hugging Face or other sources
  2. Place the model in your desired directory (e.g., /path/to/your/embedding_models/)
  3. Update the YAML configuration to point to your model path:
models:
  - model_id: sentence-transformers/all-mpnet-base-v2
    metadata:
        embedding_dimension: 768
    model_type: embedding
    provider_id: sentence-transformers
    provider_model_id: /path/to/your/embedding_models/all-mpnet-base-v2

Note: Ensure the same embedding model is used for both vector database creation and querying.

Step 4: Configure Llama Stack

Edit your run.yaml file to include BYOK configuration:

version: 2
image_name: byok-configuration

# Required APIs for BYOK
apis:
- agents
- inference
- vector_io
- tool_runtime
- safety

providers:
  inference:
  - provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
    config: {}
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY}

  agents:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      persistence:
        agent_state:
          namespace: agents_state
          backend: kv_default
        responses:
          table_name: agents_responses
          backend: sql_default

  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
    config:
      excluded_categories: []

  vector_io:
  - provider_id: your-knowledge-base
    provider_type: inline::faiss
    config:
      persistence:
        namespace: vector_io::faiss
        backend: byok_backend  # References storage.backends

  tool_runtime:
  - provider_id: rag-runtime
    provider_type: inline::rag-runtime
    config: {}

storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: ~/.llama/storage/kv_store.db
    sql_default:
      type: sql_sqlite
      db_path: ~/.llama/storage/sql_store.db
    byok_backend:
      type: kv_sqlite
      db_path: /path/to/vector_db/faiss_store.db

registered_resources:
  models:
  - model_id: your-llm-model
    provider_id: openai
    model_type: llm
    provider_model_id: gpt-4o-mini
  - model_id: sentence-transformers/all-mpnet-base-v2
    model_type: embedding
    provider_id: sentence-transformers
    provider_model_id: /path/to/embedding_models/all-mpnet-base-v2
    metadata:
      embedding_dimension: 768
  vector_stores:
  - vector_store_id: your-index-id  # ID used during index generation
    provider_id: your-knowledge-base
    embedding_model: sentence-transformers/all-mpnet-base-v2
    embedding_dimension: 768
  tool_groups:
  - toolgroup_id: builtin::rag
    provider_id: rag-runtime

⚠️ Important: The vector_store_id value must exactly match the ID you provided when creating the vector database using the rag-content tool. This identifier links your Llama Stack configuration to the specific vector database index you created.

Step 5: Enable RAG Tools

The configuration above automatically enables the RAG tools. The system will:

  1. Detect RAG availability: Automatically identify when RAG is available
  2. Enhance prompts: Encourage the AI to use RAG tools

Supported Vector Database Types

providers:
  vector_io:
  - provider_id: faiss-knowledge
    provider_type: inline::faiss
    config:
      persistence:
        namespace: vector_io::faiss
        backend: faiss_backend

storage:
  backends:
    faiss_backend:
      type: kv_sqlite
      db_path: /path/to/faiss_store.db

2. pgvector (PostgreSQL)

vector_io:
- provider_id: pgvector-knowledge
  provider_type: remote::pgvector
  config:
    host: localhost
    port: 5432
    db: knowledge_db
    user: lightspeed_user
    password: ${env.DB_PASSWORD}
    kvstore:
      type: sqlite
      db_path: .llama/distributions/pgvector/registry.db

pgvector Table Schema:


Configuration Examples

Example 1: OpenAI + FAISS

Complete configuration for OpenAI LLM with local FAISS knowledge base:

version: 2
image_name: openai-faiss-byok

apis:
- agents
- inference
- vector_io
- tool_runtime
- safety

providers:
  inference:
  - provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
    config: {}
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY}

  agents:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      persistence:
        agent_state:
          namespace: agents_state
          backend: kv_default
        responses:
          table_name: agents_responses
          backend: sql_default

  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
    config:
      excluded_categories: []

  vector_io:
  - provider_id: company-docs
    provider_type: inline::faiss
    config:
      persistence:
        namespace: vector_io::faiss
        backend: company_docs_backend

  tool_runtime:
  - provider_id: rag-runtime
    provider_type: inline::rag-runtime
    config: {}

storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: ~/.llama/storage/kv_store.db
    sql_default:
      type: sql_sqlite
      db_path: ~/.llama/storage/sql_store.db
    company_docs_backend:
      type: kv_sqlite
      db_path: /home/user/vector_dbs/company_docs/faiss_store.db

registered_resources:
  models:
  - model_id: gpt-4o-mini
    provider_id: openai
    model_type: llm
    provider_model_id: gpt-4o-mini
  - model_id: sentence-transformers/all-mpnet-base-v2
    model_type: embedding
    provider_id: sentence-transformers
    provider_model_id: /home/user/embedding_models/all-mpnet-base-v2
    metadata:
      embedding_dimension: 768
  vector_stores:
  - vector_store_id: company-knowledge-index
    provider_id: company-docs
    embedding_model: sentence-transformers/all-mpnet-base-v2
    embedding_dimension: 768
  tool_groups:
  - toolgroup_id: builtin::rag
    provider_id: rag-runtime

Example 2: vLLM + pgvector

Configuration for local vLLM inference with PostgreSQL knowledge base:

version: 2
image_name: vllm-pgvector-byok

apis:
- agents
- inference
- vector_io
- tool_runtime
- safety

models:
- model_id: meta-llama/Llama-3.1-8B-Instruct
  provider_id: vllm
  model_type: llm
  provider_model_id: null

- model_id: sentence-transformers/all-mpnet-base-v2
  metadata:
      embedding_dimension: 768
  model_type: embedding
  provider_id: sentence-transformers
  provider_model_id: sentence-transformers/all-mpnet-base-v2

providers:
  inference:
  - provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
    config: {}
  - provider_id: vllm
    provider_type: remote::vllm
    config:
      url: http://localhost:8000/v1/
      api_token: your-token-here

  agents:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      persistence:
        agent_state:
          namespace: agents_state
          backend: kv_default
        responses:
          table_name: agents_responses
          backend: sql_default

  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
    config:
      excluded_categories: []

  vector_io:
  - provider_id: enterprise-knowledge
    provider_type: remote::pgvector
    config:
      host: postgres.company.com
      port: 5432
      db: enterprise_kb
      user: rag_user
      password: ${env.POSTGRES_PASSWORD}
      kvstore:
        type: sqlite
        db_path: .llama/distributions/pgvector/registry.db

  tool_runtime:
  - provider_id: rag-runtime
    provider_type: inline::rag-runtime
    config: {}

tool_groups:
- provider_id: rag-runtime
  toolgroup_id: builtin::rag
  args: null
  mcp_endpoint: null

vector_stores:
- embedding_dimension: 768
  embedding_model: sentence-transformers/all-mpnet-base-v2
  provider_id: enterprise-knowledge
  vector_store_id: enterprise-docs

Conclusion

The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core provides powerful capabilities for integrating custom knowledge sources through RAG technology. By following this guide, you can successfully implement and configure BYOK to enhance your AI system with domain-specific knowledge.

For additional support and advanced configurations, refer to:

Remember to regularly update your knowledge sources and monitor system performance to maintain optimal BYOK functionality.