BYOK (Bring Your Own Knowledge) Feature Documentation
Overview
The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core enables users to integrate their own knowledge sources into the AI system through Retrieval-Augmented Generation (RAG) functionality. This feature allows the AI to access and utilize custom knowledge bases to provide more accurate, contextual, and domain-specific responses.
Table of Contents
- What is BYOK?
- How BYOK Works
- Prerequisites
- Configuration Guide
- Supported Vector Database Types
- Configuration Examples
- Conclusion
What is BYOK?
BYOK (Bring Your Own Knowledge) is Lightspeed Core’s implementation of Retrieval-Augmented Generation (RAG) that allows you to:
- Integrate custom knowledge sources: Add your organization’s documentation, manuals, FAQs, or any text-based knowledge
- Enhance AI responses: Provide contextual, accurate answers based on your specific domain knowledge
- Maintain data control: Keep your knowledge sources within your infrastructure
- Improve relevance: Get responses that are tailored to your organization’s context and terminology
How BYOK Works
The BYOK system operates through a sophisticated chain of components:
- Agent Orchestrator: The AI agent acts as the central coordinator, using the LLM as its reasoning engine
- RAG Tool: When the agent needs external information, it queries your custom vector database
- Vector Database: Your indexed knowledge sources, stored as vector embeddings for semantic search
- Embedding Model: Converts queries and documents into vector representations for similarity matching
- Context Integration: Retrieved knowledge is integrated into the AI’s response generation process
graph TD
A[User Query] --> B[AI Agent]
B --> C{Need External Knowledge?}
C -->|Yes| D[RAG Tool]
C -->|No| E[Generate Response]
D --> F[Vector Database]
F --> G[Retrieve Relevant Context]
G --> H[Integrate Context]
H --> E
E --> I[Response to User]
Prerequisites
Before implementing BYOK, ensure you have:
Required Tools
- rag-content tool: For creating compatible vector databases
- Repository: https://github.com/lightspeed-core/rag-content
- Used for indexing your knowledge sources
System Requirements
- Embedding Model: Local or downloadable embedding model
- LLM Provider: OpenAI, vLLM, or other supported inference provider
Knowledge Sources
- Directly supported: Markdown (.md) and plain text (.txt) files
- Requires conversion: PDFs, AsciiDoc, HTML, and other formats must be converted to markdown or TXT
- Documentation, manuals, FAQs, knowledge bases (after format conversion)
Configuration Guide
Step 1: Prepare Your Knowledge Sources
- Collect your documents: Gather all knowledge sources you want to include
- Convert formats: Convert non-supported formats to markdown (.md) or plain text (.txt)
- PDF conversion: Use tools like docling to convert PDFs to markdown
- Adoc conversion: Use custom scripts to convert AsciiDoc to plain text
- Organize content: Structure your converted documents for optimal indexing
- Format validation: Ensure all documents are in supported formats (.md or .txt)
Step 2: Create Vector Database
Use the rag-content tool to create a compatible vector database:
Please refer https://github.com/lightspeed-core/rag-content to create your vector database
Metadata Configuration:
When using the rag-content tool, you need to create a custom_processor.py script to handle document metadata:
- Document URL References: Implement the
url_functionin yourcustom_processor.pyto add URL metadata to each document chunk - Title Extraction: The system automatically extracts the document title from the first line of each file
- Custom Metadata: You can add additional metadata fields as needed for your use case
Example custom_processor.py structure:
class CustomMetadataProcessor(MetadataProcessor):
def __init__(self, url):
self.url = url
def url_function(self, file_path: str) -> str:
# Return a URL for the file, so it can be referenced when used
# in an answer
return self.url
Important Notes:
- The vector database must be compatible with Llama Stack
- Supported formats:
- Llama-Stack Faiss Vector-IO
- Llama-Stack SQLite-vec Vector-IO
- The same embedding model must be used for both creation and querying
Step 3: Configure Embedding Model
You have two options for configuring your embedding model:
Option 1: Use rag-content Download Script (Optional)
You can use the embedding generation step mentioned in the rag-content repo:
mkdir ./embeddings_model
pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2
Option 2: Manual Download and Configuration
Alternatively, you can download your own embedding model and update the path in your YAML configuration:
- Download your preferred embedding model from Hugging Face or other sources
- Place the model in your desired directory (e.g.,
/path/to/your/embedding_models/) - Update the YAML configuration to point to your model path:
models:
- model_id: sentence-transformers/all-mpnet-base-v2
metadata:
embedding_dimension: 768
model_type: embedding
provider_id: sentence-transformers
provider_model_id: /path/to/your/embedding_models/all-mpnet-base-v2
Note: Ensure the same embedding model is used for both vector database creation and querying.
Step 4: Configure Llama Stack
Edit your run.yaml file to include BYOK configuration:
version: 2
image_name: byok-configuration
# Required APIs for BYOK
apis:
- agents
- inference
- vector_io
- tool_runtime
- safety
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents_state
backend: kv_default
responses:
table_name: agents_responses
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: your-knowledge-base
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: byok_backend # References storage.backends
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/storage/kv_store.db
sql_default:
type: sql_sqlite
db_path: ~/.llama/storage/sql_store.db
byok_backend:
type: kv_sqlite
db_path: /path/to/vector_db/faiss_store.db
registered_resources:
models:
- model_id: your-llm-model
provider_id: openai
model_type: llm
provider_model_id: gpt-4o-mini
- model_id: sentence-transformers/all-mpnet-base-v2
model_type: embedding
provider_id: sentence-transformers
provider_model_id: /path/to/embedding_models/all-mpnet-base-v2
metadata:
embedding_dimension: 768
vector_stores:
- vector_store_id: your-index-id # ID used during index generation
provider_id: your-knowledge-base
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 768
tool_groups:
- toolgroup_id: builtin::rag
provider_id: rag-runtime
⚠️ Important: The vector_store_id value must exactly match the ID you provided when creating the vector database using the rag-content tool. This identifier links your Llama Stack configuration to the specific vector database index you created.
Step 5: Enable RAG Tools
The configuration above automatically enables the RAG tools. The system will:
- Detect RAG availability: Automatically identify when RAG is available
- Enhance prompts: Encourage the AI to use RAG tools
Supported Vector Database Types
1. FAISS (Recommended)
- Type: Local vector database with SQLite metadata
- Best for: Small to medium-sized knowledge bases
- Configuration:
inline::faiss - Storage: SQLite database file
providers:
vector_io:
- provider_id: faiss-knowledge
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: faiss_backend
storage:
backends:
faiss_backend:
type: kv_sqlite
db_path: /path/to/faiss_store.db
2. pgvector (PostgreSQL)
- Type: PostgreSQL with pgvector extension
- Best for: Large-scale deployments, shared knowledge bases
- Configuration:
remote::pgvector - Requirements: PostgreSQL with pgvector extension
vector_io:
- provider_id: pgvector-knowledge
provider_type: remote::pgvector
config:
host: localhost
port: 5432
db: knowledge_db
user: lightspeed_user
password: ${env.DB_PASSWORD}
kvstore:
type: sqlite
db_path: .llama/distributions/pgvector/registry.db
pgvector Table Schema:
id(text): UUID identifier of the chunkdocument(jsonb): JSON containing content and metadataembedding(vector(n)): The embedding vector (n = embedding dimension)
Configuration Examples
Example 1: OpenAI + FAISS
Complete configuration for OpenAI LLM with local FAISS knowledge base:
version: 2
image_name: openai-faiss-byok
apis:
- agents
- inference
- vector_io
- tool_runtime
- safety
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents_state
backend: kv_default
responses:
table_name: agents_responses
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: company-docs
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: company_docs_backend
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/storage/kv_store.db
sql_default:
type: sql_sqlite
db_path: ~/.llama/storage/sql_store.db
company_docs_backend:
type: kv_sqlite
db_path: /home/user/vector_dbs/company_docs/faiss_store.db
registered_resources:
models:
- model_id: gpt-4o-mini
provider_id: openai
model_type: llm
provider_model_id: gpt-4o-mini
- model_id: sentence-transformers/all-mpnet-base-v2
model_type: embedding
provider_id: sentence-transformers
provider_model_id: /home/user/embedding_models/all-mpnet-base-v2
metadata:
embedding_dimension: 768
vector_stores:
- vector_store_id: company-knowledge-index
provider_id: company-docs
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 768
tool_groups:
- toolgroup_id: builtin::rag
provider_id: rag-runtime
Example 2: vLLM + pgvector
Configuration for local vLLM inference with PostgreSQL knowledge base:
version: 2
image_name: vllm-pgvector-byok
apis:
- agents
- inference
- vector_io
- tool_runtime
- safety
models:
- model_id: meta-llama/Llama-3.1-8B-Instruct
provider_id: vllm
model_type: llm
provider_model_id: null
- model_id: sentence-transformers/all-mpnet-base-v2
metadata:
embedding_dimension: 768
model_type: embedding
provider_id: sentence-transformers
provider_model_id: sentence-transformers/all-mpnet-base-v2
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: vllm
provider_type: remote::vllm
config:
url: http://localhost:8000/v1/
api_token: your-token-here
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents_state
backend: kv_default
responses:
table_name: agents_responses
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: enterprise-knowledge
provider_type: remote::pgvector
config:
host: postgres.company.com
port: 5432
db: enterprise_kb
user: rag_user
password: ${env.POSTGRES_PASSWORD}
kvstore:
type: sqlite
db_path: .llama/distributions/pgvector/registry.db
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
tool_groups:
- provider_id: rag-runtime
toolgroup_id: builtin::rag
args: null
mcp_endpoint: null
vector_stores:
- embedding_dimension: 768
embedding_model: sentence-transformers/all-mpnet-base-v2
provider_id: enterprise-knowledge
vector_store_id: enterprise-docs
Conclusion
The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core provides powerful capabilities for integrating custom knowledge sources through RAG technology. By following this guide, you can successfully implement and configure BYOK to enhance your AI system with domain-specific knowledge.
For additional support and advanced configurations, refer to:
Remember to regularly update your knowledge sources and monitor system performance to maintain optimal BYOK functionality.