BYOK (Bring Your Own Knowledge) Feature Documentation
Overview
The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core enables users to integrate their own knowledge sources into the AI system through Retrieval-Augmented Generation (RAG) functionality. This feature allows the AI to access and utilize custom knowledge bases to provide more accurate, contextual, and domain-specific responses.
Table of Contents
- What is BYOK?
- How BYOK Works
- Prerequisites
- Configuration Guide
- Supported Vector Database Types
- Configuration Examples
- Conclusion
What is BYOK?
BYOK (Bring Your Own Knowledge) is Lightspeed Core’s implementation of Retrieval-Augmented Generation (RAG) that allows you to:
- Integrate custom knowledge sources: Add your organization’s documentation, manuals, FAQs, or any text-based knowledge
- Enhance AI responses: Provide contextual, accurate answers based on your specific domain knowledge
- Maintain data control: Keep your knowledge sources within your infrastructure
- Improve relevance: Get responses that are tailored to your organization’s context and terminology
How BYOK Works
BYOK knowledge sources can be queried in two complementary modes, configured independently:
Inline RAG
Context is fetched from your BYOK vector stores and/or OKP and injected before the LLM request. No tool calls are required.
graph TD
A[User Query] --> B[Fetch Context]
B --> C[BYOK Vector Stores]
B --> D[OKP Vector Stores]
C --> E[Retrieved Chunks]
D --> E
E --> F[Inject Context into Prompt Context]
F --> G[LLM Generates Response]
G --> H[Response to User]
Tool RAG (on-demand retrieval)
The LLM can call the file_search tool during generation when it decides external knowledge is needed. Both BYOK vector stores and OKP are supported in Tool RAG mode.
graph TD
A[User Query] --> P{Inline RAG enabled?}
P -->|Yes| Q[Fetch Context]
Q --> R[BYOK / OKP Vector Stores]
R --> S[Inject Context into Prompt Context]
S --> B[LLM]
P -->|No| B
B --> C{Need External Knowledge?}
C -->|Yes| D[file_search Tool]
C -->|No| E[Generate Response]
D --> F[BYOK / OKP Vector Stores]
F --> G[Retrieve Relevant Context]
G --> B
E --> H[Response to User]
Both modes rely on:
- Vector Database: Your indexed knowledge sources stored as vector embeddings
- Embedding Model: Converts queries and documents into vector representations for similarity matching
Inline RAG additionally supports:
- Score Multiplier: Optional weight applied per BYOK vector store when mixing multiple sources. Allows custom prioritization of content.
[!NOTE] OKP and BYOK scores are not directly comparable (different scoring systems), so
score_multiplierdoes not apply to OKP results. To control the amount of retrieved context, set theBYOK_RAG_MAX_CHUNKSandOKP_RAG_MAX_CHUNKSconstants insrc/constants.py(defaults: 10 and 5 respectively). For Tool RAG, useTOOL_RAG_MAX_CHUNKS(default: 10).
Prerequisites
Before implementing BYOK, ensure you have:
Required Tools
- rag-content tool: For creating compatible vector databases
- Repository: https://github.com/lightspeed-core/rag-content
- Used for indexing your knowledge sources
System Requirements
- Embedding Model: Local or downloadable embedding model
- LLM Provider: OpenAI, vLLM, or other supported inference provider
Knowledge Sources
- Directly supported: Markdown (.md) and plain text (.txt) files
- Requires conversion: PDFs, AsciiDoc, HTML, and other formats must be converted to markdown or TXT
- Documentation, manuals, FAQs, knowledge bases (after format conversion)
Configuration Guide
Step 1: Prepare Your Knowledge Sources
- Collect your documents: Gather all knowledge sources you want to include
- Convert formats: Convert non-supported formats to markdown (.md) or plain text (.txt)
- PDF conversion: Use tools like docling to convert PDFs to markdown
- Adoc conversion: Use custom scripts to convert AsciiDoc to plain text
- Organize content: Structure your converted documents for optimal indexing
- Format validation: Ensure all documents are in supported formats (.md or .txt)
Step 2: Create Vector Database
Use the rag-content tool to create a compatible vector database:
Please refer https://github.com/lightspeed-core/rag-content to create your vector database
Metadata Configuration:
When using the rag-content tool, you need to create a custom_processor.py script to handle document metadata:
- Document URL References: Implement the
url_functionin yourcustom_processor.pyto add URL metadata to each document chunk - Title Extraction: The system automatically extracts the document title from the first line of each file
- Custom Metadata: You can add additional metadata fields as needed for your use case
Example custom_processor.py structure:
class CustomMetadataProcessor(MetadataProcessor):
def __init__(self, url):
self.url = url
def url_function(self, file_path: str) -> str:
# Return a URL for the file, so it can be referenced when used
# in an answer
return self.url
Important Notes:
- The vector database must be compatible with Llama Stack
- Supported formats:
- Llama-Stack Faiss Vector-IO
- Llama-Stack SQLite-vec Vector-IO
- The same embedding model must be used for both creation and querying
Step 3: Configure Embedding Model
You have two options for configuring your embedding model:
Option 1: Use rag-content Download Script (Optional)
You can use the embedding generation step mentioned in the rag-content repo:
mkdir ./embeddings_model
pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2
Option 2: Manual Download and Configuration
Alternatively, you can download your own embedding model and update the path in your YAML configuration:
- Download your preferred embedding model from Hugging Face or other sources
- Place the model in your desired directory (e.g.,
/path/to/your/embedding_models/) - Update the YAML configuration to point to your model path:
models:
- model_id: sentence-transformers/all-mpnet-base-v2
metadata:
embedding_dimension: 768
model_type: embedding
provider_id: sentence-transformers
provider_model_id: /path/to/your/embedding_models/all-mpnet-base-v2
Note: Ensure the same embedding model is used for both vector database creation and querying.
Step 4: Configure Llama Stack
Edit your run.yaml file to include BYOK configuration:
version: 2
image_name: byok-configuration
# Required APIs for BYOK
apis:
- agents
- inference
- vector_io
- tool_runtime
- safety
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents_state
backend: kv_default
responses:
table_name: agents_responses
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: your-knowledge-base
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: byok_backend # References storage.backends
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/storage/kv_store.db
sql_default:
type: sql_sqlite
db_path: ~/.llama/storage/sql_store.db
byok_backend:
type: kv_sqlite
db_path: /path/to/vector_db/faiss_store.db
registered_resources:
models:
- model_id: your-llm-model
provider_id: openai
model_type: llm
provider_model_id: gpt-4o-mini
- model_id: sentence-transformers/all-mpnet-base-v2
model_type: embedding
provider_id: sentence-transformers
provider_model_id: /path/to/embedding_models/all-mpnet-base-v2
metadata:
embedding_dimension: 768
vector_stores:
- vector_store_id: your-index-id # ID used during index generation
provider_id: your-knowledge-base
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 768
tool_groups:
- toolgroup_id: builtin::rag
provider_id: rag-runtime
⚠️ Important: The vector_store_id value must exactly match the ID you provided when creating the vector database using the rag-content tool. This identifier links your Llama Stack configuration to the specific vector database index you created.
[!TIP] Instead of manually editing
run.yaml, you can declare your knowledge sources in thebyok_ragsection oflightspeed-stack.yaml. The lightspeed-stack service automatically generates the required configuration at startup.byok_rag: - rag_id: my-docs # Unique identifier for this knowledge source rag_type: inline::faiss embedding_model: sentence-transformers/all-mpnet-base-v2 embedding_dimension: 768 vector_db_id: your-index-id # Llama Stack vector store ID (from index generation) db_path: /path/to/vector_db/faiss_store.db score_multiplier: 1.0 # Optional: weight results when mixing multiple sourcesWhen multiple BYOK sources are configured,
score_multiplieradjusts the relative importance of each store’s results during Inline RAG retrieval. Values above 1.0 boost a store; below 1.0 reduce it.
Step 5: Configure RAG Strategy
Add a rag section to your lightspeed-stack.yaml to choose how BYOK knowledge is used.
Each list entry is a rag_id from byok_rag, or the special value okp for OKP.
rag:
# Inline RAG: inject context before the LLM request (no tool calls needed)
inline:
- my-docs # rag_id from byok_rag
- okp # include OKP context inline
# Tool RAG: the LLM can call file_search to retrieve context on demand
# If omitted, tool RAG is disabled. If both tool and inline are omitted, all registered stores are used as fallback
tool:
- my-docs # expose this BYOK store as the file_search tool
- okp # expose OKP as the file_search tool
# OKP provider settings (only relevant when okp is listed above)
okp:
offline: true # true = use parent_id for source URLs, false = use reference_url
Both modes can be enabled simultaneously. Choose based on your latency and control preferences:
| Mode | When context is fetched | Tool call needed | score_multiplier |
|---|---|---|---|
| Inline RAG | With every query | No | Yes (BYOK only) |
| Tool RAG | On LLM demand | Yes | No |
[!TIP] A ready-to-use example combining BYOK and OKP is available at
examples/lightspeed-stack-byok-okp-rag.yaml.
Supported Vector Database Types
1. FAISS (Recommended)
- Type: Local vector database with SQLite metadata
- Best for: Small to medium-sized knowledge bases
- Configuration:
inline::faiss - Storage: SQLite database file
providers:
vector_io:
- provider_id: faiss-knowledge
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: faiss_backend
storage:
backends:
faiss_backend:
type: kv_sqlite
db_path: /path/to/faiss_store.db
2. pgvector (PostgreSQL)
- Type: PostgreSQL with pgvector extension
- Best for: Large-scale deployments, shared knowledge bases
- Configuration:
remote::pgvector - Requirements: PostgreSQL with pgvector extension
vector_io:
- provider_id: pgvector-knowledge
provider_type: remote::pgvector
config:
host: localhost
port: 5432
db: knowledge_db
user: lightspeed_user
password: ${env.DB_PASSWORD}
kvstore:
type: sqlite
db_path: .llama/distributions/pgvector/registry.db
pgvector Table Schema:
id(text): UUID identifier of the chunkdocument(jsonb): JSON containing content and metadataembedding(vector(n)): The embedding vector (n = embedding dimension)
Configuration Examples
Example 1: OpenAI + FAISS
Complete configuration for OpenAI LLM with local FAISS knowledge base:
version: 2
image_name: openai-faiss-byok
apis:
- agents
- inference
- vector_io
- tool_runtime
- safety
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents_state
backend: kv_default
responses:
table_name: agents_responses
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: company-docs
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: company_docs_backend
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/storage/kv_store.db
sql_default:
type: sql_sqlite
db_path: ~/.llama/storage/sql_store.db
company_docs_backend:
type: kv_sqlite
db_path: /home/user/vector_dbs/company_docs/faiss_store.db
registered_resources:
models:
- model_id: gpt-4o-mini
provider_id: openai
model_type: llm
provider_model_id: gpt-4o-mini
- model_id: sentence-transformers/all-mpnet-base-v2
model_type: embedding
provider_id: sentence-transformers
provider_model_id: /home/user/embedding_models/all-mpnet-base-v2
metadata:
embedding_dimension: 768
vector_stores:
- vector_store_id: company-knowledge-index
provider_id: company-docs
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 768
tool_groups:
- toolgroup_id: builtin::rag
provider_id: rag-runtime
Example 2: vLLM + pgvector
Configuration for local vLLM inference with PostgreSQL knowledge base:
version: 2
image_name: vllm-pgvector-byok
apis:
- agents
- inference
- vector_io
- tool_runtime
- safety
models:
- model_id: meta-llama/Llama-3.1-8B-Instruct
provider_id: vllm
model_type: llm
provider_model_id: null
- model_id: sentence-transformers/all-mpnet-base-v2
metadata:
embedding_dimension: 768
model_type: embedding
provider_id: sentence-transformers
provider_model_id: sentence-transformers/all-mpnet-base-v2
providers:
inference:
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
- provider_id: vllm
provider_type: remote::vllm
config:
url: http://localhost:8000/v1/
api_token: your-token-here
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents_state
backend: kv_default
responses:
table_name: agents_responses
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
vector_io:
- provider_id: enterprise-knowledge
provider_type: remote::pgvector
config:
host: postgres.company.com
port: 5432
db: enterprise_kb
user: rag_user
password: ${env.POSTGRES_PASSWORD}
kvstore:
type: sqlite
db_path: .llama/distributions/pgvector/registry.db
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
tool_groups:
- provider_id: rag-runtime
toolgroup_id: builtin::rag
args: null
mcp_endpoint: null
vector_stores:
- embedding_dimension: 768
embedding_model: sentence-transformers/all-mpnet-base-v2
provider_id: enterprise-knowledge
vector_store_id: enterprise-docs
Conclusion
The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core provides powerful capabilities for integrating custom knowledge sources through RAG technology. By following this guide, you can successfully implement and configure BYOK to enhance your AI system with domain-specific knowledge.
For additional support and advanced configurations, refer to:
Remember to regularly update your knowledge sources and monitor system performance to maintain optimal BYOK functionality.