View on GitHub

Lightspeed Core Stack

Lightspeed Core Stack

Llama Stack Container Orchestration

This guide explains how Lightspeed Core Stack (LCORE) manages the Llama Stack container lifecycle, including startup, teardown, customization, and troubleshooting.

Table of Contents


Overview

When you run make run, the Makefile automatically:

  1. Builds the llama-stack container image (if not already built)
  2. Stops and removes any existing llama-stack container (ensures clean state)
  3. Starts a new llama-stack container with your configuration
  4. Waits for the container to pass health checks (up to 60 seconds)
  5. Starts the Lightspeed Core Stack service
  6. Sets up automatic cleanup on exit (Ctrl+C or kill signal)

This orchestration eliminates the need to manually manage two separate processes, providing a seamless single-command developer experience.


Quick Start

Prerequisites

The Makefile will auto-detect which runtime is available.

Basic Usage

# Install dependencies
uv sync --group dev --group llslibdev

# Generate llama-stack config (run.yaml)
./scripts/generate_local_run.sh

# Set required environment variables
export OPENAI_API_KEY=sk-xxxxx

# Start everything (container + service)
make run

Stop the service: Press Ctrl+C. This will automatically stop and remove the llama-stack container.


Container Lifecycle

Startup Sequence

When you run make run, the following happens:

1. Build Container Image

Target: build-llama-stack-image

make build-llama-stack-image

2. Stop Existing Container

Target: stop-llama-stack-container

make stop-llama-stack-container

3. Start New Container

Target: start-llama-stack-container

make start-llama-stack-container

Key features:

4. Wait for Health

Target: wait-for-llama-stack-health

make wait-for-llama-stack-health

5. Start Lightspeed Core Stack

Target: run-stack

make run-stack

Teardown and Cleanup

Automatic Cleanup on Exit

When you press Ctrl+C or the process receives a termination signal, the trap handler automatically runs:

trap 'echo ""; echo "Stopping services..."; $(MAKE) stop-llama-stack-container' EXIT INT TERM

This ensures the llama-stack container is always cleaned up, even if the service crashes.

Manual Cleanup Commands

Stop the container (keeps container for inspection):

make stop-llama-stack-container

Remove the container (saves logs first):

make remove-llama-stack-container

Full cleanup (remove container + image):

make clean-llama-stack

Customization Options

Makefile Variables

Override any of these variables when running make:

Variable Default Description
LLAMA_STACK_CONTAINER_NAME lightspeed-llama-stack Container name
LLAMA_STACK_IMAGE lightspeed-llama-stack:local Container image name and tag
LLAMA_STACK_PORT 8321 Host port for llama-stack
LLAMA_STACK_CONFIG run.yaml Llama Stack config file path
CONFIG lightspeed-stack.yaml LCORE config file path
CONTAINER_RUNTIME auto-detected Force specific runtime (podman or docker)

Examples

Use custom port:

make run LLAMA_STACK_PORT=9321

Note: Also update llama_stack.url in lightspeed-stack.yaml to http://localhost:9321

Use custom config files:

make run CONFIG=my-config.yaml LLAMA_STACK_CONFIG=my-run.yaml

Use custom container name:

make run LLAMA_STACK_CONTAINER_NAME=my-llama-stack

Force Docker instead of Podman:

make run CONTAINER_RUNTIME=docker

Configuration Files

run.yaml (Llama Stack Configuration)

This file configures the llama-stack server itself. Generated by ./scripts/generate_local_run.sh.

Key sections:

Location: Project root (default) or custom path via LLAMA_STACK_CONFIG

Enrichment: The container automatically enriches this file with settings from lightspeed-stack.yaml at startup (see Configuration Enrichment).

lightspeed-stack.yaml (LCORE Configuration)

This file configures the Lightspeed Core Stack service.

Llama Stack connection settings:

llama_stack:
  use_as_library_client: false
  url: http://localhost:8321
  # api_key: custom-key  # Optional authentication

Location: Project root (default) or custom path via CONFIG

Environment Variables

The Makefile passes these environment variables to the llama-stack container:

Required for OpenAI

Optional Provider Credentials

Azure (Entra ID):

RHAIIS (Red Hat AI Inference Service):

RHEL AI:

Google Vertex AI:

IBM WatsonX:

AWS Bedrock:

Search Providers:

OKP/Solr RAG Configuration

For OKP (Offline Knowledge Portal) RAG:

See OKP Guide for detailed setup instructions.

Other Configuration

Setting Environment Variables

One-time:

export OPENAI_API_KEY=sk-xxxxx
export RHAIIS_API_KEY=xxxxx
make run

In a script:

#!/bin/bash
export OPENAI_API_KEY=sk-xxxxx
export RHAIIS_API_KEY=xxxxx
export RHAIIS_URL=https://rhaiis.example.com
export RHAIIS_MODEL=granite-3.1-8b-instruct
make run

Via .env file (not recommended for secrets):

# Load from file
set -a
source .env
set +a
make run

Health Checks and Monitoring

Container-Level Health Check

The container has a built-in Docker/Podman health check:

# Check container health status
podman inspect --format='' lightspeed-llama-stack

# Possible values:
# - starting: Container is starting, health check not yet run
# - healthy: Health check passed
# - unhealthy: Health check failed 3 times

Health check configuration:

LCORE Readiness Endpoint

The /v1/readiness endpoint checks llama-stack connectivity:

# Check LCORE readiness
curl http://localhost:8080/v1/readiness

# Response when healthy:
{
  "ready": true,
  "reason": "All providers are healthy",
  "providers": []
}

# Response when llama-stack is unreachable (HTTP 503):
{
  "ready": false,
  "reason": "Providers not healthy: unknown",
  "providers": [
    {
      "provider_id": "unknown",
      "status": "error",
      "message": "Failed to initialize health check: Connection error"
    }
  ]
}

Manual Health Checks

Test llama-stack directly:

curl http://localhost:8321/v1/health
# Expected: {"status":"OK"}

Test LCORE liveness:

curl http://localhost:8080/v1/liveness
# Expected: {"alive":true}

View container logs:

# Follow logs in real-time
podman logs -f lightspeed-llama-stack

# View last 50 lines
podman logs --tail 50 lightspeed-llama-stack

Troubleshooting

Common Issues

1. Container Fails Health Check

Symptoms:

✗ ERROR: Llama-stack did not become healthy within 60 seconds
Container logs:
[error logs shown here]

Causes:

Solutions:

  1. Check logs saved by Makefile:
    cat /tmp/llama-stack-failure.log
    
  2. Inspect container manually:
    # Container might still be running in unhealthy state
    podman logs lightspeed-llama-stack
    podman exec lightspeed-llama-stack curl http://localhost:8321/v1/health
    
  3. Test config enrichment:
    # Run enrichment script manually to check for errors
    uv run src/llama_stack_configuration.py \
      -c lightspeed-stack.yaml \
      -i run.yaml \
      -o /tmp/enriched-run.yaml
       
    # Check output for errors
    cat /tmp/enriched-run.yaml
    
  4. Check for missing environment variables:
    # Example error: "Environment variable 'OPENAI_API_KEY' not set"
    # Solution: export OPENAI_API_KEY=sk-xxxxx
    

2. Port Conflict

Symptoms:

Error: cannot listen on the TCP port: listen tcp4 0.0.0.0:8321: bind: address already in use

Solutions:

  1. Find what’s using port 8321:
    sudo lsof -i :8321
    # or
    sudo ss -tulpn | grep 8321
    
  2. Kill the process or use a different port:
    make run LLAMA_STACK_PORT=9321
    

    Don’t forget to update lightspeed-stack.yaml:

    llama_stack:
      url: http://localhost:9321
    

3. Volume Mount Permission Issues (SELinux)

Symptoms:

Error: mkdir /opt/app-root/run.yaml: permission denied

Cause: SELinux on RHEL/Fedora blocks volume mounts

Solution: The Makefile already includes :z flags on volume mounts. If still failing:

# Temporarily set SELinux to permissive
sudo setenforce 0

# Check SELinux denials
sudo ausearch -m avc -ts recent

# Re-enable SELinux
sudo setenforce 1

4. Container Build Fails

Symptoms:

Error: building at STEP "RUN uv sync...": error running subprocess

Solutions:

  1. Check network connectivity:
    podman run --rm alpine ping -c 3 pypi.org
    
  2. Clear build cache:
    make clean-llama-stack
    podman system prune -a
    make build-llama-stack-image
    
  3. Check disk space:
    df -h
    # Need several GB free for build
    

5. “No container runtime found”

Symptoms:

ERROR: No container runtime found. Install podman or docker.

Solution:

# On RHEL/Fedora
sudo dnf install podman

# On Ubuntu/Debian
sudo apt-get install podman
# or
curl -fsSL https://get.docker.com | sh

6. Container Starts But LCORE Can’t Connect

Symptoms:

Solutions:

  1. Check llama-stack URL in config:
    # lightspeed-stack.yaml
    llama_stack:
      url: http://localhost:8321  # Must match LLAMA_STACK_PORT
    
  2. Test connection manually:
    curl http://localhost:8321/v1/health
    
  3. Check firewall rules:
    sudo firewall-cmd --list-ports
    # If 8321 blocked, add it:
    sudo firewall-cmd --permanent --add-port=8321/tcp
    sudo firewall-cmd --reload
    

7. Credential File Permission Errors (VertexAI, GCP)

Symptoms:

PermissionError: [Errno 13] Permission denied: '/tmp/vertex-credentials.json'
google.auth._default.load_credentials_from_file() failed to open credentials file

Cause: The llama-stack container runs as UID 1001 (non-root user for security). When you mount a credentials file with restrictive permissions (600), the container user cannot read it:

Solutions:

Option 1: Use 644 permissions (Works on all platforms)

chmod 644 /path/to/vertex-credentials.json

Allows container user (UID 1001) to read the file as “others” while keeping write access restricted to owner.

Security note: File becomes world-readable on the host. Acceptable for development environments where access to the filesystem is already restricted to your user account.

Option 2: Use ACLs (Linux only - more secure)

ACLs (Access Control Lists) allow you to grant read access to UID 1001 specifically without making the file world-readable. Note: This only works on Linux systems, not macOS.

Install ACL tools (Linux):

# RHEL/Fedora/CentOS
sudo dnf install acl

# Ubuntu/Debian
sudo apt-get install acl

Grant read access to UID 1001 (Linux only):

setfacl -m u:1001:r /path/to/vertex-credentials.json

# Verify
getfacl /path/to/vertex-credentials.json
# Output shows: user:1001:r--

This grants read-only access to UID 1001 (container user) without changing base permissions or making the file world-readable.

macOS note: macOS uses BSD ACLs and cannot assign numeric UID-based ACLs to non-existent host users. If you are testing locally on macOS, you must temporarily use chmod 644 to allow the container access, but be aware that this makes the credentials file world-readable on your host machine. Alternately, ensure your local user matches the container’s execution environment.

Why this happens: This is expected container behavior. The container runs as a non-root user (UID 1001) for security - see USER 1001 in deploy/llama-stack/test.containerfile. Files with 600 permissions are only accessible to their owner, and the container’s UID differs from your host UID.

Production recommendation: For production deployments, avoid mounting credential files entirely. Instead use:

Debug Logs

The Makefile automatically saves logs to /tmp when issues occur:

File Content When Created
/tmp/llama-stack-failure.log Last 200 lines of logs when container fails to stop gracefully Container stop timeout
/tmp/llama-stack-last-run.log Full logs before container removal make remove-llama-stack-container
(Container logs) View with podman logs lightspeed-llama-stack While container is running

Enable debug logging in llama-stack:

export LLAMA_STACK_LOGGING=debug
make run

Advanced Topics

Configuration Enrichment

When the llama-stack container starts, it automatically enriches the run.yaml file with settings from lightspeed-stack.yaml. This is done by the entrypoint script mounted into the container.

How It Works

  1. Entrypoint script (scripts/llama-stack-entrypoint.sh) is mounted at /opt/app-root/enrich-entrypoint.sh
  2. Script runs /opt/app-root/.venv/bin/python3 /opt/app-root/llama_stack_configuration.py
  3. Enrichment logic (src/llama_stack_configuration.py) reads both configs and merges them
  4. Output is written to /tmp/enriched-run.yaml inside the container
  5. Llama Stack starts with the enriched config

What Gets Enriched

Manual Enrichment (for debugging)

# Run enrichment locally to see output
uv run src/llama_stack_configuration.py \
  -c lightspeed-stack.yaml \
  -i run.yaml \
  -o enriched-run.yaml

# Inspect the enriched config
cat enriched-run.yaml

Volume Mounts

The container uses these volume mounts:

Host Path Container Path Mode Purpose
$(PWD)/run.yaml /opt/app-root/run.yaml rw Llama Stack config (enriched version written here)
$(PWD)/lightspeed-stack.yaml /opt/app-root/lightspeed-stack.yaml ro LCORE config (read for enrichment)
$(PWD)/scripts/llama-stack-entrypoint.sh /opt/app-root/enrich-entrypoint.sh ro Entrypoint script with enrichment logic
$(PWD)/src/llama_stack_configuration.py /opt/app-root/llama_stack_configuration.py ro Python enrichment script

SELinux labels:

Why mount scripts instead of baking into image?

Manual Container Management

If you need more control than the Makefile provides, you can manage the container manually:

Build the Image

podman build -f deploy/llama-stack/test.containerfile -t my-llama-stack:custom .

Run the Container

podman run -d \
  --name my-llama-stack \
  -p 9000:8321 \
  -v $(pwd)/run.yaml:/opt/app-root/run.yaml:z \
  -v $(pwd)/lightspeed-stack.yaml:/opt/app-root/lightspeed-stack.yaml:ro,z \
  -e OPENAI_API_KEY \
  my-llama-stack:custom

Monitor the Container

# Follow logs
podman logs -f my-llama-stack

# Check health
podman inspect --format='' my-llama-stack

# Execute commands inside container
podman exec my-llama-stack curl http://localhost:8321/v1/health

# View container stats (CPU, memory)
podman stats my-llama-stack

Stop and Remove

# Stop gracefully
podman stop -t 10 my-llama-stack

# Remove container
podman rm my-llama-stack

# Remove image
podman rmi my-llama-stack:custom

Connect LCORE to Manual Container

Update lightspeed-stack.yaml:

llama_stack:
  use_as_library_client: false
  url: http://localhost:9000  # Use your custom port

Then start LCORE without container orchestration:

make run-stack  # Skips container startup, just runs LCORE

See Also