Skip to main content

o8 llama - AI Workload Management

The o8 llama command group provides comprehensive management for AI workloads in Orchestr8.

Overview

Orchestr8 AI workloads are built on Llama-Stack, providing enterprise-ready deployment and management of LLM, RAG, and agentic applications. The o8 llama commands handle the complete lifecycle from initialization to monitoring.

Commands

o8 llama init

Initialize a new AI workload module.

o8 llama init <name> [OPTIONS]

Arguments:

  • name - AI workload name (required)

Options:

  • --template, -t - Workload template: rag, agent, inference, custom (default: rag)
  • --path, -p - Directory to create workload (default: current directory)
  • --provider - AI provider: openai, anthropic, groq, local (default: openai)

Examples:

# Create a RAG application with OpenAI
o8 llama init my-rag-app --template rag --provider openai

# Create an agentic workflow with Anthropic
o8 llama init my-agent --template agent --provider anthropic

# Create a custom workload in specific directory
o8 llama init my-app --template custom --path ./projects/

Generated Structure:

my-rag-app/
├── .o8/
│ ├── module.yaml # AI workload specification
│ └── security.yaml # Security policies
├── base/ # Base Kubernetes manifests
│ ├── kustomization.yaml
│ ├── deployment.yaml
│ ├── service.yaml
│ └── ...
├── overlays/ # Environment-specific configs
│ ├── dev/
│ ├── staging/
│ └── production/
└── tests/ # Automated tests

o8 llama validate

Validate an AI workload configuration.

o8 llama validate [PATH] [OPTIONS]

Arguments:

  • path - Path to AI workload directory (default: current directory)

Options:

  • --verbose, -v - Show detailed validation information

Validation Checks:

  • ✅ YAML syntax and structure
  • ✅ AI-specific configuration fields
  • ✅ GPU requirements and node availability
  • ✅ Storage requirements and classes
  • ✅ Provider secrets and credentials
  • ✅ Kubernetes manifest validity
  • ✅ Kustomize build success

Examples:

# Validate current directory
o8 llama validate

# Validate specific workload
o8 llama validate ./my-rag-app --verbose

# Quick validation check
o8 llama validate ./my-app

Sample Output:

✅ AI workload validation passed!

AI Configuration:
├── Providers: openai, anthropic
├── Capabilities: inference, safety, agents
├── GPU Nodes: 2 available
└── Storage: 300Gi required, fast-ssd available

o8 llama deploy

Deploy an AI workload to the cluster.

o8 llama deploy [PATH] [OPTIONS]

Arguments:

  • path - AI workload path (default: current directory)

Options:

  • --env, -e - Target environment: dev, staging, production (default: dev)
  • --dry-run - Show what would be deployed without applying changes
  • --wait/--no-wait - Wait for deployment completion (default: --wait)

Examples:

# Deploy to development environment
o8 llama deploy

# Deploy to production with preview
o8 llama deploy --env production --dry-run

# Deploy without waiting
o8 llama deploy --env staging --no-wait

Deployment Process:

  1. 🔍 Validation - Runs configuration validation
  2. 🔧 Build - Generates Kubernetes manifests with Kustomize
  3. 🚀 Deploy - Applies manifests to cluster
  4. Wait - Monitors deployment progress (if --wait)
  5. Verify - Confirms successful deployment

o8 llama status

Check the status of AI workloads.

o8 llama status [OPTIONS]

Options:

  • --namespace, -n - Kubernetes namespace (default: llama-stack)
  • --watch, -w - Watch for changes (continuous monitoring)

Status Information:

  • 📊 Pod Status - Running, pending, failed pods
  • 🔥 GPU Utilization - GPU allocation and usage
  • 🏥 Health Checks - Llama-Stack health and readiness
  • 🤖 Available Models - Configured and loaded models
  • 🔌 Providers - Connected AI providers
  • 💾 Storage - Persistent volume usage

Examples:

# Check status once
o8 llama status

# Monitor continuously
o8 llama status --watch

# Check specific namespace
o8 llama status --namespace my-ai-app

Sample Output:

✅ Llama-Stack Health: Healthy

AI Workload Pods:
NAME READY STATUS GPU AGE
llama-stack-7d4b8f9c8-x2m9p 1/1 Running 1/1 5m

Available Models:
┌─────────────────────────────────────┬──────────────────────┬──────┐
│ Model ID │ Provider │ Type │
├─────────────────────────────────────┼──────────────────────┼──────┤
│ meta-llama/Llama-3.2-3B-Instruct │ remote::ollama │ llm │
│ gpt-4 │ openai │ llm │
└─────────────────────────────────────┴──────────────────────┴──────┘

Configured Providers:
• inference: 3 configured
• safety: 1 configured
• vector-io: 2 configured

o8 llama logs

View AI workload logs.

o8 llama logs [OPTIONS]

Options:

  • --namespace, -n - Kubernetes namespace (default: llama-stack)
  • --follow, -f - Follow log output (stream logs)
  • --tail - Number of lines to show (default: 100)

Examples:

# View recent logs
o8 llama logs

# Stream logs continuously
o8 llama logs --follow

# Show last 50 lines
o8 llama logs --tail 50

# View logs from specific namespace
o8 llama logs --namespace my-ai-app --follow

Log Categories:

  • 🚀 Startup - Application initialization
  • 🔧 Configuration - Provider and model loading
  • 📨 Requests - API request processing
  • ⚠️ Errors - Error conditions and warnings
  • 📊 Metrics - Performance and usage metrics

o8 llama providers

List and manage AI providers.

o8 llama providers [OPTIONS]

Options:

  • --namespace, -n - Kubernetes namespace (default: llama-stack)
  • --type, -t - Filter by provider type: inference, safety, vector-io, memory
  • --provider - Show specific provider details

Provider Types:

  • inference - LLM providers (OpenAI, Anthropic, Groq, etc.)
  • safety - Content safety and moderation
  • vector-io - Vector databases (ChromaDB, Qdrant, etc.)
  • memory - Persistent memory and context
  • telemetry - Monitoring and metrics
  • datasets - Data management

Examples:

# List all providers
o8 llama providers

# Show only inference providers
o8 llama providers --type inference

# Show specific provider details
o8 llama providers --type inference --provider openai

Sample Output:

Inference Providers:
┌─────────────────────┬─────────────────────┬────────────────────────────┐
│ Provider ID │ Type │ Configuration │
├─────────────────────┼─────────────────────┼────────────────────────────┤
│ openai │ openai │ url: https://api.openai.c… │
│ anthropic │ anthropic │ url: https://api.anthropi… │
│ remote::ollama │ remote::ollama │ url: http://ollama.platfo… │
└─────────────────────┴─────────────────────┴────────────────────────────┘

Vector-Io Providers:
┌─────────────────────┬─────────────────────┬────────────────────────────┐
│ Provider ID │ Type │ Configuration │
├─────────────────────┼─────────────────────┼────────────────────────────┤
│ chromadb │ chromadb │ host: chromadb.llama-stac… │
└─────────────────────┴─────────────────────┴────────────────────────────┘

Global Options

All o8 llama commands support these global options:

  • --help, -h - Show command help
  • --verbose, -v - Enable verbose output
  • --config - Specify custom config file path

Configuration Files

AI Workload Specification (.o8/module.yaml)

apiVersion: orchestr8.platform/v1alpha1
kind: ModuleSpecification
metadata:
name: my-rag-app
spec:
module:
name: my-rag-app
version: 0.1.0
tier: enterprise
description: RAG application with OpenAI integration

requirements:
compute:
gpu:
enabled: true
type: nvidia.com/gpu
count: 1
memory:
requests: 4Gi
limits: 16Gi

storage:
modelCache:
size: 100Gi
storageClass: fast-ssd
vectorStore:
size: 200Gi
storageClass: fast-ssd

aiSpecific:
providers:
inference: [openai, anthropic]
vector-io: [chromadb]
capabilities: [inference, safety, agents, vector-io]
modelSupport:
formats: [huggingface, onnx]
sizes: [small, medium, large]

Environment Variables

  • KUBECONFIG - Path to kubeconfig file
  • O8_NAMESPACE - Default namespace for operations
  • O8_CONFIG_DIR - Configuration directory path
  • O8_LOG_LEVEL - Logging level (DEBUG, INFO, WARN, ERROR)

Exit Codes

  • 0 - Success
  • 1 - General error
  • 2 - Invalid arguments
  • 3 - Kubernetes connection error
  • 4 - Validation error
  • 5 - Deployment error

Examples

Complete AI Workload Workflow

# 1. Initialize RAG application
o8 llama init customer-support-rag --template rag --provider openai

# 2. Navigate to workload
cd customer-support-rag

# 3. Customize configuration (edit .o8/module.yaml as needed)

# 4. Validate configuration
o8 llama validate --verbose

# 5. Deploy to development
o8 llama deploy --env dev

# 6. Monitor status
o8 llama status --watch

# 7. Check logs if needed
o8 llama logs --follow

# 8. Verify providers
o8 llama providers --type inference

Multi-Environment Deployment

# Deploy to development first
o8 llama deploy --env dev

# Verify in development
o8 llama status

# Preview production deployment
o8 llama deploy --env production --dry-run

# Deploy to production
o8 llama deploy --env production

See Also