o8 llama - AI Workload Management

The o8 llama command group provides comprehensive management for AI workloads in Orchestr8.

Overview

Orchestr8 AI workloads are built on Llama-Stack, providing enterprise-ready deployment and management of LLM, RAG, and agentic applications. The o8 llama commands handle the complete lifecycle from initialization to monitoring.

Commands

o8 llama init

Initialize a new AI workload module.

o8 llama init <name> [OPTIONS]

Arguments:

name - AI workload name (required)

Options:

--template, -t - Workload template: rag, agent, inference, custom (default: rag)
--path, -p - Directory to create workload (default: current directory)
--provider - AI provider: openai, anthropic, groq, local (default: openai)

Examples:

# Create a RAG application with OpenAI
o8 llama init my-rag-app --template rag --provider openai

# Create an agentic workflow with Anthropic
o8 llama init my-agent --template agent --provider anthropic

# Create a custom workload in specific directory
o8 llama init my-app --template custom --path ./projects/

Generated Structure:

my-rag-app/
├── .o8/
│   ├── module.yaml          # AI workload specification
│   └── security.yaml        # Security policies
├── base/                    # Base Kubernetes manifests
│   ├── kustomization.yaml
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ...
├── overlays/                # Environment-specific configs
│   ├── dev/
│   ├── staging/
│   └── production/
└── tests/                   # Automated tests

o8 llama validate

Validate an AI workload configuration.

o8 llama validate [PATH] [OPTIONS]

Arguments:

path - Path to AI workload directory (default: current directory)

Options:

--verbose, -v - Show detailed validation information

Validation Checks:

✅ YAML syntax and structure
✅ AI-specific configuration fields
✅ GPU requirements and node availability
✅ Storage requirements and classes
✅ Provider secrets and credentials
✅ Kubernetes manifest validity
✅ Kustomize build success

Examples:

# Validate current directory
o8 llama validate

# Validate specific workload
o8 llama validate ./my-rag-app --verbose

# Quick validation check
o8 llama validate ./my-app

Sample Output:

✅ AI workload validation passed!

AI Configuration:
├── Providers: openai, anthropic
├── Capabilities: inference, safety, agents
├── GPU Nodes: 2 available
└── Storage: 300Gi required, fast-ssd available

o8 llama deploy

Deploy an AI workload to the cluster.

o8 llama deploy [PATH] [OPTIONS]

Arguments:

path - AI workload path (default: current directory)

Options:

--env, -e - Target environment: dev, staging, production (default: dev)
--dry-run - Show what would be deployed without applying changes
--wait/--no-wait - Wait for deployment completion (default: --wait)

Examples:

# Deploy to development environment
o8 llama deploy

# Deploy to production with preview
o8 llama deploy --env production --dry-run

# Deploy without waiting
o8 llama deploy --env staging --no-wait

Deployment Process:

🔍 Validation - Runs configuration validation
🔧 Build - Generates Kubernetes manifests with Kustomize
🚀 Deploy - Applies manifests to cluster
⏳ Wait - Monitors deployment progress (if --wait)
✅ Verify - Confirms successful deployment

o8 llama status

Check the status of AI workloads.

o8 llama status [OPTIONS]

Options:

--namespace, -n - Kubernetes namespace (default: llama-stack)
--watch, -w - Watch for changes (continuous monitoring)

Status Information:

📊 Pod Status - Running, pending, failed pods
🔥 GPU Utilization - GPU allocation and usage
🏥 Health Checks - Llama-Stack health and readiness
🤖 Available Models - Configured and loaded models
🔌 Providers - Connected AI providers
💾 Storage - Persistent volume usage

Examples:

# Check status once
o8 llama status

# Monitor continuously
o8 llama status --watch

# Check specific namespace
o8 llama status --namespace my-ai-app

Sample Output:

✅ Llama-Stack Health: Healthy

AI Workload Pods:
NAME                          READY   STATUS    GPU    AGE
llama-stack-7d4b8f9c8-x2m9p  1/1     Running   1/1    5m

Available Models:
┌─────────────────────────────────────┬──────────────────────┬──────┐
│ Model ID                            │ Provider             │ Type │
├─────────────────────────────────────┼──────────────────────┼──────┤
│ meta-llama/Llama-3.2-3B-Instruct   │ remote::ollama       │ llm  │
│ gpt-4                              │ openai               │ llm  │
└─────────────────────────────────────┴──────────────────────┴──────┘

Configured Providers:
• inference: 3 configured
• safety: 1 configured  
• vector-io: 2 configured

o8 llama logs

View AI workload logs.

o8 llama logs [OPTIONS]

Options:

--namespace, -n - Kubernetes namespace (default: llama-stack)
--follow, -f - Follow log output (stream logs)
--tail - Number of lines to show (default: 100)

Examples:

# View recent logs
o8 llama logs

# Stream logs continuously
o8 llama logs --follow

# Show last 50 lines
o8 llama logs --tail 50

# View logs from specific namespace
o8 llama logs --namespace my-ai-app --follow

Log Categories:

🚀 Startup - Application initialization
🔧 Configuration - Provider and model loading
📨 Requests - API request processing
⚠️ Errors - Error conditions and warnings
📊 Metrics - Performance and usage metrics

o8 llama providers

List and manage AI providers.

o8 llama providers [OPTIONS]

Options:

--namespace, -n - Kubernetes namespace (default: llama-stack)
--type, -t - Filter by provider type: inference, safety, vector-io, memory
--provider - Show specific provider details

Provider Types:

inference - LLM providers (OpenAI, Anthropic, Groq, etc.)
safety - Content safety and moderation
vector-io - Vector databases (ChromaDB, Qdrant, etc.)
memory - Persistent memory and context
telemetry - Monitoring and metrics
datasets - Data management

Examples:

# List all providers
o8 llama providers

# Show only inference providers
o8 llama providers --type inference

# Show specific provider details
o8 llama providers --type inference --provider openai

Sample Output:

Inference Providers:
┌─────────────────────┬─────────────────────┬────────────────────────────┐
│ Provider ID         │ Type                │ Configuration              │
├─────────────────────┼─────────────────────┼────────────────────────────┤
│ openai              │ openai              │ url: https://api.openai.c… │
│ anthropic           │ anthropic           │ url: https://api.anthropi… │
│ remote::ollama      │ remote::ollama      │ url: http://ollama.platfo… │
└─────────────────────┴─────────────────────┴────────────────────────────┘

Vector-Io Providers:
┌─────────────────────┬─────────────────────┬────────────────────────────┐
│ Provider ID         │ Type                │ Configuration              │
├─────────────────────┼─────────────────────┼────────────────────────────┤
│ chromadb            │ chromadb            │ host: chromadb.llama-stac… │
└─────────────────────┴─────────────────────┴────────────────────────────┘

Global Options

All o8 llama commands support these global options:

--help, -h - Show command help
--verbose, -v - Enable verbose output
--config - Specify custom config file path

Configuration Files

AI Workload Specification (.o8/module.yaml)

apiVersion: orchestr8.platform/v1alpha1
kind: ModuleSpecification
metadata:
  name: my-rag-app
spec:
  module:
    name: my-rag-app
    version: 0.1.0
    tier: enterprise
    description: RAG application with OpenAI integration
    
  requirements:
    compute:
      gpu:
        enabled: true
        type: nvidia.com/gpu
        count: 1
      memory:
        requests: 4Gi
        limits: 16Gi
        
    storage:
      modelCache:
        size: 100Gi
        storageClass: fast-ssd
      vectorStore:
        size: 200Gi
        storageClass: fast-ssd
        
  aiSpecific:
    providers:
      inference: [openai, anthropic]
      vector-io: [chromadb]
    capabilities: [inference, safety, agents, vector-io]
    modelSupport:
      formats: [huggingface, onnx]
      sizes: [small, medium, large]

Environment Variables

KUBECONFIG - Path to kubeconfig file
O8_NAMESPACE - Default namespace for operations
O8_CONFIG_DIR - Configuration directory path
O8_LOG_LEVEL - Logging level (DEBUG, INFO, WARN, ERROR)

Exit Codes

0 - Success
1 - General error
2 - Invalid arguments
3 - Kubernetes connection error
4 - Validation error
5 - Deployment error

Examples

Complete AI Workload Workflow

# 1. Initialize RAG application
o8 llama init customer-support-rag --template rag --provider openai

# 2. Navigate to workload
cd customer-support-rag

# 3. Customize configuration (edit .o8/module.yaml as needed)

# 4. Validate configuration
o8 llama validate --verbose

# 5. Deploy to development
o8 llama deploy --env dev

# 6. Monitor status
o8 llama status --watch

# 7. Check logs if needed
o8 llama logs --follow

# 8. Verify providers
o8 llama providers --type inference

Multi-Environment Deployment

# Deploy to development first
o8 llama deploy --env dev

# Verify in development
o8 llama status

# Preview production deployment
o8 llama deploy --env production --dry-run

# Deploy to production
o8 llama deploy --env production

Overview​

Commands​

o8 llama init​

o8 llama validate​

o8 llama deploy​

o8 llama status​

o8 llama logs​

o8 llama providers​

Global Options​

Configuration Files​

AI Workload Specification (.o8/module.yaml)​

Environment Variables​

Exit Codes​

Examples​

Complete AI Workload Workflow​

Multi-Environment Deployment​

See Also​

Overview

Commands

o8 llama init

o8 llama validate

o8 llama deploy

o8 llama status

o8 llama logs

o8 llama providers

Global Options

Configuration Files

AI Workload Specification (.o8/module.yaml)

Environment Variables

Exit Codes

Examples

Complete AI Workload Workflow

Multi-Environment Deployment

See Also