atlas/tickets/done/TICKET-018_llm-capacity-assessment.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

1.4 KiB

Ticket: LLM Capacity Assessment

Ticket Information

  • ID: TICKET-018
  • Title: LLM Capacity Assessment
  • Type: Research
  • Priority: High
  • Status: In Progress
  • Track: LLM Infra
  • Milestone: Milestone 1 - Survey & Architecture
  • Created: 2024-01-XX

Description

Determine maximum context and parameter size:

  • Assess 16GB VRAM capacity (13B-24B comfortable with quantization)
  • Determine max context window for 4080
  • Assess 1050 capacity (smaller models, limited context)
  • Document memory requirements

Acceptance Criteria

  • VRAM capacity documented for 4080
  • VRAM capacity documented for 1050
  • Max context window determined
  • Model size limits documented
  • Memory requirements in architecture docs

Technical Details

Assessment should cover:

  • 4080: 16GB VRAM, Q4/Q5 quantization
  • 1050: 4GB VRAM, very small models
  • Context window: 4K, 8K, 16K, 32K options
  • Batch size and concurrency limits

Dependencies

  • TICKET-017 (model survey)
  • docs/LLM_CAPACITY.md (to be created)
  • ARCHITECTURE.md

Notes

Critical for model selection. Should be done early.

Progress Log

  • 2024-01-XX - Capacity assessment document created
  • 2024-01-XX - VRAM limits determined:
    • 4080: 70B Q4 fits comfortably (~14GB), max 8K context
    • 1050: 3.8B Q4 fits comfortably (~2.5GB), max 8K context
  • 2024-01-XX - Concurrency limits documented (2 requests for 4080, 1-2 for 1050)