ilia/atlas

ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP

- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.

2026-01-05 23:44:16 -05:00

1.4 KiB

Raw Blame History

Ticket: LLM Capacity Assessment

Ticket Information

ID: TICKET-018
Title: LLM Capacity Assessment
Type: Research
Priority: High
Status: In Progress
Track: LLM Infra
Milestone: Milestone 1 - Survey & Architecture
Created: 2024-01-XX

Description

Determine maximum context and parameter size:

Assess 16GB VRAM capacity (13B-24B comfortable with quantization)
Determine max context window for 4080
Assess 1050 capacity (smaller models, limited context)
Document memory requirements

Acceptance Criteria

VRAM capacity documented for 4080
VRAM capacity documented for 1050
Max context window determined
Model size limits documented
Memory requirements in architecture docs

Technical Details

Assessment should cover:

4080: 16GB VRAM, Q4/Q5 quantization
1050: 4GB VRAM, very small models
Context window: 4K, 8K, 16K, 32K options
Batch size and concurrency limits

Dependencies

TICKET-017 (model survey)

docs/LLM_CAPACITY.md (to be created)
ARCHITECTURE.md

Notes

Critical for model selection. Should be done early.

Progress Log

2024-01-XX - Capacity assessment document created
2024-01-XX - VRAM limits determined:
- 4080: 70B Q4 fits comfortably (~14GB), max 8K context
- 1050: 3.8B Q4 fits comfortably (~2.5GB), max 8K context
2024-01-XX - Concurrency limits documented (2 requests for 4080, 1-2 for 1050)

1.4 KiB Raw Blame History

Ticket: LLM Capacity Assessment

Ticket Information

Description

Acceptance Criteria

Technical Details

Dependencies

Related Files

Notes

Progress Log

1.4 KiB

Raw Blame History