atlas/docs/MEMORY_DESIGN.md

# Long-Term Memory Design

This document describes the design of the long-term memory system for the Atlas voice agent.

## Overview

The memory system stores persistent facts about the user, their preferences, routines, and important information that should be remembered across conversations.

## Goals

1. **Persistent Storage**: Facts survive across sessions and restarts
2. **Fast Retrieval**: Quick lookup of relevant facts during conversations
3. **Confidence Scoring**: Track how certain we are about each fact
4. **Source Tracking**: Know where each fact came from
5. **Privacy**: Memory is local-only, no external storage

## Data Model

### Memory Entry Schema

```python
{
    "id": "uuid",
    "category": "personal|family|preferences|routines|facts",
    "key": "fact_key",  # e.g., "favorite_color", "morning_routine"
    "value": "fact_value",  # e.g., "blue", "coffee at 7am"
    "confidence": 0.0-1.0,  # How certain we are
    "source": "conversation|explicit|inferred",
    "timestamp": "ISO8601",
    "last_accessed": "ISO8601",
    "access_count": 0,
    "tags": ["tag1", "tag2"],  # For categorization
    "context": "additional context about the fact"
}
```

### Categories

- **personal**: Personal facts (name, age, location, etc.)
- **family**: Family member information
- **preferences**: User preferences (favorite foods, colors, etc.)
- **routines**: Daily/weekly routines
- **facts**: General facts about the user

## Storage

### SQLite Database

**Table: `memory`**

```sql
CREATE TABLE memory (
    id TEXT PRIMARY KEY,
    category TEXT NOT NULL,
    key TEXT NOT NULL,
    value TEXT NOT NULL,
    confidence REAL DEFAULT 0.5,
    source TEXT NOT NULL,
    timestamp TEXT NOT NULL,
    last_accessed TEXT,
    access_count INTEGER DEFAULT 0,
    tags TEXT,  -- JSON array
    context TEXT,
    UNIQUE(category, key)
);
```

**Indexes**:
- `(category, key)` - For fast lookups
- `category` - For category-based queries
- `last_accessed` - For relevance ranking

## Memory Write Policy

### When Memory Can Be Written

1. **Explicit User Statement**: "My favorite color is blue"
   - Confidence: 1.0
   - Source: "explicit"

2. **Inferred from Conversation**: "I always have coffee at 7am"
   - Confidence: 0.7-0.9
   - Source: "inferred"

3. **Confirmed Inference**: User confirms inferred fact
   - Confidence: 0.9-1.0
   - Source: "confirmed"

### When Memory Should NOT Be Written

- Uncertain information (confidence < 0.5)
- Temporary information (e.g., "I'm tired today")
- Work-related information (for family agent)
- Information from unreliable sources

## Retrieval Strategy

### Query Types

1. **By Key**: Direct lookup by category + key
2. **By Category**: All facts in a category
3. **By Tag**: Facts with specific tags
4. **Semantic Search**: Search by value/content (future: embeddings)

### Relevance Ranking

Facts are ranked by:
1. **Recency**: Recently accessed facts are more relevant
2. **Confidence**: Higher confidence facts preferred
3. **Access Count**: Frequently accessed facts are important
4. **Category Match**: Category relevance to query

### Integration with LLM

Memory facts are injected into prompts as context:

```
## User Memory

Personal Facts:
- Favorite color: blue (confidence: 1.0, source: explicit)
- Morning routine: coffee at 7am (confidence: 0.8, source: inferred)

Preferences:
- Prefers metric units (confidence: 0.9, source: explicit)
```

## API Design

### Write Operations

```python
# Store explicit fact
memory.store(
    category="preferences",
    key="favorite_color",
    value="blue",
    confidence=1.0,
    source="explicit"
)

# Store inferred fact
memory.store(
    category="routines",
    key="morning_routine",
    value="coffee at 7am",
    confidence=0.8,
    source="inferred"
)
```

### Read Operations

```python
# Get specific fact
fact = memory.get(category="preferences", key="favorite_color")

# Get all facts in category
facts = memory.get_by_category("preferences")

# Search facts
facts = memory.search(query="coffee", category="routines")
```

### Update Operations

```python
# Update confidence
memory.update_confidence(id="uuid", confidence=0.9)

# Update value
memory.update_value(id="uuid", value="new_value", confidence=1.0)

# Delete fact
memory.delete(id="uuid")
```

## Privacy Considerations

1. **Local Storage Only**: All memory stored locally in SQLite
2. **No External Sync**: No cloud backup or sync
3. **User Control**: Users can view, edit, and delete all memory
4. **Category Separation**: Work vs family memory separation
5. **Deletion Tools**: Easy memory deletion and export

## Future Enhancements

1. **Embeddings**: Semantic search using embeddings
2. **Memory Summarization**: Compress old facts into summaries
3. **Confidence Decay**: Reduce confidence over time if not accessed
4. **Memory Conflicts**: Handle conflicting facts
5. **Memory Validation**: Periodic validation of stored facts

## Integration Points

1. **LLM Prompts**: Inject relevant memory into system prompts
2. **Conversation Manager**: Track when facts are mentioned
3. **Tool Calls**: Tools can read/write memory
4. **Admin UI**: View and manage memory