nanobot/WHY_HUGGINGFACE.md

# Why Hugging Face? Can We Avoid It?

## Short Answer

**You don't HAVE to use Hugging Face**, but it's the easiest way. Here's why it's commonly used and what alternatives exist.

## Why Hugging Face is Used

### 1. **Model Distribution Platform**
- Hugging Face Hub is where most open-source models (Llama, etc.) are hosted
- When you specify `"meta-llama/Llama-3.1-8B-Instruct"`, AirLLM automatically downloads it from Hugging Face
- It's the standard repository that everyone uses

### 2. **Gated Models (Like Llama)**
- Llama models are "gated" - they require:
  - Accepting Meta's license terms
  - A Hugging Face account
  - A token to authenticate
- This is **Meta's requirement**, not Hugging Face's
- The token proves you've accepted the license

### 3. **Convenience**
- Automatic downloads
- Version management
- Easy model discovery

## Alternatives: How to Avoid Hugging Face

### Option 1: Use Local Model Files (No HF Token Needed!)

If you already have the model downloaded locally, you can use it directly:

**1. Download the model manually** (one-time, can use `git lfs` or `huggingface-cli`):
```bash
# Using huggingface-cli (still needs token, but only once)
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/llama-3.1-8b

# Or using git lfs
git lfs clone https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct ~/models/llama-3.1-8b
```

**2. Use local path in config**:
```json
{
  "providers": {
    "airllm": {
      "apiKey": "/home/youruser/models/llama-3.1-8b"
    }
  },
  "agents": {
    "defaults": {
      "model": "/home/youruser/models/llama-3.1-8b"
    }
  }
}
```

**Note**: AirLLM's `AutoModel.from_pretrained()` accepts local paths! Just use the full path instead of the model ID.

### Option 2: Use Ollama (No HF at All!)

Ollama manages models for you and doesn't require Hugging Face:

**1. Install Ollama**: https://ollama.ai

**2. Pull a model**:
```bash
ollama pull llama3.1:8b
```

**3. Configure nanobot**:
```json
{
  "providers": {
    "ollama": {
      "apiKey": "dummy",
      "apiBase": "http://localhost:11434/v1"
    }
  },
  "agents": {
    "defaults": {
      "model": "llama3.1:8b"
    }
  }
}
```

### Option 3: Use vLLM (Local Server)

**1. Download model once** (with or without HF token):
```bash
# With HF token
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/llama-3.1-8b

# Or manually download from other sources
```

**2. Start vLLM server**:
```bash
vllm serve ~/models/llama-3.1-8b --port 8000
```

**3. Configure nanobot**:
```json
{
  "providers": {
    "vllm": {
      "apiKey": "dummy",
      "apiBase": "http://localhost:8000/v1"
    }
  },
  "agents": {
    "defaults": {
      "model": "llama-3.1-8b"
    }
  }
}
```


## Why You Might Still Need HF Token

Even if you want to avoid Hugging Face long-term, you might need it **once** to:
- Download the model initially
- Accept the license for gated models (Llama)

After that, you can use the local files and never touch Hugging Face again!

## Recommendation

**For Llama models specifically:**
1. **Get HF token once** (5 minutes) - just to download and accept license
2. **Download model locally** - use `huggingface-cli` or `git lfs`
3. **Use local path** - configure nanobot to use the local directory
4. **Never need HF again** - the model runs completely offline

This gives you:
- ✅ No ongoing dependency on Hugging Face
- ✅ Faster startup (no downloads)
- ✅ Works offline
- ✅ Full control

## Summary

- **Hugging Face is required** for: Downloading models initially, accessing gated models
- **Hugging Face is NOT required** for: Running models after download, using local files, using Ollama/vLLM
- **Best approach**: Download once with HF token, then use local files forever