Tanya e6b5ead3fd Merge origin/main into feature branch

- Merged latest 166 commits from origin/main
- Resolved conflicts in .gitignore, commands.py, schema.py, providers/__init__.py, and registry.py
- Kept both local providers (Ollama, AirLLM) and new providers from main
- Preserved transformers 4.39.3 compatibility fixes
- Combined error handling improvements with new features

2026-02-18 13:03:19 -05:00

3.7 KiB

Raw Permalink Blame History

Why Hugging Face? Can We Avoid It?

Short Answer

You don't HAVE to use Hugging Face, but it's the easiest way. Here's why it's commonly used and what alternatives exist.

Why Hugging Face is Used

1. Model Distribution Platform

Hugging Face Hub is where most open-source models (Llama, etc.) are hosted
When you specify "meta-llama/Llama-3.1-8B-Instruct", AirLLM automatically downloads it from Hugging Face
It's the standard repository that everyone uses

2. Gated Models (Like Llama)

Llama models are "gated" - they require:
- Accepting Meta's license terms
- A Hugging Face account
- A token to authenticate
This is Meta's requirement, not Hugging Face's
The token proves you've accepted the license

3. Convenience

Automatic downloads
Version management
Easy model discovery

Alternatives: How to Avoid Hugging Face

Option 1: Use Local Model Files (No HF Token Needed!)

If you already have the model downloaded locally, you can use it directly:

1. Download the model manually (one-time, can use git lfs or huggingface-cli):

# Using huggingface-cli (still needs token, but only once)
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/llama-3.1-8b

# Or using git lfs
git lfs clone https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct ~/models/llama-3.1-8b

2. Use local path in config:

{
  "providers": {
    "airllm": {
      "apiKey": "/home/youruser/models/llama-3.1-8b"
    }
  },
  "agents": {
    "defaults": {
      "model": "/home/youruser/models/llama-3.1-8b"
    }
  }
}

Note: AirLLM's AutoModel.from_pretrained() accepts local paths! Just use the full path instead of the model ID.

Option 2: Use Ollama (No HF at All!)

Ollama manages models for you and doesn't require Hugging Face:

1. Install Ollama: https://ollama.ai

2. Pull a model:

ollama pull llama3.1:8b

3. Configure nanobot:

{
  "providers": {
    "ollama": {
      "apiKey": "dummy",
      "apiBase": "http://localhost:11434/v1"
    }
  },
  "agents": {
    "defaults": {
      "model": "llama3.1:8b"
    }
  }
}

Option 3: Use vLLM (Local Server)

1. Download model once (with or without HF token):

# With HF token
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/llama-3.1-8b

# Or manually download from other sources

2. Start vLLM server:

vllm serve ~/models/llama-3.1-8b --port 8000

3. Configure nanobot:

{
  "providers": {
    "vllm": {
      "apiKey": "dummy",
      "apiBase": "http://localhost:8000/v1"
    }
  },
  "agents": {
    "defaults": {
      "model": "llama-3.1-8b"
    }
  }
}

Why You Might Still Need HF Token

Even if you want to avoid Hugging Face long-term, you might need it once to:

Download the model initially
Accept the license for gated models (Llama)

After that, you can use the local files and never touch Hugging Face again!

Recommendation

For Llama models specifically:

Get HF token once (5 minutes) - just to download and accept license
Download model locally - use huggingface-cli or git lfs
Use local path - configure nanobot to use the local directory
Never need HF again - the model runs completely offline

This gives you:

✅ No ongoing dependency on Hugging Face
✅ Faster startup (no downloads)
✅ Works offline
✅ Full control

Summary

Hugging Face is required for: Downloading models initially, accessing gated models
Hugging Face is NOT required for: Running models after download, using local files, using Ollama/vLLM
Best approach: Download once with HF token, then use local files forever

3.7 KiB Raw Permalink Blame History