- Merged latest 166 commits from origin/main - Resolved conflicts in .gitignore, commands.py, schema.py, providers/__init__.py, and registry.py - Kept both local providers (Ollama, AirLLM) and new providers from main - Preserved transformers 4.39.3 compatibility fixes - Combined error handling improvements with new features
3.7 KiB
Why Hugging Face? Can We Avoid It?
Short Answer
You don't HAVE to use Hugging Face, but it's the easiest way. Here's why it's commonly used and what alternatives exist.
Why Hugging Face is Used
1. Model Distribution Platform
- Hugging Face Hub is where most open-source models (Llama, etc.) are hosted
- When you specify
"meta-llama/Llama-3.1-8B-Instruct", AirLLM automatically downloads it from Hugging Face - It's the standard repository that everyone uses
2. Gated Models (Like Llama)
- Llama models are "gated" - they require:
- Accepting Meta's license terms
- A Hugging Face account
- A token to authenticate
- This is Meta's requirement, not Hugging Face's
- The token proves you've accepted the license
3. Convenience
- Automatic downloads
- Version management
- Easy model discovery
Alternatives: How to Avoid Hugging Face
Option 1: Use Local Model Files (No HF Token Needed!)
If you already have the model downloaded locally, you can use it directly:
1. Download the model manually (one-time, can use git lfs or huggingface-cli):
# Using huggingface-cli (still needs token, but only once)
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/llama-3.1-8b
# Or using git lfs
git lfs clone https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct ~/models/llama-3.1-8b
2. Use local path in config:
{
"providers": {
"airllm": {
"apiKey": "/home/youruser/models/llama-3.1-8b"
}
},
"agents": {
"defaults": {
"model": "/home/youruser/models/llama-3.1-8b"
}
}
}
Note: AirLLM's AutoModel.from_pretrained() accepts local paths! Just use the full path instead of the model ID.
Option 2: Use Ollama (No HF at All!)
Ollama manages models for you and doesn't require Hugging Face:
1. Install Ollama: https://ollama.ai
2. Pull a model:
ollama pull llama3.1:8b
3. Configure nanobot:
{
"providers": {
"ollama": {
"apiKey": "dummy",
"apiBase": "http://localhost:11434/v1"
}
},
"agents": {
"defaults": {
"model": "llama3.1:8b"
}
}
}
Option 3: Use vLLM (Local Server)
1. Download model once (with or without HF token):
# With HF token
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/llama-3.1-8b
# Or manually download from other sources
2. Start vLLM server:
vllm serve ~/models/llama-3.1-8b --port 8000
3. Configure nanobot:
{
"providers": {
"vllm": {
"apiKey": "dummy",
"apiBase": "http://localhost:8000/v1"
}
},
"agents": {
"defaults": {
"model": "llama-3.1-8b"
}
}
}
Why You Might Still Need HF Token
Even if you want to avoid Hugging Face long-term, you might need it once to:
- Download the model initially
- Accept the license for gated models (Llama)
After that, you can use the local files and never touch Hugging Face again!
Recommendation
For Llama models specifically:
- Get HF token once (5 minutes) - just to download and accept license
- Download model locally - use
huggingface-cliorgit lfs - Use local path - configure nanobot to use the local directory
- Never need HF again - the model runs completely offline
This gives you:
- ✅ No ongoing dependency on Hugging Face
- ✅ Faster startup (no downloads)
- ✅ Works offline
- ✅ Full control
Summary
- Hugging Face is required for: Downloading models initially, accessing gated models
- Hugging Face is NOT required for: Running models after download, using local files, using Ollama/vLLM
- Best approach: Download once with HF token, then use local files forever