ansible/docs/connectivity-test.md
ilia e05b3aa0d5 Update ansible.cfg and auto-fallback script for improved connectivity handling
- Modify ansible.cfg to increase SSH connection retries from 2 to 3 and add a connection timeout setting for better reliability.
- Enhance auto-fallback.sh script to provide detailed feedback during IP connectivity tests, including clearer status messages for primary and fallback IP checks.
- Update documentation to reflect changes in connectivity testing and fallback procedures.

These updates improve the robustness of the connectivity testing process and ensure smoother operations during IP failover scenarios.
2025-09-16 23:00:32 -04:00

176 lines
4.3 KiB
Markdown

# Connectivity Test Documentation
## Overview
The `test_connectivity.py` script provides comprehensive connectivity testing for Ansible hosts with intelligent fallback IP detection and detailed diagnostics.
## Features
- **Comprehensive Testing**: Tests both ping and SSH connectivity
- **Fallback Detection**: Identifies when fallback IPs should be used
- **Smart Diagnostics**: Provides specific error messages and recommendations
- **Multiple Output Formats**: Console, quiet mode, and JSON export
- **Actionable Recommendations**: Suggests specific commands to fix issues
## Usage
### Basic Usage
```bash
# Test all hosts
make test-connectivity
# Or run directly
python3 test_connectivity.py
```
### Advanced Options
```bash
# Quiet mode (summary only)
python3 test_connectivity.py --quiet
# Export results to JSON
python3 test_connectivity.py --json results.json
# Custom hosts file
python3 test_connectivity.py --hosts-file inventories/staging/hosts
# Custom timeout
python3 test_connectivity.py --timeout 5
```
## Output Interpretation
### Status Icons
-**SUCCESS**: Host is fully accessible via primary IP
- 🔑 **SSH KEY**: SSH key authentication issue
- 🔧 **SSH SERVICE**: SSH service not running
- ⚠️ **SSH ERROR**: Other SSH-related errors
- 🔄 **USE FALLBACK**: Should switch to fallback IP
-**BOTH FAILED**: Both primary and fallback IPs failed
- 🚫 **NO FALLBACK**: Primary IP failed, no fallback available
-**UNKNOWN**: Unexpected connectivity state
### Common Issues and Solutions
#### SSH Key Issues
```
🔑 Fix SSH key issues (2 hosts):
make copy-ssh-key HOST=dev01
make copy-ssh-key HOST=debianDesktopVM
```
**Solution**: Run the suggested `make copy-ssh-key` commands
#### Fallback Recommendations
```
🔄 Switch to fallback IPs (1 hosts):
sed -i 's/vaultwardenVM ansible_host=100.100.19.11/vaultwardenVM ansible_host=10.0.10.142/' inventories/production/hosts
```
**Solution**: Run the suggested sed command or use `make auto-fallback`
#### Critical Issues
```
🚨 Critical issues (4 hosts):
bottom: ✗ bottom: Primary IP 10.0.10.156 failed, no fallback available
```
**Solution**: Check network connectivity, host status, or add fallback IPs
## Integration with Ansible Workflow
### Before Running Ansible
```bash
# Test connectivity first
make test-connectivity
# Fix any issues, then run Ansible
make apply
```
### Automated Fallback
```bash
# Automatically switch to working IPs
make auto-fallback
# Then run your Ansible tasks
make apply
```
## Configuration
### Hosts File Format
The script expects hosts with optional fallback IPs:
```
vaultwardenVM ansible_host=100.100.19.11 ansible_host_fallback=10.0.10.142 ansible_user=ladmin
```
### Timeout Settings
- **Ping timeout**: 3 seconds (configurable with `--timeout`)
- **SSH timeout**: 5 seconds (hardcoded for reliability)
## Troubleshooting
### Common Problems
1. **"Permission denied (publickey)"**
- Run: `make copy-ssh-key HOST=hostname`
2. **"Connection refused"**
- Check if SSH service is running on target host
- Verify firewall settings
3. **"Host key verification failed"**
- Add host to known_hosts: `ssh-keyscan hostname >> ~/.ssh/known_hosts`
4. **"No route to host"**
- Check network connectivity
- Verify IP addresses are correct
### Debug Mode
For detailed debugging, run with verbose output:
```bash
python3 test_connectivity.py --timeout 10
```
## JSON Output Format
When using `--json`, the output includes detailed information:
```json
[
{
"hostname": "vaultwardenVM",
"group": "vaultwarden",
"primary_ip": "100.100.19.11",
"fallback_ip": "10.0.10.142",
"user": "ladmin",
"primary_ping": true,
"primary_ssh": true,
"fallback_ping": true,
"fallback_ssh": true,
"status": "success",
"recommendation": "✓ vaultwardenVM is fully accessible via primary IP 100.100.19.11"
}
]
```
## Best Practices
1. **Run before Ansible operations** to catch connectivity issues early
2. **Use quiet mode** in scripts: `python3 test_connectivity.py --quiet`
3. **Export JSON results** for logging and monitoring
4. **Fix SSH key issues** before running Ansible
5. **Use auto-fallback** for automated IP switching
## Integration with CI/CD
```bash
# In your CI pipeline
make test-connectivity
if [ $? -ne 0 ]; then
echo "Connectivity issues detected"
exit 1
fi
make apply
```