- Modify ansible.cfg to increase SSH connection retries from 2 to 3 and add a connection timeout setting for better reliability. - Enhance auto-fallback.sh script to provide detailed feedback during IP connectivity tests, including clearer status messages for primary and fallback IP checks. - Update documentation to reflect changes in connectivity testing and fallback procedures. These updates improve the robustness of the connectivity testing process and ensure smoother operations during IP failover scenarios.
176 lines
4.3 KiB
Markdown
176 lines
4.3 KiB
Markdown
# Connectivity Test Documentation
|
|
|
|
## Overview
|
|
|
|
The `test_connectivity.py` script provides comprehensive connectivity testing for Ansible hosts with intelligent fallback IP detection and detailed diagnostics.
|
|
|
|
## Features
|
|
|
|
- **Comprehensive Testing**: Tests both ping and SSH connectivity
|
|
- **Fallback Detection**: Identifies when fallback IPs should be used
|
|
- **Smart Diagnostics**: Provides specific error messages and recommendations
|
|
- **Multiple Output Formats**: Console, quiet mode, and JSON export
|
|
- **Actionable Recommendations**: Suggests specific commands to fix issues
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Test all hosts
|
|
make test-connectivity
|
|
|
|
# Or run directly
|
|
python3 test_connectivity.py
|
|
```
|
|
|
|
### Advanced Options
|
|
|
|
```bash
|
|
# Quiet mode (summary only)
|
|
python3 test_connectivity.py --quiet
|
|
|
|
# Export results to JSON
|
|
python3 test_connectivity.py --json results.json
|
|
|
|
# Custom hosts file
|
|
python3 test_connectivity.py --hosts-file inventories/staging/hosts
|
|
|
|
# Custom timeout
|
|
python3 test_connectivity.py --timeout 5
|
|
```
|
|
|
|
## Output Interpretation
|
|
|
|
### Status Icons
|
|
|
|
- ✅ **SUCCESS**: Host is fully accessible via primary IP
|
|
- 🔑 **SSH KEY**: SSH key authentication issue
|
|
- 🔧 **SSH SERVICE**: SSH service not running
|
|
- ⚠️ **SSH ERROR**: Other SSH-related errors
|
|
- 🔄 **USE FALLBACK**: Should switch to fallback IP
|
|
- ❌ **BOTH FAILED**: Both primary and fallback IPs failed
|
|
- 🚫 **NO FALLBACK**: Primary IP failed, no fallback available
|
|
- ❓ **UNKNOWN**: Unexpected connectivity state
|
|
|
|
### Common Issues and Solutions
|
|
|
|
#### SSH Key Issues
|
|
```
|
|
🔑 Fix SSH key issues (2 hosts):
|
|
make copy-ssh-key HOST=dev01
|
|
make copy-ssh-key HOST=debianDesktopVM
|
|
```
|
|
**Solution**: Run the suggested `make copy-ssh-key` commands
|
|
|
|
#### Fallback Recommendations
|
|
```
|
|
🔄 Switch to fallback IPs (1 hosts):
|
|
sed -i 's/vaultwardenVM ansible_host=100.100.19.11/vaultwardenVM ansible_host=10.0.10.142/' inventories/production/hosts
|
|
```
|
|
**Solution**: Run the suggested sed command or use `make auto-fallback`
|
|
|
|
#### Critical Issues
|
|
```
|
|
🚨 Critical issues (4 hosts):
|
|
bottom: ✗ bottom: Primary IP 10.0.10.156 failed, no fallback available
|
|
```
|
|
**Solution**: Check network connectivity, host status, or add fallback IPs
|
|
|
|
## Integration with Ansible Workflow
|
|
|
|
### Before Running Ansible
|
|
```bash
|
|
# Test connectivity first
|
|
make test-connectivity
|
|
|
|
# Fix any issues, then run Ansible
|
|
make apply
|
|
```
|
|
|
|
### Automated Fallback
|
|
```bash
|
|
# Automatically switch to working IPs
|
|
make auto-fallback
|
|
|
|
# Then run your Ansible tasks
|
|
make apply
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Hosts File Format
|
|
The script expects hosts with optional fallback IPs:
|
|
```
|
|
vaultwardenVM ansible_host=100.100.19.11 ansible_host_fallback=10.0.10.142 ansible_user=ladmin
|
|
```
|
|
|
|
### Timeout Settings
|
|
- **Ping timeout**: 3 seconds (configurable with `--timeout`)
|
|
- **SSH timeout**: 5 seconds (hardcoded for reliability)
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Problems
|
|
|
|
1. **"Permission denied (publickey)"**
|
|
- Run: `make copy-ssh-key HOST=hostname`
|
|
|
|
2. **"Connection refused"**
|
|
- Check if SSH service is running on target host
|
|
- Verify firewall settings
|
|
|
|
3. **"Host key verification failed"**
|
|
- Add host to known_hosts: `ssh-keyscan hostname >> ~/.ssh/known_hosts`
|
|
|
|
4. **"No route to host"**
|
|
- Check network connectivity
|
|
- Verify IP addresses are correct
|
|
|
|
### Debug Mode
|
|
For detailed debugging, run with verbose output:
|
|
```bash
|
|
python3 test_connectivity.py --timeout 10
|
|
```
|
|
|
|
## JSON Output Format
|
|
|
|
When using `--json`, the output includes detailed information:
|
|
```json
|
|
[
|
|
{
|
|
"hostname": "vaultwardenVM",
|
|
"group": "vaultwarden",
|
|
"primary_ip": "100.100.19.11",
|
|
"fallback_ip": "10.0.10.142",
|
|
"user": "ladmin",
|
|
"primary_ping": true,
|
|
"primary_ssh": true,
|
|
"fallback_ping": true,
|
|
"fallback_ssh": true,
|
|
"status": "success",
|
|
"recommendation": "✓ vaultwardenVM is fully accessible via primary IP 100.100.19.11"
|
|
}
|
|
]
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Run before Ansible operations** to catch connectivity issues early
|
|
2. **Use quiet mode** in scripts: `python3 test_connectivity.py --quiet`
|
|
3. **Export JSON results** for logging and monitoring
|
|
4. **Fix SSH key issues** before running Ansible
|
|
5. **Use auto-fallback** for automated IP switching
|
|
|
|
## Integration with CI/CD
|
|
|
|
```bash
|
|
# In your CI pipeline
|
|
make test-connectivity
|
|
if [ $? -ne 0 ]; then
|
|
echo "Connectivity issues detected"
|
|
exit 1
|
|
fi
|
|
make apply
|
|
```
|