ansible/docs/connectivity-test.md
ilia e05b3aa0d5 Update ansible.cfg and auto-fallback script for improved connectivity handling
- Modify ansible.cfg to increase SSH connection retries from 2 to 3 and add a connection timeout setting for better reliability.
- Enhance auto-fallback.sh script to provide detailed feedback during IP connectivity tests, including clearer status messages for primary and fallback IP checks.
- Update documentation to reflect changes in connectivity testing and fallback procedures.

These updates improve the robustness of the connectivity testing process and ensure smoother operations during IP failover scenarios.
2025-09-16 23:00:32 -04:00

4.3 KiB

Connectivity Test Documentation

Overview

The test_connectivity.py script provides comprehensive connectivity testing for Ansible hosts with intelligent fallback IP detection and detailed diagnostics.

Features

  • Comprehensive Testing: Tests both ping and SSH connectivity
  • Fallback Detection: Identifies when fallback IPs should be used
  • Smart Diagnostics: Provides specific error messages and recommendations
  • Multiple Output Formats: Console, quiet mode, and JSON export
  • Actionable Recommendations: Suggests specific commands to fix issues

Usage

Basic Usage

# Test all hosts
make test-connectivity

# Or run directly
python3 test_connectivity.py

Advanced Options

# Quiet mode (summary only)
python3 test_connectivity.py --quiet

# Export results to JSON
python3 test_connectivity.py --json results.json

# Custom hosts file
python3 test_connectivity.py --hosts-file inventories/staging/hosts

# Custom timeout
python3 test_connectivity.py --timeout 5

Output Interpretation

Status Icons

  • SUCCESS: Host is fully accessible via primary IP
  • 🔑 SSH KEY: SSH key authentication issue
  • 🔧 SSH SERVICE: SSH service not running
  • ⚠️ SSH ERROR: Other SSH-related errors
  • 🔄 USE FALLBACK: Should switch to fallback IP
  • BOTH FAILED: Both primary and fallback IPs failed
  • 🚫 NO FALLBACK: Primary IP failed, no fallback available
  • UNKNOWN: Unexpected connectivity state

Common Issues and Solutions

SSH Key Issues

🔑 Fix SSH key issues (2 hosts):
  make copy-ssh-key HOST=dev01
  make copy-ssh-key HOST=debianDesktopVM

Solution: Run the suggested make copy-ssh-key commands

Fallback Recommendations

🔄 Switch to fallback IPs (1 hosts):
  sed -i 's/vaultwardenVM ansible_host=100.100.19.11/vaultwardenVM ansible_host=10.0.10.142/' inventories/production/hosts

Solution: Run the suggested sed command or use make auto-fallback

Critical Issues

🚨 Critical issues (4 hosts):
  bottom: ✗ bottom: Primary IP 10.0.10.156 failed, no fallback available

Solution: Check network connectivity, host status, or add fallback IPs

Integration with Ansible Workflow

Before Running Ansible

# Test connectivity first
make test-connectivity

# Fix any issues, then run Ansible
make apply

Automated Fallback

# Automatically switch to working IPs
make auto-fallback

# Then run your Ansible tasks
make apply

Configuration

Hosts File Format

The script expects hosts with optional fallback IPs:

vaultwardenVM ansible_host=100.100.19.11 ansible_host_fallback=10.0.10.142 ansible_user=ladmin

Timeout Settings

  • Ping timeout: 3 seconds (configurable with --timeout)
  • SSH timeout: 5 seconds (hardcoded for reliability)

Troubleshooting

Common Problems

  1. "Permission denied (publickey)"

    • Run: make copy-ssh-key HOST=hostname
  2. "Connection refused"

    • Check if SSH service is running on target host
    • Verify firewall settings
  3. "Host key verification failed"

    • Add host to known_hosts: ssh-keyscan hostname >> ~/.ssh/known_hosts
  4. "No route to host"

    • Check network connectivity
    • Verify IP addresses are correct

Debug Mode

For detailed debugging, run with verbose output:

python3 test_connectivity.py --timeout 10

JSON Output Format

When using --json, the output includes detailed information:

[
  {
    "hostname": "vaultwardenVM",
    "group": "vaultwarden",
    "primary_ip": "100.100.19.11",
    "fallback_ip": "10.0.10.142",
    "user": "ladmin",
    "primary_ping": true,
    "primary_ssh": true,
    "fallback_ping": true,
    "fallback_ssh": true,
    "status": "success",
    "recommendation": "✓ vaultwardenVM is fully accessible via primary IP 100.100.19.11"
  }
]

Best Practices

  1. Run before Ansible operations to catch connectivity issues early
  2. Use quiet mode in scripts: python3 test_connectivity.py --quiet
  3. Export JSON results for logging and monitoring
  4. Fix SSH key issues before running Ansible
  5. Use auto-fallback for automated IP switching

Integration with CI/CD

# In your CI pipeline
make test-connectivity
if [ $? -ne 0 ]; then
    echo "Connectivity issues detected"
    exit 1
fi
make apply