Compare commits

..

16 Commits

Author SHA1 Message Date
a176dd2365 Makefile: avoid vault errors when detecting current host 2026-01-01 12:19:07 -05:00
98e0fc0bed Fix lint regressions after rebase 2026-01-01 12:10:45 -05:00
baf3e3de09 Refactor playbooks: servers/workstations, split monitoring, improve shell 2026-01-01 11:35:24 -05:00
69a39e5e5b Add POTE app project support and improve IP conflict detection (#3)
## Summary

This PR adds comprehensive support for deploying the **POTE** application project via Ansible, along with improvements to IP conflict detection and a new app stack provisioning system for Proxmox-managed LXC containers.

## Key Features

### 🆕 New Roles
- **`roles/pote`**: Python/venv deployment role for POTE (PostgreSQL, cron jobs, Alembic migrations)
- **`roles/app_setup`**: Generic app deployment role (Node.js/systemd)
- **`roles/base_os`**: Base OS hardening role

### 🛡️ Safety Improvements
- IP uniqueness validation within projects
- Proxmox-side IP conflict detection
- Enhanced error messages for IP conflicts

### 📦 New Playbooks
- `playbooks/app/site.yml`: End-to-end app stack deployment
- `playbooks/app/provision_vms.yml`: Proxmox guest provisioning
- `playbooks/app/configure_app.yml`: OS + application configuration

## Security
-  All secrets stored in encrypted vault.yml
-  Deploy keys excluded via .gitignore
-  No plaintext secrets committed

## Testing
-  POTE successfully deployed to dev/qa/prod environments
-  All components validated (Git, PostgreSQL, cron, migrations)

Co-authored-by: ilia <ilia@levkin.ca>
Reviewed-on: #3
2026-01-01 11:19:54 -05:00
e897b1a027 Fix: Resolve linting errors and improve firewall configuration (#2)
Some checks failed
CI / lint-and-test (push) Successful in 1m16s
CI / ansible-validation (push) Successful in 5m49s
CI / secret-scanning (push) Successful in 1m33s
CI / dependency-scan (push) Successful in 2m48s
CI / sast-scan (push) Successful in 5m46s
CI / license-check (push) Successful in 1m11s
CI / vault-check (push) Failing after 5m25s
CI / playbook-test (push) Successful in 5m32s
CI / container-scan (push) Successful in 4m32s
CI / sonar-analysis (push) Successful in 6m53s
CI / workflow-summary (push) Successful in 1m6s
- Fix UFW firewall to allow outbound traffic (was blocking all outbound)
- Add HOST parameter support to shell Makefile target
- Fix all ansible-lint errors (trailing spaces, missing newlines, document starts)
- Add changed_when: false to check commands
- Fix variable naming (vault_devGPU -> vault_devgpu)
- Update .ansible-lint config to exclude .gitea/ and allow strategy: free
- Fix NodeSource repository GPG key handling in shell playbook
- Add missing document starts to host_vars files
- Clean up empty lines in datascience role files

Reviewed-on: #2
2025-12-25 16:47:26 -05:00
95a301ae3f Merge pull request 'Fix: Update CI workflow to use Alpine-based images, install Node.js and Trivy with improved methods, and enhance dependency scanning steps' (#1) from update-ci into master
All checks were successful
CI / lint-and-test (push) Successful in 59s
CI / ansible-validation (push) Successful in 2m14s
CI / secret-scanning (push) Successful in 57s
CI / dependency-scan (push) Successful in 1m4s
CI / sast-scan (push) Successful in 1m57s
CI / license-check (push) Successful in 57s
CI / vault-check (push) Successful in 1m53s
CI / playbook-test (push) Successful in 1m57s
CI / container-scan (push) Successful in 1m26s
CI / sonar-analysis (push) Successful in 2m1s
CI / workflow-summary (push) Successful in 55s
Reviewed-on: #1
2025-12-17 22:45:00 -05:00
ilia
c017ec6941 Fix: Update CI workflow to install a fixed version of Trivy for improved reliability and error handling during installation
All checks were successful
CI / lint-and-test (pull_request) Successful in 1m2s
CI / ansible-validation (pull_request) Successful in 3m6s
CI / secret-scanning (pull_request) Successful in 56s
CI / dependency-scan (pull_request) Successful in 1m0s
CI / sast-scan (pull_request) Successful in 2m13s
CI / license-check (pull_request) Successful in 57s
CI / vault-check (pull_request) Successful in 2m8s
CI / playbook-test (pull_request) Successful in 2m2s
CI / container-scan (pull_request) Successful in 1m26s
CI / sonar-analysis (pull_request) Successful in 2m3s
CI / workflow-summary (pull_request) Successful in 52s
2025-12-15 15:50:04 -05:00
ilia
9e7ef8159b Fix: Update CI workflow to disable SCM in SonarScanner configuration for improved analysis accuracy
Some checks failed
CI / lint-and-test (pull_request) Successful in 57s
CI / ansible-validation (pull_request) Successful in 2m20s
CI / secret-scanning (pull_request) Successful in 54s
CI / dependency-scan (pull_request) Successful in 59s
CI / sast-scan (pull_request) Successful in 2m26s
CI / license-check (pull_request) Successful in 57s
CI / vault-check (pull_request) Successful in 2m34s
CI / playbook-test (pull_request) Successful in 2m37s
CI / container-scan (pull_request) Failing after 1m42s
CI / sonar-analysis (pull_request) Successful in 2m18s
CI / workflow-summary (pull_request) Successful in 52s
2025-12-15 15:36:15 -05:00
ilia
3828e04b13 Fix: Update CI workflow to install Git alongside Node.js and enhance SonarScanner installation process with improved error handling
All checks were successful
CI / lint-and-test (pull_request) Successful in 59s
CI / ansible-validation (pull_request) Successful in 3m32s
CI / secret-scanning (pull_request) Successful in 56s
CI / dependency-scan (pull_request) Successful in 1m3s
CI / sast-scan (pull_request) Successful in 2m54s
CI / license-check (pull_request) Successful in 59s
CI / vault-check (pull_request) Successful in 2m43s
CI / playbook-test (pull_request) Successful in 3m7s
CI / container-scan (pull_request) Successful in 1m54s
CI / sonar-analysis (pull_request) Successful in 2m5s
CI / workflow-summary (pull_request) Successful in 52s
2025-12-15 15:11:36 -05:00
ilia
d6655babd9 Refactor: Simplify connectivity analysis logic by breaking down into smaller helper functions for improved readability and maintainability
All checks were successful
CI / lint-and-test (pull_request) Successful in 1m0s
CI / ansible-validation (pull_request) Successful in 2m12s
CI / secret-scanning (pull_request) Successful in 54s
CI / dependency-scan (pull_request) Successful in 58s
CI / sast-scan (pull_request) Successful in 2m58s
CI / license-check (pull_request) Successful in 59s
CI / vault-check (pull_request) Successful in 2m50s
CI / playbook-test (pull_request) Successful in 2m42s
CI / container-scan (pull_request) Successful in 1m44s
CI / sonar-analysis (pull_request) Successful in 2m12s
CI / workflow-summary (pull_request) Successful in 51s
2025-12-15 14:55:10 -05:00
ilia
dc94395bbc Fix: Enhance SonarScanner error handling in CI workflow with detailed failure messages and troubleshooting guidance
All checks were successful
CI / lint-and-test (pull_request) Successful in 57s
CI / ansible-validation (pull_request) Successful in 2m20s
CI / secret-scanning (pull_request) Successful in 53s
CI / dependency-scan (pull_request) Successful in 58s
CI / sast-scan (pull_request) Successful in 2m14s
CI / license-check (pull_request) Successful in 55s
CI / vault-check (pull_request) Successful in 2m9s
CI / playbook-test (pull_request) Successful in 2m4s
CI / container-scan (pull_request) Successful in 1m27s
CI / sonar-analysis (pull_request) Successful in 2m5s
CI / workflow-summary (pull_request) Successful in 51s
2025-12-14 21:35:52 -05:00
ilia
699aaefac3 Fix: Update CI workflow to improve SonarScanner installation process with enhanced error handling and version management
All checks were successful
CI / lint-and-test (pull_request) Successful in 57s
CI / ansible-validation (pull_request) Successful in 2m16s
CI / secret-scanning (pull_request) Successful in 53s
CI / dependency-scan (pull_request) Successful in 57s
CI / sast-scan (pull_request) Successful in 2m5s
CI / license-check (pull_request) Successful in 54s
CI / vault-check (pull_request) Successful in 1m53s
CI / playbook-test (pull_request) Successful in 2m20s
CI / container-scan (pull_request) Successful in 1m35s
CI / sonar-analysis (pull_request) Successful in 2m16s
CI / workflow-summary (pull_request) Successful in 51s
2025-12-14 21:21:26 -05:00
ilia
277a22d962 Fix: Clean up duplicate repository entries in application and development roles 2025-12-14 21:21:19 -05:00
ilia
83a5d988af Fix: Update ansible-lint configuration to exclude specific paths and skip certain rules for improved linting flexibility
Some checks failed
CI / lint-and-test (pull_request) Successful in 58s
CI / ansible-validation (pull_request) Successful in 2m17s
CI / secret-scanning (pull_request) Successful in 53s
CI / dependency-scan (pull_request) Successful in 57s
CI / sast-scan (pull_request) Successful in 2m17s
CI / license-check (pull_request) Successful in 55s
CI / vault-check (pull_request) Successful in 2m20s
CI / playbook-test (pull_request) Successful in 2m16s
CI / container-scan (pull_request) Successful in 1m25s
CI / sonar-analysis (pull_request) Failing after 1m56s
CI / workflow-summary (pull_request) Successful in 50s
2025-12-14 21:04:45 -05:00
ilia
a45ee496e4 Fix: Update CI workflow to use Ubuntu 22.04 container, install Node.js and SonarScanner with improved methods, and enhance SonarQube connectivity verification
Some checks failed
CI / lint-and-test (pull_request) Successful in 57s
CI / ansible-validation (pull_request) Successful in 2m6s
CI / secret-scanning (pull_request) Successful in 53s
CI / dependency-scan (pull_request) Successful in 57s
CI / sast-scan (pull_request) Successful in 1m55s
CI / license-check (pull_request) Successful in 54s
CI / vault-check (pull_request) Successful in 1m58s
CI / playbook-test (pull_request) Successful in 1m58s
CI / container-scan (pull_request) Successful in 1m31s
CI / sonar-analysis (pull_request) Failing after 2m36s
CI / workflow-summary (pull_request) Successful in 50s
2025-12-14 20:51:36 -05:00
ilia
e54ecfefc1 Fix: Update CI workflow to enhance playbook syntax checking and improve SonarQube connectivity verification
Some checks failed
CI / lint-and-test (pull_request) Successful in 58s
CI / ansible-validation (pull_request) Successful in 2m15s
CI / secret-scanning (pull_request) Successful in 54s
CI / dependency-scan (pull_request) Successful in 58s
CI / sast-scan (pull_request) Successful in 2m11s
CI / license-check (pull_request) Successful in 54s
CI / vault-check (pull_request) Successful in 1m54s
CI / playbook-test (pull_request) Successful in 1m52s
CI / container-scan (pull_request) Successful in 1m27s
CI / sonar-analysis (pull_request) Failing after 50s
CI / workflow-summary (pull_request) Successful in 50s
2025-12-14 20:43:28 -05:00
35 changed files with 1214 additions and 88 deletions

View File

@ -11,15 +11,19 @@ exclude_paths:
# Exclude patterns
- .cache/
- .github/
- .gitea/
- .ansible/
# Skip specific rules
skip_list:
- yaml[line-length] # Allow longer lines in some cases
- yaml[document-start] # Allow missing document start in vault files
- yaml[truthy] # Allow different truthy values in workflow files
- name[casing] # Allow mixed case in task names
- args[module] # Skip args rule that causes "file name too long" issues
- var-naming[no-role-prefix] # Allow shorter variable names for readability
- risky-shell-pipe # Allow shell pipes in maintenance scripts
- run-once[play] # Allow strategy: free for parallel execution
# Warn instead of error for these
warn_list:

View File

@ -5,6 +5,7 @@ name: CI
push:
branches: [master]
pull_request:
types: [opened, synchronize, reopened]
jobs:
# Check if CI should be skipped based on branch name or commit message
@ -63,6 +64,8 @@ jobs:
needs: skip-ci-check
if: needs.skip-ci-check.outputs.should-skip != '1'
runs-on: ubuntu-latest
# Skip push events for non-master branches (they'll be covered by PR events)
if: github.event_name == 'pull_request' || github.ref == 'refs/heads/master'
container:
image: node:20-bullseye
steps:
@ -83,12 +86,14 @@ jobs:
needs: skip-ci-check
if: needs.skip-ci-check.outputs.should-skip != '1'
runs-on: ubuntu-latest
# Skip push events for non-master branches (they'll be covered by PR events)
if: github.event_name == 'pull_request' || github.ref == 'refs/heads/master'
container:
image: ubuntu:22.04
steps:
- name: Install Node.js for checkout action
run: |
apt-get update && apt-get install -y curl
apt-get update && apt-get install -y curl git
curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
apt-get install -y nodejs
@ -286,8 +291,11 @@ jobs:
fi
done
if [ $failed -eq 1 ]; then
echo "Some playbooks have errors (this is expected without inventory/vault)"
exit 0
echo "❌ Some playbooks have syntax errors!"
echo "Note: This may be expected if playbooks require inventory/vault, but syntax errors should still be fixed."
exit 1
else
echo "✅ All playbooks passed syntax check"
fi
continue-on-error: true
@ -309,22 +317,43 @@ jobs:
- name: Install Trivy
run: |
set -e
apt-get update && apt-get install -y wget curl tar
# Try multiple download methods for reliability
echo "Downloading Trivy..."
if wget -q "https://github.com/aquasecurity/trivy/releases/latest/download/trivy_linux_amd64.tar.gz" -O /tmp/trivy.tar.gz 2>&1; then
echo "Downloaded tar.gz, extracting..."
tar -xzf /tmp/trivy.tar.gz -C /tmp/ trivy
mv /tmp/trivy /usr/local/bin/trivy
elif wget -q "https://github.com/aquasecurity/trivy/releases/latest/download/trivy_linux_amd64" -O /usr/local/bin/trivy 2>&1; then
echo "Downloaded binary directly"
else
echo "Failed to download Trivy, trying with version detection..."
TRIVY_VERSION=$(curl -s https://api.github.com/repos/aquasecurity/trivy/releases/latest | grep tag_name | cut -d '"' -f 4 | sed 's/v//')
wget -q "https://github.com/aquasecurity/trivy/releases/download/v${TRIVY_VERSION}/trivy_${TRIVY_VERSION}_Linux-64bit.tar.gz" -O /tmp/trivy.tar.gz
tar -xzf /tmp/trivy.tar.gz -C /tmp/ trivy
mv /tmp/trivy /usr/local/bin/trivy
# Use a fixed, known-good Trivy version to avoid URL/redirect issues
TRIVY_VERSION="0.58.2"
TRIVY_URL="https://github.com/aquasecurity/trivy/releases/download/v${TRIVY_VERSION}/trivy_${TRIVY_VERSION}_Linux-64bit.tar.gz"
echo "Installing Trivy version: ${TRIVY_VERSION}"
echo "Downloading from: ${TRIVY_URL}"
if ! wget --progress=bar:force "${TRIVY_URL}" -O /tmp/trivy.tar.gz 2>&1; then
echo "❌ Failed to download Trivy archive"
echo "Checking if file was partially downloaded:"
ls -lh /tmp/trivy.tar.gz 2>/dev/null || echo "No file found"
exit 1
fi
if [ ! -f /tmp/trivy.tar.gz ] || [ ! -s /tmp/trivy.tar.gz ]; then
echo "❌ Downloaded Trivy archive is missing or empty"
exit 1
fi
echo "Download complete. File size: $(du -h /tmp/trivy.tar.gz | cut -f1)"
echo "Extracting Trivy..."
if ! tar -xzf /tmp/trivy.tar.gz -C /tmp/ trivy; then
echo "❌ Failed to extract Trivy binary from archive"
tar -tzf /tmp/trivy.tar.gz 2>&1 | head -20 || true
exit 1
fi
if [ ! -f /tmp/trivy ]; then
echo "❌ Trivy binary not found after extraction"
ls -la /tmp/ | grep trivy || ls -la /tmp/ | head -20
exit 1
fi
mv /tmp/trivy /usr/local/bin/trivy
chmod +x /usr/local/bin/trivy
/usr/local/bin/trivy --version
trivy --version
@ -347,7 +376,7 @@ jobs:
if: needs.skip-ci-check.outputs.should-skip != '1'
runs-on: ubuntu-latest
container:
image: sonarsource/sonar-scanner-cli:latest
image: ubuntu:22.04
env:
SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
@ -355,13 +384,127 @@ jobs:
- name: Check out code
uses: actions/checkout@v4
- name: Install Java and SonarScanner
run: |
set -e
apt-get update && apt-get install -y wget curl unzip openjdk-21-jre
# Use a known working version to avoid download issues
SONAR_SCANNER_VERSION="5.0.1.3006"
SCANNER_URL="https://binaries.sonarsource.com/Distribution/sonar-scanner-cli/sonar-scanner-cli-${SONAR_SCANNER_VERSION}-linux.zip"
echo "Installing SonarScanner version: ${SONAR_SCANNER_VERSION}"
echo "Downloading from: ${SCANNER_URL}"
# Download with verbose error output
if ! wget --progress=bar:force "${SCANNER_URL}" -O /tmp/sonar-scanner.zip 2>&1; then
echo "❌ Failed to download SonarScanner"
echo "Checking if file was partially downloaded:"
ls -lh /tmp/sonar-scanner.zip 2>/dev/null || echo "No file found"
exit 1
fi
# Verify download
if [ ! -f /tmp/sonar-scanner.zip ] || [ ! -s /tmp/sonar-scanner.zip ]; then
echo "❌ Downloaded file is missing or empty"
exit 1
fi
echo "Download complete. File size: $(du -h /tmp/sonar-scanner.zip | cut -f1)"
echo "Extracting SonarScanner..."
if ! unzip -q /tmp/sonar-scanner.zip -d /tmp; then
echo "❌ Failed to extract SonarScanner"
echo "Archive info:"
file /tmp/sonar-scanner.zip || true
unzip -l /tmp/sonar-scanner.zip 2>&1 | head -20 || true
exit 1
fi
# Find the extracted directory (handle both naming conventions)
EXTRACTED_DIR=""
if [ -d "/tmp/sonar-scanner-${SONAR_SCANNER_VERSION}-linux" ]; then
EXTRACTED_DIR="/tmp/sonar-scanner-${SONAR_SCANNER_VERSION}-linux"
elif [ -d "/tmp/sonar-scanner-cli-${SONAR_SCANNER_VERSION}-linux" ]; then
EXTRACTED_DIR="/tmp/sonar-scanner-cli-${SONAR_SCANNER_VERSION}-linux"
else
# Try to find any sonar-scanner directory
EXTRACTED_DIR=$(find /tmp -maxdepth 1 -type d -name "*sonar-scanner*" | head -1)
fi
if [ -z "$EXTRACTED_DIR" ] || [ ! -d "$EXTRACTED_DIR" ]; then
echo "❌ SonarScanner directory not found after extraction"
echo "Contents of /tmp:"
ls -la /tmp/ | grep -E "(sonar|zip)" || ls -la /tmp/ | head -20
exit 1
fi
echo "Found extracted directory: ${EXTRACTED_DIR}"
mv "${EXTRACTED_DIR}" /opt/sonar-scanner
# Create symlink
if [ -f /opt/sonar-scanner/bin/sonar-scanner ]; then
ln -sf /opt/sonar-scanner/bin/sonar-scanner /usr/local/bin/sonar-scanner
chmod +x /opt/sonar-scanner/bin/sonar-scanner
chmod +x /usr/local/bin/sonar-scanner
else
echo "❌ sonar-scanner binary not found in /opt/sonar-scanner/bin/"
echo "Contents of /opt/sonar-scanner/bin/:"
ls -la /opt/sonar-scanner/bin/ || true
exit 1
fi
echo "Verifying installation..."
if ! sonar-scanner --version; then
echo "❌ SonarScanner verification failed"
echo "PATH: $PATH"
which sonar-scanner || echo "sonar-scanner not in PATH"
exit 1
fi
echo "✓ SonarScanner installed successfully"
- name: Verify SonarQube connection
run: |
echo "Checking SonarQube connectivity..."
if [ -z "$SONAR_HOST_URL" ] || [ -z "$SONAR_TOKEN" ]; then
echo "❌ ERROR: SONAR_HOST_URL or SONAR_TOKEN secrets are not set!"
echo "Please configure them in: Repository Settings → Actions → Secrets"
exit 1
fi
echo "✓ Secrets are configured"
echo "SonarQube URL: ${SONAR_HOST_URL}"
echo "Testing connectivity to SonarQube server..."
if curl -f -s -o /dev/null -w "%{http_code}" "${SONAR_HOST_URL}/api/system/status" | grep -q "200"; then
echo "✓ SonarQube server is reachable"
else
echo "⚠️ Warning: Could not verify SonarQube server connectivity"
fi
- name: Run SonarScanner
run: |
sonar-scanner \
-Dsonar.projectKey=ansible-infra \
echo "Starting SonarQube analysis..."
if ! sonar-scanner \
-Dsonar.projectKey=ansible \
-Dsonar.sources=. \
-Dsonar.host.url=${SONAR_HOST_URL} \
-Dsonar.login=${SONAR_TOKEN}
-Dsonar.token=${SONAR_TOKEN} \
-Dsonar.scm.disabled=true \
-Dsonar.python.version=3.10 \
-X; then
echo ""
echo "❌ SonarScanner analysis failed!"
echo ""
echo "Common issues:"
echo " 1. Project 'ansible' doesn't exist in SonarQube"
echo " → Create it manually in SonarQube UI"
echo " 2. Token doesn't have permission to analyze/create project"
echo " → Ensure token has 'Execute Analysis' permission"
echo " 3. Token doesn't have 'Create Projects' permission (if project doesn't exist)"
echo " → Grant this permission in SonarQube user settings"
echo ""
echo "Check SonarQube logs for more details."
exit 1
fi
continue-on-error: true
workflow-summary:

View File

@ -35,7 +35,8 @@ ANSIBLE_ARGS := --vault-password-file ~/.ansible-vault-pass
## Auto-detect current host to exclude from remote operations
CURRENT_IP := $(shell hostname -I | awk '{print $$1}')
CURRENT_HOST := $(shell ansible-inventory --list | jq -r '._meta.hostvars | to_entries[] | select(.value.ansible_host == "$(CURRENT_IP)") | .key' 2>/dev/null | head -1)
# NOTE: inventory parsing may require vault secrets. Keep this best-effort and silent in CI.
CURRENT_HOST := $(shell ansible-inventory --list --vault-password-file ~/.ansible-vault-pass 2>/dev/null | jq -r '._meta.hostvars | to_entries[] | select(.value.ansible_host == "$(CURRENT_IP)") | .key' 2>/dev/null | head -1)
EXCLUDE_CURRENT := $(if $(CURRENT_HOST),--limit '!$(CURRENT_HOST)',)
help: ## Show this help message
@ -276,14 +277,22 @@ workstations: ## Run workstation baseline (usage: make workstations [GROUP=dev]
fi
# Host-specific targets
dev: ## Run on specific host (usage: make dev HOST=dev01)
dev: ## Run on specific host (usage: make dev HOST=dev01 [SUDO=true] [SSH_PASS=true])
ifndef HOST
@echo "$(RED)Error: HOST parameter required$(RESET)"
@echo "Usage: make dev HOST=dev01"
@echo "Usage: make dev HOST=dev01 [SUDO=true] [SSH_PASS=true]"
@exit 1
endif
@echo "$(YELLOW)Running on host: $(HOST)$(RESET)"
$(ANSIBLE_PLAYBOOK) $(PLAYBOOK_DEV) --limit $(HOST)
@SSH_FLAGS=""; \
SUDO_FLAGS=""; \
if [ "$(SSH_PASS)" = "true" ]; then \
SSH_FLAGS="-k"; \
fi; \
if [ "$(SUDO)" = "true" ]; then \
SUDO_FLAGS="-K"; \
fi; \
$(ANSIBLE_PLAYBOOK) $(PLAYBOOK_DEV) --limit $(HOST) $(ANSIBLE_ARGS) $$SSH_FLAGS $$SUDO_FLAGS
# Data science role
datascience: ## Install data science stack (usage: make datascience HOST=server01)
@ -395,12 +404,21 @@ docker: ## Install/configure Docker only
@echo "$(YELLOW)Running Docker setup...$(RESET)"
$(ANSIBLE_PLAYBOOK) $(PLAYBOOK_DEV) --tags docker
shell: ## Configure shell only
@echo "$(YELLOW)Running shell configuration...$(RESET)"
shell: ## Configure shell (usage: make shell [HOST=dev02] [SUDO=true])
ifdef HOST
@echo "$(YELLOW)Running shell configuration on host: $(HOST)$(RESET)"
@if [ "$(SUDO)" = "true" ]; then \
$(ANSIBLE_PLAYBOOK) playbooks/shell.yml --limit $(HOST) $(ANSIBLE_ARGS) -K; \
else \
$(ANSIBLE_PLAYBOOK) playbooks/shell.yml --limit $(HOST) $(ANSIBLE_ARGS); \
fi
else
@echo "$(YELLOW)Running shell configuration on all dev hosts...$(RESET)"
$(ANSIBLE_PLAYBOOK) $(PLAYBOOK_DEV) --tags shell
endif
shell-all: ## Configure shell on all shell_hosts (usage: make shell-all)
@echo "$(YELLOW)Running shell configuration on all shell hosts...$(RESET)"
shell-all: ## Configure shell on all hosts (usage: make shell-all)
@echo "$(YELLOW)Running shell configuration on all hosts...$(RESET)"
$(ANSIBLE_PLAYBOOK) playbooks/shell.yml $(ANSIBLE_ARGS)
apps: ## Install applications only
@ -604,6 +622,45 @@ endif
@echo "$(YELLOW)Provisioning + configuring app project: $(PROJECT)$(RESET)"
$(ANSIBLE_PLAYBOOK) playbooks/app/site.yml -e app_project=$(PROJECT)
# Timeshift snapshot and rollback
timeshift-snapshot: ## Create Timeshift snapshot (usage: make timeshift-snapshot HOST=dev02)
ifndef HOST
@echo "$(RED)Error: HOST parameter required$(RESET)"
@echo "Usage: make timeshift-snapshot HOST=dev02"
@exit 1
endif
@echo "$(YELLOW)Creating Timeshift snapshot on $(HOST)...$(RESET)"
$(ANSIBLE_PLAYBOOK) $(PLAYBOOK_DEV) --limit $(HOST) --tags timeshift,snapshot
@echo "$(GREEN)✓ Snapshot created$(RESET)"
timeshift-list: ## List Timeshift snapshots (usage: make timeshift-list HOST=dev02)
ifndef HOST
@echo "$(RED)Error: HOST parameter required$(RESET)"
@echo "Usage: make timeshift-list HOST=dev02"
@exit 1
endif
@echo "$(YELLOW)Listing Timeshift snapshots on $(HOST)...$(RESET)"
@$(ANSIBLE_PLAYBOOK) playbooks/timeshift.yml --limit $(HOST) -e "timeshift_action=list" $(ANSIBLE_ARGS)
timeshift-restore: ## Restore from Timeshift snapshot (usage: make timeshift-restore HOST=dev02 SNAPSHOT=2025-12-17_21-30-00)
ifndef HOST
@echo "$(RED)Error: HOST parameter required$(RESET)"
@echo "Usage: make timeshift-restore HOST=dev02 SNAPSHOT=2025-12-17_21-30-00"
@exit 1
endif
ifndef SNAPSHOT
@echo "$(RED)Error: SNAPSHOT parameter required$(RESET)"
@echo "Usage: make timeshift-restore HOST=dev02 SNAPSHOT=2025-12-17_21-30-00"
@echo "$(YELLOW)Available snapshots:$(RESET)"
@$(MAKE) timeshift-list HOST=$(HOST)
@exit 1
endif
@echo "$(RED)WARNING: This will restore the system to snapshot $(SNAPSHOT)$(RESET)"
@echo "$(YELLOW)This action cannot be undone. Continue? [y/N]$(RESET)"
@read -r confirm && [ "$$confirm" = "y" ] || exit 1
@echo "$(YELLOW)Restoring snapshot $(SNAPSHOT) on $(HOST)...$(RESET)"
@$(ANSIBLE_PLAYBOOK) playbooks/timeshift.yml --limit $(HOST) -e "timeshift_action=restore timeshift_snapshot=$(SNAPSHOT)" $(ANSIBLE_ARGS)
@echo "$(GREEN)✓ Snapshot restored$(RESET)"
test-connectivity: ## Test host connectivity with detailed diagnostics and recommendations
@echo "$(YELLOW)Testing host connectivity...$(RESET)"

157
docs/ROADMAP.md Normal file
View File

@ -0,0 +1,157 @@
# Project Roadmap & Future Improvements
Ideas and plans for enhancing the Ansible infrastructure.
## 🚀 Quick Wins (< 30 minutes each)
### Monitoring Enhancements
- [ ] Add Grafana + Prometheus for service monitoring dashboard
- [ ] Implement health check scripts for critical services
- [ ] Create custom Ansible callback plugin for better output
### Security Improvements
- [ ] Add ClamAV antivirus scanning
- [ ] Implement Lynis security auditing
- [ ] Set up automatic security updates with unattended-upgrades
- [ ] Add SSH key rotation mechanism
- [ ] Implement connection monitoring and alerting
## 📊 Medium Projects (1-2 hours each)
### Infrastructure Services
- [ ] **Centralized Logging**: Deploy ELK stack (Elasticsearch, Logstash, Kibana)
- [ ] **Container Orchestration**: Implement Docker Swarm or K3s
- [ ] **CI/CD Pipeline**: Set up GitLab Runner or Jenkins
- [ ] **Network Storage**: Configure NFS or Samba shares
- [ ] **DNS Server**: Deploy Pi-hole for ad blocking and local DNS
### New Service VMs
- [ ] **Monitoring VM**: Dedicated Prometheus + Grafana instance
- [ ] **Media VM**: Plex/Jellyfin media server
- [ ] **Security VM**: Security scanning and vulnerability monitoring
- [ ] **Database VM**: PostgreSQL/MySQL for application data
## 🎯 Service-Specific Enhancements
### giteaVM (Alpine)
Current: Git repository hosting ✅
- [ ] Add CI/CD runners
- [ ] Implement package registry
- [ ] Set up webhook integrations
- [ ] Add code review tools
### portainerVM (Alpine)
Current: Container management ✅
- [ ] Deploy Docker registry
- [ ] Add image vulnerability scanning
- [ ] Set up container monitoring
### homepageVM (Debian)
Current: Service dashboard ✅
- [ ] Add uptime monitoring (Uptime Kuma)
- [ ] Create public status page
- [ ] Implement service dependency mapping
- [ ] Add performance metrics display
### Development VMs
Current: Development environment ✅
- [ ] Add code quality tools (SonarQube)
- [ ] Deploy testing environments
- [ ] Implement development databases
- [ ] Set up local package caching (Artifactory/Nexus)
## 🔧 Ansible Improvements
### Role Enhancements
- [ ] Create reusable database role (PostgreSQL, MySQL, Redis)
- [ ] Develop monitoring role with multiple backends
- [ ] Build certificate management role (Let's Encrypt)
- [ ] Create reverse proxy role (nginx/traefik)
### Playbook Optimization
- [ ] Implement dynamic inventory from cloud providers
- [ ] Add parallel execution strategies
- [ ] Create rollback mechanisms
- [ ] Implement blue-green deployment patterns
### Testing & Quality
- [ ] Add Molecule tests for all roles
- [ ] Implement GitHub Actions CI/CD
- [ ] Create integration test suite
- [ ] Add performance benchmarking
## 📈 Long-term Goals
### High Availability
- [ ] Implement cluster management for critical services
- [ ] Set up load balancing
- [ ] Create disaster recovery procedures
- [ ] Implement automated failover
### Observability
- [ ] Full APM (Application Performance Monitoring)
- [ ] Distributed tracing
- [ ] Log aggregation and analysis
- [ ] Custom metrics and dashboards
### Automation
- [ ] GitOps workflow implementation
- [ ] Self-healing infrastructure
- [ ] Automated scaling
- [ ] Predictive maintenance
## 📝 Documentation Improvements
- [ ] Create video tutorials
- [ ] Add architecture diagrams
- [ ] Write troubleshooting guides
- [ ] Create role development guide
- [ ] Add contribution guidelines
## Priority Matrix
### ✅ **COMPLETED (This Week)**
1. ~~Fix any existing shell issues~~ - Shell configuration working
2. ~~Complete vault setup with all secrets~~ - Tailscale auth key in vault
3. ~~Deploy monitoring basics~~ - System monitoring deployed
4. ~~Fix Tailscale handler issues~~ - Case-sensitive handlers fixed
### 🎯 **IMMEDIATE (Next)**
1. **Security hardening** - ClamAV, Lynis, vulnerability scanning
2. **Enhanced monitoring** - Add Grafana + Prometheus
3. **Security hardening** - ClamAV, Lynis auditing
4. **SSH key management** - Fix remaining connectivity issues
### Short-term (This Month)
1. Centralized logging
2. Enhanced monitoring
3. Security auditing
4. Advanced security monitoring
### Medium-term (Quarter)
1. CI/CD pipeline
2. Container orchestration
3. Service mesh
4. Advanced monitoring
### Long-term (Year)
1. Full HA implementation
2. Multi-region support
3. Complete observability
4. Full automation
## Contributing
To add new ideas:
1. Create an issue in the repository
2. Label with `enhancement` or `feature`
3. Discuss in team meetings
4. Update this roadmap when approved
## Notes
- Focus on stability over features
- Security and monitoring are top priorities
- All changes should be tested in dev first
- Document everything as you go

View File

@ -0,0 +1,205 @@
# Security Hardening Implementation Plan
## 🔒 **Security Hardening Role Structure**
### **Phase 1: Antivirus Protection (ClamAV)**
**What gets installed:**
```bash
- clamav-daemon # Background scanning service
- clamav-freshclam # Virus definition updates
- clamav-milter # Email integration
- clamdscan # Command-line scanner
```
**What gets configured:**
- **Daily scans** at 3 AM of critical directories
- **Real-time monitoring** of `/home`, `/var/www`, `/tmp`
- **Automatic updates** of virus definitions
- **Email alerts** for detected threats
- **Quarantine system** for suspicious files
**Ansible tasks:**
```yaml
- name: Install ClamAV
apt:
name: [clamav-daemon, clamav-freshclam, clamdscan]
state: present
- name: Configure daily scans
cron:
name: "Daily ClamAV scan"
job: "/usr/bin/clamscan -r /home /var/www --log=/var/log/clamav/daily.log"
hour: "3"
minute: "0"
- name: Enable real-time scanning
systemd:
name: clamav-daemon
enabled: true
state: started
```
### **Phase 2: Security Auditing (Lynis)**
**What gets installed:**
```bash
- lynis # Security auditing tool
- rkhunter # Rootkit hunter
- chkrootkit # Additional rootkit detection
```
**What gets configured:**
- **Weekly security audits** with detailed reports
- **Baseline security scoring** for comparison
- **Automated hardening** of common issues
- **Email reports** to administrators
- **Trend tracking** of security improvements
**Ansible tasks:**
```yaml
- name: Install Lynis
get_url:
url: "https://downloads.cisofy.com/lynis/lynis-3.0.8.tar.gz"
dest: "/tmp/lynis.tar.gz"
- name: Extract and install Lynis
unarchive:
src: "/tmp/lynis.tar.gz"
dest: "/opt/"
remote_src: yes
- name: Create weekly audit cron
cron:
name: "Weekly Lynis audit"
job: "/opt/lynis/lynis audit system --quick --report-file /var/log/lynis/weekly-$(date +\\%Y\\%m\\%d).log"
weekday: "0"
hour: "2"
minute: "0"
```
### **Phase 3: Advanced Security Measures**
#### **File Integrity Monitoring (AIDE)**
```yaml
# Monitors critical system files for changes
- Tracks modifications to /etc, /bin, /sbin, /usr/bin
- Alerts on unauthorized changes
- Creates cryptographic checksums
- Daily integrity checks
```
#### **Intrusion Detection (Fail2ban Enhancement)**
```yaml
# Already have basic fail2ban, enhance with:
- SSH brute force protection ✅ (already done)
- Web application attack detection
- Port scan detection
- DDoS protection rules
- Geographic IP blocking
```
#### **System Hardening**
```yaml
# Kernel security parameters
- Disable unused network protocols
- Enable ASLR (Address Space Layout Randomization)
- Configure secure memory settings
- Harden network stack parameters
# Service hardening
- Disable unnecessary services
- Secure service configurations
- Implement principle of least privilege
- Configure secure file permissions
```
## 🎯 **Implementation Strategy**
### **Week 1: Basic Antivirus**
```bash
# Create security role
mkdir -p roles/security/{tasks,templates,handlers,defaults}
# Implement ClamAV
- Install and configure ClamAV
- Set up daily scans
- Configure email alerts
- Test malware detection
```
### **Week 2: Security Auditing**
```bash
# Add Lynis auditing
- Install Lynis security scanner
- Configure weekly audits
- Create reporting dashboard
- Baseline current security score
```
### **Week 3: Advanced Hardening**
```bash
# Implement AIDE and enhanced fail2ban
- File integrity monitoring
- Enhanced intrusion detection
- System parameter hardening
- Security policy enforcement
```
## 📊 **Expected Benefits**
### **Immediate (Week 1)**
- ✅ **Malware protection** on all systems
- ✅ **Automated threat detection**
- ✅ **Real-time file monitoring**
### **Short-term (Month 1)**
- ✅ **Security baseline** established
- ✅ **Vulnerability identification**
- ✅ **Automated hardening** applied
- ✅ **Security trend tracking**
### **Long-term (Ongoing)**
- ✅ **Proactive threat detection**
- ✅ **Compliance reporting**
- ✅ **Reduced attack surface**
- ✅ **Security incident prevention**
## 🚨 **Security Alerts & Monitoring**
### **Alert Types:**
1. **Critical**: Malware detected, system compromise
2. **High**: Failed security audit, integrity violation
3. **Medium**: Suspicious activity, configuration drift
4. **Low**: Routine scan results, update notifications
### **Notification Methods:**
- **Email alerts** for critical/high priority
- **Log aggregation** in centralized system
- **Dashboard indicators** in monitoring system
- **Weekly reports** with security trends
## 🔧 **Integration with Existing Infrastructure**
### **Works with your current setup:**
- ✅ **Fail2ban** - Enhanced with more rules
- ✅ **UFW firewall** - Additional hardening rules
- ✅ **SSH hardening** - Extended with key rotation
- ✅ **Monitoring** - Security metrics integration
- ✅ **Maintenance** - Security updates automation
### **Complements Proxmox + NAS:**
- **File-level protection** vs. VM snapshots
- **Real-time detection** vs. snapshot recovery
- **Proactive prevention** vs. reactive restoration
- **Security compliance** vs. data protection
## 📋 **Next Steps**
1. **Create security role** structure
2. **Implement ClamAV** antivirus protection
3. **Add Lynis** security auditing
4. **Configure monitoring** integration
5. **Test and validate** security improvements
Would you like me to start implementing the security role?

211
docs/guides/timeshift.md Normal file
View File

@ -0,0 +1,211 @@
# Timeshift Snapshot and Rollback Guide
## Overview
Timeshift is a system restore utility that creates snapshots of your system before Ansible playbook execution.
This allows you to easily rollback if something goes wrong during configuration changes.
## How It Works
When you run a playbook, the Timeshift role automatically:
1. Checks if Timeshift is installed (installs if missing)
2. Creates a snapshot before making any changes
3. Tags the snapshot with "ansible" and "pre-playbook" for easy identification
## Usage
### Automatic Snapshots
Snapshots are created automatically when running playbooks:
```bash
# Run playbook - snapshot created automatically
make dev HOST=dev02
# Or run only snapshot creation
make timeshift-snapshot HOST=dev02
```
### List Snapshots
```bash
# List all snapshots on a host
make timeshift-list HOST=dev02
# Or manually on the host
ssh ladmin@192.168.20.28 "sudo timeshift --list"
```
### Restore from Snapshot
```bash
# Restore from a specific snapshot
make timeshift-restore HOST=dev02 SNAPSHOT=2025-12-17_21-30-00
# The command will:
# 1. Show available snapshots if SNAPSHOT is not provided
# 2. Ask for confirmation before restoring
# 3. Restore the system to that snapshot
```
### Manual Snapshot
```bash
# Create snapshot manually on host
ssh ladmin@192.168.20.28
sudo timeshift --create --comments "Manual snapshot before manual changes"
```
### Manual Restore
```bash
# SSH to host
ssh ladmin@192.168.20.28
# List snapshots
sudo timeshift --list
# Restore (interactive)
sudo timeshift --restore
# Or restore specific snapshot (non-interactive)
sudo timeshift --restore --snapshot '2025-12-17_21-30-00' --scripted
```
## Configuration
### Disable Auto-Snapshots
If you don't want automatic snapshots, disable them in `host_vars` or `group_vars`:
```yaml
# inventories/production/host_vars/dev02.yml
timeshift_auto_snapshot: false
```
### Customize Snapshot Settings
```yaml
# inventories/production/group_vars/dev/main.yml
timeshift_snapshot_description: "Pre-deployment snapshot"
timeshift_snapshot_tags: ["ansible", "deployment"]
timeshift_keep_daily: 7
timeshift_keep_weekly: 4
timeshift_keep_monthly: 6
```
## Important Notes
### Disk Space
- Snapshots require significant disk space (typically 10-50% of system size)
- RSYNC snapshots are larger but work on any filesystem
- BTRFS snapshots are smaller but require BTRFS filesystem
- Monitor disk usage: `df -h /timeshift`
### What Gets Backed Up
By default, Timeshift backs up:
- ✅ System files (`/etc`, `/usr`, `/boot`, etc.)
- ✅ System configuration
- ❌ User home directories (`/home`) - excluded by default
- ❌ User data
### Recovery Process
1. **Boot from recovery** (if system won't boot):
- Boot from live USB
- Install Timeshift: `sudo apt install timeshift`
- Run: `sudo timeshift --restore`
2. **Restore from running system**:
- SSH to host
- Run: `sudo timeshift --restore`
- Select snapshot and confirm
### Best Practices
1. **Always create snapshots before major changes**
```bash
make timeshift-snapshot HOST=dev02
make dev HOST=dev02
```
2. **Test rollback process** before you need it
```bash
# Create test snapshot
make timeshift-snapshot HOST=dev02
# Make a test change
# ...
# Practice restoring
make timeshift-list HOST=dev02
make timeshift-restore HOST=dev02 SNAPSHOT=<test-snapshot>
```
3. **Monitor snapshot disk usage**
```bash
ssh ladmin@192.168.20.28 "df -h /timeshift"
```
4. **Clean up old snapshots** if needed
```bash
ssh ladmin@192.168.20.28 "sudo timeshift --delete --snapshot 'OLD-SNAPSHOT'"
```
## Troubleshooting
### Snapshot Creation Fails
```bash
# Check Timeshift status
ssh ladmin@192.168.20.28 "sudo timeshift --list"
# Check disk space
ssh ladmin@192.168.20.28 "df -h"
# Check Timeshift logs
ssh ladmin@192.168.20.28 "sudo journalctl -u timeshift"
```
### Restore Fails
- Ensure you have enough disk space
- Check that snapshot still exists: `sudo timeshift --list`
- Try booting from recovery media if system won't boot
### Disk Full
```bash
# List snapshots
sudo timeshift --list
# Delete old snapshots
sudo timeshift --delete --snapshot 'OLD-SNAPSHOT'
# Or configure retention in group_vars
timeshift_keep_daily: 3 # Reduce from 7
timeshift_keep_weekly: 2 # Reduce from 4
```
## Integration with Ansible
The Timeshift role is automatically included in the development playbook and runs first to create snapshots before any changes are made.
This ensures you always have a restore point.
```yaml
# playbooks/development.yml
roles:
- {role: timeshift, tags: ['timeshift', 'snapshot']} # Runs first
- {role: base}
- {role: development}
# ... other roles
```
## See Also
- [Timeshift Documentation](https://github.com/teejee2008/timeshift)
- [Ansible Vault Guide](./vault.md) - For securing passwords
- [Maintenance Guide](../reference/makefile.md) - For system maintenance

View File

@ -0,0 +1,9 @@
---
# Development group overrides
# Development machines may need more permissive SSH settings
# Allow root login for initial setup (can be disabled after setup)
ssh_permit_root_login: 'yes'
# Allow password authentication for initial setup (should be disabled after SSH keys are set up)
ssh_password_authentication: 'yes'

View File

@ -0,0 +1,10 @@
---
# Host variables for KrakenMint
# Using root user directly, password will be prompted
ansible_become: true
# Configure shell for root
shell_users:
- ladmin

View File

@ -0,0 +1,8 @@
$ANSIBLE_VAULT;1.1;AES256
39353931333431383166336133363735336334376339646261353331323162343663386265393337
3761626465643830323333613065316361623839363439630a653563306462313663393432306135
61383936326637366635373563623038623866643230356164336436666535626239346163323665
6339623335643238660a303031363233396466326333613831366265363839313435366235663139
35616161333063363035326636353936633465613865313033393331313662303436646537613665
39616336363533633833383266346562373161656332363237343665316337353764386661333664
336163353333613762626533333437376637

View File

@ -1,3 +1,4 @@
---
$ANSIBLE_VAULT;1.1;AES256
31306264346663636630656534303766666564333866326139336137383339633338323834653266
6132333337363566623265303037336266646238633036390a663432623861363562386561393264

View File

@ -1,3 +1,4 @@
---
$ANSIBLE_VAULT;1.1;AES256
66633265383239626163633134656233613638643862323562373330643363323036333334646566
3439646635343533353432323064643135623532333738380a353866643461636233376432396434

View File

@ -1,5 +1,5 @@
---
ansible_become_password: root
ansible_become_password: "{{ vault_devgpu_become_password }}"
ansible_python_interpreter: /usr/bin/python3

View File

@ -0,0 +1,2 @@
---
vault_devgpu_become_password: root

View File

@ -5,4 +5,4 @@ ansible_become_exe: /usr/bin/sudo
ansible_become_method: sudo
# Alternative: if sudo is in a different location, update this
# ansible_become_exe: /usr/local/bin/sudo
# ansible_become_exe: /usr/local/bin/sudo

View File

@ -1,3 +1,4 @@
---
$ANSIBLE_VAULT;1.1;AES256
61623232353833613730343036663434633265346638366431383737623936616131356661616238
3230346138373030396336663566353433396230346434630a313633633161303539373965343466

View File

@ -1,3 +1,4 @@
---
$ANSIBLE_VAULT;1.1;AES256
31316663336338303832323464623866343366313261653536623233303466636630633235643638
3666646431323061313836333233356162643462323763380a623666663062386337393439653134

View File

@ -1,3 +1,4 @@
---
$ANSIBLE_VAULT;1.1;AES256
62356361353835643235613335613661356230666539386533383536623432316333346431343462
3265376632633731623430376333323234633962643766380a363033666334643930326636343963

View File

@ -1,3 +1,4 @@
---
$ANSIBLE_VAULT;1.1;AES256
35633833353965363964376161393730613065663236326239376562356231316166656131366263
6263363436373965316339623139353830643062393165370a643138356561613537616431316534

View File

@ -16,6 +16,8 @@ devGPU ansible_host=10.0.30.63 ansible_user=root
[qa]
git-ci-01 ansible_host=10.0.10.223 ansible_user=ladmin
sonarqube-01 ansible_host=10.0.10.54 ansible_user=ladmin
dev02 ansible_host=10.0.10.100 ansible_user=ladmin
KrakenMint ansible_host=10.0.10.120 ansible_user=ladmin
[ansible]
ansibleVM ansible_host=10.0.10.157 ansible_user=master

View File

@ -4,6 +4,7 @@
become: true
roles:
- {role: timeshift, tags: ['timeshift', 'snapshot']} # Create snapshot before changes
- {role: maintenance, tags: ['maintenance']}
- {role: base, tags: ['base', 'security']}
- {role: user, tags: ['user']}

28
playbooks/timeshift.yml Normal file
View File

@ -0,0 +1,28 @@
---
- name: Timeshift operations
hosts: all
become: true
gather_facts: false
tasks:
- name: List Timeshift snapshots
ansible.builtin.command: timeshift --list
register: timeshift_list_result
when: timeshift_action == "list"
changed_when: false
- name: Display snapshots
ansible.builtin.debug:
msg: "{{ timeshift_list_result.stdout_lines }}"
when: timeshift_action == "list"
- name: Restore from snapshot
ansible.builtin.command: timeshift --restore --snapshot "{{ timeshift_snapshot }}" --scripted # noqa command-instead-of-module
when: timeshift_action == "restore"
register: timeshift_restore_result
changed_when: false
- name: Display restore result
ansible.builtin.debug:
msg: "{{ timeshift_restore_result.stdout_lines }}"
when: timeshift_action == "restore"

View File

@ -1,2 +1,8 @@
---
# defaults file for base
# Fail2ban email configuration
# Set these in group_vars/all/main.yml or host_vars to enable email notifications
fail2ban_destemail: "" # Empty by default - no email notifications
fail2ban_sender: "" # Empty by default
fail2ban_action: "%(action_mwl)s" # Mail, whois, and log action

View File

@ -23,6 +23,7 @@
- unzip
- xclip
- tree
- copyq
# Network and admin tools
- net-tools
- ufw
@ -31,6 +32,9 @@
- jq
- ripgrep
- fd-find
# Power management (TLP for laptops)
- tlp
- tlp-rdw
state: present
- name: Install yq YAML processor
@ -74,3 +78,17 @@
community.general.locale_gen:
name: "{{ locale | default('en_US.UTF-8') }}"
state: present
- name: Gather package facts to check for TLP
ansible.builtin.package_facts:
manager: apt
when: ansible_facts.packages is not defined
- name: Enable and start TLP service
ansible.builtin.systemd:
name: tlp
enabled: true
state: started
daemon_reload: true
become: true
when: ansible_facts.packages is defined and 'tlp' in ansible_facts.packages

View File

@ -6,10 +6,14 @@ findtime = 600
# Allow 3 failures before banning
maxretry = 3
# Email notifications (uncomment and configure if needed)
destemail = idobkin@gmail.com
sender = idobkin@gmail.com
action = %(action_mwl)s
# Email notifications (configured via fail2ban_destemail variable)
{% if fail2ban_destemail | default('') | length > 0 %}
destemail = {{ fail2ban_destemail }}
sender = {{ fail2ban_sender | default(fail2ban_destemail) }}
action = {{ fail2ban_action | default('%(action_mwl)s') }}
{% else %}
# Email notifications disabled (set fail2ban_destemail in group_vars/all/main.yml to enable)
{% endif %}
[sshd]
enabled = true

View File

@ -5,7 +5,7 @@
state: present
become: true
- name: Check if NodeSource Node.js is installed
- name: Check if Node.js is installed
ansible.builtin.command: node --version
register: node_version_check
failed_when: false
@ -67,12 +67,20 @@
- node_version_check.rc != 0 or not node_version_check.stdout.startswith('v22')
- not (nodesource_key_stat.stat.exists | default(false))
- name: Add NodeSource GPG key only if needed
ansible.builtin.get_url:
url: https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key
dest: /etc/apt/keyrings/nodesource.gpg
mode: '0644'
force: true
- name: Import NodeSource GPG key into apt keyring
ansible.builtin.shell: |
# Ensure keyrings directory exists
mkdir -p /etc/apt/keyrings
# Download and convert key to binary format for signed-by
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
chmod 644 /etc/apt/keyrings/nodesource.gpg
# Verify the key file is valid
if ! file /etc/apt/keyrings/nodesource.gpg | grep -q "PGP"; then
echo "ERROR: Key file is not valid PGP format"
exit 1
fi
args:
creates: /etc/apt/keyrings/nodesource.gpg
become: true
when:
- node_version_check.rc != 0 or not node_version_check.stdout.startswith('v22')
@ -93,7 +101,8 @@
name: nodejs
state: present
become: true
when: node_version_check.rc != 0 or not node_version_check.stdout.startswith('v22')
when:
- (node_version_check.rc != 0 or not node_version_check.stdout.startswith('v22'))
- name: Verify Node.js installation
ansible.builtin.command: node --version

View File

@ -11,7 +11,6 @@
- name: Check if Docker is already installed
ansible.builtin.command: docker --version
register: docker_check
ignore_errors: true
changed_when: false
failed_when: false
no_log: true

View File

@ -33,5 +33,11 @@
ansible.builtin.apt_repository:
repo: "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ docker_ubuntu_codename }} stable"
state: present
update_cache: true
update_cache: false
when: docker_repo_check.stdout in ["not_exists", "wrong_config"]
- name: Update apt cache after adding Docker repository
ansible.builtin.apt:
update_cache: true
become: true
when: docker_repo_check.stdout in ["not_exists", "wrong_config"]

View File

@ -6,10 +6,14 @@ findtime = 600
# Allow 3 failures before banning
maxretry = 3
# Email notifications (uncomment and configure if needed)
destemail = idobkin@gmail.com
sender = idobkin@gmail.com
action = %(action_mwl)s
# Email notifications (configured via fail2ban_destemail variable)
{% if fail2ban_destemail | default('') | length > 0 %}
destemail = {{ fail2ban_destemail }}
sender = {{ fail2ban_sender | default(fail2ban_destemail) }}
action = {{ fail2ban_action | default('%(action_mwl)s') }}
{% else %}
# Email notifications disabled (set fail2ban_destemail in group_vars/all/main.yml to enable)
{% endif %}
[sshd]
enabled = true

View File

@ -143,4 +143,5 @@
- >-
Mode: {{ shell_mode | default('minimal') }} ({{ 'managed ~/.zshrc deployed' if (shell_deploy_managed_zshrc | bool) else 'aliases-only appended to ~/.zshrc' }})
- "If you want zsh as default login shell, set: shell_set_default_shell=true"
- "If zsh was set as the default shell, log out/in or run: exec zsh"
- "=========================================="

View File

@ -2,8 +2,10 @@
# SSH server configuration
ssh_port: 22
ssh_listen_addresses: ['0.0.0.0']
ssh_permit_root_login: 'yes'
ssh_password_authentication: 'yes'
# Security defaults - hardened by default
# Override in group_vars for dev/desktop machines if needed
ssh_permit_root_login: 'prohibit-password' # Allow root only with keys, not passwords
ssh_password_authentication: 'no' # Disable password auth by default (use keys)
ssh_pubkey_authentication: 'yes'
ssh_max_auth_tries: 3
ssh_client_alive_interval: 300

View File

@ -33,7 +33,16 @@
name: OpenSSH
failed_when: false
- name: Enable UFW with deny default policy
- name: Set UFW default policy for incoming (deny)
community.general.ufw:
direction: incoming
policy: deny
- name: Set UFW default policy for outgoing (allow)
community.general.ufw:
direction: outgoing
policy: allow
- name: Enable UFW firewall
community.general.ufw:
state: enabled
policy: deny

100
roles/timeshift/README.md Normal file
View File

@ -0,0 +1,100 @@
# Timeshift Role
Manages Timeshift system snapshots for backup and rollback capabilities.
## Purpose
This role installs and configures Timeshift, a system restore utility for Linux. It can automatically create snapshots before playbook execution to enable easy rollback if something goes wrong.
## Features
- Installs Timeshift package
- Creates automatic snapshots before playbook runs
- Configurable snapshot retention
- Easy rollback capability
## Variables
### Installation
- `timeshift_install` (default: `true`) - Install Timeshift package
### Snapshot Settings
- `timeshift_auto_snapshot` (default: `true`) - Automatically create snapshot before playbook execution
- `timeshift_snapshot_description` (default: `"Ansible playbook snapshot"`) - Description for snapshots
- `timeshift_snapshot_tags` (default: `["ansible", "pre-playbook"]`) - Tags for snapshots
- `timeshift_snapshot_type` (default: `"RSYNC"`) - Snapshot type: RSYNC or BTRFS
### Retention
- `timeshift_keep_daily` (default: `7`) - Keep daily snapshots for N days
- `timeshift_keep_weekly` (default: `4`) - Keep weekly snapshots for N weeks
- `timeshift_keep_monthly` (default: `6`) - Keep monthly snapshots for N months
### Location
- `timeshift_snapshot_location` (default: `"/timeshift"`) - Where to store snapshots
## Usage
### Basic Usage
Add to your playbook:
```yaml
roles:
- { role: timeshift, tags: ['timeshift', 'snapshot'] }
```
### Disable Auto-Snapshot
```yaml
roles:
- { role: timeshift, tags: ['timeshift'] }
```
In host_vars or group_vars:
```yaml
timeshift_auto_snapshot: false
```
### Manual Snapshot
```bash
# On the target host
sudo timeshift --create --comments "Manual snapshot before changes"
```
### Rollback
```bash
# List snapshots
sudo timeshift --list
# Restore from snapshot
sudo timeshift --restore --snapshot 'YYYY-MM-DD_HH-MM-SS'
# Or use the Makefile target
make timeshift-restore HOST=dev02 SNAPSHOT=2025-12-17_21-30-00
```
## Integration with Playbooks
The role is designed to be run early in playbooks to create snapshots before making changes:
```yaml
roles:
- { role: timeshift, tags: ['timeshift', 'snapshot'] } # Create snapshot first
- { role: base }
- { role: development }
# ... other roles
```
## Dependencies
- Debian/Ubuntu-based system
- Root/sudo access
## Notes
- Snapshots require significant disk space
- RSYNC snapshots are larger but work on any filesystem
- BTRFS snapshots are smaller but require BTRFS filesystem
- Snapshots exclude `/home` by default (configurable)

View File

@ -0,0 +1,21 @@
---
# Timeshift role defaults
# Install Timeshift
timeshift_install: true
# Timeshift snapshot settings
timeshift_snapshot_type: "RSYNC" # RSYNC or BTRFS
timeshift_snapshot_description: "Ansible playbook snapshot"
timeshift_snapshot_tags: ["ansible", "pre-playbook"]
# Auto-create snapshot before playbook runs
timeshift_auto_snapshot: true
# Retention settings
timeshift_keep_daily: 7
timeshift_keep_weekly: 4
timeshift_keep_monthly: 6
# Snapshot location (default: /timeshift)
timeshift_snapshot_location: "/timeshift"

View File

@ -0,0 +1,52 @@
---
- name: Check if Timeshift is installed
ansible.builtin.command: timeshift --version
register: timeshift_check
failed_when: false
changed_when: false
- name: Install Timeshift
ansible.builtin.apt:
name: timeshift
state: present
become: true
when:
- timeshift_install | default(true) | bool
- timeshift_check.rc != 0
- name: Create Timeshift snapshot directory
ansible.builtin.file:
path: "{{ timeshift_snapshot_location }}"
state: directory
mode: '0755'
become: true
when: timeshift_install | default(true) | bool
- name: Create snapshot before playbook execution
ansible.builtin.command: >
timeshift --create
--comments "{{ timeshift_snapshot_description }}"
--tags {{ timeshift_snapshot_tags | join(',') }}
--scripted
become: true
register: timeshift_snapshot_result
when:
- timeshift_auto_snapshot | default(true) | bool
- timeshift_check.rc == 0 or timeshift_install | default(true) | bool
changed_when: "'Snapshot created successfully' in timeshift_snapshot_result.stdout or 'Created snapshot' in timeshift_snapshot_result.stdout"
failed_when: >
timeshift_snapshot_result.rc != 0
and "'already exists' not in timeshift_snapshot_result.stderr | default('')"
and "'Snapshot created' not in timeshift_snapshot_result.stderr | default('')"
ignore_errors: true
- name: Display snapshot information
ansible.builtin.debug:
msg:
- "Timeshift snapshot operation completed"
- "Output: {{ timeshift_snapshot_result.stdout | default('Check with: sudo timeshift --list') }}"
- "To list snapshots: sudo timeshift --list"
- "To restore: sudo timeshift --restore --snapshot 'SNAPSHOT_NAME'"
when:
- timeshift_auto_snapshot | default(true) | bool
- timeshift_snapshot_result is defined

View File

@ -141,39 +141,91 @@ class ConnectivityTester:
return result
def _analyze_connectivity(self, result: Dict) -> Tuple[str, str]:
"""Analyze connectivity results and provide recommendations."""
hostname = result['hostname']
primary_ip = result['primary_ip']
fallback_ip = result['fallback_ip']
# Primary IP works perfectly
if result['primary_ping'] and result['primary_ssh']:
return 'success', f"{hostname} is fully accessible via primary IP {primary_ip}"
# Primary ping works but SSH fails
if result['primary_ping'] and not result['primary_ssh']:
error = result['primary_ssh_error']
if 'Permission denied' in error:
return 'ssh_key', f"{hostname}: SSH key issue on {primary_ip} - run: make copy-ssh-key HOST={hostname}"
elif 'Connection refused' in error:
return 'ssh_service', f"{hostname}: SSH service not running on {primary_ip}"
else:
return 'ssh_error', f"{hostname}: SSH error on {primary_ip} - {error}"
# Primary IP fails, test fallback
if not result['primary_ping'] and fallback_ip:
if result['fallback_ping'] and result['fallback_ssh']:
return 'use_fallback', f"{hostname}: Switch to fallback IP {fallback_ip} (primary {primary_ip} failed)"
elif result['fallback_ping'] and not result['fallback_ssh']:
return 'fallback_ssh', f"{hostname}: Fallback IP {fallback_ip} reachable but SSH failed"
else:
return 'both_failed', f"{hostname}: Both primary {primary_ip} and fallback {fallback_ip} failed"
# No fallback IP and primary failed
if not result['primary_ping'] and not fallback_ip:
return 'no_fallback', f"{hostname}: Primary IP {primary_ip} failed, no fallback available"
return 'unknown', f"? {hostname}: Unknown connectivity state"
"""Analyze connectivity results and provide recommendations.
Split into smaller helpers to keep this function's complexity low
while preserving the original decision logic.
"""
for handler in (
self._handle_primary_success,
self._handle_primary_ping_only,
self._handle_fallback_path,
self._handle_no_fallback,
):
outcome = handler(result)
if outcome is not None:
return outcome
hostname = result["hostname"]
return "unknown", f"? {hostname}: Unknown connectivity state"
def _handle_primary_success(self, result: Dict) -> Optional[Tuple[str, str]]:
"""Handle case where primary IP works perfectly."""
if result.get("primary_ping") and result.get("primary_ssh"):
hostname = result["hostname"]
primary_ip = result["primary_ip"]
return "success", f"{hostname} is fully accessible via primary IP {primary_ip}"
return None
def _handle_primary_ping_only(self, result: Dict) -> Optional[Tuple[str, str]]:
"""Handle cases where primary ping works but SSH fails."""
if result.get("primary_ping") and not result.get("primary_ssh"):
hostname = result["hostname"]
primary_ip = result["primary_ip"]
error = result.get("primary_ssh_error", "")
if "Permission denied" in error:
return (
"ssh_key",
f"{hostname}: SSH key issue on {primary_ip} - run: make copy-ssh-key HOST={hostname}",
)
if "Connection refused" in error:
return "ssh_service", f"{hostname}: SSH service not running on {primary_ip}"
return "ssh_error", f"{hostname}: SSH error on {primary_ip} - {error}"
return None
def _handle_fallback_path(self, result: Dict) -> Optional[Tuple[str, str]]:
"""Handle cases where primary fails and a fallback IP is defined."""
if result.get("primary_ping"):
return None
fallback_ip = result.get("fallback_ip")
if not fallback_ip:
return None
hostname = result["hostname"]
primary_ip = result["primary_ip"]
if result.get("fallback_ping") and result.get("fallback_ssh"):
return (
"use_fallback",
f"{hostname}: Switch to fallback IP {fallback_ip} (primary {primary_ip} failed)",
)
if result.get("fallback_ping") and not result.get("fallback_ssh"):
return (
"fallback_ssh",
f"{hostname}: Fallback IP {fallback_ip} reachable but SSH failed",
)
return (
"both_failed",
f"{hostname}: Both primary {primary_ip} and fallback {fallback_ip} failed",
)
def _handle_no_fallback(self, result: Dict) -> Optional[Tuple[str, str]]:
"""Handle cases where primary failed and no fallback IP is available."""
if result.get("primary_ping"):
return None
fallback_ip = result.get("fallback_ip")
if fallback_ip:
return None
hostname = result["hostname"]
primary_ip = result["primary_ip"]
return "no_fallback", f"{hostname}: Primary IP {primary_ip} failed, no fallback available"
def run_tests(self) -> List[Dict]:
"""Run connectivity tests for all hosts."""
@ -264,8 +316,8 @@ class ConnectivityTester:
# Auto-fallback suggestion
if fallback_needed:
print(f"\n🤖 Or run auto-fallback to fix automatically:")
print(f" make auto-fallback")
print("\n🤖 Or run auto-fallback to fix automatically:")
print(" make auto-fallback")
def export_json(self, results: List[Dict], output_file: str):
"""Export results to JSON file."""