ansible/roles/datascience
ilia 579f0709ce Update Makefile and inventory configurations for improved task execution and organization
- Refactor Makefile to enhance command structure, including clearer descriptions and usage examples for targets related to development, inventory, and monitoring tasks.
- Update inventory files to ensure correct host configurations and user settings, including adjustments to ansible_user for specific hosts.
- Modify group_vars to streamline Tailscale configuration and ensure proper handling of authentication keys.

These changes improve the clarity and usability of the Makefile and inventory setup, facilitating smoother operations across the infrastructure.
2025-10-09 21:24:45 -04:00
..

Role: datascience

Description

Installs and configures a complete data science environment including Anaconda/Conda, Jupyter Notebook, and R statistical computing language. This role is optional and separate from general development tools for faster deployments when data science capabilities are not needed.

Requirements

  • Ansible 2.9+
  • Debian/Ubuntu systems
  • At least 5GB free disk space (Anaconda is ~700MB)
  • Root or sudo access

Installed Components

Anaconda/Conda

  • Full Anaconda3 distribution (latest version)
  • Python data science packages (pandas, numpy, matplotlib, scikit-learn)
  • Conda package manager
  • Initialized for bash (zsh initialization via shell role)

Jupyter Notebook

  • Jupyter Notebook server
  • IPython kernel
  • JupyterLab interface
  • Systemd service for automatic startup
  • Web-based access on configurable port

R Language

  • R base and development packages
  • R recommended packages
  • CRAN repository configuration
  • IRkernel for Jupyter integration
  • Common R packages

Variables

Variable Default Description
install_conda false Install Anaconda/Conda
conda_install_path {{ ansible_env.HOME }}/anaconda3 Conda installation directory
install_jupyter false Install Jupyter Notebook (requires conda)
jupyter_port 8888 Jupyter web server port
jupyter_bind_all_interfaces true Listen on all network interfaces
jupyter_allow_remote true Allow remote connections
install_r false Install R language
r_packages [r-base, r-base-dev, r-recommended] R packages to install

Dependencies

  • base role (for core utilities)

Example Playbook

Full Data Science Stack

- hosts: datascience_servers
  roles:
    - role: datascience
      install_conda: true
      install_jupyter: true
      install_r: true

Conda + Jupyter Only

- hosts: jupyter_servers
  roles:
    - role: datascience
      install_conda: true
      install_jupyter: true
      install_r: false

R Only (no Python)

- hosts: r_servers
  roles:
    - role: datascience
      install_conda: false
      install_r: true

Usage

Installation

# Install full data science stack
make datascience HOST=server01

# Install on specific host with custom vars
ansible-playbook playbooks/development.yml --limit server01 --tags datascience \
  -e "install_conda=true install_jupyter=true install_r=true"

Post-Installation

Set Jupyter Password

jupyter notebook password

Access Jupyter

# Local
http://localhost:8888

# Remote
http://your-server-ip:8888

Verify Installations

conda --version
jupyter --version
R --version

Tags

  • datascience: All data science tasks
  • conda: Conda/Anaconda installation only
  • jupyter: Jupyter Notebook installation only
  • r, rstats: R language installation only

Services

Jupyter Notebook Service

  • Service name: jupyter-notebook.service
  • Start: systemctl start jupyter-notebook
  • Status: systemctl status jupyter-notebook
  • Logs: journalctl -u jupyter-notebook

Performance Notes

Installation Times

  • Anaconda: 5-10 minutes (700MB download)
  • Jupyter: 2-5 minutes (if Anaconda already installed)
  • R: 10-30 minutes (compilation of packages)

Disk Space

  • Anaconda: ~2.5GB after installation
  • R + packages: ~500MB
  • Total: ~3GB for full stack

Security Considerations

  1. Jupyter Password: Always set a password after installation
  2. Firewall: Ensure port 8888 (or custom port) is properly firewalled
  3. HTTPS: Consider using HTTPS for remote access
  4. User Isolation: Jupyter runs as the ansible user by default

Troubleshooting

Conda Not in PATH

# Add to .bashrc or .zshrc
export PATH="$HOME/anaconda3/bin:$PATH"

Jupyter Service Won't Start

# Check logs
journalctl -u jupyter-notebook -n 50

# Verify conda is accessible
/root/anaconda3/bin/jupyter --version

R Package Installation Fails

# Install manually in R
R
> install.packages("IRkernel")

Integration with Other Roles

  • Shell Role: Provides zsh with conda integration
  • Monitoring: btop/htop for resource monitoring
  • Docker: Can run Jupyter in containers alternatively

Notes

  • Anaconda installer is cleaned up after installation
  • Conda init for zsh is handled by the shell role
  • IRkernel is automatically installed if both Jupyter and R are enabled
  • R packages are compiled during installation (can be slow)
  • Jupyter service starts on boot automatically