Role: datascience

Description

Installs and configures a complete data science environment including Anaconda/Conda, Jupyter Notebook, and R statistical computing language. This role is optional and separate from general development tools for faster deployments when data science capabilities are not needed.

Requirements

  • Ansible 2.9+
  • Debian/Ubuntu systems
  • At least 5GB free disk space (Anaconda is ~700MB)
  • Root or sudo access

Installed Components

Anaconda/Conda

  • Full Anaconda3 distribution (latest version)
  • Python data science packages (pandas, numpy, matplotlib, scikit-learn)
  • Conda package manager
  • Initialized for bash (zsh initialization via shell role)

Jupyter Notebook

  • Jupyter Notebook server
  • IPython kernel
  • JupyterLab interface
  • Systemd service for automatic startup
  • Web-based access on configurable port

R Language

  • R base and development packages
  • R recommended packages
  • CRAN repository configuration
  • IRkernel for Jupyter integration
  • Common R packages

Variables

Variable Default Description
install_conda false Install Anaconda/Conda
conda_install_path {{ ansible_env.HOME }}/anaconda3 Conda installation directory
install_jupyter false Install Jupyter Notebook (requires conda)
jupyter_port 8888 Jupyter web server port
jupyter_bind_all_interfaces true Listen on all network interfaces
jupyter_allow_remote true Allow remote connections
install_r false Install R language
r_packages [r-base, r-base-dev, r-recommended] R packages to install

Dependencies

  • base role (for core utilities)

Example Playbook

Full Data Science Stack

- hosts: datascience_servers
  roles:
    - role: datascience
      install_conda: true
      install_jupyter: true
      install_r: true

Conda + Jupyter Only

- hosts: jupyter_servers
  roles:
    - role: datascience
      install_conda: true
      install_jupyter: true
      install_r: false

R Only (no Python)

- hosts: r_servers
  roles:
    - role: datascience
      install_conda: false
      install_r: true

Usage

Installation

# Install full data science stack
make datascience HOST=server01

# Install on specific host with custom vars
ansible-playbook playbooks/development.yml --limit server01 --tags datascience \
  -e "install_conda=true install_jupyter=true install_r=true"

Post-Installation

Set Jupyter Password

jupyter notebook password

Access Jupyter

# Local
http://localhost:8888

# Remote
http://your-server-ip:8888

Verify Installations

conda --version
jupyter --version
R --version

Tags

  • datascience: All data science tasks
  • conda: Conda/Anaconda installation only
  • jupyter: Jupyter Notebook installation only
  • r, rstats: R language installation only

Services

Jupyter Notebook Service

  • Service name: jupyter-notebook.service
  • Start: systemctl start jupyter-notebook
  • Status: systemctl status jupyter-notebook
  • Logs: journalctl -u jupyter-notebook

Performance Notes

Installation Times

  • Anaconda: 5-10 minutes (700MB download)
  • Jupyter: 2-5 minutes (if Anaconda already installed)
  • R: 10-30 minutes (compilation of packages)

Disk Space

  • Anaconda: ~2.5GB after installation
  • R + packages: ~500MB
  • Total: ~3GB for full stack

Security Considerations

  1. Jupyter Password: Always set a password after installation
  2. Firewall: Ensure port 8888 (or custom port) is properly firewalled
  3. HTTPS: Consider using HTTPS for remote access
  4. User Isolation: Jupyter runs as the ansible user by default

Troubleshooting

Conda Not in PATH

# Add to .bashrc or .zshrc
export PATH="$HOME/anaconda3/bin:$PATH"

Jupyter Service Won't Start

# Check logs
journalctl -u jupyter-notebook -n 50

# Verify conda is accessible
/root/anaconda3/bin/jupyter --version

R Package Installation Fails

# Install manually in R
R
> install.packages("IRkernel")

Integration with Other Roles

  • Shell Role: Provides zsh with conda integration
  • Monitoring: btop/htop for resource monitoring
  • Docker: Can run Jupyter in containers alternatively

Notes

  • Anaconda installer is cleaned up after installation
  • Conda init for zsh is handled by the shell role
  • IRkernel is automatically installed if both Jupyter and R are enabled
  • R packages are compiled during installation (can be slow)
  • Jupyter service starts on boot automatically