# Role: datascience ## Description Installs and configures a complete data science environment including Anaconda/Conda, Jupyter Notebook, and R statistical computing language. This role is optional and separate from general development tools for faster deployments when data science capabilities are not needed. ## Requirements - Ansible 2.9+ - Debian/Ubuntu systems - At least 5GB free disk space (Anaconda is ~700MB) - Root or sudo access ## Installed Components ### Anaconda/Conda - Full Anaconda3 distribution (latest version) - Python data science packages (pandas, numpy, matplotlib, scikit-learn) - Conda package manager - Initialized for bash (zsh initialization via shell role) ### Jupyter Notebook - Jupyter Notebook server - IPython kernel - JupyterLab interface - Systemd service for automatic startup - Web-based access on configurable port ### R Language - R base and development packages - R recommended packages - CRAN repository configuration - IRkernel for Jupyter integration - Common R packages ## Variables | Variable | Default | Description | |----------|---------|-------------| | `install_conda` | `false` | Install Anaconda/Conda | | `conda_install_path` | `{{ ansible_env.HOME }}/anaconda3` | Conda installation directory | | `install_jupyter` | `false` | Install Jupyter Notebook (requires conda) | | `jupyter_port` | `8888` | Jupyter web server port | | `jupyter_bind_all_interfaces` | `true` | Listen on all network interfaces | | `jupyter_allow_remote` | `true` | Allow remote connections | | `install_r` | `false` | Install R language | | `r_packages` | `[r-base, r-base-dev, r-recommended]` | R packages to install | ## Dependencies - `base` role (for core utilities) ## Example Playbook ### Full Data Science Stack ```yaml - hosts: datascience_servers roles: - role: datascience install_conda: true install_jupyter: true install_r: true ``` ### Conda + Jupyter Only ```yaml - hosts: jupyter_servers roles: - role: datascience install_conda: true install_jupyter: true install_r: false ``` ### R Only (no Python) ```yaml - hosts: r_servers roles: - role: datascience install_conda: false install_r: true ``` ## Usage ### Installation ```bash # Install full data science stack make datascience HOST=server01 # Install on specific host with custom vars ansible-playbook playbooks/development.yml --limit server01 --tags datascience \ -e "install_conda=true install_jupyter=true install_r=true" ``` ### Post-Installation #### Set Jupyter Password ```bash jupyter notebook password ``` #### Access Jupyter ```bash # Local http://localhost:8888 # Remote http://your-server-ip:8888 ``` #### Verify Installations ```bash conda --version jupyter --version R --version ``` ## Tags - `datascience`: All data science tasks - `conda`: Conda/Anaconda installation only - `jupyter`: Jupyter Notebook installation only - `r`, `rstats`: R language installation only ## Services ### Jupyter Notebook Service - **Service name**: `jupyter-notebook.service` - **Start**: `systemctl start jupyter-notebook` - **Status**: `systemctl status jupyter-notebook` - **Logs**: `journalctl -u jupyter-notebook` ## Performance Notes ### Installation Times - **Anaconda**: 5-10 minutes (700MB download) - **Jupyter**: 2-5 minutes (if Anaconda already installed) - **R**: 10-30 minutes (compilation of packages) ### Disk Space - **Anaconda**: ~2.5GB after installation - **R + packages**: ~500MB - **Total**: ~3GB for full stack ## Security Considerations 1. **Jupyter Password**: Always set a password after installation 2. **Firewall**: Ensure port 8888 (or custom port) is properly firewalled 3. **HTTPS**: Consider using HTTPS for remote access 4. **User Isolation**: Jupyter runs as the ansible user by default ## Troubleshooting ### Conda Not in PATH ```bash # Add to .bashrc or .zshrc export PATH="$HOME/anaconda3/bin:$PATH" ``` ### Jupyter Service Won't Start ```bash # Check logs journalctl -u jupyter-notebook -n 50 # Verify conda is accessible /root/anaconda3/bin/jupyter --version ``` ### R Package Installation Fails ```bash # Install manually in R R > install.packages("IRkernel") ``` ## Integration with Other Roles - **Shell Role**: Provides zsh with conda integration - **Monitoring**: btop/htop for resource monitoring - **Docker**: Can run Jupyter in containers alternatively ## Notes - Anaconda installer is cleaned up after installation - Conda init for zsh is handled by the shell role - IRkernel is automatically installed if both Jupyter and R are enabled - R packages are compiled during installation (can be slow) - Jupyter service starts on boot automatically