- Refactor Makefile to enhance command structure, including clearer descriptions and usage examples for targets related to development, inventory, and monitoring tasks. - Update inventory files to ensure correct host configurations and user settings, including adjustments to ansible_user for specific hosts. - Modify group_vars to streamline Tailscale configuration and ensure proper handling of authentication keys. These changes improve the clarity and usability of the Makefile and inventory setup, facilitating smoother operations across the infrastructure.
4.6 KiB
4.6 KiB
Role: datascience
Description
Installs and configures a complete data science environment including Anaconda/Conda, Jupyter Notebook, and R statistical computing language. This role is optional and separate from general development tools for faster deployments when data science capabilities are not needed.
Requirements
- Ansible 2.9+
- Debian/Ubuntu systems
- At least 5GB free disk space (Anaconda is ~700MB)
- Root or sudo access
Installed Components
Anaconda/Conda
- Full Anaconda3 distribution (latest version)
- Python data science packages (pandas, numpy, matplotlib, scikit-learn)
- Conda package manager
- Initialized for bash (zsh initialization via shell role)
Jupyter Notebook
- Jupyter Notebook server
- IPython kernel
- JupyterLab interface
- Systemd service for automatic startup
- Web-based access on configurable port
R Language
- R base and development packages
- R recommended packages
- CRAN repository configuration
- IRkernel for Jupyter integration
- Common R packages
Variables
| Variable | Default | Description |
|---|---|---|
install_conda |
false |
Install Anaconda/Conda |
conda_install_path |
{{ ansible_env.HOME }}/anaconda3 |
Conda installation directory |
install_jupyter |
false |
Install Jupyter Notebook (requires conda) |
jupyter_port |
8888 |
Jupyter web server port |
jupyter_bind_all_interfaces |
true |
Listen on all network interfaces |
jupyter_allow_remote |
true |
Allow remote connections |
install_r |
false |
Install R language |
r_packages |
[r-base, r-base-dev, r-recommended] |
R packages to install |
Dependencies
baserole (for core utilities)
Example Playbook
Full Data Science Stack
- hosts: datascience_servers
roles:
- role: datascience
install_conda: true
install_jupyter: true
install_r: true
Conda + Jupyter Only
- hosts: jupyter_servers
roles:
- role: datascience
install_conda: true
install_jupyter: true
install_r: false
R Only (no Python)
- hosts: r_servers
roles:
- role: datascience
install_conda: false
install_r: true
Usage
Installation
# Install full data science stack
make datascience HOST=server01
# Install on specific host with custom vars
ansible-playbook playbooks/development.yml --limit server01 --tags datascience \
-e "install_conda=true install_jupyter=true install_r=true"
Post-Installation
Set Jupyter Password
jupyter notebook password
Access Jupyter
# Local
http://localhost:8888
# Remote
http://your-server-ip:8888
Verify Installations
conda --version
jupyter --version
R --version
Tags
datascience: All data science tasksconda: Conda/Anaconda installation onlyjupyter: Jupyter Notebook installation onlyr,rstats: R language installation only
Services
Jupyter Notebook Service
- Service name:
jupyter-notebook.service - Start:
systemctl start jupyter-notebook - Status:
systemctl status jupyter-notebook - Logs:
journalctl -u jupyter-notebook
Performance Notes
Installation Times
- Anaconda: 5-10 minutes (700MB download)
- Jupyter: 2-5 minutes (if Anaconda already installed)
- R: 10-30 minutes (compilation of packages)
Disk Space
- Anaconda: ~2.5GB after installation
- R + packages: ~500MB
- Total: ~3GB for full stack
Security Considerations
- Jupyter Password: Always set a password after installation
- Firewall: Ensure port 8888 (or custom port) is properly firewalled
- HTTPS: Consider using HTTPS for remote access
- User Isolation: Jupyter runs as the ansible user by default
Troubleshooting
Conda Not in PATH
# Add to .bashrc or .zshrc
export PATH="$HOME/anaconda3/bin:$PATH"
Jupyter Service Won't Start
# Check logs
journalctl -u jupyter-notebook -n 50
# Verify conda is accessible
/root/anaconda3/bin/jupyter --version
R Package Installation Fails
# Install manually in R
R
> install.packages("IRkernel")
Integration with Other Roles
- Shell Role: Provides zsh with conda integration
- Monitoring: btop/htop for resource monitoring
- Docker: Can run Jupyter in containers alternatively
Notes
- Anaconda installer is cleaned up after installation
- Conda init for zsh is handled by the shell role
- IRkernel is automatically installed if both Jupyter and R are enabled
- R packages are compiled during installation (can be slow)
- Jupyter service starts on boot automatically