- Refactor Makefile to enhance command structure, including clearer descriptions and usage examples for targets related to development, inventory, and monitoring tasks. - Update inventory files to ensure correct host configurations and user settings, including adjustments to ansible_user for specific hosts. - Modify group_vars to streamline Tailscale configuration and ensure proper handling of authentication keys. These changes improve the clarity and usability of the Makefile and inventory setup, facilitating smoother operations across the infrastructure.
186 lines
4.6 KiB
Markdown
186 lines
4.6 KiB
Markdown
# Role: datascience
|
|
|
|
## Description
|
|
Installs and configures a complete data science environment including Anaconda/Conda, Jupyter Notebook, and R statistical computing language. This role is optional and separate from general development tools for faster deployments when data science capabilities are not needed.
|
|
|
|
## Requirements
|
|
- Ansible 2.9+
|
|
- Debian/Ubuntu systems
|
|
- At least 5GB free disk space (Anaconda is ~700MB)
|
|
- Root or sudo access
|
|
|
|
## Installed Components
|
|
|
|
### Anaconda/Conda
|
|
- Full Anaconda3 distribution (latest version)
|
|
- Python data science packages (pandas, numpy, matplotlib, scikit-learn)
|
|
- Conda package manager
|
|
- Initialized for bash (zsh initialization via shell role)
|
|
|
|
### Jupyter Notebook
|
|
- Jupyter Notebook server
|
|
- IPython kernel
|
|
- JupyterLab interface
|
|
- Systemd service for automatic startup
|
|
- Web-based access on configurable port
|
|
|
|
### R Language
|
|
- R base and development packages
|
|
- R recommended packages
|
|
- CRAN repository configuration
|
|
- IRkernel for Jupyter integration
|
|
- Common R packages
|
|
|
|
## Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `install_conda` | `false` | Install Anaconda/Conda |
|
|
| `conda_install_path` | `{{ ansible_env.HOME }}/anaconda3` | Conda installation directory |
|
|
| `install_jupyter` | `false` | Install Jupyter Notebook (requires conda) |
|
|
| `jupyter_port` | `8888` | Jupyter web server port |
|
|
| `jupyter_bind_all_interfaces` | `true` | Listen on all network interfaces |
|
|
| `jupyter_allow_remote` | `true` | Allow remote connections |
|
|
| `install_r` | `false` | Install R language |
|
|
| `r_packages` | `[r-base, r-base-dev, r-recommended]` | R packages to install |
|
|
|
|
## Dependencies
|
|
- `base` role (for core utilities)
|
|
|
|
## Example Playbook
|
|
|
|
### Full Data Science Stack
|
|
```yaml
|
|
- hosts: datascience_servers
|
|
roles:
|
|
- role: datascience
|
|
install_conda: true
|
|
install_jupyter: true
|
|
install_r: true
|
|
```
|
|
|
|
### Conda + Jupyter Only
|
|
```yaml
|
|
- hosts: jupyter_servers
|
|
roles:
|
|
- role: datascience
|
|
install_conda: true
|
|
install_jupyter: true
|
|
install_r: false
|
|
```
|
|
|
|
### R Only (no Python)
|
|
```yaml
|
|
- hosts: r_servers
|
|
roles:
|
|
- role: datascience
|
|
install_conda: false
|
|
install_r: true
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Installation
|
|
```bash
|
|
# Install full data science stack
|
|
make datascience HOST=server01
|
|
|
|
# Install on specific host with custom vars
|
|
ansible-playbook playbooks/development.yml --limit server01 --tags datascience \
|
|
-e "install_conda=true install_jupyter=true install_r=true"
|
|
```
|
|
|
|
### Post-Installation
|
|
|
|
#### Set Jupyter Password
|
|
```bash
|
|
jupyter notebook password
|
|
```
|
|
|
|
#### Access Jupyter
|
|
```bash
|
|
# Local
|
|
http://localhost:8888
|
|
|
|
# Remote
|
|
http://your-server-ip:8888
|
|
```
|
|
|
|
#### Verify Installations
|
|
```bash
|
|
conda --version
|
|
jupyter --version
|
|
R --version
|
|
```
|
|
|
|
## Tags
|
|
- `datascience`: All data science tasks
|
|
- `conda`: Conda/Anaconda installation only
|
|
- `jupyter`: Jupyter Notebook installation only
|
|
- `r`, `rstats`: R language installation only
|
|
|
|
## Services
|
|
|
|
### Jupyter Notebook Service
|
|
- **Service name**: `jupyter-notebook.service`
|
|
- **Start**: `systemctl start jupyter-notebook`
|
|
- **Status**: `systemctl status jupyter-notebook`
|
|
- **Logs**: `journalctl -u jupyter-notebook`
|
|
|
|
## Performance Notes
|
|
|
|
### Installation Times
|
|
- **Anaconda**: 5-10 minutes (700MB download)
|
|
- **Jupyter**: 2-5 minutes (if Anaconda already installed)
|
|
- **R**: 10-30 minutes (compilation of packages)
|
|
|
|
### Disk Space
|
|
- **Anaconda**: ~2.5GB after installation
|
|
- **R + packages**: ~500MB
|
|
- **Total**: ~3GB for full stack
|
|
|
|
## Security Considerations
|
|
|
|
1. **Jupyter Password**: Always set a password after installation
|
|
2. **Firewall**: Ensure port 8888 (or custom port) is properly firewalled
|
|
3. **HTTPS**: Consider using HTTPS for remote access
|
|
4. **User Isolation**: Jupyter runs as the ansible user by default
|
|
|
|
## Troubleshooting
|
|
|
|
### Conda Not in PATH
|
|
```bash
|
|
# Add to .bashrc or .zshrc
|
|
export PATH="$HOME/anaconda3/bin:$PATH"
|
|
```
|
|
|
|
### Jupyter Service Won't Start
|
|
```bash
|
|
# Check logs
|
|
journalctl -u jupyter-notebook -n 50
|
|
|
|
# Verify conda is accessible
|
|
/root/anaconda3/bin/jupyter --version
|
|
```
|
|
|
|
### R Package Installation Fails
|
|
```bash
|
|
# Install manually in R
|
|
R
|
|
> install.packages("IRkernel")
|
|
```
|
|
|
|
## Integration with Other Roles
|
|
|
|
- **Shell Role**: Provides zsh with conda integration
|
|
- **Monitoring**: btop/htop for resource monitoring
|
|
- **Docker**: Can run Jupyter in containers alternatively
|
|
|
|
## Notes
|
|
- Anaconda installer is cleaned up after installation
|
|
- Conda init for zsh is handled by the shell role
|
|
- IRkernel is automatically installed if both Jupyter and R are enabled
|
|
- R packages are compiled during installation (can be slow)
|
|
- Jupyter service starts on boot automatically
|
|
|