# LinkedOut - LinkedIn Posts Scraper A Node.js application that automates LinkedIn login and scrapes posts containing specific keywords. The tool is designed to help track job market trends, layoffs, and open work opportunities by monitoring LinkedIn content. ## Features - **Automated LinkedIn Login**: Uses Playwright to automate browser interactions - **Keyword-based Search**: Searches for posts containing keywords from CSV files or CLI - **Flexible Keyword Sources**: Supports multiple CSV files in `keywords/` or CLI-only mode - **Configurable Search Parameters**: Customizable date ranges, sorting options, city, and scroll behavior - **Duplicate Detection**: Prevents duplicate posts and profiles in results - **Clean Text Processing**: Removes hashtags, emojis, and URLs from post content - **Timestamped Results**: Saves results to JSON files with timestamps - **Command-line Overrides**: Support for runtime parameter adjustments - **Enhanced Geographic Location Validation**: Validates user locations against 200+ Canadian cities with smart matching - **Local AI Analysis (Ollama)**: Free, private, and fast post-processing with local LLMs - **Flexible Processing**: Disable features, run AI analysis immediately, or process results later ## Prerequisites - Node.js (v14 or higher) - Valid LinkedIn account credentials - [Ollama](https://ollama.ai/) with a model (free, private, local AI) ## Installation 1. Clone the repository or download the files 2. Install dependencies: ```bash npm install ``` 3. Copy the configuration template and customize: ```bash cp env-config.example .env ``` 4. Edit `.env` with your settings (see Configuration section below) ## Configuration ### Environment Variables (.env file) Create a `.env` file from `env-config.example`: ```env # LinkedIn Credentials (Required) LINKEDIN_USERNAME=your_email@example.com LINKEDIN_PASSWORD=your_password # Basic Settings HEADLESS=true KEYWORDS=keywords-layoff.csv # Just the filename; always looks in keywords/ unless path is given DATE_POSTED=past-week SORT_BY=date_posted CITY=Toronto WHEELS=5 # Enhanced Location Filtering LOCATION_FILTER=Ontario,Manitoba ENABLE_LOCATION_CHECK=true # Local AI Analysis (Ollama) ENABLE_LOCAL_AI=true OLLAMA_MODEL=mistral OLLAMA_HOST=http://localhost:11434 RUN_LOCAL_AI_AFTER_SCRAPING=false # true = run after scraping, false = run manually AI_CONTEXT=job layoffs and workforce reduction AI_CONFIDENCE=0.7 AI_BATCH_SIZE=3 ``` ### Configuration Options #### Required - `LINKEDIN_USERNAME`: Your LinkedIn email/username - `LINKEDIN_PASSWORD`: Your LinkedIn password #### Basic Settings - `HEADLESS`: Browser headless mode (`true`/`false`, default: `true`) - `KEYWORDS`: CSV file name (default: `keywords-layoff.csv` in `keywords/` folder) - `DATE_POSTED`: Filter by date (`past-24h`, `past-week`, `past-month`, or empty) - `SORT_BY`: Sort results (`relevance` or `date_posted`) - `CITY`: Search location (default: `Toronto`) - `WHEELS`: Number of scrolls to load posts (default: `5`) #### Enhanced Location Filtering - `LOCATION_FILTER`: Geographic filter - supports multiple provinces/cities: - Single: `Ontario` or `Toronto` - Multiple: `Ontario,Manitoba` or `Toronto,Vancouver` - `ENABLE_LOCATION_CHECK`: Enable location validation (`true`/`false`) #### Local AI Analysis (Ollama) - `ENABLE_LOCAL_AI=true`: Enable local AI analysis - `OLLAMA_MODEL`: Model to use (`mistral`, `llama2`, `codellama`) - `OLLAMA_HOST`: Ollama server URL (default: `http://localhost:11434`) - `RUN_LOCAL_AI_AFTER_SCRAPING`: Run AI immediately after scraping (`true`/`false`) - `AI_CONTEXT`: Context for analysis (e.g., `job layoffs`) - `AI_CONFIDENCE`: Minimum confidence threshold (0.0-1.0, default: 0.7) - `AI_BATCH_SIZE`: Posts per batch (default: 3) ## Usage ### Basic Commands ```bash # Standard scraping with configured settings node linkedout.js # Visual mode (see browser) node linkedout.js --headless=false # Use only these keywords (ignore CSV) node linkedout.js --keyword="layoff,downsizing" # Add extra keywords to CSV/CLI list node linkedout.js --add-keyword="hiring freeze,open to work" # Override city and date node linkedout.js --city="Vancouver" --date_posted=past-month # Custom output file node linkedout.js --output=results/myfile.json # Skip location and AI filtering (fastest) node linkedout.js --no-location --no-ai # Run AI analysis immediately after scraping node linkedout.js --ai-after # Show help node linkedout.js --help ``` ### All Command-line Options - `--headless=true|false`: Override browser headless mode - `--keyword="kw1,kw2"`: Use only these keywords (comma-separated, overrides CSV) - `--add-keyword="kw1,kw2"`: Add extra keywords to CSV/CLI list - `--city="CityName"`: Override city - `--date_posted=VALUE`: Override date posted (past-24h, past-week, past-month, or empty) - `--sort_by=VALUE`: Override sort by (date_posted or relevance) - `--location_filter=VALUE`: Override location filter - `--output=FILE`: Output file name - `--no-location`: Disable location filtering - `--no-ai`: Disable AI analysis - `--ai-after`: Run local AI analysis after scraping - `--help, -h`: Show help message ### Keyword Files - Place all keyword CSVs in the `keywords/` folder - Example: `keywords/keywords-layoff.csv`, `keywords/keywords-open-work.csv` - Custom CSV format: header `keyword` with one keyword per line ### Local AI Analysis Commands After scraping, you can run AI analysis on the results: ```bash # Analyze latest results node ai-analyzer-local.js --context="job layoffs" # Analyze specific file node ai-analyzer-local.js --input=results/results-2024-01-15.json --context="hiring" # Use different model node ai-analyzer-local.js --model=llama2 --context="remote work" # Change confidence and batch size node ai-analyzer-local.js --context="job layoffs" --confidence=0.8 --batch-size=5 ``` ## Workflow Examples ### 1. Quick Start (All Features) ```bash node linkedout.js --ai-after ``` ### 2. Fast Scraping Only ```bash node linkedout.js --no-location --no-ai ``` ### 3. Location-Only Filtering ```bash node linkedout.js --no-ai ``` ### 4. Test Different AI Contexts ```bash node linkedout.js --no-ai node ai-analyzer-local.js --context="job layoffs" node ai-analyzer-local.js --context="hiring opportunities" node ai-analyzer-local.js --context="remote work" ``` ## Project Structure ``` linkedout/ ├── .env # Your configuration (create from template) ├── env-config.example # Configuration template ├── linkedout.js # Main scraper ├── ai-analyzer-local.js # Free local AI analyzer (Ollama) ├── location-utils.js # Enhanced location utilities ├── package.json # Dependencies ├── keywords/ # All keyword CSVs go here │ ├── keywords-layoff.csv │ └── keywords-open-work.csv ├── results/ # Output directory └── README.md # This documentation ``` ## Legal & Security - **Credentials**: Store securely in `.env`, add to `.gitignore` - **LinkedIn ToS**: Respect rate limits and usage guidelines - **Privacy**: Local AI keeps all data on your machine - **Usage**: Educational and research purposes only ## Dependencies - `playwright`: Browser automation - `dotenv`: Environment variables - `csv-parser`: CSV file reading - Built-in: `fs`, `path`, `child_process` ## Support For issues: 1. Check this README 2. Verify `.env` configuration 3. Test with `--headless=false` for debugging 4. Check Ollama status: `ollama list`