279 lines
8.5 KiB
Markdown
279 lines
8.5 KiB
Markdown
# LinkedOut - LinkedIn Posts Scraper
|
|
|
|
A Node.js application that automates LinkedIn login and scrapes posts containing specific keywords. The tool is designed to help track job market trends, layoffs, and open work opportunities by monitoring LinkedIn content.
|
|
|
|
## Features
|
|
|
|
- **Automated LinkedIn Login**: Uses Playwright to automate browser interactions
|
|
- **Keyword-based Search**: Searches for posts containing keywords from CSV files or CLI
|
|
- **Flexible Keyword Sources**: Supports multiple CSV files in `keywords/` or CLI-only mode
|
|
- **Configurable Search Parameters**: Customizable date ranges, sorting options, city, and scroll behavior
|
|
- **Duplicate Detection**: Prevents duplicate posts and profiles in results
|
|
- **Clean Text Processing**: Removes hashtags, emojis, and URLs from post content
|
|
- **Timestamped Results**: Saves results to JSON files with timestamps
|
|
- **Command-line Overrides**: Support for runtime parameter adjustments
|
|
- **Enhanced Geographic Location Validation**: Validates user locations against 200+ Canadian cities with smart matching
|
|
- **Local AI Analysis (Ollama)**: Free, private, and fast post-processing with local LLMs
|
|
- **Flexible Processing**: Disable features, run AI analysis immediately, or process results later
|
|
|
|
## Prerequisites
|
|
|
|
- Node.js (v14 or higher)
|
|
- Valid LinkedIn account credentials
|
|
- [Ollama](https://ollama.ai/) with a model (free, private, local AI)
|
|
|
|
## Installation
|
|
|
|
1. Clone the repository or download the files
|
|
2. Install dependencies:
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
3. Copy the configuration template and customize:
|
|
|
|
```bash
|
|
cp env-config.example .env
|
|
```
|
|
|
|
4. Edit `.env` with your settings (see Configuration section below)
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables (.env file)
|
|
|
|
Create a `.env` file from `env-config.example`:
|
|
|
|
```env
|
|
# LinkedIn Credentials (Required)
|
|
LINKEDIN_USERNAME=your_email@example.com
|
|
LINKEDIN_PASSWORD=your_password
|
|
|
|
# Basic Settings
|
|
HEADLESS=true
|
|
KEYWORDS=keywords-layoff.csv # Just the filename; always looks in keywords/ unless path is given
|
|
DATE_POSTED=past-week
|
|
SORT_BY=date_posted
|
|
CITY=Toronto
|
|
WHEELS=5
|
|
|
|
# Enhanced Location Filtering
|
|
LOCATION_FILTER=Ontario,Manitoba
|
|
ENABLE_LOCATION_CHECK=true
|
|
|
|
# Local AI Analysis (Ollama)
|
|
ENABLE_LOCAL_AI=true
|
|
OLLAMA_MODEL=mistral
|
|
OLLAMA_HOST=http://localhost:11434
|
|
RUN_LOCAL_AI_AFTER_SCRAPING=false # true = run after scraping, false = run manually
|
|
AI_CONTEXT=job layoffs and workforce reduction
|
|
AI_CONFIDENCE=0.7
|
|
AI_BATCH_SIZE=3
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
#### Required
|
|
|
|
- `LINKEDIN_USERNAME`: Your LinkedIn email/username
|
|
- `LINKEDIN_PASSWORD`: Your LinkedIn password
|
|
|
|
#### Basic Settings
|
|
|
|
- `HEADLESS`: Browser headless mode (`true`/`false`, default: `true`)
|
|
- `KEYWORDS`: CSV file name (default: `keywords-layoff.csv` in `keywords/` folder)
|
|
- `DATE_POSTED`: Filter by date (`past-24h`, `past-week`, `past-month`, or empty)
|
|
- `SORT_BY`: Sort results (`relevance` or `date_posted`)
|
|
- `CITY`: Search location (default: `Toronto`)
|
|
- `WHEELS`: Number of scrolls to load posts (default: `5`)
|
|
|
|
#### Enhanced Location Filtering
|
|
|
|
- `LOCATION_FILTER`: Geographic filter - supports multiple provinces/cities:
|
|
- Single: `Ontario` or `Toronto`
|
|
- Multiple: `Ontario,Manitoba` or `Toronto,Vancouver`
|
|
- `ENABLE_LOCATION_CHECK`: Enable location validation (`true`/`false`)
|
|
|
|
#### Local AI Analysis (Ollama)
|
|
|
|
- `ENABLE_LOCAL_AI=true`: Enable local AI analysis
|
|
- `OLLAMA_MODEL`: Model to use (auto-detects available models: `mistral`, `llama2`, `codellama`, etc.)
|
|
- `OLLAMA_HOST`: Ollama server URL (default: `http://localhost:11434`)
|
|
- `RUN_LOCAL_AI_AFTER_SCRAPING`: Run AI immediately after scraping (`true`/`false`)
|
|
- `AI_CONTEXT`: Context for analysis (e.g., `job layoffs`)
|
|
- `AI_CONFIDENCE`: Minimum confidence threshold (0.0-1.0, default: 0.7)
|
|
- `AI_BATCH_SIZE`: Posts per batch (default: 3)
|
|
|
|
## Usage
|
|
|
|
### Demo Mode
|
|
|
|
For testing and demonstration purposes, you can run the interactive demo:
|
|
|
|
```bash
|
|
# Run interactive demo (simulates scraping with fake data)
|
|
npm run demo
|
|
|
|
# Or directly:
|
|
node demo.js
|
|
```
|
|
|
|
The demo mode:
|
|
|
|
- Uses fake, anonymized data for safety
|
|
- Walks through all configuration options interactively
|
|
- Shows available Ollama models for selection
|
|
- Demonstrates the complete workflow without actual LinkedIn scraping
|
|
- Perfect for creating documentation, GIFs, or testing configurations
|
|
|
|
### Basic Commands
|
|
|
|
```bash
|
|
# Standard scraping with configured settings
|
|
node linkedout.js
|
|
|
|
# Visual mode (see browser)
|
|
node linkedout.js --headless=false
|
|
|
|
# Use only these keywords (ignore CSV)
|
|
node linkedout.js --keyword="layoff,downsizing"
|
|
|
|
# Add extra keywords to CSV/CLI list
|
|
node linkedout.js --add-keyword="hiring freeze,open to work"
|
|
|
|
# Override city and date
|
|
node linkedout.js --city="Vancouver" --date_posted=past-month
|
|
|
|
# Custom output file
|
|
node linkedout.js --output=results/myfile.json
|
|
|
|
# Skip location and AI filtering (fastest)
|
|
node linkedout.js --no-location --no-ai
|
|
|
|
# Run AI analysis immediately after scraping
|
|
node linkedout.js --ai-after
|
|
|
|
# Show help
|
|
node linkedout.js --help
|
|
```
|
|
|
|
### All Command-line Options
|
|
|
|
- `--headless=true|false`: Override browser headless mode
|
|
- `--keyword="kw1,kw2"`: Use only these keywords (comma-separated, overrides CSV)
|
|
- `--add-keyword="kw1,kw2"`: Add extra keywords to CSV/CLI list
|
|
- `--city="CityName"`: Override city
|
|
- `--date_posted=VALUE`: Override date posted (past-24h, past-week, past-month, or empty)
|
|
- `--sort_by=VALUE`: Override sort by (date_posted or relevance)
|
|
- `--location_filter=VALUE`: Override location filter
|
|
- `--output=FILE`: Output file name
|
|
- `--no-location`: Disable location filtering
|
|
- `--no-ai`: Disable AI analysis
|
|
- `--ai-after`: Run local AI analysis after scraping
|
|
- `--help, -h`: Show help message
|
|
|
|
### Keyword Files
|
|
|
|
- Place all keyword CSVs in the `keywords/` folder
|
|
- Example: `keywords/keywords-layoff.csv`, `keywords/keywords-open-work.csv`
|
|
- Custom CSV format: header `keyword` with one keyword per line
|
|
|
|
### Local AI Analysis Commands
|
|
|
|
After scraping, you can run AI analysis on the results:
|
|
|
|
```bash
|
|
# Analyze latest results
|
|
node ai-analyzer-local.js --context="job layoffs"
|
|
|
|
# Analyze specific file
|
|
node ai-analyzer-local.js --input=results/results-2024-01-15.json --context="hiring"
|
|
|
|
# Use different model (auto-detects available models)
|
|
node ai-analyzer-local.js --model=llama2 --context="remote work"
|
|
|
|
# Change confidence and batch size
|
|
node ai-analyzer-local.js --context="job layoffs" --confidence=0.8 --batch-size=5
|
|
|
|
# Check available models
|
|
ollama list
|
|
```
|
|
|
|
## Workflow Examples
|
|
|
|
### 1. First Time Setup (Demo Mode)
|
|
|
|
```bash
|
|
# Run interactive demo to test configuration
|
|
npm run demo
|
|
```
|
|
|
|
### 2. Quick Start (All Features)
|
|
|
|
```bash
|
|
node linkedout.js --ai-after
|
|
```
|
|
|
|
### 3. Fast Scraping Only
|
|
|
|
```bash
|
|
node linkedout.js --no-location --no-ai
|
|
```
|
|
|
|
### 4. Location-Only Filtering
|
|
|
|
```bash
|
|
node linkedout.js --no-ai
|
|
```
|
|
|
|
### 5. Test Different AI Contexts
|
|
|
|
```bash
|
|
node linkedout.js --no-ai
|
|
node ai-analyzer-local.js --context="job layoffs"
|
|
node ai-analyzer-local.js --context="hiring opportunities"
|
|
node ai-analyzer-local.js --context="remote work"
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
linkedout/
|
|
├── .env # Your configuration (create from template)
|
|
├── env-config.example # Configuration template
|
|
├── linkedout.js # Main scraper
|
|
├── demo.js # Interactive demo with fake data
|
|
├── ai-analyzer-local.js # Free local AI analyzer (Ollama)
|
|
├── location-utils.js # Enhanced location utilities
|
|
├── package.json # Dependencies
|
|
├── keywords/ # All keyword CSVs go here
|
|
│ ├── keywords-layoff.csv
|
|
│ └── keywords-open-work.csv
|
|
├── results/ # Output directory
|
|
└── README.md # This documentation
|
|
```
|
|
|
|
## Legal & Security
|
|
|
|
- **Credentials**: Store securely in `.env`, add to `.gitignore`
|
|
- **LinkedIn ToS**: Respect rate limits and usage guidelines
|
|
- **Privacy**: Local AI keeps all data on your machine
|
|
- **Usage**: Educational and research purposes only
|
|
|
|
## Dependencies
|
|
|
|
- `playwright`: Browser automation
|
|
- `dotenv`: Environment variables
|
|
- `csv-parser`: CSV file reading
|
|
- Built-in: `fs`, `path`, `child_process`
|
|
|
|
## Support
|
|
|
|
For issues:
|
|
|
|
1. Check this README
|
|
2. Verify `.env` configuration
|
|
3. Test with `--headless=false` for debugging
|
|
4. Check Ollama status: `ollama list`
|