linkedout/README.md
2025-07-03 21:43:45 -04:00

248 lines
7.7 KiB
Markdown

# LinkedOut - LinkedIn Posts Scraper
A Node.js application that automates LinkedIn login and scrapes posts containing specific keywords. The tool is designed to help track job market trends, layoffs, and open work opportunities by monitoring LinkedIn content.
## Features
- **Automated LinkedIn Login**: Uses Playwright to automate browser interactions
- **Keyword-based Search**: Searches for posts containing keywords from CSV files or CLI
- **Flexible Keyword Sources**: Supports multiple CSV files in `keywords/` or CLI-only mode
- **Configurable Search Parameters**: Customizable date ranges, sorting options, city, and scroll behavior
- **Duplicate Detection**: Prevents duplicate posts and profiles in results
- **Clean Text Processing**: Removes hashtags, emojis, and URLs from post content
- **Timestamped Results**: Saves results to JSON files with timestamps
- **Command-line Overrides**: Support for runtime parameter adjustments
- **Enhanced Geographic Location Validation**: Validates user locations against 200+ Canadian cities with smart matching
- **Local AI Analysis (Ollama)**: Free, private, and fast post-processing with local LLMs
- **Flexible Processing**: Disable features, run AI analysis immediately, or process results later
## Prerequisites
- Node.js (v14 or higher)
- Valid LinkedIn account credentials
- [Ollama](https://ollama.ai/) with a model (free, private, local AI)
## Installation
1. Clone the repository or download the files
2. Install dependencies:
```bash
npm install
```
3. Copy the configuration template and customize:
```bash
cp env-config.example .env
```
4. Edit `.env` with your settings (see Configuration section below)
## Configuration
### Environment Variables (.env file)
Create a `.env` file from `env-config.example`:
```env
# LinkedIn Credentials (Required)
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password
# Basic Settings
HEADLESS=true
KEYWORDS=keywords-layoff.csv # Just the filename; always looks in keywords/ unless path is given
DATE_POSTED=past-week
SORT_BY=date_posted
CITY=Toronto
WHEELS=5
# Enhanced Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true
# Local AI Analysis (Ollama)
ENABLE_LOCAL_AI=true
OLLAMA_MODEL=mistral
OLLAMA_HOST=http://localhost:11434
RUN_LOCAL_AI_AFTER_SCRAPING=false # true = run after scraping, false = run manually
AI_CONTEXT=job layoffs and workforce reduction
AI_CONFIDENCE=0.7
AI_BATCH_SIZE=3
```
### Configuration Options
#### Required
- `LINKEDIN_USERNAME`: Your LinkedIn email/username
- `LINKEDIN_PASSWORD`: Your LinkedIn password
#### Basic Settings
- `HEADLESS`: Browser headless mode (`true`/`false`, default: `true`)
- `KEYWORDS`: CSV file name (default: `keywords-layoff.csv` in `keywords/` folder)
- `DATE_POSTED`: Filter by date (`past-24h`, `past-week`, `past-month`, or empty)
- `SORT_BY`: Sort results (`relevance` or `date_posted`)
- `CITY`: Search location (default: `Toronto`)
- `WHEELS`: Number of scrolls to load posts (default: `5`)
#### Enhanced Location Filtering
- `LOCATION_FILTER`: Geographic filter - supports multiple provinces/cities:
- Single: `Ontario` or `Toronto`
- Multiple: `Ontario,Manitoba` or `Toronto,Vancouver`
- `ENABLE_LOCATION_CHECK`: Enable location validation (`true`/`false`)
#### Local AI Analysis (Ollama)
- `ENABLE_LOCAL_AI=true`: Enable local AI analysis
- `OLLAMA_MODEL`: Model to use (`mistral`, `llama2`, `codellama`)
- `OLLAMA_HOST`: Ollama server URL (default: `http://localhost:11434`)
- `RUN_LOCAL_AI_AFTER_SCRAPING`: Run AI immediately after scraping (`true`/`false`)
- `AI_CONTEXT`: Context for analysis (e.g., `job layoffs`)
- `AI_CONFIDENCE`: Minimum confidence threshold (0.0-1.0, default: 0.7)
- `AI_BATCH_SIZE`: Posts per batch (default: 3)
## Usage
### Basic Commands
```bash
# Standard scraping with configured settings
node linkedout.js
# Visual mode (see browser)
node linkedout.js --headless=false
# Use only these keywords (ignore CSV)
node linkedout.js --keyword="layoff,downsizing"
# Add extra keywords to CSV/CLI list
node linkedout.js --add-keyword="hiring freeze,open to work"
# Override city and date
node linkedout.js --city="Vancouver" --date_posted=past-month
# Custom output file
node linkedout.js --output=results/myfile.json
# Skip location and AI filtering (fastest)
node linkedout.js --no-location --no-ai
# Run AI analysis immediately after scraping
node linkedout.js --ai-after
# Show help
node linkedout.js --help
```
### All Command-line Options
- `--headless=true|false`: Override browser headless mode
- `--keyword="kw1,kw2"`: Use only these keywords (comma-separated, overrides CSV)
- `--add-keyword="kw1,kw2"`: Add extra keywords to CSV/CLI list
- `--city="CityName"`: Override city
- `--date_posted=VALUE`: Override date posted (past-24h, past-week, past-month, or empty)
- `--sort_by=VALUE`: Override sort by (date_posted or relevance)
- `--location_filter=VALUE`: Override location filter
- `--output=FILE`: Output file name
- `--no-location`: Disable location filtering
- `--no-ai`: Disable AI analysis
- `--ai-after`: Run local AI analysis after scraping
- `--help, -h`: Show help message
### Keyword Files
- Place all keyword CSVs in the `keywords/` folder
- Example: `keywords/keywords-layoff.csv`, `keywords/keywords-open-work.csv`
- Custom CSV format: header `keyword` with one keyword per line
### Local AI Analysis Commands
After scraping, you can run AI analysis on the results:
```bash
# Analyze latest results
node ai-analyzer-local.js --context="job layoffs"
# Analyze specific file
node ai-analyzer-local.js --input=results/results-2024-01-15.json --context="hiring"
# Use different model
node ai-analyzer-local.js --model=llama2 --context="remote work"
# Change confidence and batch size
node ai-analyzer-local.js --context="job layoffs" --confidence=0.8 --batch-size=5
```
## Workflow Examples
### 1. Quick Start (All Features)
```bash
node linkedout.js --ai-after
```
### 2. Fast Scraping Only
```bash
node linkedout.js --no-location --no-ai
```
### 3. Location-Only Filtering
```bash
node linkedout.js --no-ai
```
### 4. Test Different AI Contexts
```bash
node linkedout.js --no-ai
node ai-analyzer-local.js --context="job layoffs"
node ai-analyzer-local.js --context="hiring opportunities"
node ai-analyzer-local.js --context="remote work"
```
## Project Structure
```
linkedout/
├── .env # Your configuration (create from template)
├── env-config.example # Configuration template
├── linkedout.js # Main scraper
├── ai-analyzer-local.js # Free local AI analyzer (Ollama)
├── location-utils.js # Enhanced location utilities
├── package.json # Dependencies
├── keywords/ # All keyword CSVs go here
│ ├── keywords-layoff.csv
│ └── keywords-open-work.csv
├── results/ # Output directory
└── README.md # This documentation
```
## Legal & Security
- **Credentials**: Store securely in `.env`, add to `.gitignore`
- **LinkedIn ToS**: Respect rate limits and usage guidelines
- **Privacy**: Local AI keeps all data on your machine
- **Usage**: Educational and research purposes only
## Dependencies
- `playwright`: Browser automation
- `dotenv`: Environment variables
- `csv-parser`: CSV file reading
- Built-in: `fs`, `path`, `child_process`
## Support
For issues:
1. Check this README
2. Verify `.env` configuration
3. Test with `--headless=false` for debugging
4. Check Ollama status: `ollama list`