8.5 KiB
8.5 KiB
LinkedOut - LinkedIn Posts Scraper
A Node.js application that automates LinkedIn login and scrapes posts containing specific keywords. The tool is designed to help track job market trends, layoffs, and open work opportunities by monitoring LinkedIn content.
Features
- Automated LinkedIn Login: Uses Playwright to automate browser interactions
- Keyword-based Search: Searches for posts containing keywords from CSV files or CLI
- Flexible Keyword Sources: Supports multiple CSV files in
keywords/or CLI-only mode - Configurable Search Parameters: Customizable date ranges, sorting options, city, and scroll behavior
- Duplicate Detection: Prevents duplicate posts and profiles in results
- Clean Text Processing: Removes hashtags, emojis, and URLs from post content
- Timestamped Results: Saves results to JSON files with timestamps
- Command-line Overrides: Support for runtime parameter adjustments
- Enhanced Geographic Location Validation: Validates user locations against 200+ Canadian cities with smart matching
- Local AI Analysis (Ollama): Free, private, and fast post-processing with local LLMs
- Flexible Processing: Disable features, run AI analysis immediately, or process results later
Prerequisites
- Node.js (v14 or higher)
- Valid LinkedIn account credentials
- Ollama with a model (free, private, local AI)
Installation
-
Clone the repository or download the files
-
Install dependencies:
npm install -
Copy the configuration template and customize:
cp env-config.example .env -
Edit
.envwith your settings (see Configuration section below)
Configuration
Environment Variables (.env file)
Create a .env file from env-config.example:
# LinkedIn Credentials (Required)
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password
# Basic Settings
HEADLESS=true
KEYWORDS=keywords-layoff.csv # Just the filename; always looks in keywords/ unless path is given
DATE_POSTED=past-week
SORT_BY=date_posted
CITY=Toronto
WHEELS=5
# Enhanced Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true
# Local AI Analysis (Ollama)
ENABLE_LOCAL_AI=true
OLLAMA_MODEL=mistral
OLLAMA_HOST=http://localhost:11434
RUN_LOCAL_AI_AFTER_SCRAPING=false # true = run after scraping, false = run manually
AI_CONTEXT=job layoffs and workforce reduction
AI_CONFIDENCE=0.7
AI_BATCH_SIZE=3
Configuration Options
Required
LINKEDIN_USERNAME: Your LinkedIn email/usernameLINKEDIN_PASSWORD: Your LinkedIn password
Basic Settings
HEADLESS: Browser headless mode (true/false, default:true)KEYWORDS: CSV file name (default:keywords-layoff.csvinkeywords/folder)DATE_POSTED: Filter by date (past-24h,past-week,past-month, or empty)SORT_BY: Sort results (relevanceordate_posted)CITY: Search location (default:Toronto)WHEELS: Number of scrolls to load posts (default:5)
Enhanced Location Filtering
LOCATION_FILTER: Geographic filter - supports multiple provinces/cities:- Single:
OntarioorToronto - Multiple:
Ontario,ManitobaorToronto,Vancouver
- Single:
ENABLE_LOCATION_CHECK: Enable location validation (true/false)
Local AI Analysis (Ollama)
ENABLE_LOCAL_AI=true: Enable local AI analysisOLLAMA_MODEL: Model to use (auto-detects available models:mistral,llama2,codellama, etc.)OLLAMA_HOST: Ollama server URL (default:http://localhost:11434)RUN_LOCAL_AI_AFTER_SCRAPING: Run AI immediately after scraping (true/false)AI_CONTEXT: Context for analysis (e.g.,job layoffs)AI_CONFIDENCE: Minimum confidence threshold (0.0-1.0, default: 0.7)AI_BATCH_SIZE: Posts per batch (default: 3)
Usage
Demo Mode
For testing and demonstration purposes, you can run the interactive demo:
# Run interactive demo (simulates scraping with fake data)
npm run demo
# Or directly:
node demo.js
The demo mode:
- Uses fake, anonymized data for safety
- Walks through all configuration options interactively
- Shows available Ollama models for selection
- Demonstrates the complete workflow without actual LinkedIn scraping
- Perfect for creating documentation, GIFs, or testing configurations
Basic Commands
# Standard scraping with configured settings
node linkedout.js
# Visual mode (see browser)
node linkedout.js --headless=false
# Use only these keywords (ignore CSV)
node linkedout.js --keyword="layoff,downsizing"
# Add extra keywords to CSV/CLI list
node linkedout.js --add-keyword="hiring freeze,open to work"
# Override city and date
node linkedout.js --city="Vancouver" --date_posted=past-month
# Custom output file
node linkedout.js --output=results/myfile.json
# Skip location and AI filtering (fastest)
node linkedout.js --no-location --no-ai
# Run AI analysis immediately after scraping
node linkedout.js --ai-after
# Show help
node linkedout.js --help
All Command-line Options
--headless=true|false: Override browser headless mode--keyword="kw1,kw2": Use only these keywords (comma-separated, overrides CSV)--add-keyword="kw1,kw2": Add extra keywords to CSV/CLI list--city="CityName": Override city--date_posted=VALUE: Override date posted (past-24h, past-week, past-month, or empty)--sort_by=VALUE: Override sort by (date_posted or relevance)--location_filter=VALUE: Override location filter--output=FILE: Output file name--no-location: Disable location filtering--no-ai: Disable AI analysis--ai-after: Run local AI analysis after scraping--help, -h: Show help message
Keyword Files
- Place all keyword CSVs in the
keywords/folder - Example:
keywords/keywords-layoff.csv,keywords/keywords-open-work.csv - Custom CSV format: header
keywordwith one keyword per line
Local AI Analysis Commands
After scraping, you can run AI analysis on the results:
# Analyze latest results
node ai-analyzer-local.js --context="job layoffs"
# Analyze specific file
node ai-analyzer-local.js --input=results/results-2024-01-15.json --context="hiring"
# Use different model (auto-detects available models)
node ai-analyzer-local.js --model=llama2 --context="remote work"
# Change confidence and batch size
node ai-analyzer-local.js --context="job layoffs" --confidence=0.8 --batch-size=5
# Check available models
ollama list
Workflow Examples
1. First Time Setup (Demo Mode)
# Run interactive demo to test configuration
npm run demo
2. Quick Start (All Features)
node linkedout.js --ai-after
3. Fast Scraping Only
node linkedout.js --no-location --no-ai
4. Location-Only Filtering
node linkedout.js --no-ai
5. Test Different AI Contexts
node linkedout.js --no-ai
node ai-analyzer-local.js --context="job layoffs"
node ai-analyzer-local.js --context="hiring opportunities"
node ai-analyzer-local.js --context="remote work"
Project Structure
linkedout/
├── .env # Your configuration (create from template)
├── env-config.example # Configuration template
├── linkedout.js # Main scraper
├── demo.js # Interactive demo with fake data
├── ai-analyzer-local.js # Free local AI analyzer (Ollama)
├── location-utils.js # Enhanced location utilities
├── package.json # Dependencies
├── keywords/ # All keyword CSVs go here
│ ├── keywords-layoff.csv
│ └── keywords-open-work.csv
├── results/ # Output directory
└── README.md # This documentation
Legal & Security
- Credentials: Store securely in
.env, add to.gitignore - LinkedIn ToS: Respect rate limits and usage guidelines
- Privacy: Local AI keeps all data on your machine
- Usage: Educational and research purposes only
Dependencies
playwright: Browser automationdotenv: Environment variablescsv-parser: CSV file reading- Built-in:
fs,path,child_process
Support
For issues:
- Check this README
- Verify
.envconfiguration - Test with
--headless=falsefor debugging - Check Ollama status:
ollama list