update
This commit is contained in:
parent
b62854909b
commit
a04e9fb374
494
README.md
494
README.md
@ -1,247 +1,247 @@
|
|||||||
# LinkedOut - LinkedIn Posts Scraper
|
# LinkedOut - LinkedIn Posts Scraper
|
||||||
|
|
||||||
A Node.js application that automates LinkedIn login and scrapes posts containing specific keywords. The tool is designed to help track job market trends, layoffs, and open work opportunities by monitoring LinkedIn content.
|
A Node.js application that automates LinkedIn login and scrapes posts containing specific keywords. The tool is designed to help track job market trends, layoffs, and open work opportunities by monitoring LinkedIn content.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Automated LinkedIn Login**: Uses Playwright to automate browser interactions
|
- **Automated LinkedIn Login**: Uses Playwright to automate browser interactions
|
||||||
- **Keyword-based Search**: Searches for posts containing keywords from CSV files or CLI
|
- **Keyword-based Search**: Searches for posts containing keywords from CSV files or CLI
|
||||||
- **Flexible Keyword Sources**: Supports multiple CSV files in `keywords/` or CLI-only mode
|
- **Flexible Keyword Sources**: Supports multiple CSV files in `keywords/` or CLI-only mode
|
||||||
- **Configurable Search Parameters**: Customizable date ranges, sorting options, city, and scroll behavior
|
- **Configurable Search Parameters**: Customizable date ranges, sorting options, city, and scroll behavior
|
||||||
- **Duplicate Detection**: Prevents duplicate posts and profiles in results
|
- **Duplicate Detection**: Prevents duplicate posts and profiles in results
|
||||||
- **Clean Text Processing**: Removes hashtags, emojis, and URLs from post content
|
- **Clean Text Processing**: Removes hashtags, emojis, and URLs from post content
|
||||||
- **Timestamped Results**: Saves results to JSON files with timestamps
|
- **Timestamped Results**: Saves results to JSON files with timestamps
|
||||||
- **Command-line Overrides**: Support for runtime parameter adjustments
|
- **Command-line Overrides**: Support for runtime parameter adjustments
|
||||||
- **Enhanced Geographic Location Validation**: Validates user locations against 200+ Canadian cities with smart matching
|
- **Enhanced Geographic Location Validation**: Validates user locations against 200+ Canadian cities with smart matching
|
||||||
- **Local AI Analysis (Ollama)**: Free, private, and fast post-processing with local LLMs
|
- **Local AI Analysis (Ollama)**: Free, private, and fast post-processing with local LLMs
|
||||||
- **Flexible Processing**: Disable features, run AI analysis immediately, or process results later
|
- **Flexible Processing**: Disable features, run AI analysis immediately, or process results later
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
- Node.js (v14 or higher)
|
- Node.js (v14 or higher)
|
||||||
- Valid LinkedIn account credentials
|
- Valid LinkedIn account credentials
|
||||||
- [Ollama](https://ollama.ai/) with a model (free, private, local AI)
|
- [Ollama](https://ollama.ai/) with a model (free, private, local AI)
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
1. Clone the repository or download the files
|
1. Clone the repository or download the files
|
||||||
2. Install dependencies:
|
2. Install dependencies:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npm install
|
npm install
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Copy the configuration template and customize:
|
3. Copy the configuration template and customize:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cp env-config.example .env
|
cp env-config.example .env
|
||||||
```
|
```
|
||||||
|
|
||||||
4. Edit `.env` with your settings (see Configuration section below)
|
4. Edit `.env` with your settings (see Configuration section below)
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
### Environment Variables (.env file)
|
### Environment Variables (.env file)
|
||||||
|
|
||||||
Create a `.env` file from `env-config.example`:
|
Create a `.env` file from `env-config.example`:
|
||||||
|
|
||||||
```env
|
```env
|
||||||
# LinkedIn Credentials (Required)
|
# LinkedIn Credentials (Required)
|
||||||
LINKEDIN_USERNAME=your_email@example.com
|
LINKEDIN_USERNAME=your_email@example.com
|
||||||
LINKEDIN_PASSWORD=your_password
|
LINKEDIN_PASSWORD=your_password
|
||||||
|
|
||||||
# Basic Settings
|
# Basic Settings
|
||||||
HEADLESS=true
|
HEADLESS=true
|
||||||
KEYWORDS=keywords-layoff.csv # Just the filename; always looks in keywords/ unless path is given
|
KEYWORDS=keywords-layoff.csv # Just the filename; always looks in keywords/ unless path is given
|
||||||
DATE_POSTED=past-week
|
DATE_POSTED=past-week
|
||||||
SORT_BY=date_posted
|
SORT_BY=date_posted
|
||||||
CITY=Toronto
|
CITY=Toronto
|
||||||
WHEELS=5
|
WHEELS=5
|
||||||
|
|
||||||
# Enhanced Location Filtering
|
# Enhanced Location Filtering
|
||||||
LOCATION_FILTER=Ontario,Manitoba
|
LOCATION_FILTER=Ontario,Manitoba
|
||||||
ENABLE_LOCATION_CHECK=true
|
ENABLE_LOCATION_CHECK=true
|
||||||
|
|
||||||
# Local AI Analysis (Ollama)
|
# Local AI Analysis (Ollama)
|
||||||
ENABLE_LOCAL_AI=true
|
ENABLE_LOCAL_AI=true
|
||||||
OLLAMA_MODEL=mistral
|
OLLAMA_MODEL=mistral
|
||||||
OLLAMA_HOST=http://localhost:11434
|
OLLAMA_HOST=http://localhost:11434
|
||||||
RUN_LOCAL_AI_AFTER_SCRAPING=false # true = run after scraping, false = run manually
|
RUN_LOCAL_AI_AFTER_SCRAPING=false # true = run after scraping, false = run manually
|
||||||
AI_CONTEXT=job layoffs and workforce reduction
|
AI_CONTEXT=job layoffs and workforce reduction
|
||||||
AI_CONFIDENCE=0.7
|
AI_CONFIDENCE=0.7
|
||||||
AI_BATCH_SIZE=3
|
AI_BATCH_SIZE=3
|
||||||
```
|
```
|
||||||
|
|
||||||
### Configuration Options
|
### Configuration Options
|
||||||
|
|
||||||
#### Required
|
#### Required
|
||||||
|
|
||||||
- `LINKEDIN_USERNAME`: Your LinkedIn email/username
|
- `LINKEDIN_USERNAME`: Your LinkedIn email/username
|
||||||
- `LINKEDIN_PASSWORD`: Your LinkedIn password
|
- `LINKEDIN_PASSWORD`: Your LinkedIn password
|
||||||
|
|
||||||
#### Basic Settings
|
#### Basic Settings
|
||||||
|
|
||||||
- `HEADLESS`: Browser headless mode (`true`/`false`, default: `true`)
|
- `HEADLESS`: Browser headless mode (`true`/`false`, default: `true`)
|
||||||
- `KEYWORDS`: CSV file name (default: `keywords-layoff.csv` in `keywords/` folder)
|
- `KEYWORDS`: CSV file name (default: `keywords-layoff.csv` in `keywords/` folder)
|
||||||
- `DATE_POSTED`: Filter by date (`past-24h`, `past-week`, `past-month`, or empty)
|
- `DATE_POSTED`: Filter by date (`past-24h`, `past-week`, `past-month`, or empty)
|
||||||
- `SORT_BY`: Sort results (`relevance` or `date_posted`)
|
- `SORT_BY`: Sort results (`relevance` or `date_posted`)
|
||||||
- `CITY`: Search location (default: `Toronto`)
|
- `CITY`: Search location (default: `Toronto`)
|
||||||
- `WHEELS`: Number of scrolls to load posts (default: `5`)
|
- `WHEELS`: Number of scrolls to load posts (default: `5`)
|
||||||
|
|
||||||
#### Enhanced Location Filtering
|
#### Enhanced Location Filtering
|
||||||
|
|
||||||
- `LOCATION_FILTER`: Geographic filter - supports multiple provinces/cities:
|
- `LOCATION_FILTER`: Geographic filter - supports multiple provinces/cities:
|
||||||
- Single: `Ontario` or `Toronto`
|
- Single: `Ontario` or `Toronto`
|
||||||
- Multiple: `Ontario,Manitoba` or `Toronto,Vancouver`
|
- Multiple: `Ontario,Manitoba` or `Toronto,Vancouver`
|
||||||
- `ENABLE_LOCATION_CHECK`: Enable location validation (`true`/`false`)
|
- `ENABLE_LOCATION_CHECK`: Enable location validation (`true`/`false`)
|
||||||
|
|
||||||
#### Local AI Analysis (Ollama)
|
#### Local AI Analysis (Ollama)
|
||||||
|
|
||||||
- `ENABLE_LOCAL_AI=true`: Enable local AI analysis
|
- `ENABLE_LOCAL_AI=true`: Enable local AI analysis
|
||||||
- `OLLAMA_MODEL`: Model to use (`mistral`, `llama2`, `codellama`)
|
- `OLLAMA_MODEL`: Model to use (`mistral`, `llama2`, `codellama`)
|
||||||
- `OLLAMA_HOST`: Ollama server URL (default: `http://localhost:11434`)
|
- `OLLAMA_HOST`: Ollama server URL (default: `http://localhost:11434`)
|
||||||
- `RUN_LOCAL_AI_AFTER_SCRAPING`: Run AI immediately after scraping (`true`/`false`)
|
- `RUN_LOCAL_AI_AFTER_SCRAPING`: Run AI immediately after scraping (`true`/`false`)
|
||||||
- `AI_CONTEXT`: Context for analysis (e.g., `job layoffs`)
|
- `AI_CONTEXT`: Context for analysis (e.g., `job layoffs`)
|
||||||
- `AI_CONFIDENCE`: Minimum confidence threshold (0.0-1.0, default: 0.7)
|
- `AI_CONFIDENCE`: Minimum confidence threshold (0.0-1.0, default: 0.7)
|
||||||
- `AI_BATCH_SIZE`: Posts per batch (default: 3)
|
- `AI_BATCH_SIZE`: Posts per batch (default: 3)
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
### Basic Commands
|
### Basic Commands
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Standard scraping with configured settings
|
# Standard scraping with configured settings
|
||||||
node linkedout.js
|
node linkedout.js
|
||||||
|
|
||||||
# Visual mode (see browser)
|
# Visual mode (see browser)
|
||||||
node linkedout.js --headless=false
|
node linkedout.js --headless=false
|
||||||
|
|
||||||
# Use only these keywords (ignore CSV)
|
# Use only these keywords (ignore CSV)
|
||||||
node linkedout.js --keyword="layoff,downsizing"
|
node linkedout.js --keyword="layoff,downsizing"
|
||||||
|
|
||||||
# Add extra keywords to CSV/CLI list
|
# Add extra keywords to CSV/CLI list
|
||||||
node linkedout.js --add-keyword="hiring freeze,open to work"
|
node linkedout.js --add-keyword="hiring freeze,open to work"
|
||||||
|
|
||||||
# Override city and date
|
# Override city and date
|
||||||
node linkedout.js --city="Vancouver" --date_posted=past-month
|
node linkedout.js --city="Vancouver" --date_posted=past-month
|
||||||
|
|
||||||
# Custom output file
|
# Custom output file
|
||||||
node linkedout.js --output=results/myfile.json
|
node linkedout.js --output=results/myfile.json
|
||||||
|
|
||||||
# Skip location and AI filtering (fastest)
|
# Skip location and AI filtering (fastest)
|
||||||
node linkedout.js --no-location --no-ai
|
node linkedout.js --no-location --no-ai
|
||||||
|
|
||||||
# Run AI analysis immediately after scraping
|
# Run AI analysis immediately after scraping
|
||||||
node linkedout.js --ai-after
|
node linkedout.js --ai-after
|
||||||
|
|
||||||
# Show help
|
# Show help
|
||||||
node linkedout.js --help
|
node linkedout.js --help
|
||||||
```
|
```
|
||||||
|
|
||||||
### All Command-line Options
|
### All Command-line Options
|
||||||
|
|
||||||
- `--headless=true|false`: Override browser headless mode
|
- `--headless=true|false`: Override browser headless mode
|
||||||
- `--keyword="kw1,kw2"`: Use only these keywords (comma-separated, overrides CSV)
|
- `--keyword="kw1,kw2"`: Use only these keywords (comma-separated, overrides CSV)
|
||||||
- `--add-keyword="kw1,kw2"`: Add extra keywords to CSV/CLI list
|
- `--add-keyword="kw1,kw2"`: Add extra keywords to CSV/CLI list
|
||||||
- `--city="CityName"`: Override city
|
- `--city="CityName"`: Override city
|
||||||
- `--date_posted=VALUE`: Override date posted (past-24h, past-week, past-month, or empty)
|
- `--date_posted=VALUE`: Override date posted (past-24h, past-week, past-month, or empty)
|
||||||
- `--sort_by=VALUE`: Override sort by (date_posted or relevance)
|
- `--sort_by=VALUE`: Override sort by (date_posted or relevance)
|
||||||
- `--location_filter=VALUE`: Override location filter
|
- `--location_filter=VALUE`: Override location filter
|
||||||
- `--output=FILE`: Output file name
|
- `--output=FILE`: Output file name
|
||||||
- `--no-location`: Disable location filtering
|
- `--no-location`: Disable location filtering
|
||||||
- `--no-ai`: Disable AI analysis
|
- `--no-ai`: Disable AI analysis
|
||||||
- `--ai-after`: Run local AI analysis after scraping
|
- `--ai-after`: Run local AI analysis after scraping
|
||||||
- `--help, -h`: Show help message
|
- `--help, -h`: Show help message
|
||||||
|
|
||||||
### Keyword Files
|
### Keyword Files
|
||||||
|
|
||||||
- Place all keyword CSVs in the `keywords/` folder
|
- Place all keyword CSVs in the `keywords/` folder
|
||||||
- Example: `keywords/keywords-layoff.csv`, `keywords/keywords-open-work.csv`
|
- Example: `keywords/keywords-layoff.csv`, `keywords/keywords-open-work.csv`
|
||||||
- Custom CSV format: header `keyword` with one keyword per line
|
- Custom CSV format: header `keyword` with one keyword per line
|
||||||
|
|
||||||
### Local AI Analysis Commands
|
### Local AI Analysis Commands
|
||||||
|
|
||||||
After scraping, you can run AI analysis on the results:
|
After scraping, you can run AI analysis on the results:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Analyze latest results
|
# Analyze latest results
|
||||||
node ai-analyzer-local.js --context="job layoffs"
|
node ai-analyzer-local.js --context="job layoffs"
|
||||||
|
|
||||||
# Analyze specific file
|
# Analyze specific file
|
||||||
node ai-analyzer-local.js --input=results/results-2024-01-15.json --context="hiring"
|
node ai-analyzer-local.js --input=results/results-2024-01-15.json --context="hiring"
|
||||||
|
|
||||||
# Use different model
|
# Use different model
|
||||||
node ai-analyzer-local.js --model=llama2 --context="remote work"
|
node ai-analyzer-local.js --model=llama2 --context="remote work"
|
||||||
|
|
||||||
# Change confidence and batch size
|
# Change confidence and batch size
|
||||||
node ai-analyzer-local.js --context="job layoffs" --confidence=0.8 --batch-size=5
|
node ai-analyzer-local.js --context="job layoffs" --confidence=0.8 --batch-size=5
|
||||||
```
|
```
|
||||||
|
|
||||||
## Workflow Examples
|
## Workflow Examples
|
||||||
|
|
||||||
### 1. Quick Start (All Features)
|
### 1. Quick Start (All Features)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
node linkedout.js --ai-after
|
node linkedout.js --ai-after
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Fast Scraping Only
|
### 2. Fast Scraping Only
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
node linkedout.js --no-location --no-ai
|
node linkedout.js --no-location --no-ai
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Location-Only Filtering
|
### 3. Location-Only Filtering
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
node linkedout.js --no-ai
|
node linkedout.js --no-ai
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Test Different AI Contexts
|
### 4. Test Different AI Contexts
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
node linkedout.js --no-ai
|
node linkedout.js --no-ai
|
||||||
node ai-analyzer-local.js --context="job layoffs"
|
node ai-analyzer-local.js --context="job layoffs"
|
||||||
node ai-analyzer-local.js --context="hiring opportunities"
|
node ai-analyzer-local.js --context="hiring opportunities"
|
||||||
node ai-analyzer-local.js --context="remote work"
|
node ai-analyzer-local.js --context="remote work"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
linkedout/
|
linkedout/
|
||||||
├── .env # Your configuration (create from template)
|
├── .env # Your configuration (create from template)
|
||||||
├── env-config.example # Configuration template
|
├── env-config.example # Configuration template
|
||||||
├── linkedout.js # Main scraper
|
├── linkedout.js # Main scraper
|
||||||
├── ai-analyzer-local.js # Free local AI analyzer (Ollama)
|
├── ai-analyzer-local.js # Free local AI analyzer (Ollama)
|
||||||
├── location-utils.js # Enhanced location utilities
|
├── location-utils.js # Enhanced location utilities
|
||||||
├── package.json # Dependencies
|
├── package.json # Dependencies
|
||||||
├── keywords/ # All keyword CSVs go here
|
├── keywords/ # All keyword CSVs go here
|
||||||
│ ├── keywords-layoff.csv
|
│ ├── keywords-layoff.csv
|
||||||
│ └── keywords-open-work.csv
|
│ └── keywords-open-work.csv
|
||||||
├── results/ # Output directory
|
├── results/ # Output directory
|
||||||
└── README.md # This documentation
|
└── README.md # This documentation
|
||||||
```
|
```
|
||||||
|
|
||||||
## Legal & Security
|
## Legal & Security
|
||||||
|
|
||||||
- **Credentials**: Store securely in `.env`, add to `.gitignore`
|
- **Credentials**: Store securely in `.env`, add to `.gitignore`
|
||||||
- **LinkedIn ToS**: Respect rate limits and usage guidelines
|
- **LinkedIn ToS**: Respect rate limits and usage guidelines
|
||||||
- **Privacy**: Local AI keeps all data on your machine
|
- **Privacy**: Local AI keeps all data on your machine
|
||||||
- **Usage**: Educational and research purposes only
|
- **Usage**: Educational and research purposes only
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- `playwright`: Browser automation
|
- `playwright`: Browser automation
|
||||||
- `dotenv`: Environment variables
|
- `dotenv`: Environment variables
|
||||||
- `csv-parser`: CSV file reading
|
- `csv-parser`: CSV file reading
|
||||||
- Built-in: `fs`, `path`, `child_process`
|
- Built-in: `fs`, `path`, `child_process`
|
||||||
|
|
||||||
## Support
|
## Support
|
||||||
|
|
||||||
For issues:
|
For issues:
|
||||||
|
|
||||||
1. Check this README
|
1. Check this README
|
||||||
2. Verify `.env` configuration
|
2. Verify `.env` configuration
|
||||||
3. Test with `--headless=false` for debugging
|
3. Test with `--headless=false` for debugging
|
||||||
4. Check Ollama status: `ollama list`
|
4. Check Ollama status: `ollama list`
|
||||||
|
|||||||
1080
ai-analyzer-local.js
1080
ai-analyzer-local.js
File diff suppressed because it is too large
Load Diff
38
test/test.js
38
test/test.js
@ -1,19 +1,19 @@
|
|||||||
console.log("START!");
|
console.log("START!");
|
||||||
|
|
||||||
const { chromium } = require("playwright");
|
const { chromium } = require("playwright");
|
||||||
(async () => {
|
(async () => {
|
||||||
console.log("browser!");
|
console.log("browser!");
|
||||||
|
|
||||||
const browser = await chromium.launch({
|
const browser = await chromium.launch({
|
||||||
headless: true,
|
headless: true,
|
||||||
args: ["--no-sandbox", "--disable-setuid-sandbox"],
|
args: ["--no-sandbox", "--disable-setuid-sandbox"],
|
||||||
});
|
});
|
||||||
console.log("new page!");
|
console.log("new page!");
|
||||||
|
|
||||||
const page = await browser.newPage();
|
const page = await browser.newPage();
|
||||||
console.log("GOTO!");
|
console.log("GOTO!");
|
||||||
|
|
||||||
await page.goto("https://example.com");
|
await page.goto("https://example.com");
|
||||||
console.log("Success!");
|
console.log("Success!");
|
||||||
await browser.close();
|
await browser.close();
|
||||||
})();
|
})();
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user