316 lines
6.6 KiB
Markdown
316 lines
6.6 KiB
Markdown
# LinkedIn Parser
|
|
|
|
LinkedIn posts parser with **integrated AI analysis** using the ai-analyzer core package. AI analysis is now embedded directly into the results JSON file.
|
|
|
|
## 🚀 Quick Start
|
|
|
|
```bash
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Run with default settings (AI analysis integrated into results)
|
|
npm start
|
|
|
|
# Run without AI analysis
|
|
npm run start:no-ai
|
|
```
|
|
|
|
## 📋 Available Scripts
|
|
|
|
### Parser Modes
|
|
|
|
```bash
|
|
# Basic parsing with integrated AI analysis
|
|
npm start
|
|
|
|
# Parsing without AI analysis
|
|
npm run start:no-ai
|
|
|
|
# Headless browser mode
|
|
npm run start:headless
|
|
|
|
# Visible browser mode (for debugging)
|
|
npm run start:visible
|
|
|
|
# Disable location filtering
|
|
npm run start:no-location
|
|
|
|
# Custom keywords
|
|
npm run start:custom
|
|
```
|
|
|
|
### Testing
|
|
|
|
```bash
|
|
# Run tests
|
|
npm test
|
|
|
|
# Run tests in watch mode
|
|
npm run test:watch
|
|
|
|
# Run tests with coverage
|
|
npm run test:coverage
|
|
```
|
|
|
|
### AI Analysis (CLI)
|
|
|
|
```bash
|
|
# Analyze latest results file with default context
|
|
npm run analyze:latest
|
|
|
|
# Analyze latest results file for layoffs
|
|
npm run analyze:layoff
|
|
|
|
# Analyze latest results file for job market trends
|
|
npm run analyze:trends
|
|
|
|
# Analyze specific file (requires --input parameter)
|
|
npm run analyze -- --input=results.json
|
|
```
|
|
|
|
### Utilities
|
|
|
|
```bash
|
|
# Show help
|
|
npm run help
|
|
|
|
# Run demo
|
|
npm run demo
|
|
|
|
# Install Playwright browser
|
|
npm run install:playwright
|
|
```
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Create a `.env` file in the `linkedin-parser` directory:
|
|
|
|
```env
|
|
# LinkedIn Credentials
|
|
LINKEDIN_USERNAME=your_email@example.com
|
|
LINKEDIN_PASSWORD=your_password
|
|
|
|
# Search Configuration
|
|
CITY=Toronto
|
|
DATE_POSTED=past-week
|
|
SORT_BY=date_posted
|
|
WHEELS=5
|
|
|
|
# Location Filtering
|
|
LOCATION_FILTER=Ontario,Manitoba
|
|
ENABLE_LOCATION_CHECK=true
|
|
|
|
# AI Analysis
|
|
ENABLE_AI_ANALYSIS=true
|
|
AI_CONTEXT="job market analysis and trends"
|
|
OLLAMA_MODEL=mistral
|
|
|
|
# Browser Configuration
|
|
HEADLESS=true
|
|
```
|
|
|
|
### Command Line Options
|
|
|
|
```bash
|
|
# Browser options
|
|
--headless=true|false # Browser headless mode
|
|
--keyword="kw1,kw2" # Specific keywords
|
|
--add-keyword="kw" # Additional keywords
|
|
--no-location # Disable location filtering
|
|
--no-ai # Disable AI analysis
|
|
```
|
|
|
|
## 📊 Output Files
|
|
|
|
The parser generates two main files:
|
|
|
|
1. **`linkedin-results-YYYY-MM-DD-HH-MM.json`** - Main results with **integrated AI analysis**
|
|
2. **`linkedin-rejected-YYYY-MM-DD-HH-MM.json`** - Rejected posts with reasons
|
|
|
|
### Results Structure
|
|
|
|
Each result in the JSON file now includes AI analysis:
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"timestamp": "2025-07-21T02:00:08.561Z",
|
|
"totalPosts": 10,
|
|
"aiAnalysisEnabled": true,
|
|
"aiAnalysisCompleted": true,
|
|
"aiContext": "job market analysis and trends",
|
|
"aiModel": "mistral"
|
|
},
|
|
"results": [
|
|
{
|
|
"keyword": "layoff",
|
|
"text": "Post content...",
|
|
"profileLink": "https://linkedin.com/in/user",
|
|
"location": "Toronto, Ontario",
|
|
"aiAnalysis": {
|
|
"isRelevant": true,
|
|
"confidence": 0.9,
|
|
"reasoning": "Post discusses job market conditions and hiring",
|
|
"context": "job market analysis and trends",
|
|
"model": "mistral",
|
|
"analyzedAt": "2025-07-21T02:48:42.487Z"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## 🧠 AI Analysis Workflow
|
|
|
|
### Automatic Integration
|
|
|
|
AI analysis runs automatically after parsing completes and is **embedded directly into the results JSON** (unless disabled with `--no-ai`).
|
|
|
|
### Manual Re-analysis
|
|
|
|
You can re-analyze existing results with different contexts using the CLI:
|
|
|
|
```bash
|
|
# Analyze latest results with default context
|
|
npm run analyze:latest
|
|
|
|
# Analyze latest results for layoffs
|
|
npm run analyze:layoff
|
|
|
|
# Analyze latest results for job market trends
|
|
npm run analyze:trends
|
|
|
|
# Analyze specific file with custom context
|
|
node ../ai-analyzer/cli.js --input=results.json --context="custom analysis"
|
|
```
|
|
|
|
### CLI Options
|
|
|
|
The AI analyzer CLI supports:
|
|
|
|
```bash
|
|
--input=FILE # Input JSON file
|
|
--output=FILE # Output file (default: original-ai.json)
|
|
--context="description" # Analysis context
|
|
--model=MODEL # Ollama model (default: mistral)
|
|
--latest # Use latest results file
|
|
--dir=PATH # Directory to look for results
|
|
```
|
|
|
|
## 🎯 Use Cases
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Run parser with integrated AI analysis
|
|
npm start
|
|
```
|
|
|
|
### Testing Different Keywords
|
|
|
|
```bash
|
|
# Test with custom keywords
|
|
npm run start:custom
|
|
```
|
|
|
|
### Debugging
|
|
|
|
```bash
|
|
# Run with visible browser
|
|
npm run start:visible
|
|
|
|
# Run without location filtering
|
|
npm run start:no-location
|
|
```
|
|
|
|
### Re-analyzing Data
|
|
|
|
```bash
|
|
# After running parser, re-analyze with different contexts
|
|
npm run analyze:layoff
|
|
npm run analyze:trends
|
|
|
|
# Analyze specific file
|
|
node ../ai-analyzer/cli.js --input=results/linkedin-results-2025-07-20-18-00.json
|
|
```
|
|
|
|
## 🔍 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Missing credentials**
|
|
|
|
```bash
|
|
# Check .env file exists and has credentials
|
|
cat .env
|
|
```
|
|
|
|
2. **Browser issues**
|
|
|
|
```bash
|
|
# Install Playwright browser
|
|
npm run install:playwright
|
|
```
|
|
|
|
3. **AI not available**
|
|
|
|
```bash
|
|
# Make sure Ollama is running
|
|
ollama list
|
|
|
|
# Install mistral model if needed
|
|
ollama pull mistral
|
|
```
|
|
|
|
4. **No results found**
|
|
|
|
```bash
|
|
# Try different keywords
|
|
npm run start:custom
|
|
```
|
|
|
|
5. **CLI can't find results**
|
|
```bash
|
|
# Make sure you're in the linkedin-parser directory
|
|
cd linkedin-parser
|
|
npm run analyze:latest
|
|
```
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
linkedin-parser/
|
|
├── index.js # Main parser with integrated AI analysis
|
|
├── package.json # Dependencies and scripts
|
|
├── .env # Configuration (create this)
|
|
├── keywords/ # Keyword CSV files
|
|
└── results/ # Output files (created automatically)
|
|
├── linkedin-results-*.json # Results with integrated AI analysis
|
|
└── linkedin-rejected-*.json # Rejected posts
|
|
```
|
|
|
|
## 🤝 Integration
|
|
|
|
This parser integrates with:
|
|
|
|
- **ai-analyzer**: Core AI utilities and CLI analysis tool
|
|
- **job-search-parser**: Job market intelligence (separate module)
|
|
|
|
### AI Analysis Package
|
|
|
|
The `ai-analyzer` package provides:
|
|
|
|
- **Library functions**: `analyzeBatch`, `checkOllamaStatus`, etc.
|
|
- **CLI tool**: `cli.js` for standalone analysis
|
|
- **Reusable components**: For other parsers in the ecosystem
|
|
|
|
## 🆕 What's New
|
|
|
|
- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
|
|
- **No Separate Files**: No more separate AI analysis files to manage
|
|
- **Rich Context**: Each post includes detailed AI insights
|
|
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
|
|
- **Backward Compatible**: Original data structure preserved
|