316 lines
6.6 KiB
Markdown

# LinkedIn Parser
LinkedIn posts parser with **integrated AI analysis** using the ai-analyzer core package. AI analysis is now embedded directly into the results JSON file.
## 🚀 Quick Start
```bash
# Install dependencies
npm install
# Run with default settings (AI analysis integrated into results)
npm start
# Run without AI analysis
npm run start:no-ai
```
## 📋 Available Scripts
### Parser Modes
```bash
# Basic parsing with integrated AI analysis
npm start
# Parsing without AI analysis
npm run start:no-ai
# Headless browser mode
npm run start:headless
# Visible browser mode (for debugging)
npm run start:visible
# Disable location filtering
npm run start:no-location
# Custom keywords
npm run start:custom
```
### Testing
```bash
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage
```
### AI Analysis (CLI)
```bash
# Analyze latest results file with default context
npm run analyze:latest
# Analyze latest results file for layoffs
npm run analyze:layoff
# Analyze latest results file for job market trends
npm run analyze:trends
# Analyze specific file (requires --input parameter)
npm run analyze -- --input=results.json
```
### Utilities
```bash
# Show help
npm run help
# Run demo
npm run demo
# Install Playwright browser
npm run install:playwright
```
## 🔧 Configuration
### Environment Variables
Create a `.env` file in the `linkedin-parser` directory:
```env
# LinkedIn Credentials
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password
# Search Configuration
CITY=Toronto
DATE_POSTED=past-week
SORT_BY=date_posted
WHEELS=5
# Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true
# AI Analysis
ENABLE_AI_ANALYSIS=true
AI_CONTEXT="job market analysis and trends"
OLLAMA_MODEL=mistral
# Browser Configuration
HEADLESS=true
```
### Command Line Options
```bash
# Browser options
--headless=true|false # Browser headless mode
--keyword="kw1,kw2" # Specific keywords
--add-keyword="kw" # Additional keywords
--no-location # Disable location filtering
--no-ai # Disable AI analysis
```
## 📊 Output Files
The parser generates two main files:
1. **`linkedin-results-YYYY-MM-DD-HH-MM.json`** - Main results with **integrated AI analysis**
2. **`linkedin-rejected-YYYY-MM-DD-HH-MM.json`** - Rejected posts with reasons
### Results Structure
Each result in the JSON file now includes AI analysis:
```json
{
"metadata": {
"timestamp": "2025-07-21T02:00:08.561Z",
"totalPosts": 10,
"aiAnalysisEnabled": true,
"aiAnalysisCompleted": true,
"aiContext": "job market analysis and trends",
"aiModel": "mistral"
},
"results": [
{
"keyword": "layoff",
"text": "Post content...",
"profileLink": "https://linkedin.com/in/user",
"location": "Toronto, Ontario",
"aiAnalysis": {
"isRelevant": true,
"confidence": 0.9,
"reasoning": "Post discusses job market conditions and hiring",
"context": "job market analysis and trends",
"model": "mistral",
"analyzedAt": "2025-07-21T02:48:42.487Z"
}
}
]
}
```
## 🧠 AI Analysis Workflow
### Automatic Integration
AI analysis runs automatically after parsing completes and is **embedded directly into the results JSON** (unless disabled with `--no-ai`).
### Manual Re-analysis
You can re-analyze existing results with different contexts using the CLI:
```bash
# Analyze latest results with default context
npm run analyze:latest
# Analyze latest results for layoffs
npm run analyze:layoff
# Analyze latest results for job market trends
npm run analyze:trends
# Analyze specific file with custom context
node ../ai-analyzer/cli.js --input=results.json --context="custom analysis"
```
### CLI Options
The AI analyzer CLI supports:
```bash
--input=FILE # Input JSON file
--output=FILE # Output file (default: original-ai.json)
--context="description" # Analysis context
--model=MODEL # Ollama model (default: mistral)
--latest # Use latest results file
--dir=PATH # Directory to look for results
```
## 🎯 Use Cases
### Basic Usage
```bash
# Run parser with integrated AI analysis
npm start
```
### Testing Different Keywords
```bash
# Test with custom keywords
npm run start:custom
```
### Debugging
```bash
# Run with visible browser
npm run start:visible
# Run without location filtering
npm run start:no-location
```
### Re-analyzing Data
```bash
# After running parser, re-analyze with different contexts
npm run analyze:layoff
npm run analyze:trends
# Analyze specific file
node ../ai-analyzer/cli.js --input=results/linkedin-results-2025-07-20-18-00.json
```
## 🔍 Troubleshooting
### Common Issues
1. **Missing credentials**
```bash
# Check .env file exists and has credentials
cat .env
```
2. **Browser issues**
```bash
# Install Playwright browser
npm run install:playwright
```
3. **AI not available**
```bash
# Make sure Ollama is running
ollama list
# Install mistral model if needed
ollama pull mistral
```
4. **No results found**
```bash
# Try different keywords
npm run start:custom
```
5. **CLI can't find results**
```bash
# Make sure you're in the linkedin-parser directory
cd linkedin-parser
npm run analyze:latest
```
## 📁 Project Structure
```
linkedin-parser/
├── index.js # Main parser with integrated AI analysis
├── package.json # Dependencies and scripts
├── .env # Configuration (create this)
├── keywords/ # Keyword CSV files
└── results/ # Output files (created automatically)
├── linkedin-results-*.json # Results with integrated AI analysis
└── linkedin-rejected-*.json # Rejected posts
```
## 🤝 Integration
This parser integrates with:
- **ai-analyzer**: Core AI utilities and CLI analysis tool
- **job-search-parser**: Job market intelligence (separate module)
### AI Analysis Package
The `ai-analyzer` package provides:
- **Library functions**: `analyzeBatch`, `checkOllamaStatus`, etc.
- **CLI tool**: `cli.js` for standalone analysis
- **Reusable components**: For other parsers in the ecosystem
## 🆕 What's New
- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
- **No Separate Files**: No more separate AI analysis files to manage
- **Rich Context**: Each post includes detailed AI insights
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
- **Backward Compatible**: Original data structure preserved