407 lines
11 KiB
Markdown
407 lines
11 KiB
Markdown
# Job Market Intelligence Platform
|
|
|
|
A comprehensive platform for job market intelligence with **integrated AI-powered insights**. Built with modular architecture for extensibility and maintainability.
|
|
|
|
## 🏗️ Architecture Overview
|
|
|
|
```
|
|
job-market-intelligence/
|
|
├── ai-analyzer/ # Shared core utilities (logger, AI, location, text) + CLI tool
|
|
├── linkedin-parser/ # LinkedIn-specific scraper with integrated AI analysis
|
|
├── job-search-parser/ # Job search intelligence
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js 18+
|
|
- Playwright browser automation
|
|
- LinkedIn account credentials
|
|
- Optional: Ollama for local AI analysis
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
npm install
|
|
npx playwright install chromium
|
|
```
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Run LinkedIn parser with integrated AI analysis
|
|
cd linkedin-parser && npm start
|
|
|
|
# Run LinkedIn parser with specific keywords
|
|
cd linkedin-parser && npm run start:custom
|
|
|
|
# Run LinkedIn parser without AI analysis
|
|
cd linkedin-parser && npm run start:no-ai
|
|
|
|
# Run job search parser
|
|
cd job-search-parser && npm start
|
|
|
|
# Analyze existing results with AI (CLI)
|
|
cd linkedin-parser && npm run analyze:latest
|
|
|
|
# Analyze with custom context
|
|
cd linkedin-parser && npm run analyze:layoff
|
|
|
|
# Run demo workflow
|
|
node demo.js
|
|
```
|
|
|
|
## 📦 Core Components
|
|
|
|
### 1. AI Analyzer (`ai-analyzer/`)
|
|
|
|
**Shared utilities and CLI tool used by all parsers**
|
|
|
|
- **Logger**: Consistent logging across all components
|
|
- **Text Processing**: Keyword matching, text cleaning
|
|
- **Location Validation**: Geographic filtering and validation
|
|
- **AI Integration**: Local Ollama support with integrated analysis
|
|
- **CLI Tool**: Command-line interface for standalone AI analysis
|
|
- **Test Utilities**: Shared testing helpers
|
|
|
|
**Key Features:**
|
|
|
|
- Configurable log levels with color support
|
|
- Intelligent text processing and keyword matching
|
|
- Geographic location validation against filters
|
|
- **Integrated AI analysis**: AI results embedded in data structure
|
|
- **CLI tool**: Standalone analysis with flexible options
|
|
- Comprehensive test coverage
|
|
|
|
### 2. LinkedIn Parser (`linkedin-parser/`)
|
|
|
|
**Specialized LinkedIn content scraper with integrated AI analysis**
|
|
|
|
- Automated LinkedIn login and navigation
|
|
- Keyword-based post searching
|
|
- Profile location validation
|
|
- Duplicate detection and filtering
|
|
- **Automatic AI analysis integrated into results**
|
|
- Configurable search parameters
|
|
|
|
**Key Features:**
|
|
|
|
- Browser automation with Playwright
|
|
- Geographic filtering by city/region
|
|
- Date range filtering (24h, week, month)
|
|
- **Integrated AI-powered content relevance analysis**
|
|
- **Single JSON output with embedded AI insights**
|
|
- **Two output files: results (with AI) and rejected posts**
|
|
|
|
### 3. Job Search Parser (`job-search-parser/`)
|
|
|
|
**Job market intelligence and analysis**
|
|
|
|
- Job posting aggregation
|
|
- Role-specific keyword tracking
|
|
- Market trend analysis
|
|
- Salary and requirement insights
|
|
|
|
**Key Features:**
|
|
|
|
- Tech role keyword tracking
|
|
- Industry-specific analysis
|
|
- Market demand insights
|
|
- Competitive intelligence
|
|
|
|
### 4. AI Analysis CLI (`ai-analyzer/cli.js`)
|
|
|
|
**Command-line tool for AI analysis of any results JSON file**
|
|
|
|
- Analyze any results JSON file from LinkedIn parser or other sources
|
|
- **Integrated analysis**: AI results embedded back into original JSON
|
|
- Custom analysis context and AI models
|
|
- Comprehensive analysis summary and statistics
|
|
- Flexible input format support
|
|
|
|
**Key Features:**
|
|
|
|
- Works with any JSON results file
|
|
- **Integrated output**: AI analysis embedded in original structure
|
|
- Custom analysis contexts
|
|
- Detailed relevance scoring
|
|
- Confidence level analysis
|
|
- Summary statistics and insights
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Create a `.env` file in the root directory:
|
|
|
|
```env
|
|
# LinkedIn Credentials
|
|
LINKEDIN_USERNAME=your_email@example.com
|
|
LINKEDIN_PASSWORD=your_password
|
|
|
|
# Search Configuration
|
|
CITY=Toronto
|
|
DATE_POSTED=past-week
|
|
SORT_BY=date_posted
|
|
WHEELS=5
|
|
|
|
# Location Filtering
|
|
LOCATION_FILTER=Ontario,Manitoba
|
|
ENABLE_LOCATION_CHECK=true
|
|
|
|
# AI Analysis
|
|
ENABLE_AI_ANALYSIS=true
|
|
AI_CONTEXT="job market analysis and trends"
|
|
OLLAMA_MODEL=mistral
|
|
|
|
# Keywords
|
|
KEYWORDS=keywords-layoff.csv
|
|
```
|
|
|
|
### Command Line Options
|
|
|
|
```bash
|
|
# LinkedIn Parser Options
|
|
--headless=true|false # Browser headless mode
|
|
--keyword="kw1,kw2" # Specific keywords
|
|
--add-keyword="kw1,kw2" # Additional keywords
|
|
--no-location # Disable location filtering
|
|
--no-ai # Disable AI analysis
|
|
|
|
# Job Search Parser Options
|
|
--help # Show parser-specific help
|
|
|
|
# AI Analysis CLI Options
|
|
--input=FILE # Input JSON file
|
|
--output=FILE # Output file
|
|
--context="description" # Custom AI analysis context
|
|
--model=MODEL # Ollama model
|
|
--latest # Use latest results file
|
|
--dir=PATH # Directory to look for results
|
|
```
|
|
|
|
## 📊 Output Formats
|
|
|
|
### LinkedIn Parser Output
|
|
|
|
The LinkedIn parser now generates **two main files** with **integrated AI analysis**:
|
|
|
|
#### 1. Main Results with AI Analysis (`linkedin-results-YYYY-MM-DD-HH-MM.json`)
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"timestamp": "2024-01-15T10:30:00Z",
|
|
"totalPosts": 45,
|
|
"rejectedPosts": 12,
|
|
"aiAnalysisEnabled": true,
|
|
"aiAnalysisCompleted": true,
|
|
"aiContext": "job market analysis and trends",
|
|
"aiModel": "mistral",
|
|
"locationFilter": "Ontario,Manitoba"
|
|
},
|
|
"results": [
|
|
{
|
|
"keyword": "layoff",
|
|
"text": "Cleaned post content...",
|
|
"profileLink": "https://linkedin.com/in/johndoe",
|
|
"location": "Toronto, Ontario, Canada",
|
|
"locationValid": true,
|
|
"locationMatchedFilter": "Ontario",
|
|
"locationReasoning": "Location matches filter",
|
|
"timestamp": "2024-01-15T10:30:00Z",
|
|
"source": "linkedin",
|
|
"parser": "linkedout-parser",
|
|
"aiAnalysis": {
|
|
"isRelevant": true,
|
|
"confidence": 0.9,
|
|
"reasoning": "Post discusses job market conditions and layoffs",
|
|
"context": "job market analysis and trends",
|
|
"model": "mistral",
|
|
"analyzedAt": "2024-01-15T10:30:00Z"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 2. Rejected Posts (`linkedin-rejected-YYYY-MM-DD-HH-MM.json`)
|
|
|
|
```json
|
|
[
|
|
{
|
|
"rejected": true,
|
|
"reason": "Location filter failed: Location not in filter",
|
|
"keyword": "layoff",
|
|
"text": "Post content...",
|
|
"profileLink": "https://linkedin.com/in/janedoe",
|
|
"location": "Vancouver, BC, Canada",
|
|
"timestamp": "2024-01-15T10:30:00Z"
|
|
}
|
|
]
|
|
```
|
|
|
|
### AI Analysis CLI Output
|
|
|
|
The CLI tool creates **integrated results** with AI analysis embedded:
|
|
|
|
#### Re-analyzed Results (`original-filename-ai.json`)
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"timestamp": "2024-01-15T10:30:00Z",
|
|
"totalPosts": 45,
|
|
"aiAnalysisUpdated": "2024-01-15T11:00:00Z",
|
|
"aiContext": "layoff analysis",
|
|
"aiModel": "mistral"
|
|
},
|
|
"results": [
|
|
{
|
|
"keyword": "layoff",
|
|
"text": "Post content...",
|
|
"profileLink": "https://linkedin.com/in/johndoe",
|
|
"location": "Toronto, Ontario, Canada",
|
|
"aiAnalysis": {
|
|
"isRelevant": true,
|
|
"confidence": 0.9,
|
|
"reasoning": "Post mentions layoffs and workforce reduction",
|
|
"context": "layoff analysis",
|
|
"model": "mistral",
|
|
"analyzedAt": "2024-01-15T11:00:00Z"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## 🧪 Testing
|
|
|
|
### Run All Tests
|
|
|
|
```bash
|
|
npm test
|
|
```
|
|
|
|
### Run Specific Test Suites
|
|
|
|
```bash
|
|
# AI Analyzer tests
|
|
cd ai-analyzer && npm test
|
|
|
|
# LinkedIn Parser tests
|
|
cd linkedin-parser && npm test
|
|
|
|
# Job Search Parser tests
|
|
cd job-search-parser && npm test
|
|
```
|
|
|
|
## 🔒 Security & Legal
|
|
|
|
### Security Best Practices
|
|
|
|
- Store credentials in `.env` file (never commit)
|
|
- Use environment variables for sensitive data
|
|
- Implement rate limiting to avoid detection
|
|
- Respect LinkedIn's Terms of Service
|
|
|
|
### Legal Compliance
|
|
|
|
- Educational/research purposes only
|
|
- Respect rate limits and usage policies
|
|
- Monitor LinkedIn ToS changes
|
|
- Implement data retention policies
|
|
|
|
## 🚀 Advanced Features
|
|
|
|
### AI-Powered Analysis
|
|
|
|
- **Local AI**: Ollama integration for privacy
|
|
- **Integrated Analysis**: AI results embedded in data structure
|
|
- **Automatic Analysis**: Runs after parsing completes
|
|
- **Context Analysis**: Relevance scoring
|
|
- **Confidence Scoring**: AI confidence levels for each post
|
|
- **CLI Tool**: Standalone analysis with flexible options
|
|
|
|
### Geographic Intelligence
|
|
|
|
- **Location Validation**: Profile location verification
|
|
- **Regional Filtering**: City/state/country filtering
|
|
- **Geographic Analysis**: Location-based insights
|
|
|
|
### Data Processing
|
|
|
|
- **Duplicate Detection**: Intelligent deduplication
|
|
- **Content Cleaning**: Remove hashtags, URLs, emojis
|
|
- **Metadata Extraction**: Author, engagement, timing data
|
|
- **Integrated AI**: AI insights embedded in each result
|
|
|
|
## 📈 Performance Optimization
|
|
|
|
### Recommended Settings
|
|
|
|
- **Headless Mode**: Faster execution
|
|
- **Location Filtering**: Reduces false positives
|
|
- **AI Analysis**: Improves result quality (enabled by default)
|
|
- **Batch Processing**: Efficient data handling
|
|
|
|
### Monitoring
|
|
|
|
- Real-time progress indicators
|
|
- Detailed logging with configurable levels
|
|
- Performance metrics tracking
|
|
- Error handling and recovery
|
|
|
|
## 🤝 Contributing
|
|
|
|
### Development Setup
|
|
|
|
1. Fork the repository
|
|
2. Create feature branch
|
|
3. Add tests for new functionality
|
|
4. Ensure all tests pass
|
|
5. Submit pull request
|
|
|
|
### Code Standards
|
|
|
|
- Follow existing code style
|
|
- Add JSDoc comments
|
|
- Maintain test coverage
|
|
- Update documentation
|
|
|
|
## 📄 License
|
|
|
|
This project is for educational and research purposes. Please respect LinkedIn's Terms of Service and use responsibly.
|
|
|
|
## 🆘 Support
|
|
|
|
### Common Issues
|
|
|
|
- **Browser Issues**: Ensure Playwright is installed
|
|
- **Login Problems**: Check credentials in `.env`
|
|
- **Rate Limiting**: Implement delays between requests
|
|
- **Location Filtering**: Verify location filter format
|
|
- **AI Analysis**: Ensure Ollama is running for AI features
|
|
|
|
### Getting Help
|
|
|
|
- Check the component-specific READMEs
|
|
- Review the demo files for examples
|
|
- Examine the test files for usage patterns
|
|
- Open an issue with detailed error information
|
|
|
|
## 🆕 What's New
|
|
|
|
- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
|
|
- **No Separate Files**: No more separate AI analysis files to manage
|
|
- **CLI Tool**: Standalone AI analysis with flexible options
|
|
- **Rich Context**: Each post includes detailed AI insights
|
|
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
|
|
- **Backward Compatible**: Original data structure preserved
|
|
|
|
---
|
|
|
|
**Note**: This tool is designed for educational and research purposes. Always respect LinkedIn's Terms of Service and implement appropriate rate limiting and ethical usage practices.
|