11 KiB
Job Market Intelligence Platform
A comprehensive platform for job market intelligence with integrated AI-powered insights. Built with modular architecture for extensibility and maintainability.
🏗️ Architecture Overview
job-market-intelligence/
├── ai-analyzer/ # Shared core utilities (logger, AI, location, text) + CLI tool
├── linkedin-parser/ # LinkedIn-specific scraper with integrated AI analysis
├── job-search-parser/ # Job search intelligence
└── docs/ # Documentation
🚀 Quick Start
Prerequisites
- Node.js 18+
- Playwright browser automation
- LinkedIn account credentials
- Optional: Ollama for local AI analysis
Installation
npm install
npx playwright install chromium
Basic Usage
# Run LinkedIn parser with integrated AI analysis
cd linkedin-parser && npm start
# Run LinkedIn parser with specific keywords
cd linkedin-parser && npm run start:custom
# Run LinkedIn parser without AI analysis
cd linkedin-parser && npm run start:no-ai
# Run job search parser
cd job-search-parser && npm start
# Analyze existing results with AI (CLI)
cd linkedin-parser && npm run analyze:latest
# Analyze with custom context
cd linkedin-parser && npm run analyze:layoff
# Run demo workflow
node demo.js
📦 Core Components
1. AI Analyzer (ai-analyzer/)
Shared utilities and CLI tool used by all parsers
- Logger: Consistent logging across all components
- Text Processing: Keyword matching, text cleaning
- Location Validation: Geographic filtering and validation
- AI Integration: Local Ollama support with integrated analysis
- CLI Tool: Command-line interface for standalone AI analysis
- Test Utilities: Shared testing helpers
Key Features:
- Configurable log levels with color support
- Intelligent text processing and keyword matching
- Geographic location validation against filters
- Integrated AI analysis: AI results embedded in data structure
- CLI tool: Standalone analysis with flexible options
- Comprehensive test coverage
2. LinkedIn Parser (linkedin-parser/)
Specialized LinkedIn content scraper with integrated AI analysis
- Automated LinkedIn login and navigation
- Keyword-based post searching
- Profile location validation
- Duplicate detection and filtering
- Automatic AI analysis integrated into results
- Configurable search parameters
Key Features:
- Browser automation with Playwright
- Geographic filtering by city/region
- Date range filtering (24h, week, month)
- Integrated AI-powered content relevance analysis
- Single JSON output with embedded AI insights
- Two output files: results (with AI) and rejected posts
3. Job Search Parser (job-search-parser/)
Job market intelligence and analysis
- Job posting aggregation
- Role-specific keyword tracking
- Market trend analysis
- Salary and requirement insights
Key Features:
- Tech role keyword tracking
- Industry-specific analysis
- Market demand insights
- Competitive intelligence
4. AI Analysis CLI (ai-analyzer/cli.js)
Command-line tool for AI analysis of any results JSON file
- Analyze any results JSON file from LinkedIn parser or other sources
- Integrated analysis: AI results embedded back into original JSON
- Custom analysis context and AI models
- Comprehensive analysis summary and statistics
- Flexible input format support
Key Features:
- Works with any JSON results file
- Integrated output: AI analysis embedded in original structure
- Custom analysis contexts
- Detailed relevance scoring
- Confidence level analysis
- Summary statistics and insights
🔧 Configuration
Environment Variables
Create a .env file in the root directory:
# LinkedIn Credentials
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password
# Search Configuration
CITY=Toronto
DATE_POSTED=past-week
SORT_BY=date_posted
WHEELS=5
# Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true
# AI Analysis
ENABLE_AI_ANALYSIS=true
AI_CONTEXT="job market analysis and trends"
OLLAMA_MODEL=mistral
# Keywords
KEYWORDS=keywords-layoff.csv
Command Line Options
# LinkedIn Parser Options
--headless=true|false # Browser headless mode
--keyword="kw1,kw2" # Specific keywords
--add-keyword="kw1,kw2" # Additional keywords
--no-location # Disable location filtering
--no-ai # Disable AI analysis
# Job Search Parser Options
--help # Show parser-specific help
# AI Analysis CLI Options
--input=FILE # Input JSON file
--output=FILE # Output file
--context="description" # Custom AI analysis context
--model=MODEL # Ollama model
--latest # Use latest results file
--dir=PATH # Directory to look for results
📊 Output Formats
LinkedIn Parser Output
The LinkedIn parser now generates two main files with integrated AI analysis:
1. Main Results with AI Analysis (linkedin-results-YYYY-MM-DD-HH-MM.json)
{
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"totalPosts": 45,
"rejectedPosts": 12,
"aiAnalysisEnabled": true,
"aiAnalysisCompleted": true,
"aiContext": "job market analysis and trends",
"aiModel": "mistral",
"locationFilter": "Ontario,Manitoba"
},
"results": [
{
"keyword": "layoff",
"text": "Cleaned post content...",
"profileLink": "https://linkedin.com/in/johndoe",
"location": "Toronto, Ontario, Canada",
"locationValid": true,
"locationMatchedFilter": "Ontario",
"locationReasoning": "Location matches filter",
"timestamp": "2024-01-15T10:30:00Z",
"source": "linkedin",
"parser": "linkedout-parser",
"aiAnalysis": {
"isRelevant": true,
"confidence": 0.9,
"reasoning": "Post discusses job market conditions and layoffs",
"context": "job market analysis and trends",
"model": "mistral",
"analyzedAt": "2024-01-15T10:30:00Z"
}
}
]
}
2. Rejected Posts (linkedin-rejected-YYYY-MM-DD-HH-MM.json)
[
{
"rejected": true,
"reason": "Location filter failed: Location not in filter",
"keyword": "layoff",
"text": "Post content...",
"profileLink": "https://linkedin.com/in/janedoe",
"location": "Vancouver, BC, Canada",
"timestamp": "2024-01-15T10:30:00Z"
}
]
AI Analysis CLI Output
The CLI tool creates integrated results with AI analysis embedded:
Re-analyzed Results (original-filename-ai.json)
{
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"totalPosts": 45,
"aiAnalysisUpdated": "2024-01-15T11:00:00Z",
"aiContext": "layoff analysis",
"aiModel": "mistral"
},
"results": [
{
"keyword": "layoff",
"text": "Post content...",
"profileLink": "https://linkedin.com/in/johndoe",
"location": "Toronto, Ontario, Canada",
"aiAnalysis": {
"isRelevant": true,
"confidence": 0.9,
"reasoning": "Post mentions layoffs and workforce reduction",
"context": "layoff analysis",
"model": "mistral",
"analyzedAt": "2024-01-15T11:00:00Z"
}
}
]
}
🧪 Testing
Run All Tests
npm test
Run Specific Test Suites
# AI Analyzer tests
cd ai-analyzer && npm test
# LinkedIn Parser tests
cd linkedin-parser && npm test
# Job Search Parser tests
cd job-search-parser && npm test
🔒 Security & Legal
Security Best Practices
- Store credentials in
.envfile (never commit) - Use environment variables for sensitive data
- Implement rate limiting to avoid detection
- Respect LinkedIn's Terms of Service
Legal Compliance
- Educational/research purposes only
- Respect rate limits and usage policies
- Monitor LinkedIn ToS changes
- Implement data retention policies
🚀 Advanced Features
AI-Powered Analysis
- Local AI: Ollama integration for privacy
- Integrated Analysis: AI results embedded in data structure
- Automatic Analysis: Runs after parsing completes
- Context Analysis: Relevance scoring
- Confidence Scoring: AI confidence levels for each post
- CLI Tool: Standalone analysis with flexible options
Geographic Intelligence
- Location Validation: Profile location verification
- Regional Filtering: City/state/country filtering
- Geographic Analysis: Location-based insights
Data Processing
- Duplicate Detection: Intelligent deduplication
- Content Cleaning: Remove hashtags, URLs, emojis
- Metadata Extraction: Author, engagement, timing data
- Integrated AI: AI insights embedded in each result
📈 Performance Optimization
Recommended Settings
- Headless Mode: Faster execution
- Location Filtering: Reduces false positives
- AI Analysis: Improves result quality (enabled by default)
- Batch Processing: Efficient data handling
Monitoring
- Real-time progress indicators
- Detailed logging with configurable levels
- Performance metrics tracking
- Error handling and recovery
🤝 Contributing
Development Setup
- Fork the repository
- Create feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit pull request
Code Standards
- Follow existing code style
- Add JSDoc comments
- Maintain test coverage
- Update documentation
📄 License
This project is for educational and research purposes. Please respect LinkedIn's Terms of Service and use responsibly.
🆘 Support
Common Issues
- Browser Issues: Ensure Playwright is installed
- Login Problems: Check credentials in
.env - Rate Limiting: Implement delays between requests
- Location Filtering: Verify location filter format
- AI Analysis: Ensure Ollama is running for AI features
Getting Help
- Check the component-specific READMEs
- Review the demo files for examples
- Examine the test files for usage patterns
- Open an issue with detailed error information
🆕 What's New
- Integrated AI Analysis: AI results are now embedded directly in the results JSON
- No Separate Files: No more separate AI analysis files to manage
- CLI Tool: Standalone AI analysis with flexible options
- Rich Context: Each post includes detailed AI insights
- Flexible Re-analysis: Easy to re-analyze with different contexts
- Backward Compatible: Original data structure preserved
Note: This tool is designed for educational and research purposes. Always respect LinkedIn's Terms of Service and implement appropriate rate limiting and ethical usage practices.