# Job Market Intelligence Platform A comprehensive platform for job market intelligence with **integrated AI-powered insights**. Built with modular architecture for extensibility and maintainability. ## ๐Ÿ—๏ธ Architecture Overview ``` job-market-intelligence/ โ”œโ”€โ”€ ai-analyzer/ # Shared core utilities (logger, AI, location, text) + CLI tool โ”œโ”€โ”€ linkedin-parser/ # LinkedIn-specific scraper with integrated AI analysis โ”œโ”€โ”€ job-search-parser/ # Job search intelligence โ””โ”€โ”€ docs/ # Documentation ``` ## ๐Ÿš€ Quick Start ### Prerequisites - Node.js 18+ - Playwright browser automation - LinkedIn account credentials - Optional: Ollama for local AI analysis ### Installation ```bash npm install npx playwright install chromium ``` ### Basic Usage ```bash # Run LinkedIn parser with integrated AI analysis cd linkedin-parser && npm start # Run LinkedIn parser with specific keywords cd linkedin-parser && npm run start:custom # Run LinkedIn parser without AI analysis cd linkedin-parser && npm run start:no-ai # Run job search parser cd job-search-parser && npm start # Analyze existing results with AI (CLI) cd linkedin-parser && npm run analyze:latest # Analyze with custom context cd linkedin-parser && npm run analyze:layoff # Run demo workflow node demo.js ``` ## ๐Ÿ“ฆ Core Components ### 1. AI Analyzer (`ai-analyzer/`) **Shared utilities and CLI tool used by all parsers** - **Logger**: Consistent logging across all components - **Text Processing**: Keyword matching, text cleaning - **Location Validation**: Geographic filtering and validation - **AI Integration**: Local Ollama support with integrated analysis - **CLI Tool**: Command-line interface for standalone AI analysis - **Test Utilities**: Shared testing helpers **Key Features:** - Configurable log levels with color support - Intelligent text processing and keyword matching - Geographic location validation against filters - **Integrated AI analysis**: AI results embedded in data structure - **CLI tool**: Standalone analysis with flexible options - Comprehensive test coverage ### 2. LinkedIn Parser (`linkedin-parser/`) **Specialized LinkedIn content scraper with integrated AI analysis** - Automated LinkedIn login and navigation - Keyword-based post searching - Profile location validation - Duplicate detection and filtering - **Automatic AI analysis integrated into results** - Configurable search parameters **Key Features:** - Browser automation with Playwright - Geographic filtering by city/region - Date range filtering (24h, week, month) - **Integrated AI-powered content relevance analysis** - **Single JSON output with embedded AI insights** - **Two output files: results (with AI) and rejected posts** ### 3. Job Search Parser (`job-search-parser/`) **Job market intelligence and analysis** - Job posting aggregation - Role-specific keyword tracking - Market trend analysis - Salary and requirement insights **Key Features:** - Tech role keyword tracking - Industry-specific analysis - Market demand insights - Competitive intelligence ### 4. AI Analysis CLI (`ai-analyzer/cli.js`) **Command-line tool for AI analysis of any results JSON file** - Analyze any results JSON file from LinkedIn parser or other sources - **Integrated analysis**: AI results embedded back into original JSON - Custom analysis context and AI models - Comprehensive analysis summary and statistics - Flexible input format support **Key Features:** - Works with any JSON results file - **Integrated output**: AI analysis embedded in original structure - Custom analysis contexts - Detailed relevance scoring - Confidence level analysis - Summary statistics and insights ## ๐Ÿ”ง Configuration ### Environment Variables Create a `.env` file in the root directory: ```env # LinkedIn Credentials LINKEDIN_USERNAME=your_email@example.com LINKEDIN_PASSWORD=your_password # Search Configuration CITY=Toronto DATE_POSTED=past-week SORT_BY=date_posted WHEELS=5 # Location Filtering LOCATION_FILTER=Ontario,Manitoba ENABLE_LOCATION_CHECK=true # AI Analysis ENABLE_AI_ANALYSIS=true AI_CONTEXT="job market analysis and trends" OLLAMA_MODEL=mistral # Keywords KEYWORDS=keywords-layoff.csv ``` ### Command Line Options ```bash # LinkedIn Parser Options --headless=true|false # Browser headless mode --keyword="kw1,kw2" # Specific keywords --add-keyword="kw1,kw2" # Additional keywords --no-location # Disable location filtering --no-ai # Disable AI analysis # Job Search Parser Options --help # Show parser-specific help # AI Analysis CLI Options --input=FILE # Input JSON file --output=FILE # Output file --context="description" # Custom AI analysis context --model=MODEL # Ollama model --latest # Use latest results file --dir=PATH # Directory to look for results ``` ## ๐Ÿ“Š Output Formats ### LinkedIn Parser Output The LinkedIn parser now generates **two main files** with **integrated AI analysis**: #### 1. Main Results with AI Analysis (`linkedin-results-YYYY-MM-DD-HH-MM.json`) ```json { "metadata": { "timestamp": "2024-01-15T10:30:00Z", "totalPosts": 45, "rejectedPosts": 12, "aiAnalysisEnabled": true, "aiAnalysisCompleted": true, "aiContext": "job market analysis and trends", "aiModel": "mistral", "locationFilter": "Ontario,Manitoba" }, "results": [ { "keyword": "layoff", "text": "Cleaned post content...", "profileLink": "https://linkedin.com/in/johndoe", "location": "Toronto, Ontario, Canada", "locationValid": true, "locationMatchedFilter": "Ontario", "locationReasoning": "Location matches filter", "timestamp": "2024-01-15T10:30:00Z", "source": "linkedin", "parser": "linkedout-parser", "aiAnalysis": { "isRelevant": true, "confidence": 0.9, "reasoning": "Post discusses job market conditions and layoffs", "context": "job market analysis and trends", "model": "mistral", "analyzedAt": "2024-01-15T10:30:00Z" } } ] } ``` #### 2. Rejected Posts (`linkedin-rejected-YYYY-MM-DD-HH-MM.json`) ```json [ { "rejected": true, "reason": "Location filter failed: Location not in filter", "keyword": "layoff", "text": "Post content...", "profileLink": "https://linkedin.com/in/janedoe", "location": "Vancouver, BC, Canada", "timestamp": "2024-01-15T10:30:00Z" } ] ``` ### AI Analysis CLI Output The CLI tool creates **integrated results** with AI analysis embedded: #### Re-analyzed Results (`original-filename-ai.json`) ```json { "metadata": { "timestamp": "2024-01-15T10:30:00Z", "totalPosts": 45, "aiAnalysisUpdated": "2024-01-15T11:00:00Z", "aiContext": "layoff analysis", "aiModel": "mistral" }, "results": [ { "keyword": "layoff", "text": "Post content...", "profileLink": "https://linkedin.com/in/johndoe", "location": "Toronto, Ontario, Canada", "aiAnalysis": { "isRelevant": true, "confidence": 0.9, "reasoning": "Post mentions layoffs and workforce reduction", "context": "layoff analysis", "model": "mistral", "analyzedAt": "2024-01-15T11:00:00Z" } } ] } ``` ## ๐Ÿงช Testing ### Run All Tests ```bash npm test ``` ### Run Specific Test Suites ```bash # AI Analyzer tests cd ai-analyzer && npm test # LinkedIn Parser tests cd linkedin-parser && npm test # Job Search Parser tests cd job-search-parser && npm test ``` ## ๐Ÿ”’ Security & Legal ### Security Best Practices - Store credentials in `.env` file (never commit) - Use environment variables for sensitive data - Implement rate limiting to avoid detection - Respect LinkedIn's Terms of Service ### Legal Compliance - Educational/research purposes only - Respect rate limits and usage policies - Monitor LinkedIn ToS changes - Implement data retention policies ## ๐Ÿš€ Advanced Features ### AI-Powered Analysis - **Local AI**: Ollama integration for privacy - **Integrated Analysis**: AI results embedded in data structure - **Automatic Analysis**: Runs after parsing completes - **Context Analysis**: Relevance scoring - **Confidence Scoring**: AI confidence levels for each post - **CLI Tool**: Standalone analysis with flexible options ### Geographic Intelligence - **Location Validation**: Profile location verification - **Regional Filtering**: City/state/country filtering - **Geographic Analysis**: Location-based insights ### Data Processing - **Duplicate Detection**: Intelligent deduplication - **Content Cleaning**: Remove hashtags, URLs, emojis - **Metadata Extraction**: Author, engagement, timing data - **Integrated AI**: AI insights embedded in each result ## ๐Ÿ“ˆ Performance Optimization ### Recommended Settings - **Headless Mode**: Faster execution - **Location Filtering**: Reduces false positives - **AI Analysis**: Improves result quality (enabled by default) - **Batch Processing**: Efficient data handling ### Monitoring - Real-time progress indicators - Detailed logging with configurable levels - Performance metrics tracking - Error handling and recovery ## ๐Ÿค Contributing ### Development Setup 1. Fork the repository 2. Create feature branch 3. Add tests for new functionality 4. Ensure all tests pass 5. Submit pull request ### Code Standards - Follow existing code style - Add JSDoc comments - Maintain test coverage - Update documentation ## ๐Ÿ“„ License This project is for educational and research purposes. Please respect LinkedIn's Terms of Service and use responsibly. ## ๐Ÿ†˜ Support ### Common Issues - **Browser Issues**: Ensure Playwright is installed - **Login Problems**: Check credentials in `.env` - **Rate Limiting**: Implement delays between requests - **Location Filtering**: Verify location filter format - **AI Analysis**: Ensure Ollama is running for AI features ### Getting Help - Check the component-specific READMEs - Review the demo files for examples - Examine the test files for usage patterns - Open an issue with detailed error information ## ๐Ÿ†• What's New - **Integrated AI Analysis**: AI results are now embedded directly in the results JSON - **No Separate Files**: No more separate AI analysis files to manage - **CLI Tool**: Standalone AI analysis with flexible options - **Rich Context**: Each post includes detailed AI insights - **Flexible Re-analysis**: Easy to re-analyze with different contexts - **Backward Compatible**: Original data structure preserved --- **Note**: This tool is designed for educational and research purposes. Always respect LinkedIn's Terms of Service and implement appropriate rate limiting and ethical usage practices.