linkedout/README.md

11 KiB

Job Market Intelligence Platform

A comprehensive platform for job market intelligence with integrated AI-powered insights. Built with modular architecture for extensibility and maintainability.

🏗️ Architecture Overview

job-market-intelligence/
├── ai-analyzer/              # Shared core utilities (logger, AI, location, text) + CLI tool
├── linkedin-parser/          # LinkedIn-specific scraper with integrated AI analysis
├── job-search-parser/        # Job search intelligence
└── docs/                    # Documentation

🚀 Quick Start

Prerequisites

  • Node.js 18+
  • Playwright browser automation
  • LinkedIn account credentials
  • Optional: Ollama for local AI analysis

Installation

npm install
npx playwright install chromium

Basic Usage

# Run LinkedIn parser with integrated AI analysis
cd linkedin-parser && npm start

# Run LinkedIn parser with specific keywords
cd linkedin-parser && npm run start:custom

# Run LinkedIn parser without AI analysis
cd linkedin-parser && npm run start:no-ai

# Run job search parser
cd job-search-parser && npm start

# Analyze existing results with AI (CLI)
cd linkedin-parser && npm run analyze:latest

# Analyze with custom context
cd linkedin-parser && npm run analyze:layoff

# Run demo workflow
node demo.js

📦 Core Components

1. AI Analyzer (ai-analyzer/)

Shared utilities and CLI tool used by all parsers

  • Logger: Consistent logging across all components
  • Text Processing: Keyword matching, text cleaning
  • Location Validation: Geographic filtering and validation
  • AI Integration: Local Ollama support with integrated analysis
  • CLI Tool: Command-line interface for standalone AI analysis
  • Test Utilities: Shared testing helpers

Key Features:

  • Configurable log levels with color support
  • Intelligent text processing and keyword matching
  • Geographic location validation against filters
  • Integrated AI analysis: AI results embedded in data structure
  • CLI tool: Standalone analysis with flexible options
  • Comprehensive test coverage

2. LinkedIn Parser (linkedin-parser/)

Specialized LinkedIn content scraper with integrated AI analysis

  • Automated LinkedIn login and navigation
  • Keyword-based post searching
  • Profile location validation
  • Duplicate detection and filtering
  • Automatic AI analysis integrated into results
  • Configurable search parameters

Key Features:

  • Browser automation with Playwright
  • Geographic filtering by city/region
  • Date range filtering (24h, week, month)
  • Integrated AI-powered content relevance analysis
  • Single JSON output with embedded AI insights
  • Two output files: results (with AI) and rejected posts

3. Job Search Parser (job-search-parser/)

Job market intelligence and analysis

  • Job posting aggregation
  • Role-specific keyword tracking
  • Market trend analysis
  • Salary and requirement insights

Key Features:

  • Tech role keyword tracking
  • Industry-specific analysis
  • Market demand insights
  • Competitive intelligence

4. AI Analysis CLI (ai-analyzer/cli.js)

Command-line tool for AI analysis of any results JSON file

  • Analyze any results JSON file from LinkedIn parser or other sources
  • Integrated analysis: AI results embedded back into original JSON
  • Custom analysis context and AI models
  • Comprehensive analysis summary and statistics
  • Flexible input format support

Key Features:

  • Works with any JSON results file
  • Integrated output: AI analysis embedded in original structure
  • Custom analysis contexts
  • Detailed relevance scoring
  • Confidence level analysis
  • Summary statistics and insights

🔧 Configuration

Environment Variables

Create a .env file in the root directory:

# LinkedIn Credentials
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password

# Search Configuration
CITY=Toronto
DATE_POSTED=past-week
SORT_BY=date_posted
WHEELS=5

# Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true

# AI Analysis
ENABLE_AI_ANALYSIS=true
AI_CONTEXT="job market analysis and trends"
OLLAMA_MODEL=mistral

# Keywords
KEYWORDS=keywords-layoff.csv

Command Line Options

# LinkedIn Parser Options
--headless=true|false         # Browser headless mode
--keyword="kw1,kw2"          # Specific keywords
--add-keyword="kw1,kw2"      # Additional keywords
--no-location                # Disable location filtering
--no-ai                      # Disable AI analysis

# Job Search Parser Options
--help                       # Show parser-specific help

# AI Analysis CLI Options
--input=FILE                 # Input JSON file
--output=FILE                # Output file
--context="description"      # Custom AI analysis context
--model=MODEL                # Ollama model
--latest                     # Use latest results file
--dir=PATH                   # Directory to look for results

📊 Output Formats

LinkedIn Parser Output

The LinkedIn parser now generates two main files with integrated AI analysis:

1. Main Results with AI Analysis (linkedin-results-YYYY-MM-DD-HH-MM.json)

{
  "metadata": {
    "timestamp": "2024-01-15T10:30:00Z",
    "totalPosts": 45,
    "rejectedPosts": 12,
    "aiAnalysisEnabled": true,
    "aiAnalysisCompleted": true,
    "aiContext": "job market analysis and trends",
    "aiModel": "mistral",
    "locationFilter": "Ontario,Manitoba"
  },
  "results": [
    {
      "keyword": "layoff",
      "text": "Cleaned post content...",
      "profileLink": "https://linkedin.com/in/johndoe",
      "location": "Toronto, Ontario, Canada",
      "locationValid": true,
      "locationMatchedFilter": "Ontario",
      "locationReasoning": "Location matches filter",
      "timestamp": "2024-01-15T10:30:00Z",
      "source": "linkedin",
      "parser": "linkedout-parser",
      "aiAnalysis": {
        "isRelevant": true,
        "confidence": 0.9,
        "reasoning": "Post discusses job market conditions and layoffs",
        "context": "job market analysis and trends",
        "model": "mistral",
        "analyzedAt": "2024-01-15T10:30:00Z"
      }
    }
  ]
}

2. Rejected Posts (linkedin-rejected-YYYY-MM-DD-HH-MM.json)

[
  {
    "rejected": true,
    "reason": "Location filter failed: Location not in filter",
    "keyword": "layoff",
    "text": "Post content...",
    "profileLink": "https://linkedin.com/in/janedoe",
    "location": "Vancouver, BC, Canada",
    "timestamp": "2024-01-15T10:30:00Z"
  }
]

AI Analysis CLI Output

The CLI tool creates integrated results with AI analysis embedded:

Re-analyzed Results (original-filename-ai.json)

{
  "metadata": {
    "timestamp": "2024-01-15T10:30:00Z",
    "totalPosts": 45,
    "aiAnalysisUpdated": "2024-01-15T11:00:00Z",
    "aiContext": "layoff analysis",
    "aiModel": "mistral"
  },
  "results": [
    {
      "keyword": "layoff",
      "text": "Post content...",
      "profileLink": "https://linkedin.com/in/johndoe",
      "location": "Toronto, Ontario, Canada",
      "aiAnalysis": {
        "isRelevant": true,
        "confidence": 0.9,
        "reasoning": "Post mentions layoffs and workforce reduction",
        "context": "layoff analysis",
        "model": "mistral",
        "analyzedAt": "2024-01-15T11:00:00Z"
      }
    }
  ]
}

🧪 Testing

Run All Tests

npm test

Run Specific Test Suites

# AI Analyzer tests
cd ai-analyzer && npm test

# LinkedIn Parser tests
cd linkedin-parser && npm test

# Job Search Parser tests
cd job-search-parser && npm test

Security Best Practices

  • Store credentials in .env file (never commit)
  • Use environment variables for sensitive data
  • Implement rate limiting to avoid detection
  • Respect LinkedIn's Terms of Service
  • Educational/research purposes only
  • Respect rate limits and usage policies
  • Monitor LinkedIn ToS changes
  • Implement data retention policies

🚀 Advanced Features

AI-Powered Analysis

  • Local AI: Ollama integration for privacy
  • Integrated Analysis: AI results embedded in data structure
  • Automatic Analysis: Runs after parsing completes
  • Context Analysis: Relevance scoring
  • Confidence Scoring: AI confidence levels for each post
  • CLI Tool: Standalone analysis with flexible options

Geographic Intelligence

  • Location Validation: Profile location verification
  • Regional Filtering: City/state/country filtering
  • Geographic Analysis: Location-based insights

Data Processing

  • Duplicate Detection: Intelligent deduplication
  • Content Cleaning: Remove hashtags, URLs, emojis
  • Metadata Extraction: Author, engagement, timing data
  • Integrated AI: AI insights embedded in each result

📈 Performance Optimization

  • Headless Mode: Faster execution
  • Location Filtering: Reduces false positives
  • AI Analysis: Improves result quality (enabled by default)
  • Batch Processing: Efficient data handling

Monitoring

  • Real-time progress indicators
  • Detailed logging with configurable levels
  • Performance metrics tracking
  • Error handling and recovery

🤝 Contributing

Development Setup

  1. Fork the repository
  2. Create feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit pull request

Code Standards

  • Follow existing code style
  • Add JSDoc comments
  • Maintain test coverage
  • Update documentation

📄 License

This project is for educational and research purposes. Please respect LinkedIn's Terms of Service and use responsibly.

🆘 Support

Common Issues

  • Browser Issues: Ensure Playwright is installed
  • Login Problems: Check credentials in .env
  • Rate Limiting: Implement delays between requests
  • Location Filtering: Verify location filter format
  • AI Analysis: Ensure Ollama is running for AI features

Getting Help

  • Check the component-specific READMEs
  • Review the demo files for examples
  • Examine the test files for usage patterns
  • Open an issue with detailed error information

🆕 What's New

  • Integrated AI Analysis: AI results are now embedded directly in the results JSON
  • No Separate Files: No more separate AI analysis files to manage
  • CLI Tool: Standalone AI analysis with flexible options
  • Rich Context: Each post includes detailed AI insights
  • Flexible Re-analysis: Easy to re-analyze with different contexts
  • Backward Compatible: Original data structure preserved

Note: This tool is designed for educational and research purposes. Always respect LinkedIn's Terms of Service and implement appropriate rate limiting and ethical usage practices.