linkedout/linkedin-parser

LinkedIn Parser

LinkedIn posts parser with integrated AI analysis using the ai-analyzer core package. AI analysis is now embedded directly into the results JSON file.

🚀 Quick Start

# Install dependencies
npm install

# Run with default settings (AI analysis integrated into results)
npm start

# Run without AI analysis
npm run start:no-ai

📋 Available Scripts

Parser Modes

# Basic parsing with integrated AI analysis
npm start

# Parsing without AI analysis
npm run start:no-ai

# Headless browser mode
npm run start:headless

# Visible browser mode (for debugging)
npm run start:visible

# Disable location filtering
npm run start:no-location

# Custom keywords
npm run start:custom

Testing

# Run tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

AI Analysis (CLI)

# Analyze latest results file with default context
npm run analyze:latest

# Analyze latest results file for layoffs
npm run analyze:layoff

# Analyze latest results file for job market trends
npm run analyze:trends

# Analyze specific file (requires --input parameter)
npm run analyze -- --input=results.json

Utilities

# Show help
npm run help

# Run demo
npm run demo

# Install Playwright browser
npm run install:playwright

🔧 Configuration

Environment Variables

Create a .env file in the linkedin-parser directory:

# LinkedIn Credentials
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password

# Search Configuration
CITY=Toronto
DATE_POSTED=past-week
SORT_BY=date_posted
WHEELS=5

# Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true

# AI Analysis
ENABLE_AI_ANALYSIS=true
AI_CONTEXT="job market analysis and trends"
OLLAMA_MODEL=mistral

# Browser Configuration
HEADLESS=true

Command Line Options

# Browser options
--headless=true|false    # Browser headless mode
--keyword="kw1,kw2"      # Specific keywords
--add-keyword="kw"       # Additional keywords
--no-location            # Disable location filtering
--no-ai                  # Disable AI analysis

📊 Output Files

The parser generates two main files:

  1. linkedin-results-YYYY-MM-DD-HH-MM.json - Main results with integrated AI analysis
  2. linkedin-rejected-YYYY-MM-DD-HH-MM.json - Rejected posts with reasons

Results Structure

Each result in the JSON file now includes AI analysis:

{
  "metadata": {
    "timestamp": "2025-07-21T02:00:08.561Z",
    "totalPosts": 10,
    "aiAnalysisEnabled": true,
    "aiAnalysisCompleted": true,
    "aiContext": "job market analysis and trends",
    "aiModel": "mistral"
  },
  "results": [
    {
      "keyword": "layoff",
      "text": "Post content...",
      "profileLink": "https://linkedin.com/in/user",
      "location": "Toronto, Ontario",
      "aiAnalysis": {
        "isRelevant": true,
        "confidence": 0.9,
        "reasoning": "Post discusses job market conditions and hiring",
        "context": "job market analysis and trends",
        "model": "mistral",
        "analyzedAt": "2025-07-21T02:48:42.487Z"
      }
    }
  ]
}

🧠 AI Analysis Workflow

Automatic Integration

AI analysis runs automatically after parsing completes and is embedded directly into the results JSON (unless disabled with --no-ai).

Manual Re-analysis

You can re-analyze existing results with different contexts using the CLI:

# Analyze latest results with default context
npm run analyze:latest

# Analyze latest results for layoffs
npm run analyze:layoff

# Analyze latest results for job market trends
npm run analyze:trends

# Analyze specific file with custom context
node ../ai-analyzer/cli.js --input=results.json --context="custom analysis"

CLI Options

The AI analyzer CLI supports:

--input=FILE              # Input JSON file
--output=FILE             # Output file (default: original-ai.json)
--context="description"   # Analysis context
--model=MODEL             # Ollama model (default: mistral)
--latest                  # Use latest results file
--dir=PATH                # Directory to look for results

🎯 Use Cases

Basic Usage

# Run parser with integrated AI analysis
npm start

Testing Different Keywords

# Test with custom keywords
npm run start:custom

Debugging

# Run with visible browser
npm run start:visible

# Run without location filtering
npm run start:no-location

Re-analyzing Data

# After running parser, re-analyze with different contexts
npm run analyze:layoff
npm run analyze:trends

# Analyze specific file
node ../ai-analyzer/cli.js --input=results/linkedin-results-2025-07-20-18-00.json

🔍 Troubleshooting

Common Issues

  1. Missing credentials

    # Check .env file exists and has credentials
    cat .env
    
  2. Browser issues

    # Install Playwright browser
    npm run install:playwright
    
  3. AI not available

    # Make sure Ollama is running
    ollama list
    
    # Install mistral model if needed
    ollama pull mistral
    
  4. No results found

    # Try different keywords
    npm run start:custom
    
  5. CLI can't find results

    # Make sure you're in the linkedin-parser directory
    cd linkedin-parser
    npm run analyze:latest
    

📁 Project Structure

linkedin-parser/
├── index.js              # Main parser with integrated AI analysis
├── package.json          # Dependencies and scripts
├── .env                  # Configuration (create this)
├── keywords/             # Keyword CSV files
└── results/              # Output files (created automatically)
    ├── linkedin-results-*.json    # Results with integrated AI analysis
    └── linkedin-rejected-*.json   # Rejected posts

🤝 Integration

This parser integrates with:

  • ai-analyzer: Core AI utilities and CLI analysis tool
  • job-search-parser: Job market intelligence (separate module)

AI Analysis Package

The ai-analyzer package provides:

  • Library functions: analyzeBatch, checkOllamaStatus, etc.
  • CLI tool: cli.js for standalone analysis
  • Reusable components: For other parsers in the ecosystem

🆕 What's New

  • Integrated AI Analysis: AI results are now embedded directly in the results JSON
  • No Separate Files: No more separate AI analysis files to manage
  • Rich Context: Each post includes detailed AI insights
  • Flexible Re-analysis: Easy to re-analyze with different contexts
  • Backward Compatible: Original data structure preserved