linkedout/README.md

# Job Market Intelligence Platform

A comprehensive platform for job market intelligence with **integrated AI-powered insights**. Built with modular architecture for extensibility and maintainability.

## 🏗️ Architecture Overview

```
job-market-intelligence/
├── ai-analyzer/              # Shared core utilities (logger, AI, location, text) + CLI tool
├── linkedin-parser/          # LinkedIn-specific scraper with integrated AI analysis
├── job-search-parser/        # Job search intelligence
└── docs/                    # Documentation
```

## 🚀 Quick Start

### Prerequisites

- Node.js 18+
- Playwright browser automation
- LinkedIn account credentials
- Optional: Ollama for local AI analysis

### Installation

```bash
npm install
npx playwright install chromium
```

### Basic Usage

```bash
# Run LinkedIn parser with integrated AI analysis
cd linkedin-parser && npm start

# Run LinkedIn parser with specific keywords
cd linkedin-parser && npm run start:custom

# Run LinkedIn parser without AI analysis
cd linkedin-parser && npm run start:no-ai

# Run job search parser
cd job-search-parser && npm start

# Analyze existing results with AI (CLI)
cd linkedin-parser && npm run analyze:latest

# Analyze with custom context
cd linkedin-parser && npm run analyze:layoff

# Run demo workflow
node demo.js
```

## 📦 Core Components

### 1. AI Analyzer (`ai-analyzer/`)

**Shared utilities and CLI tool used by all parsers**

- **Logger**: Consistent logging across all components
- **Text Processing**: Keyword matching, text cleaning
- **Location Validation**: Geographic filtering and validation
- **AI Integration**: Local Ollama support with integrated analysis
- **CLI Tool**: Command-line interface for standalone AI analysis
- **Test Utilities**: Shared testing helpers

**Key Features:**

- Configurable log levels with color support
- Intelligent text processing and keyword matching
- Geographic location validation against filters
- **Integrated AI analysis**: AI results embedded in data structure
- **CLI tool**: Standalone analysis with flexible options
- Comprehensive test coverage

### 2. LinkedIn Parser (`linkedin-parser/`)

**Specialized LinkedIn content scraper with integrated AI analysis**

- Automated LinkedIn login and navigation
- Keyword-based post searching
- Profile location validation
- Duplicate detection and filtering
- **Automatic AI analysis integrated into results**
- Configurable search parameters

**Key Features:**

- Browser automation with Playwright
- Geographic filtering by city/region
- Date range filtering (24h, week, month)
- **Integrated AI-powered content relevance analysis**
- **Single JSON output with embedded AI insights**
- **Two output files: results (with AI) and rejected posts**

### 3. Job Search Parser (`job-search-parser/`)

**Job market intelligence and analysis**

- Job posting aggregation
- Role-specific keyword tracking
- Market trend analysis
- Salary and requirement insights

**Key Features:**

- Tech role keyword tracking
- Industry-specific analysis
- Market demand insights
- Competitive intelligence

### 4. AI Analysis CLI (`ai-analyzer/cli.js`)

**Command-line tool for AI analysis of any results JSON file**

- Analyze any results JSON file from LinkedIn parser or other sources
- **Integrated analysis**: AI results embedded back into original JSON
- Custom analysis context and AI models
- Comprehensive analysis summary and statistics
- Flexible input format support

**Key Features:**

- Works with any JSON results file
- **Integrated output**: AI analysis embedded in original structure
- Custom analysis contexts
- Detailed relevance scoring
- Confidence level analysis
- Summary statistics and insights

## 🔧 Configuration

### Environment Variables

Create a `.env` file in the root directory:

```env
# LinkedIn Credentials
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password

# Search Configuration
CITY=Toronto
DATE_POSTED=past-week
SORT_BY=date_posted
WHEELS=5

# Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true

# AI Analysis
ENABLE_AI_ANALYSIS=true
AI_CONTEXT="job market analysis and trends"
OLLAMA_MODEL=mistral

# Keywords
KEYWORDS=keywords-layoff.csv
```

### Command Line Options

```bash
# LinkedIn Parser Options
--headless=true|false         # Browser headless mode
--keyword="kw1,kw2"          # Specific keywords
--add-keyword="kw1,kw2"      # Additional keywords
--no-location                # Disable location filtering
--no-ai                      # Disable AI analysis

# Job Search Parser Options
--help                       # Show parser-specific help

# AI Analysis CLI Options
--input=FILE                 # Input JSON file
--output=FILE                # Output file
--context="description"      # Custom AI analysis context
--model=MODEL                # Ollama model
--latest                     # Use latest results file
--dir=PATH                   # Directory to look for results
```

## 📊 Output Formats

### LinkedIn Parser Output

The LinkedIn parser now generates **two main files** with **integrated AI analysis**:

#### 1. Main Results with AI Analysis (`linkedin-results-YYYY-MM-DD-HH-MM.json`)

```json
{
  "metadata": {
    "timestamp": "2024-01-15T10:30:00Z",
    "totalPosts": 45,
    "rejectedPosts": 12,
    "aiAnalysisEnabled": true,
    "aiAnalysisCompleted": true,
    "aiContext": "job market analysis and trends",
    "aiModel": "mistral",
    "locationFilter": "Ontario,Manitoba"
  },
  "results": [
    {
      "keyword": "layoff",
      "text": "Cleaned post content...",
      "profileLink": "https://linkedin.com/in/johndoe",
      "location": "Toronto, Ontario, Canada",
      "locationValid": true,
      "locationMatchedFilter": "Ontario",
      "locationReasoning": "Location matches filter",
      "timestamp": "2024-01-15T10:30:00Z",
      "source": "linkedin",
      "parser": "linkedout-parser",
      "aiAnalysis": {
        "isRelevant": true,
        "confidence": 0.9,
        "reasoning": "Post discusses job market conditions and layoffs",
        "context": "job market analysis and trends",
        "model": "mistral",
        "analyzedAt": "2024-01-15T10:30:00Z"
      }
    }
  ]
}
```

#### 2. Rejected Posts (`linkedin-rejected-YYYY-MM-DD-HH-MM.json`)

```json
[
  {
    "rejected": true,
    "reason": "Location filter failed: Location not in filter",
    "keyword": "layoff",
    "text": "Post content...",
    "profileLink": "https://linkedin.com/in/janedoe",
    "location": "Vancouver, BC, Canada",
    "timestamp": "2024-01-15T10:30:00Z"
  }
]
```

### AI Analysis CLI Output

The CLI tool creates **integrated results** with AI analysis embedded:

#### Re-analyzed Results (`original-filename-ai.json`)

```json
{
  "metadata": {
    "timestamp": "2024-01-15T10:30:00Z",
    "totalPosts": 45,
    "aiAnalysisUpdated": "2024-01-15T11:00:00Z",
    "aiContext": "layoff analysis",
    "aiModel": "mistral"
  },
  "results": [
    {
      "keyword": "layoff",
      "text": "Post content...",
      "profileLink": "https://linkedin.com/in/johndoe",
      "location": "Toronto, Ontario, Canada",
      "aiAnalysis": {
        "isRelevant": true,
        "confidence": 0.9,
        "reasoning": "Post mentions layoffs and workforce reduction",
        "context": "layoff analysis",
        "model": "mistral",
        "analyzedAt": "2024-01-15T11:00:00Z"
      }
    }
  ]
}
```

## 🧪 Testing

### Run All Tests

```bash
npm test
```

### Run Specific Test Suites

```bash
# AI Analyzer tests
cd ai-analyzer && npm test

# LinkedIn Parser tests
cd linkedin-parser && npm test

# Job Search Parser tests
cd job-search-parser && npm test
```

## 🔒 Security & Legal

### Security Best Practices

- Store credentials in `.env` file (never commit)
- Use environment variables for sensitive data
- Implement rate limiting to avoid detection
- Respect LinkedIn's Terms of Service

### Legal Compliance

- Educational/research purposes only
- Respect rate limits and usage policies
- Monitor LinkedIn ToS changes
- Implement data retention policies

## 🚀 Advanced Features

### AI-Powered Analysis

- **Local AI**: Ollama integration for privacy
- **Integrated Analysis**: AI results embedded in data structure
- **Automatic Analysis**: Runs after parsing completes
- **Context Analysis**: Relevance scoring
- **Confidence Scoring**: AI confidence levels for each post
- **CLI Tool**: Standalone analysis with flexible options

### Geographic Intelligence

- **Location Validation**: Profile location verification
- **Regional Filtering**: City/state/country filtering
- **Geographic Analysis**: Location-based insights

### Data Processing

- **Duplicate Detection**: Intelligent deduplication
- **Content Cleaning**: Remove hashtags, URLs, emojis
- **Metadata Extraction**: Author, engagement, timing data
- **Integrated AI**: AI insights embedded in each result

## 📈 Performance Optimization

### Recommended Settings

- **Headless Mode**: Faster execution
- **Location Filtering**: Reduces false positives
- **AI Analysis**: Improves result quality (enabled by default)
- **Batch Processing**: Efficient data handling

### Monitoring

- Real-time progress indicators
- Detailed logging with configurable levels
- Performance metrics tracking
- Error handling and recovery

## 🤝 Contributing

### Development Setup

1. Fork the repository
2. Create feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit pull request

### Code Standards

- Follow existing code style
- Add JSDoc comments
- Maintain test coverage
- Update documentation

## 📄 License

This project is for educational and research purposes. Please respect LinkedIn's Terms of Service and use responsibly.

## 🆘 Support

### Common Issues

- **Browser Issues**: Ensure Playwright is installed
- **Login Problems**: Check credentials in `.env`
- **Rate Limiting**: Implement delays between requests
- **Location Filtering**: Verify location filter format
- **AI Analysis**: Ensure Ollama is running for AI features

### Getting Help

- Check the component-specific READMEs
- Review the demo files for examples
- Examine the test files for usage patterns
- Open an issue with detailed error information

## 🆕 What's New

- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
- **No Separate Files**: No more separate AI analysis files to manage
- **CLI Tool**: Standalone AI analysis with flexible options
- **Rich Context**: Each post includes detailed AI insights
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
- **Backward Compatible**: Original data structure preserved

---

**Note**: This tool is designed for educational and research purposes. Always respect LinkedIn's Terms of Service and implement appropriate rate limiting and ethical usage practices.