linkedout/linkedin-parser/README.md

# LinkedIn Parser

LinkedIn posts parser with **integrated AI analysis** using the ai-analyzer core package. AI analysis is now embedded directly into the results JSON file.

## 🚀 Quick Start

```bash
# Install dependencies
npm install

# Run with default settings (AI analysis integrated into results)
npm start

# Run without AI analysis
npm run start:no-ai
```

## 📋 Available Scripts

### Parser Modes

```bash
# Basic parsing with integrated AI analysis
npm start

# Parsing without AI analysis
npm run start:no-ai

# Headless browser mode
npm run start:headless

# Visible browser mode (for debugging)
npm run start:visible

# Disable location filtering
npm run start:no-location

# Custom keywords
npm run start:custom
```

### Testing

```bash
# Run tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage
```

### AI Analysis (CLI)

```bash
# Analyze latest results file with default context
npm run analyze:latest

# Analyze latest results file for layoffs
npm run analyze:layoff

# Analyze latest results file for job market trends
npm run analyze:trends

# Analyze specific file (requires --input parameter)
npm run analyze -- --input=results.json
```

### Utilities

```bash
# Show help
npm run help

# Run demo
npm run demo

# Install Playwright browser
npm run install:playwright
```

## 🔧 Configuration

### Environment Variables

Create a `.env` file in the `linkedin-parser` directory:

```env
# LinkedIn Credentials
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password

# Search Configuration
CITY=Toronto
DATE_POSTED=past-week
SORT_BY=date_posted
WHEELS=5

# Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true

# AI Analysis
ENABLE_AI_ANALYSIS=true
AI_CONTEXT="job market analysis and trends"
OLLAMA_MODEL=mistral

# Browser Configuration
HEADLESS=true
```

### Command Line Options

```bash
# Browser options
--headless=true|false    # Browser headless mode
--keyword="kw1,kw2"      # Specific keywords
--add-keyword="kw"       # Additional keywords
--no-location            # Disable location filtering
--no-ai                  # Disable AI analysis
```

## 📊 Output Files

The parser generates two main files:

1. **`linkedin-results-YYYY-MM-DD-HH-MM.json`** - Main results with **integrated AI analysis**
2. **`linkedin-rejected-YYYY-MM-DD-HH-MM.json`** - Rejected posts with reasons

### Results Structure

Each result in the JSON file now includes AI analysis:

```json
{
  "metadata": {
    "timestamp": "2025-07-21T02:00:08.561Z",
    "totalPosts": 10,
    "aiAnalysisEnabled": true,
    "aiAnalysisCompleted": true,
    "aiContext": "job market analysis and trends",
    "aiModel": "mistral"
  },
  "results": [
    {
      "keyword": "layoff",
      "text": "Post content...",
      "profileLink": "https://linkedin.com/in/user",
      "location": "Toronto, Ontario",
      "aiAnalysis": {
        "isRelevant": true,
        "confidence": 0.9,
        "reasoning": "Post discusses job market conditions and hiring",
        "context": "job market analysis and trends",
        "model": "mistral",
        "analyzedAt": "2025-07-21T02:48:42.487Z"
      }
    }
  ]
}
```

## 🧠 AI Analysis Workflow

### Automatic Integration

AI analysis runs automatically after parsing completes and is **embedded directly into the results JSON** (unless disabled with `--no-ai`).

### Manual Re-analysis

You can re-analyze existing results with different contexts using the CLI:

```bash
# Analyze latest results with default context
npm run analyze:latest

# Analyze latest results for layoffs
npm run analyze:layoff

# Analyze latest results for job market trends
npm run analyze:trends

# Analyze specific file with custom context
node ../ai-analyzer/cli.js --input=results.json --context="custom analysis"
```

### CLI Options

The AI analyzer CLI supports:

```bash
--input=FILE              # Input JSON file
--output=FILE             # Output file (default: original-ai.json)
--context="description"   # Analysis context
--model=MODEL             # Ollama model (default: mistral)
--latest                  # Use latest results file
--dir=PATH                # Directory to look for results
```

## 🎯 Use Cases

### Basic Usage

```bash
# Run parser with integrated AI analysis
npm start
```

### Testing Different Keywords

```bash
# Test with custom keywords
npm run start:custom
```

### Debugging

```bash
# Run with visible browser
npm run start:visible

# Run without location filtering
npm run start:no-location
```

### Re-analyzing Data

```bash
# After running parser, re-analyze with different contexts
npm run analyze:layoff
npm run analyze:trends

# Analyze specific file
node ../ai-analyzer/cli.js --input=results/linkedin-results-2025-07-20-18-00.json
```

## 🔍 Troubleshooting

### Common Issues

1. **Missing credentials**

   ```bash
   # Check .env file exists and has credentials
   cat .env
   ```

2. **Browser issues**

   ```bash
   # Install Playwright browser
   npm run install:playwright
   ```

3. **AI not available**

   ```bash
   # Make sure Ollama is running
   ollama list

   # Install mistral model if needed
   ollama pull mistral
   ```

4. **No results found**

   ```bash
   # Try different keywords
   npm run start:custom
   ```

5. **CLI can't find results**
   ```bash
   # Make sure you're in the linkedin-parser directory
   cd linkedin-parser
   npm run analyze:latest
   ```

## 📁 Project Structure

```
linkedin-parser/
├── index.js              # Main parser with integrated AI analysis
├── package.json          # Dependencies and scripts
├── .env                  # Configuration (create this)
├── keywords/             # Keyword CSV files
└── results/              # Output files (created automatically)
    ├── linkedin-results-*.json    # Results with integrated AI analysis
    └── linkedin-rejected-*.json   # Rejected posts
```

## 🤝 Integration

This parser integrates with:

- **ai-analyzer**: Core AI utilities and CLI analysis tool
- **job-search-parser**: Job market intelligence (separate module)

### AI Analysis Package

The `ai-analyzer` package provides:

- **Library functions**: `analyzeBatch`, `checkOllamaStatus`, etc.
- **CLI tool**: `cli.js` for standalone analysis
- **Reusable components**: For other parsers in the ecosystem

## 🆕 What's New

- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
- **No Separate Files**: No more separate AI analysis files to manage
- **Rich Context**: Each post includes detailed AI insights
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
- **Backward Compatible**: Original data structure preserved