linkedout/ai-analyzer/README.md

# AI Analyzer - Core Utilities Package

Shared utilities and core functionality used by all LinkedOut parsers. This package provides consistent logging, text processing, location validation, AI integration, and a **command-line interface for AI analysis**.

## 🎯 Purpose

The AI Analyzer serves as the foundation for all LinkedOut components, providing:

- **Consistent Logging**: Unified logging system across all parsers
- **Text Processing**: Keyword matching, content cleaning, and analysis
- **Location Validation**: Geographic filtering and location intelligence
- **AI Integration**: Local Ollama support with integrated analysis
- **CLI Tool**: Command-line interface for standalone AI analysis
- **Test Utilities**: Shared testing helpers and mocks

## 📦 Components

### 1. Logger (`src/logger.js`)

Configurable logging system with color support and level controls.

```javascript
const { logger } = require("ai-analyzer");

// Basic logging
logger.info("Processing started");
logger.warning("Rate limit approaching");
logger.error("Connection failed");

// Convenience methods with emoji prefixes
logger.step("🚀 Starting scrape");
logger.search("🔍 Searching for keywords");
logger.ai("🧠 Running AI analysis");
logger.location("📍 Validating location");
logger.file("📄 Saving results");
```

**Features:**

- Configurable log levels (debug, info, warning, error, success)
- Color-coded output with chalk
- Emoji prefixes for better UX
- Silent mode for production
- Timestamp formatting

### 2. Text Utilities (`src/text-utils.js`)

Text processing and keyword matching utilities.

```javascript
const { cleanText, containsAnyKeyword } = require("ai-analyzer");

// Clean text content
const cleaned = cleanText(
  "Check out this #awesome post! https://example.com 🚀"
);
// Result: "Check out this awesome post!"

// Check for keyword matches
const keywords = ["layoff", "downsizing", "RIF"];
const hasMatch = containsAnyKeyword(text, keywords);
```

**Features:**

- Remove hashtags, URLs, and emojis
- Case-insensitive keyword matching
- Multiple keyword detection
- Text normalization

### 3. Location Utilities (`src/location-utils.js`)

Geographic location validation and filtering.

```javascript
const {
  parseLocationFilters,
  validateLocationAgainstFilters,
  extractLocationFromProfile,
} = require("ai-analyzer");

// Parse location filter string
const filters = parseLocationFilters("Ontario,Manitoba,Toronto");

// Validate location against filters
const isValid = validateLocationAgainstFilters(
  "Toronto, Ontario, Canada",
  filters
);

// Extract location from profile text
const location = extractLocationFromProfile(
  "Software Engineer at Tech Corp • Toronto, Ontario"
);
```

**Features:**

- Geographic filter parsing
- Location validation against 200+ Canadian cities
- Profile location extraction
- Smart location matching

### 4. AI Utilities (`src/ai-utils.js`)

AI-powered content analysis with **integrated results**.

```javascript
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");

// Check AI availability
const aiAvailable = await checkOllamaStatus("mistral");

// Analyze posts with AI (returns analysis results)
const analysis = await analyzeBatch(posts, "job market analysis", "mistral");

// Integrate AI analysis into results
const resultsWithAI = posts.map((post, index) => ({
  ...post,
  aiAnalysis: {
    isRelevant: analysis[index].isRelevant,
    confidence: analysis[index].confidence,
    reasoning: analysis[index].reasoning,
    context: "job market analysis",
    model: "mistral",
    analyzedAt: new Date().toISOString(),
  },
}));
```

**Features:**

- Ollama integration for local AI
- Batch processing for efficiency
- Confidence scoring
- Context-aware analysis
- **Integrated results**: AI analysis embedded in data structure

### 5. CLI Tool (`cli.js`)

Command-line interface for standalone AI analysis.

```bash
# Analyze latest results file
node cli.js --latest --dir=results

# Analyze specific file
node cli.js --input=results.json

# Analyze with custom context
node cli.js --input=results.json --context="layoff analysis"

# Analyze with different model
node cli.js --input=results.json --model=mistral

# Show help
node cli.js --help
```

**Features:**

- **Integrated Analysis**: AI results embedded back into original JSON
- **Flexible Input**: Support for various JSON formats
- **Context Switching**: Easy re-analysis with different contexts
- **Model Selection**: Choose different Ollama models
- **Directory Support**: Specify results directory with `--dir`

### 6. Test Utilities (`src/test-utils.js`)

Shared testing helpers and mocks.

```javascript
const { createMockPost, createMockProfile } = require("ai-analyzer");

// Create test data
const mockPost = createMockPost({
  content: "Test post content",
  author: "John Doe",
  location: "Toronto, Ontario",
});
```

## 🚀 Installation

```bash
# Install dependencies
npm install

# Run tests
npm test

# Run specific test suites
npm test -- --testNamePattern="Logger"
```

## 📋 CLI Reference

### Basic Usage

```bash
# Analyze latest results file
node cli.js --latest --dir=results

# Analyze specific file
node cli.js --input=results.json

# Analyze with custom output
node cli.js --input=results.json --output=analysis.json
```

### Options

```bash
--input=FILE              # Input JSON file
--output=FILE             # Output file (default: original-ai.json)
--context="description"   # Analysis context (default: "job market analysis and trends")
--model=MODEL             # Ollama model (default: mistral)
--latest                  # Use latest results file from directory
--dir=PATH                # Directory to look for results (default: 'results')
--help, -h                # Show help
```

### Examples

```bash
# Analyze latest LinkedIn results
cd linkedin-parser
node ../ai-analyzer/cli.js --latest --dir=results

# Analyze with layoff context
node cli.js --input=results.json --context="layoff analysis"

# Analyze with different model
node cli.js --input=results.json --model=llama3

# Analyze from project root
node ai-analyzer/cli.js --latest --dir=linkedin-parser/results
```

### Output Format

The CLI integrates AI analysis directly into the original JSON structure:

```json
{
  "metadata": {
    "timestamp": "2025-07-21T02:00:08.561Z",
    "totalPosts": 10,
    "aiAnalysisUpdated": "2025-07-21T02:48:42.487Z",
    "aiContext": "job market analysis and trends",
    "aiModel": "mistral"
  },
  "results": [
    {
      "keyword": "layoff",
      "text": "Post content...",
      "aiAnalysis": {
        "isRelevant": true,
        "confidence": 0.9,
        "reasoning": "Post discusses job market conditions",
        "context": "job market analysis and trends",
        "model": "mistral",
        "analyzedAt": "2025-07-21T02:48:42.487Z"
      }
    }
  ]
}
```

## 📋 API Reference

### Logger Class

```javascript
const { Logger } = require("ai-analyzer");

// Create custom logger
const logger = new Logger({
  debug: false,
  colors: true,
});

// Configure levels
logger.setLevel("debug", true);
logger.silent(); // Disable all logging
logger.verbose(); // Enable all logging
```

### Text Processing

```javascript
const { cleanText, containsAnyKeyword } = require('ai-analyzer');

// Clean text
cleanText(text: string): string

// Check keywords
containsAnyKeyword(text: string, keywords: string[]): boolean
```

### Location Validation

```javascript
const {
  parseLocationFilters,
  validateLocationAgainstFilters,
  extractLocationFromProfile
} = require('ai-analyzer');

// Parse filters
parseLocationFilters(filterString: string): string[]

// Validate location
validateLocationAgainstFilters(location: string, filters: string[]): boolean

// Extract from profile
extractLocationFromProfile(profileText: string): string | null
```

### AI Analysis

```javascript
const { analyzeBatch, checkOllamaStatus, findLatestResultsFile } = require('ai-analyzer');

// Check AI availability
checkOllamaStatus(model?: string, ollamaHost?: string): Promise<boolean>

// Analyze posts
analyzeBatch(posts: Post[], context: string, model?: string): Promise<AnalysisResult[]>

// Find latest results file
findLatestResultsFile(resultsDir?: string): string
```

## 🧪 Testing

### Run All Tests

```bash
npm test
```

### Test Coverage

```bash
npm run test:coverage
```

### Specific Test Suites

```bash
# Logger tests
npm test -- --testNamePattern="Logger"

# Text utilities tests
npm test -- --testNamePattern="Text"

# Location utilities tests
npm test -- --testNamePattern="Location"

# AI utilities tests
npm test -- --testNamePattern="AI"
```

## 🔧 Configuration

### Environment Variables

```env
# AI Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=mistral
AI_CONTEXT="job market analysis and trends"

# Logging Configuration
LOG_LEVEL=info
LOG_COLORS=true

# Location Configuration
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true
```

### Logger Configuration

```javascript
const logger = new Logger({
  debug: true, // Enable debug logging
  info: true, // Enable info logging
  warning: true, // Enable warning logging
  error: true, // Enable error logging
  success: true, // Enable success logging
  colors: true, // Enable color output
});
```

## 📊 Usage Examples

### Basic Logging Setup

```javascript
const { logger } = require("ai-analyzer");

// Configure for production
if (process.env.NODE_ENV === "production") {
  logger.setLevel("debug", false);
  logger.setLevel("info", true);
}

// Use throughout your application
logger.step("Starting LinkedIn scrape");
logger.info("Found 150 posts");
logger.warning("Rate limit approaching");
logger.success("Scraping completed successfully");
```

### Text Processing Pipeline

```javascript
const { cleanText, containsAnyKeyword } = require("ai-analyzer");

function processPost(post) {
  // Clean the content
  const cleanedContent = cleanText(post.content);

  // Check for keywords
  const keywords = ["layoff", "downsizing", "RIF"];
  const hasKeywords = containsAnyKeyword(cleanedContent, keywords);

  return {
    ...post,
    cleanedContent,
    hasKeywords,
  };
}
```

### Location Validation

```javascript
const {
  parseLocationFilters,
  validateLocationAgainstFilters,
} = require("ai-analyzer");

// Setup location filtering
const locationFilters = parseLocationFilters("Ontario,Manitoba,Toronto");

// Validate each post
function validatePost(post) {
  const isValidLocation = validateLocationAgainstFilters(
    post.author.location,
    locationFilters
  );

  return isValidLocation ? post : null;
}
```

### AI Analysis Integration

```javascript
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");

async function analyzePosts(posts) {
  try {
    // Check AI availability
    const aiAvailable = await checkOllamaStatus("mistral");
    if (!aiAvailable) {
      logger.warning("AI not available - skipping analysis");
      return posts;
    }

    // Run AI analysis
    const analysis = await analyzeBatch(
      posts,
      "job market analysis",
      "mistral"
    );

    // Integrate AI analysis into results
    const resultsWithAI = posts.map((post, index) => ({
      ...post,
      aiAnalysis: {
        isRelevant: analysis[index].isRelevant,
        confidence: analysis[index].confidence,
        reasoning: analysis[index].reasoning,
        context: "job market analysis",
        model: "mistral",
        analyzedAt: new Date().toISOString(),
      },
    }));

    return resultsWithAI;
  } catch (error) {
    logger.error("AI analysis failed:", error.message);
    return posts; // Return original posts if AI fails
  }
}
```

### CLI Integration

```javascript
// In your parser's package.json scripts
{
  "scripts": {
    "analyze:latest": "node ../ai-analyzer/cli.js --latest --dir=results",
    "analyze:layoff": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"layoff analysis\"",
    "analyze:trends": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"job market trends\""
  }
}
```

## 🔒 Security & Best Practices

### Credential Management

- Store API keys in environment variables
- Never commit sensitive data to version control
- Use `.env` files for local development

### Rate Limiting

- Implement delays between AI API calls
- Respect service provider rate limits
- Use batch processing to minimize requests

### Error Handling

- Always wrap AI calls in try-catch blocks
- Provide fallback behavior when services fail
- Log errors with appropriate detail levels

## 🤝 Contributing

### Development Setup

1. Fork the repository
2. Create feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit pull request

### Code Standards

- Follow existing code style
- Add JSDoc comments for all functions
- Maintain test coverage above 90%
- Update documentation for new features

## 📄 License

This package is part of the LinkedOut platform and follows the same licensing terms.

---

**Note**: This package is designed to be used as a dependency by other LinkedOut components. It provides the core utilities, CLI tool, and should not be used standalone.