559 lines
13 KiB
Markdown

# AI Analyzer - Core Utilities Package
Shared utilities and core functionality used by all LinkedOut parsers. This package provides consistent logging, text processing, location validation, AI integration, and a **command-line interface for AI analysis**.
## 🎯 Purpose
The AI Analyzer serves as the foundation for all LinkedOut components, providing:
- **Consistent Logging**: Unified logging system across all parsers
- **Text Processing**: Keyword matching, content cleaning, and analysis
- **Location Validation**: Geographic filtering and location intelligence
- **AI Integration**: Local Ollama support with integrated analysis
- **CLI Tool**: Command-line interface for standalone AI analysis
- **Test Utilities**: Shared testing helpers and mocks
## 📦 Components
### 1. Logger (`src/logger.js`)
Configurable logging system with color support and level controls.
```javascript
const { logger } = require("ai-analyzer");
// Basic logging
logger.info("Processing started");
logger.warning("Rate limit approaching");
logger.error("Connection failed");
// Convenience methods with emoji prefixes
logger.step("🚀 Starting scrape");
logger.search("🔍 Searching for keywords");
logger.ai("🧠 Running AI analysis");
logger.location("📍 Validating location");
logger.file("📄 Saving results");
```
**Features:**
- Configurable log levels (debug, info, warning, error, success)
- Color-coded output with chalk
- Emoji prefixes for better UX
- Silent mode for production
- Timestamp formatting
### 2. Text Utilities (`src/text-utils.js`)
Text processing and keyword matching utilities.
```javascript
const { cleanText, containsAnyKeyword } = require("ai-analyzer");
// Clean text content
const cleaned = cleanText(
"Check out this #awesome post! https://example.com 🚀"
);
// Result: "Check out this awesome post!"
// Check for keyword matches
const keywords = ["layoff", "downsizing", "RIF"];
const hasMatch = containsAnyKeyword(text, keywords);
```
**Features:**
- Remove hashtags, URLs, and emojis
- Case-insensitive keyword matching
- Multiple keyword detection
- Text normalization
### 3. Location Utilities (`src/location-utils.js`)
Geographic location validation and filtering.
```javascript
const {
parseLocationFilters,
validateLocationAgainstFilters,
extractLocationFromProfile,
} = require("ai-analyzer");
// Parse location filter string
const filters = parseLocationFilters("Ontario,Manitoba,Toronto");
// Validate location against filters
const isValid = validateLocationAgainstFilters(
"Toronto, Ontario, Canada",
filters
);
// Extract location from profile text
const location = extractLocationFromProfile(
"Software Engineer at Tech Corp • Toronto, Ontario"
);
```
**Features:**
- Geographic filter parsing
- Location validation against 200+ Canadian cities
- Profile location extraction
- Smart location matching
### 4. AI Utilities (`src/ai-utils.js`)
AI-powered content analysis with **integrated results**.
```javascript
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");
// Check AI availability
const aiAvailable = await checkOllamaStatus("mistral");
// Analyze posts with AI (returns analysis results)
const analysis = await analyzeBatch(posts, "job market analysis", "mistral");
// Integrate AI analysis into results
const resultsWithAI = posts.map((post, index) => ({
...post,
aiAnalysis: {
isRelevant: analysis[index].isRelevant,
confidence: analysis[index].confidence,
reasoning: analysis[index].reasoning,
context: "job market analysis",
model: "mistral",
analyzedAt: new Date().toISOString(),
},
}));
```
**Features:**
- Ollama integration for local AI
- Batch processing for efficiency
- Confidence scoring
- Context-aware analysis
- **Integrated results**: AI analysis embedded in data structure
### 5. CLI Tool (`cli.js`)
Command-line interface for standalone AI analysis.
```bash
# Analyze latest results file
node cli.js --latest --dir=results
# Analyze specific file
node cli.js --input=results.json
# Analyze with custom context
node cli.js --input=results.json --context="layoff analysis"
# Analyze with different model
node cli.js --input=results.json --model=mistral
# Show help
node cli.js --help
```
**Features:**
- **Integrated Analysis**: AI results embedded back into original JSON
- **Flexible Input**: Support for various JSON formats
- **Context Switching**: Easy re-analysis with different contexts
- **Model Selection**: Choose different Ollama models
- **Directory Support**: Specify results directory with `--dir`
### 6. Test Utilities (`src/test-utils.js`)
Shared testing helpers and mocks.
```javascript
const { createMockPost, createMockProfile } = require("ai-analyzer");
// Create test data
const mockPost = createMockPost({
content: "Test post content",
author: "John Doe",
location: "Toronto, Ontario",
});
```
## 🚀 Installation
```bash
# Install dependencies
npm install
# Run tests
npm test
# Run specific test suites
npm test -- --testNamePattern="Logger"
```
## 📋 CLI Reference
### Basic Usage
```bash
# Analyze latest results file
node cli.js --latest --dir=results
# Analyze specific file
node cli.js --input=results.json
# Analyze with custom output
node cli.js --input=results.json --output=analysis.json
```
### Options
```bash
--input=FILE # Input JSON file
--output=FILE # Output file (default: original-ai.json)
--context="description" # Analysis context (default: "job market analysis and trends")
--model=MODEL # Ollama model (default: mistral)
--latest # Use latest results file from directory
--dir=PATH # Directory to look for results (default: 'results')
--help, -h # Show help
```
### Examples
```bash
# Analyze latest LinkedIn results
cd linkedin-parser
node ../ai-analyzer/cli.js --latest --dir=results
# Analyze with layoff context
node cli.js --input=results.json --context="layoff analysis"
# Analyze with different model
node cli.js --input=results.json --model=llama3
# Analyze from project root
node ai-analyzer/cli.js --latest --dir=linkedin-parser/results
```
### Output Format
The CLI integrates AI analysis directly into the original JSON structure:
```json
{
"metadata": {
"timestamp": "2025-07-21T02:00:08.561Z",
"totalPosts": 10,
"aiAnalysisUpdated": "2025-07-21T02:48:42.487Z",
"aiContext": "job market analysis and trends",
"aiModel": "mistral"
},
"results": [
{
"keyword": "layoff",
"text": "Post content...",
"aiAnalysis": {
"isRelevant": true,
"confidence": 0.9,
"reasoning": "Post discusses job market conditions",
"context": "job market analysis and trends",
"model": "mistral",
"analyzedAt": "2025-07-21T02:48:42.487Z"
}
}
]
}
```
## 📋 API Reference
### Logger Class
```javascript
const { Logger } = require("ai-analyzer");
// Create custom logger
const logger = new Logger({
debug: false,
colors: true,
});
// Configure levels
logger.setLevel("debug", true);
logger.silent(); // Disable all logging
logger.verbose(); // Enable all logging
```
### Text Processing
```javascript
const { cleanText, containsAnyKeyword } = require('ai-analyzer');
// Clean text
cleanText(text: string): string
// Check keywords
containsAnyKeyword(text: string, keywords: string[]): boolean
```
### Location Validation
```javascript
const {
parseLocationFilters,
validateLocationAgainstFilters,
extractLocationFromProfile
} = require('ai-analyzer');
// Parse filters
parseLocationFilters(filterString: string): string[]
// Validate location
validateLocationAgainstFilters(location: string, filters: string[]): boolean
// Extract from profile
extractLocationFromProfile(profileText: string): string | null
```
### AI Analysis
```javascript
const { analyzeBatch, checkOllamaStatus, findLatestResultsFile } = require('ai-analyzer');
// Check AI availability
checkOllamaStatus(model?: string, ollamaHost?: string): Promise<boolean>
// Analyze posts
analyzeBatch(posts: Post[], context: string, model?: string): Promise<AnalysisResult[]>
// Find latest results file
findLatestResultsFile(resultsDir?: string): string
```
## 🧪 Testing
### Run All Tests
```bash
npm test
```
### Test Coverage
```bash
npm run test:coverage
```
### Specific Test Suites
```bash
# Logger tests
npm test -- --testNamePattern="Logger"
# Text utilities tests
npm test -- --testNamePattern="Text"
# Location utilities tests
npm test -- --testNamePattern="Location"
# AI utilities tests
npm test -- --testNamePattern="AI"
```
## 🔧 Configuration
### Environment Variables
```env
# AI Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=mistral
AI_CONTEXT="job market analysis and trends"
# Logging Configuration
LOG_LEVEL=info
LOG_COLORS=true
# Location Configuration
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true
```
### Logger Configuration
```javascript
const logger = new Logger({
debug: true, // Enable debug logging
info: true, // Enable info logging
warning: true, // Enable warning logging
error: true, // Enable error logging
success: true, // Enable success logging
colors: true, // Enable color output
});
```
## 📊 Usage Examples
### Basic Logging Setup
```javascript
const { logger } = require("ai-analyzer");
// Configure for production
if (process.env.NODE_ENV === "production") {
logger.setLevel("debug", false);
logger.setLevel("info", true);
}
// Use throughout your application
logger.step("Starting LinkedIn scrape");
logger.info("Found 150 posts");
logger.warning("Rate limit approaching");
logger.success("Scraping completed successfully");
```
### Text Processing Pipeline
```javascript
const { cleanText, containsAnyKeyword } = require("ai-analyzer");
function processPost(post) {
// Clean the content
const cleanedContent = cleanText(post.content);
// Check for keywords
const keywords = ["layoff", "downsizing", "RIF"];
const hasKeywords = containsAnyKeyword(cleanedContent, keywords);
return {
...post,
cleanedContent,
hasKeywords,
};
}
```
### Location Validation
```javascript
const {
parseLocationFilters,
validateLocationAgainstFilters,
} = require("ai-analyzer");
// Setup location filtering
const locationFilters = parseLocationFilters("Ontario,Manitoba,Toronto");
// Validate each post
function validatePost(post) {
const isValidLocation = validateLocationAgainstFilters(
post.author.location,
locationFilters
);
return isValidLocation ? post : null;
}
```
### AI Analysis Integration
```javascript
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");
async function analyzePosts(posts) {
try {
// Check AI availability
const aiAvailable = await checkOllamaStatus("mistral");
if (!aiAvailable) {
logger.warning("AI not available - skipping analysis");
return posts;
}
// Run AI analysis
const analysis = await analyzeBatch(
posts,
"job market analysis",
"mistral"
);
// Integrate AI analysis into results
const resultsWithAI = posts.map((post, index) => ({
...post,
aiAnalysis: {
isRelevant: analysis[index].isRelevant,
confidence: analysis[index].confidence,
reasoning: analysis[index].reasoning,
context: "job market analysis",
model: "mistral",
analyzedAt: new Date().toISOString(),
},
}));
return resultsWithAI;
} catch (error) {
logger.error("AI analysis failed:", error.message);
return posts; // Return original posts if AI fails
}
}
```
### CLI Integration
```javascript
// In your parser's package.json scripts
{
"scripts": {
"analyze:latest": "node ../ai-analyzer/cli.js --latest --dir=results",
"analyze:layoff": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"layoff analysis\"",
"analyze:trends": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"job market trends\""
}
}
```
## 🔒 Security & Best Practices
### Credential Management
- Store API keys in environment variables
- Never commit sensitive data to version control
- Use `.env` files for local development
### Rate Limiting
- Implement delays between AI API calls
- Respect service provider rate limits
- Use batch processing to minimize requests
### Error Handling
- Always wrap AI calls in try-catch blocks
- Provide fallback behavior when services fail
- Log errors with appropriate detail levels
## 🤝 Contributing
### Development Setup
1. Fork the repository
2. Create feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit pull request
### Code Standards
- Follow existing code style
- Add JSDoc comments for all functions
- Maintain test coverage above 90%
- Update documentation for new features
## 📄 License
This package is part of the LinkedOut platform and follows the same licensing terms.
---
**Note**: This package is designed to be used as a dependency by other LinkedOut components. It provides the core utilities, CLI tool, and should not be used standalone.