- Created core modules: `ai-analyzer`, `core-parser`, and `job-search-parser`. - Implemented LinkedIn and job search parsers with integrated AI analysis. - Added CLI tools for AI analysis and job parsing. - Included comprehensive README files for each module detailing usage and features. - Established a `.gitignore` file to exclude unnecessary files. - Introduced sample data for testing and demonstration purposes. - Set up package.json files for dependency management across modules. - Implemented logging and error handling utilities for better debugging and user feedback.
559 lines
13 KiB
Markdown
559 lines
13 KiB
Markdown
# AI Analyzer - Core Utilities Package
|
|
|
|
Shared utilities and core functionality used by all LinkedOut parsers. This package provides consistent logging, text processing, location validation, AI integration, and a **command-line interface for AI analysis**.
|
|
|
|
## 🎯 Purpose
|
|
|
|
The AI Analyzer serves as the foundation for all LinkedOut components, providing:
|
|
|
|
- **Consistent Logging**: Unified logging system across all parsers
|
|
- **Text Processing**: Keyword matching, content cleaning, and analysis
|
|
- **Location Validation**: Geographic filtering and location intelligence
|
|
- **AI Integration**: Local Ollama support with integrated analysis
|
|
- **CLI Tool**: Command-line interface for standalone AI analysis
|
|
- **Test Utilities**: Shared testing helpers and mocks
|
|
|
|
## 📦 Components
|
|
|
|
### 1. Logger (`src/logger.js`)
|
|
|
|
Configurable logging system with color support and level controls.
|
|
|
|
```javascript
|
|
const { logger } = require("ai-analyzer");
|
|
|
|
// Basic logging
|
|
logger.info("Processing started");
|
|
logger.warning("Rate limit approaching");
|
|
logger.error("Connection failed");
|
|
|
|
// Convenience methods with emoji prefixes
|
|
logger.step("🚀 Starting scrape");
|
|
logger.search("🔍 Searching for keywords");
|
|
logger.ai("🧠 Running AI analysis");
|
|
logger.location("📍 Validating location");
|
|
logger.file("📄 Saving results");
|
|
```
|
|
|
|
**Features:**
|
|
|
|
- Configurable log levels (debug, info, warning, error, success)
|
|
- Color-coded output with chalk
|
|
- Emoji prefixes for better UX
|
|
- Silent mode for production
|
|
- Timestamp formatting
|
|
|
|
### 2. Text Utilities (`src/text-utils.js`)
|
|
|
|
Text processing and keyword matching utilities.
|
|
|
|
```javascript
|
|
const { cleanText, containsAnyKeyword } = require("ai-analyzer");
|
|
|
|
// Clean text content
|
|
const cleaned = cleanText(
|
|
"Check out this #awesome post! https://example.com 🚀"
|
|
);
|
|
// Result: "Check out this awesome post!"
|
|
|
|
// Check for keyword matches
|
|
const keywords = ["layoff", "downsizing", "RIF"];
|
|
const hasMatch = containsAnyKeyword(text, keywords);
|
|
```
|
|
|
|
**Features:**
|
|
|
|
- Remove hashtags, URLs, and emojis
|
|
- Case-insensitive keyword matching
|
|
- Multiple keyword detection
|
|
- Text normalization
|
|
|
|
### 3. Location Utilities (`src/location-utils.js`)
|
|
|
|
Geographic location validation and filtering.
|
|
|
|
```javascript
|
|
const {
|
|
parseLocationFilters,
|
|
validateLocationAgainstFilters,
|
|
extractLocationFromProfile,
|
|
} = require("ai-analyzer");
|
|
|
|
// Parse location filter string
|
|
const filters = parseLocationFilters("Ontario,Manitoba,Toronto");
|
|
|
|
// Validate location against filters
|
|
const isValid = validateLocationAgainstFilters(
|
|
"Toronto, Ontario, Canada",
|
|
filters
|
|
);
|
|
|
|
// Extract location from profile text
|
|
const location = extractLocationFromProfile(
|
|
"Software Engineer at Tech Corp • Toronto, Ontario"
|
|
);
|
|
```
|
|
|
|
**Features:**
|
|
|
|
- Geographic filter parsing
|
|
- Location validation against 200+ Canadian cities
|
|
- Profile location extraction
|
|
- Smart location matching
|
|
|
|
### 4. AI Utilities (`src/ai-utils.js`)
|
|
|
|
AI-powered content analysis with **integrated results**.
|
|
|
|
```javascript
|
|
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");
|
|
|
|
// Check AI availability
|
|
const aiAvailable = await checkOllamaStatus("mistral");
|
|
|
|
// Analyze posts with AI (returns analysis results)
|
|
const analysis = await analyzeBatch(posts, "job market analysis", "mistral");
|
|
|
|
// Integrate AI analysis into results
|
|
const resultsWithAI = posts.map((post, index) => ({
|
|
...post,
|
|
aiAnalysis: {
|
|
isRelevant: analysis[index].isRelevant,
|
|
confidence: analysis[index].confidence,
|
|
reasoning: analysis[index].reasoning,
|
|
context: "job market analysis",
|
|
model: "mistral",
|
|
analyzedAt: new Date().toISOString(),
|
|
},
|
|
}));
|
|
```
|
|
|
|
**Features:**
|
|
|
|
- Ollama integration for local AI
|
|
- Batch processing for efficiency
|
|
- Confidence scoring
|
|
- Context-aware analysis
|
|
- **Integrated results**: AI analysis embedded in data structure
|
|
|
|
### 5. CLI Tool (`cli.js`)
|
|
|
|
Command-line interface for standalone AI analysis.
|
|
|
|
```bash
|
|
# Analyze latest results file
|
|
node cli.js --latest --dir=results
|
|
|
|
# Analyze specific file
|
|
node cli.js --input=results.json
|
|
|
|
# Analyze with custom context
|
|
node cli.js --input=results.json --context="layoff analysis"
|
|
|
|
# Analyze with different model
|
|
node cli.js --input=results.json --model=mistral
|
|
|
|
# Show help
|
|
node cli.js --help
|
|
```
|
|
|
|
**Features:**
|
|
|
|
- **Integrated Analysis**: AI results embedded back into original JSON
|
|
- **Flexible Input**: Support for various JSON formats
|
|
- **Context Switching**: Easy re-analysis with different contexts
|
|
- **Model Selection**: Choose different Ollama models
|
|
- **Directory Support**: Specify results directory with `--dir`
|
|
|
|
### 6. Test Utilities (`src/test-utils.js`)
|
|
|
|
Shared testing helpers and mocks.
|
|
|
|
```javascript
|
|
const { createMockPost, createMockProfile } = require("ai-analyzer");
|
|
|
|
// Create test data
|
|
const mockPost = createMockPost({
|
|
content: "Test post content",
|
|
author: "John Doe",
|
|
location: "Toronto, Ontario",
|
|
});
|
|
```
|
|
|
|
## 🚀 Installation
|
|
|
|
```bash
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Run tests
|
|
npm test
|
|
|
|
# Run specific test suites
|
|
npm test -- --testNamePattern="Logger"
|
|
```
|
|
|
|
## 📋 CLI Reference
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Analyze latest results file
|
|
node cli.js --latest --dir=results
|
|
|
|
# Analyze specific file
|
|
node cli.js --input=results.json
|
|
|
|
# Analyze with custom output
|
|
node cli.js --input=results.json --output=analysis.json
|
|
```
|
|
|
|
### Options
|
|
|
|
```bash
|
|
--input=FILE # Input JSON file
|
|
--output=FILE # Output file (default: original-ai.json)
|
|
--context="description" # Analysis context (default: "job market analysis and trends")
|
|
--model=MODEL # Ollama model (default: mistral)
|
|
--latest # Use latest results file from directory
|
|
--dir=PATH # Directory to look for results (default: 'results')
|
|
--help, -h # Show help
|
|
```
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
# Analyze latest LinkedIn results
|
|
cd linkedin-parser
|
|
node ../ai-analyzer/cli.js --latest --dir=results
|
|
|
|
# Analyze with layoff context
|
|
node cli.js --input=results.json --context="layoff analysis"
|
|
|
|
# Analyze with different model
|
|
node cli.js --input=results.json --model=llama3
|
|
|
|
# Analyze from project root
|
|
node ai-analyzer/cli.js --latest --dir=linkedin-parser/results
|
|
```
|
|
|
|
### Output Format
|
|
|
|
The CLI integrates AI analysis directly into the original JSON structure:
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"timestamp": "2025-07-21T02:00:08.561Z",
|
|
"totalPosts": 10,
|
|
"aiAnalysisUpdated": "2025-07-21T02:48:42.487Z",
|
|
"aiContext": "job market analysis and trends",
|
|
"aiModel": "mistral"
|
|
},
|
|
"results": [
|
|
{
|
|
"keyword": "layoff",
|
|
"text": "Post content...",
|
|
"aiAnalysis": {
|
|
"isRelevant": true,
|
|
"confidence": 0.9,
|
|
"reasoning": "Post discusses job market conditions",
|
|
"context": "job market analysis and trends",
|
|
"model": "mistral",
|
|
"analyzedAt": "2025-07-21T02:48:42.487Z"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## 📋 API Reference
|
|
|
|
### Logger Class
|
|
|
|
```javascript
|
|
const { Logger } = require("ai-analyzer");
|
|
|
|
// Create custom logger
|
|
const logger = new Logger({
|
|
debug: false,
|
|
colors: true,
|
|
});
|
|
|
|
// Configure levels
|
|
logger.setLevel("debug", true);
|
|
logger.silent(); // Disable all logging
|
|
logger.verbose(); // Enable all logging
|
|
```
|
|
|
|
### Text Processing
|
|
|
|
```javascript
|
|
const { cleanText, containsAnyKeyword } = require('ai-analyzer');
|
|
|
|
// Clean text
|
|
cleanText(text: string): string
|
|
|
|
// Check keywords
|
|
containsAnyKeyword(text: string, keywords: string[]): boolean
|
|
```
|
|
|
|
### Location Validation
|
|
|
|
```javascript
|
|
const {
|
|
parseLocationFilters,
|
|
validateLocationAgainstFilters,
|
|
extractLocationFromProfile
|
|
} = require('ai-analyzer');
|
|
|
|
// Parse filters
|
|
parseLocationFilters(filterString: string): string[]
|
|
|
|
// Validate location
|
|
validateLocationAgainstFilters(location: string, filters: string[]): boolean
|
|
|
|
// Extract from profile
|
|
extractLocationFromProfile(profileText: string): string | null
|
|
```
|
|
|
|
### AI Analysis
|
|
|
|
```javascript
|
|
const { analyzeBatch, checkOllamaStatus, findLatestResultsFile } = require('ai-analyzer');
|
|
|
|
// Check AI availability
|
|
checkOllamaStatus(model?: string, ollamaHost?: string): Promise<boolean>
|
|
|
|
// Analyze posts
|
|
analyzeBatch(posts: Post[], context: string, model?: string): Promise<AnalysisResult[]>
|
|
|
|
// Find latest results file
|
|
findLatestResultsFile(resultsDir?: string): string
|
|
```
|
|
|
|
## 🧪 Testing
|
|
|
|
### Run All Tests
|
|
|
|
```bash
|
|
npm test
|
|
```
|
|
|
|
### Test Coverage
|
|
|
|
```bash
|
|
npm run test:coverage
|
|
```
|
|
|
|
### Specific Test Suites
|
|
|
|
```bash
|
|
# Logger tests
|
|
npm test -- --testNamePattern="Logger"
|
|
|
|
# Text utilities tests
|
|
npm test -- --testNamePattern="Text"
|
|
|
|
# Location utilities tests
|
|
npm test -- --testNamePattern="Location"
|
|
|
|
# AI utilities tests
|
|
npm test -- --testNamePattern="AI"
|
|
```
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```env
|
|
# AI Configuration
|
|
OLLAMA_HOST=http://localhost:11434
|
|
OLLAMA_MODEL=mistral
|
|
AI_CONTEXT="job market analysis and trends"
|
|
|
|
# Logging Configuration
|
|
LOG_LEVEL=info
|
|
LOG_COLORS=true
|
|
|
|
# Location Configuration
|
|
LOCATION_FILTER=Ontario,Manitoba
|
|
ENABLE_LOCATION_CHECK=true
|
|
```
|
|
|
|
### Logger Configuration
|
|
|
|
```javascript
|
|
const logger = new Logger({
|
|
debug: true, // Enable debug logging
|
|
info: true, // Enable info logging
|
|
warning: true, // Enable warning logging
|
|
error: true, // Enable error logging
|
|
success: true, // Enable success logging
|
|
colors: true, // Enable color output
|
|
});
|
|
```
|
|
|
|
## 📊 Usage Examples
|
|
|
|
### Basic Logging Setup
|
|
|
|
```javascript
|
|
const { logger } = require("ai-analyzer");
|
|
|
|
// Configure for production
|
|
if (process.env.NODE_ENV === "production") {
|
|
logger.setLevel("debug", false);
|
|
logger.setLevel("info", true);
|
|
}
|
|
|
|
// Use throughout your application
|
|
logger.step("Starting LinkedIn scrape");
|
|
logger.info("Found 150 posts");
|
|
logger.warning("Rate limit approaching");
|
|
logger.success("Scraping completed successfully");
|
|
```
|
|
|
|
### Text Processing Pipeline
|
|
|
|
```javascript
|
|
const { cleanText, containsAnyKeyword } = require("ai-analyzer");
|
|
|
|
function processPost(post) {
|
|
// Clean the content
|
|
const cleanedContent = cleanText(post.content);
|
|
|
|
// Check for keywords
|
|
const keywords = ["layoff", "downsizing", "RIF"];
|
|
const hasKeywords = containsAnyKeyword(cleanedContent, keywords);
|
|
|
|
return {
|
|
...post,
|
|
cleanedContent,
|
|
hasKeywords,
|
|
};
|
|
}
|
|
```
|
|
|
|
### Location Validation
|
|
|
|
```javascript
|
|
const {
|
|
parseLocationFilters,
|
|
validateLocationAgainstFilters,
|
|
} = require("ai-analyzer");
|
|
|
|
// Setup location filtering
|
|
const locationFilters = parseLocationFilters("Ontario,Manitoba,Toronto");
|
|
|
|
// Validate each post
|
|
function validatePost(post) {
|
|
const isValidLocation = validateLocationAgainstFilters(
|
|
post.author.location,
|
|
locationFilters
|
|
);
|
|
|
|
return isValidLocation ? post : null;
|
|
}
|
|
```
|
|
|
|
### AI Analysis Integration
|
|
|
|
```javascript
|
|
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");
|
|
|
|
async function analyzePosts(posts) {
|
|
try {
|
|
// Check AI availability
|
|
const aiAvailable = await checkOllamaStatus("mistral");
|
|
if (!aiAvailable) {
|
|
logger.warning("AI not available - skipping analysis");
|
|
return posts;
|
|
}
|
|
|
|
// Run AI analysis
|
|
const analysis = await analyzeBatch(
|
|
posts,
|
|
"job market analysis",
|
|
"mistral"
|
|
);
|
|
|
|
// Integrate AI analysis into results
|
|
const resultsWithAI = posts.map((post, index) => ({
|
|
...post,
|
|
aiAnalysis: {
|
|
isRelevant: analysis[index].isRelevant,
|
|
confidence: analysis[index].confidence,
|
|
reasoning: analysis[index].reasoning,
|
|
context: "job market analysis",
|
|
model: "mistral",
|
|
analyzedAt: new Date().toISOString(),
|
|
},
|
|
}));
|
|
|
|
return resultsWithAI;
|
|
} catch (error) {
|
|
logger.error("AI analysis failed:", error.message);
|
|
return posts; // Return original posts if AI fails
|
|
}
|
|
}
|
|
```
|
|
|
|
### CLI Integration
|
|
|
|
```javascript
|
|
// In your parser's package.json scripts
|
|
{
|
|
"scripts": {
|
|
"analyze:latest": "node ../ai-analyzer/cli.js --latest --dir=results",
|
|
"analyze:layoff": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"layoff analysis\"",
|
|
"analyze:trends": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"job market trends\""
|
|
}
|
|
}
|
|
```
|
|
|
|
## 🔒 Security & Best Practices
|
|
|
|
### Credential Management
|
|
|
|
- Store API keys in environment variables
|
|
- Never commit sensitive data to version control
|
|
- Use `.env` files for local development
|
|
|
|
### Rate Limiting
|
|
|
|
- Implement delays between AI API calls
|
|
- Respect service provider rate limits
|
|
- Use batch processing to minimize requests
|
|
|
|
### Error Handling
|
|
|
|
- Always wrap AI calls in try-catch blocks
|
|
- Provide fallback behavior when services fail
|
|
- Log errors with appropriate detail levels
|
|
|
|
## 🤝 Contributing
|
|
|
|
### Development Setup
|
|
|
|
1. Fork the repository
|
|
2. Create feature branch
|
|
3. Add tests for new functionality
|
|
4. Ensure all tests pass
|
|
5. Submit pull request
|
|
|
|
### Code Standards
|
|
|
|
- Follow existing code style
|
|
- Add JSDoc comments for all functions
|
|
- Maintain test coverage above 90%
|
|
- Update documentation for new features
|
|
|
|
## 📄 License
|
|
|
|
This package is part of the LinkedOut platform and follows the same licensing terms.
|
|
|
|
---
|
|
|
|
**Note**: This package is designed to be used as a dependency by other LinkedOut components. It provides the core utilities, CLI tool, and should not be used standalone.
|