Add initial project structure for Job Market Intelligence platform
- Created core modules: `ai-analyzer`, `core-parser`, and `job-search-parser`. - Implemented LinkedIn and job search parsers with integrated AI analysis. - Added CLI tools for AI analysis and job parsing. - Included comprehensive README files for each module detailing usage and features. - Established a `.gitignore` file to exclude unnecessary files. - Introduced sample data for testing and demonstration purposes. - Set up package.json files for dependency management across modules. - Implemented logging and error handling utilities for better debugging and user feedback.
This commit is contained in:
commit
8de65bc04c
17
.gitignore
vendored
Normal file
17
.gitignore
vendored
Normal file
@ -0,0 +1,17 @@
|
||||
.vscode/
|
||||
*.md
|
||||
!README.md
|
||||
node_modules/
|
||||
.env
|
||||
results/
|
||||
zip*
|
||||
*.7z
|
||||
*obfuscated.js
|
||||
.history
|
||||
# Debug files
|
||||
debug-*.js
|
||||
debug-*.png
|
||||
*.png
|
||||
*.log
|
||||
# Install scripts (optional - remove if you want to commit)
|
||||
install-ollama.sh
|
||||
406
README.md
Normal file
406
README.md
Normal file
@ -0,0 +1,406 @@
|
||||
# Job Market Intelligence Platform
|
||||
|
||||
A comprehensive platform for job market intelligence with **integrated AI-powered insights**. Built with modular architecture for extensibility and maintainability.
|
||||
|
||||
## 🏗️ Architecture Overview
|
||||
|
||||
```
|
||||
job-market-intelligence/
|
||||
├── ai-analyzer/ # Shared core utilities (logger, AI, location, text) + CLI tool
|
||||
├── linkedin-parser/ # LinkedIn-specific scraper with integrated AI analysis
|
||||
├── job-search-parser/ # Job search intelligence
|
||||
└── docs/ # Documentation
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Node.js 18+
|
||||
- Playwright browser automation
|
||||
- LinkedIn account credentials
|
||||
- Optional: Ollama for local AI analysis
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
npm install
|
||||
npx playwright install chromium
|
||||
```
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Run LinkedIn parser with integrated AI analysis
|
||||
cd linkedin-parser && npm start
|
||||
|
||||
# Run LinkedIn parser with specific keywords
|
||||
cd linkedin-parser && npm run start:custom
|
||||
|
||||
# Run LinkedIn parser without AI analysis
|
||||
cd linkedin-parser && npm run start:no-ai
|
||||
|
||||
# Run job search parser
|
||||
cd job-search-parser && npm start
|
||||
|
||||
# Analyze existing results with AI (CLI)
|
||||
cd linkedin-parser && npm run analyze:latest
|
||||
|
||||
# Analyze with custom context
|
||||
cd linkedin-parser && npm run analyze:layoff
|
||||
|
||||
# Run demo workflow
|
||||
node demo.js
|
||||
```
|
||||
|
||||
## 📦 Core Components
|
||||
|
||||
### 1. AI Analyzer (`ai-analyzer/`)
|
||||
|
||||
**Shared utilities and CLI tool used by all parsers**
|
||||
|
||||
- **Logger**: Consistent logging across all components
|
||||
- **Text Processing**: Keyword matching, text cleaning
|
||||
- **Location Validation**: Geographic filtering and validation
|
||||
- **AI Integration**: Local Ollama support with integrated analysis
|
||||
- **CLI Tool**: Command-line interface for standalone AI analysis
|
||||
- **Test Utilities**: Shared testing helpers
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- Configurable log levels with color support
|
||||
- Intelligent text processing and keyword matching
|
||||
- Geographic location validation against filters
|
||||
- **Integrated AI analysis**: AI results embedded in data structure
|
||||
- **CLI tool**: Standalone analysis with flexible options
|
||||
- Comprehensive test coverage
|
||||
|
||||
### 2. LinkedIn Parser (`linkedin-parser/`)
|
||||
|
||||
**Specialized LinkedIn content scraper with integrated AI analysis**
|
||||
|
||||
- Automated LinkedIn login and navigation
|
||||
- Keyword-based post searching
|
||||
- Profile location validation
|
||||
- Duplicate detection and filtering
|
||||
- **Automatic AI analysis integrated into results**
|
||||
- Configurable search parameters
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- Browser automation with Playwright
|
||||
- Geographic filtering by city/region
|
||||
- Date range filtering (24h, week, month)
|
||||
- **Integrated AI-powered content relevance analysis**
|
||||
- **Single JSON output with embedded AI insights**
|
||||
- **Two output files: results (with AI) and rejected posts**
|
||||
|
||||
### 3. Job Search Parser (`job-search-parser/`)
|
||||
|
||||
**Job market intelligence and analysis**
|
||||
|
||||
- Job posting aggregation
|
||||
- Role-specific keyword tracking
|
||||
- Market trend analysis
|
||||
- Salary and requirement insights
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- Tech role keyword tracking
|
||||
- Industry-specific analysis
|
||||
- Market demand insights
|
||||
- Competitive intelligence
|
||||
|
||||
### 4. AI Analysis CLI (`ai-analyzer/cli.js`)
|
||||
|
||||
**Command-line tool for AI analysis of any results JSON file**
|
||||
|
||||
- Analyze any results JSON file from LinkedIn parser or other sources
|
||||
- **Integrated analysis**: AI results embedded back into original JSON
|
||||
- Custom analysis context and AI models
|
||||
- Comprehensive analysis summary and statistics
|
||||
- Flexible input format support
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- Works with any JSON results file
|
||||
- **Integrated output**: AI analysis embedded in original structure
|
||||
- Custom analysis contexts
|
||||
- Detailed relevance scoring
|
||||
- Confidence level analysis
|
||||
- Summary statistics and insights
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file in the root directory:
|
||||
|
||||
```env
|
||||
# LinkedIn Credentials
|
||||
LINKEDIN_USERNAME=your_email@example.com
|
||||
LINKEDIN_PASSWORD=your_password
|
||||
|
||||
# Search Configuration
|
||||
CITY=Toronto
|
||||
DATE_POSTED=past-week
|
||||
SORT_BY=date_posted
|
||||
WHEELS=5
|
||||
|
||||
# Location Filtering
|
||||
LOCATION_FILTER=Ontario,Manitoba
|
||||
ENABLE_LOCATION_CHECK=true
|
||||
|
||||
# AI Analysis
|
||||
ENABLE_AI_ANALYSIS=true
|
||||
AI_CONTEXT="job market analysis and trends"
|
||||
OLLAMA_MODEL=mistral
|
||||
|
||||
# Keywords
|
||||
KEYWORDS=keywords-layoff.csv
|
||||
```
|
||||
|
||||
### Command Line Options
|
||||
|
||||
```bash
|
||||
# LinkedIn Parser Options
|
||||
--headless=true|false # Browser headless mode
|
||||
--keyword="kw1,kw2" # Specific keywords
|
||||
--add-keyword="kw1,kw2" # Additional keywords
|
||||
--no-location # Disable location filtering
|
||||
--no-ai # Disable AI analysis
|
||||
|
||||
# Job Search Parser Options
|
||||
--help # Show parser-specific help
|
||||
|
||||
# AI Analysis CLI Options
|
||||
--input=FILE # Input JSON file
|
||||
--output=FILE # Output file
|
||||
--context="description" # Custom AI analysis context
|
||||
--model=MODEL # Ollama model
|
||||
--latest # Use latest results file
|
||||
--dir=PATH # Directory to look for results
|
||||
```
|
||||
|
||||
## 📊 Output Formats
|
||||
|
||||
### LinkedIn Parser Output
|
||||
|
||||
The LinkedIn parser now generates **two main files** with **integrated AI analysis**:
|
||||
|
||||
#### 1. Main Results with AI Analysis (`linkedin-results-YYYY-MM-DD-HH-MM.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"timestamp": "2024-01-15T10:30:00Z",
|
||||
"totalPosts": 45,
|
||||
"rejectedPosts": 12,
|
||||
"aiAnalysisEnabled": true,
|
||||
"aiAnalysisCompleted": true,
|
||||
"aiContext": "job market analysis and trends",
|
||||
"aiModel": "mistral",
|
||||
"locationFilter": "Ontario,Manitoba"
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"keyword": "layoff",
|
||||
"text": "Cleaned post content...",
|
||||
"profileLink": "https://linkedin.com/in/johndoe",
|
||||
"location": "Toronto, Ontario, Canada",
|
||||
"locationValid": true,
|
||||
"locationMatchedFilter": "Ontario",
|
||||
"locationReasoning": "Location matches filter",
|
||||
"timestamp": "2024-01-15T10:30:00Z",
|
||||
"source": "linkedin",
|
||||
"parser": "linkedout-parser",
|
||||
"aiAnalysis": {
|
||||
"isRelevant": true,
|
||||
"confidence": 0.9,
|
||||
"reasoning": "Post discusses job market conditions and layoffs",
|
||||
"context": "job market analysis and trends",
|
||||
"model": "mistral",
|
||||
"analyzedAt": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. Rejected Posts (`linkedin-rejected-YYYY-MM-DD-HH-MM.json`)
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"rejected": true,
|
||||
"reason": "Location filter failed: Location not in filter",
|
||||
"keyword": "layoff",
|
||||
"text": "Post content...",
|
||||
"profileLink": "https://linkedin.com/in/janedoe",
|
||||
"location": "Vancouver, BC, Canada",
|
||||
"timestamp": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### AI Analysis CLI Output
|
||||
|
||||
The CLI tool creates **integrated results** with AI analysis embedded:
|
||||
|
||||
#### Re-analyzed Results (`original-filename-ai.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"timestamp": "2024-01-15T10:30:00Z",
|
||||
"totalPosts": 45,
|
||||
"aiAnalysisUpdated": "2024-01-15T11:00:00Z",
|
||||
"aiContext": "layoff analysis",
|
||||
"aiModel": "mistral"
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"keyword": "layoff",
|
||||
"text": "Post content...",
|
||||
"profileLink": "https://linkedin.com/in/johndoe",
|
||||
"location": "Toronto, Ontario, Canada",
|
||||
"aiAnalysis": {
|
||||
"isRelevant": true,
|
||||
"confidence": 0.9,
|
||||
"reasoning": "Post mentions layoffs and workforce reduction",
|
||||
"context": "layoff analysis",
|
||||
"model": "mistral",
|
||||
"analyzedAt": "2024-01-15T11:00:00Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
npm test
|
||||
```
|
||||
|
||||
### Run Specific Test Suites
|
||||
|
||||
```bash
|
||||
# AI Analyzer tests
|
||||
cd ai-analyzer && npm test
|
||||
|
||||
# LinkedIn Parser tests
|
||||
cd linkedin-parser && npm test
|
||||
|
||||
# Job Search Parser tests
|
||||
cd job-search-parser && npm test
|
||||
```
|
||||
|
||||
## 🔒 Security & Legal
|
||||
|
||||
### Security Best Practices
|
||||
|
||||
- Store credentials in `.env` file (never commit)
|
||||
- Use environment variables for sensitive data
|
||||
- Implement rate limiting to avoid detection
|
||||
- Respect LinkedIn's Terms of Service
|
||||
|
||||
### Legal Compliance
|
||||
|
||||
- Educational/research purposes only
|
||||
- Respect rate limits and usage policies
|
||||
- Monitor LinkedIn ToS changes
|
||||
- Implement data retention policies
|
||||
|
||||
## 🚀 Advanced Features
|
||||
|
||||
### AI-Powered Analysis
|
||||
|
||||
- **Local AI**: Ollama integration for privacy
|
||||
- **Integrated Analysis**: AI results embedded in data structure
|
||||
- **Automatic Analysis**: Runs after parsing completes
|
||||
- **Context Analysis**: Relevance scoring
|
||||
- **Confidence Scoring**: AI confidence levels for each post
|
||||
- **CLI Tool**: Standalone analysis with flexible options
|
||||
|
||||
### Geographic Intelligence
|
||||
|
||||
- **Location Validation**: Profile location verification
|
||||
- **Regional Filtering**: City/state/country filtering
|
||||
- **Geographic Analysis**: Location-based insights
|
||||
|
||||
### Data Processing
|
||||
|
||||
- **Duplicate Detection**: Intelligent deduplication
|
||||
- **Content Cleaning**: Remove hashtags, URLs, emojis
|
||||
- **Metadata Extraction**: Author, engagement, timing data
|
||||
- **Integrated AI**: AI insights embedded in each result
|
||||
|
||||
## 📈 Performance Optimization
|
||||
|
||||
### Recommended Settings
|
||||
|
||||
- **Headless Mode**: Faster execution
|
||||
- **Location Filtering**: Reduces false positives
|
||||
- **AI Analysis**: Improves result quality (enabled by default)
|
||||
- **Batch Processing**: Efficient data handling
|
||||
|
||||
### Monitoring
|
||||
|
||||
- Real-time progress indicators
|
||||
- Detailed logging with configurable levels
|
||||
- Performance metrics tracking
|
||||
- Error handling and recovery
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
### Development Setup
|
||||
|
||||
1. Fork the repository
|
||||
2. Create feature branch
|
||||
3. Add tests for new functionality
|
||||
4. Ensure all tests pass
|
||||
5. Submit pull request
|
||||
|
||||
### Code Standards
|
||||
|
||||
- Follow existing code style
|
||||
- Add JSDoc comments
|
||||
- Maintain test coverage
|
||||
- Update documentation
|
||||
|
||||
## 📄 License
|
||||
|
||||
This project is for educational and research purposes. Please respect LinkedIn's Terms of Service and use responsibly.
|
||||
|
||||
## 🆘 Support
|
||||
|
||||
### Common Issues
|
||||
|
||||
- **Browser Issues**: Ensure Playwright is installed
|
||||
- **Login Problems**: Check credentials in `.env`
|
||||
- **Rate Limiting**: Implement delays between requests
|
||||
- **Location Filtering**: Verify location filter format
|
||||
- **AI Analysis**: Ensure Ollama is running for AI features
|
||||
|
||||
### Getting Help
|
||||
|
||||
- Check the component-specific READMEs
|
||||
- Review the demo files for examples
|
||||
- Examine the test files for usage patterns
|
||||
- Open an issue with detailed error information
|
||||
|
||||
## 🆕 What's New
|
||||
|
||||
- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
|
||||
- **No Separate Files**: No more separate AI analysis files to manage
|
||||
- **CLI Tool**: Standalone AI analysis with flexible options
|
||||
- **Rich Context**: Each post includes detailed AI insights
|
||||
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
|
||||
- **Backward Compatible**: Original data structure preserved
|
||||
|
||||
---
|
||||
|
||||
**Note**: This tool is designed for educational and research purposes. Always respect LinkedIn's Terms of Service and implement appropriate rate limiting and ethical usage practices.
|
||||
558
ai-analyzer/README.md
Normal file
558
ai-analyzer/README.md
Normal file
@ -0,0 +1,558 @@
|
||||
# AI Analyzer - Core Utilities Package
|
||||
|
||||
Shared utilities and core functionality used by all LinkedOut parsers. This package provides consistent logging, text processing, location validation, AI integration, and a **command-line interface for AI analysis**.
|
||||
|
||||
## 🎯 Purpose
|
||||
|
||||
The AI Analyzer serves as the foundation for all LinkedOut components, providing:
|
||||
|
||||
- **Consistent Logging**: Unified logging system across all parsers
|
||||
- **Text Processing**: Keyword matching, content cleaning, and analysis
|
||||
- **Location Validation**: Geographic filtering and location intelligence
|
||||
- **AI Integration**: Local Ollama support with integrated analysis
|
||||
- **CLI Tool**: Command-line interface for standalone AI analysis
|
||||
- **Test Utilities**: Shared testing helpers and mocks
|
||||
|
||||
## 📦 Components
|
||||
|
||||
### 1. Logger (`src/logger.js`)
|
||||
|
||||
Configurable logging system with color support and level controls.
|
||||
|
||||
```javascript
|
||||
const { logger } = require("ai-analyzer");
|
||||
|
||||
// Basic logging
|
||||
logger.info("Processing started");
|
||||
logger.warning("Rate limit approaching");
|
||||
logger.error("Connection failed");
|
||||
|
||||
// Convenience methods with emoji prefixes
|
||||
logger.step("🚀 Starting scrape");
|
||||
logger.search("🔍 Searching for keywords");
|
||||
logger.ai("🧠 Running AI analysis");
|
||||
logger.location("📍 Validating location");
|
||||
logger.file("📄 Saving results");
|
||||
```
|
||||
|
||||
**Features:**
|
||||
|
||||
- Configurable log levels (debug, info, warning, error, success)
|
||||
- Color-coded output with chalk
|
||||
- Emoji prefixes for better UX
|
||||
- Silent mode for production
|
||||
- Timestamp formatting
|
||||
|
||||
### 2. Text Utilities (`src/text-utils.js`)
|
||||
|
||||
Text processing and keyword matching utilities.
|
||||
|
||||
```javascript
|
||||
const { cleanText, containsAnyKeyword } = require("ai-analyzer");
|
||||
|
||||
// Clean text content
|
||||
const cleaned = cleanText(
|
||||
"Check out this #awesome post! https://example.com 🚀"
|
||||
);
|
||||
// Result: "Check out this awesome post!"
|
||||
|
||||
// Check for keyword matches
|
||||
const keywords = ["layoff", "downsizing", "RIF"];
|
||||
const hasMatch = containsAnyKeyword(text, keywords);
|
||||
```
|
||||
|
||||
**Features:**
|
||||
|
||||
- Remove hashtags, URLs, and emojis
|
||||
- Case-insensitive keyword matching
|
||||
- Multiple keyword detection
|
||||
- Text normalization
|
||||
|
||||
### 3. Location Utilities (`src/location-utils.js`)
|
||||
|
||||
Geographic location validation and filtering.
|
||||
|
||||
```javascript
|
||||
const {
|
||||
parseLocationFilters,
|
||||
validateLocationAgainstFilters,
|
||||
extractLocationFromProfile,
|
||||
} = require("ai-analyzer");
|
||||
|
||||
// Parse location filter string
|
||||
const filters = parseLocationFilters("Ontario,Manitoba,Toronto");
|
||||
|
||||
// Validate location against filters
|
||||
const isValid = validateLocationAgainstFilters(
|
||||
"Toronto, Ontario, Canada",
|
||||
filters
|
||||
);
|
||||
|
||||
// Extract location from profile text
|
||||
const location = extractLocationFromProfile(
|
||||
"Software Engineer at Tech Corp • Toronto, Ontario"
|
||||
);
|
||||
```
|
||||
|
||||
**Features:**
|
||||
|
||||
- Geographic filter parsing
|
||||
- Location validation against 200+ Canadian cities
|
||||
- Profile location extraction
|
||||
- Smart location matching
|
||||
|
||||
### 4. AI Utilities (`src/ai-utils.js`)
|
||||
|
||||
AI-powered content analysis with **integrated results**.
|
||||
|
||||
```javascript
|
||||
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");
|
||||
|
||||
// Check AI availability
|
||||
const aiAvailable = await checkOllamaStatus("mistral");
|
||||
|
||||
// Analyze posts with AI (returns analysis results)
|
||||
const analysis = await analyzeBatch(posts, "job market analysis", "mistral");
|
||||
|
||||
// Integrate AI analysis into results
|
||||
const resultsWithAI = posts.map((post, index) => ({
|
||||
...post,
|
||||
aiAnalysis: {
|
||||
isRelevant: analysis[index].isRelevant,
|
||||
confidence: analysis[index].confidence,
|
||||
reasoning: analysis[index].reasoning,
|
||||
context: "job market analysis",
|
||||
model: "mistral",
|
||||
analyzedAt: new Date().toISOString(),
|
||||
},
|
||||
}));
|
||||
```
|
||||
|
||||
**Features:**
|
||||
|
||||
- Ollama integration for local AI
|
||||
- Batch processing for efficiency
|
||||
- Confidence scoring
|
||||
- Context-aware analysis
|
||||
- **Integrated results**: AI analysis embedded in data structure
|
||||
|
||||
### 5. CLI Tool (`cli.js`)
|
||||
|
||||
Command-line interface for standalone AI analysis.
|
||||
|
||||
```bash
|
||||
# Analyze latest results file
|
||||
node cli.js --latest --dir=results
|
||||
|
||||
# Analyze specific file
|
||||
node cli.js --input=results.json
|
||||
|
||||
# Analyze with custom context
|
||||
node cli.js --input=results.json --context="layoff analysis"
|
||||
|
||||
# Analyze with different model
|
||||
node cli.js --input=results.json --model=mistral
|
||||
|
||||
# Show help
|
||||
node cli.js --help
|
||||
```
|
||||
|
||||
**Features:**
|
||||
|
||||
- **Integrated Analysis**: AI results embedded back into original JSON
|
||||
- **Flexible Input**: Support for various JSON formats
|
||||
- **Context Switching**: Easy re-analysis with different contexts
|
||||
- **Model Selection**: Choose different Ollama models
|
||||
- **Directory Support**: Specify results directory with `--dir`
|
||||
|
||||
### 6. Test Utilities (`src/test-utils.js`)
|
||||
|
||||
Shared testing helpers and mocks.
|
||||
|
||||
```javascript
|
||||
const { createMockPost, createMockProfile } = require("ai-analyzer");
|
||||
|
||||
// Create test data
|
||||
const mockPost = createMockPost({
|
||||
content: "Test post content",
|
||||
author: "John Doe",
|
||||
location: "Toronto, Ontario",
|
||||
});
|
||||
```
|
||||
|
||||
## 🚀 Installation
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Run tests
|
||||
npm test
|
||||
|
||||
# Run specific test suites
|
||||
npm test -- --testNamePattern="Logger"
|
||||
```
|
||||
|
||||
## 📋 CLI Reference
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Analyze latest results file
|
||||
node cli.js --latest --dir=results
|
||||
|
||||
# Analyze specific file
|
||||
node cli.js --input=results.json
|
||||
|
||||
# Analyze with custom output
|
||||
node cli.js --input=results.json --output=analysis.json
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
```bash
|
||||
--input=FILE # Input JSON file
|
||||
--output=FILE # Output file (default: original-ai.json)
|
||||
--context="description" # Analysis context (default: "job market analysis and trends")
|
||||
--model=MODEL # Ollama model (default: mistral)
|
||||
--latest # Use latest results file from directory
|
||||
--dir=PATH # Directory to look for results (default: 'results')
|
||||
--help, -h # Show help
|
||||
```
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Analyze latest LinkedIn results
|
||||
cd linkedin-parser
|
||||
node ../ai-analyzer/cli.js --latest --dir=results
|
||||
|
||||
# Analyze with layoff context
|
||||
node cli.js --input=results.json --context="layoff analysis"
|
||||
|
||||
# Analyze with different model
|
||||
node cli.js --input=results.json --model=llama3
|
||||
|
||||
# Analyze from project root
|
||||
node ai-analyzer/cli.js --latest --dir=linkedin-parser/results
|
||||
```
|
||||
|
||||
### Output Format
|
||||
|
||||
The CLI integrates AI analysis directly into the original JSON structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"timestamp": "2025-07-21T02:00:08.561Z",
|
||||
"totalPosts": 10,
|
||||
"aiAnalysisUpdated": "2025-07-21T02:48:42.487Z",
|
||||
"aiContext": "job market analysis and trends",
|
||||
"aiModel": "mistral"
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"keyword": "layoff",
|
||||
"text": "Post content...",
|
||||
"aiAnalysis": {
|
||||
"isRelevant": true,
|
||||
"confidence": 0.9,
|
||||
"reasoning": "Post discusses job market conditions",
|
||||
"context": "job market analysis and trends",
|
||||
"model": "mistral",
|
||||
"analyzedAt": "2025-07-21T02:48:42.487Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 📋 API Reference
|
||||
|
||||
### Logger Class
|
||||
|
||||
```javascript
|
||||
const { Logger } = require("ai-analyzer");
|
||||
|
||||
// Create custom logger
|
||||
const logger = new Logger({
|
||||
debug: false,
|
||||
colors: true,
|
||||
});
|
||||
|
||||
// Configure levels
|
||||
logger.setLevel("debug", true);
|
||||
logger.silent(); // Disable all logging
|
||||
logger.verbose(); // Enable all logging
|
||||
```
|
||||
|
||||
### Text Processing
|
||||
|
||||
```javascript
|
||||
const { cleanText, containsAnyKeyword } = require('ai-analyzer');
|
||||
|
||||
// Clean text
|
||||
cleanText(text: string): string
|
||||
|
||||
// Check keywords
|
||||
containsAnyKeyword(text: string, keywords: string[]): boolean
|
||||
```
|
||||
|
||||
### Location Validation
|
||||
|
||||
```javascript
|
||||
const {
|
||||
parseLocationFilters,
|
||||
validateLocationAgainstFilters,
|
||||
extractLocationFromProfile
|
||||
} = require('ai-analyzer');
|
||||
|
||||
// Parse filters
|
||||
parseLocationFilters(filterString: string): string[]
|
||||
|
||||
// Validate location
|
||||
validateLocationAgainstFilters(location: string, filters: string[]): boolean
|
||||
|
||||
// Extract from profile
|
||||
extractLocationFromProfile(profileText: string): string | null
|
||||
```
|
||||
|
||||
### AI Analysis
|
||||
|
||||
```javascript
|
||||
const { analyzeBatch, checkOllamaStatus, findLatestResultsFile } = require('ai-analyzer');
|
||||
|
||||
// Check AI availability
|
||||
checkOllamaStatus(model?: string, ollamaHost?: string): Promise<boolean>
|
||||
|
||||
// Analyze posts
|
||||
analyzeBatch(posts: Post[], context: string, model?: string): Promise<AnalysisResult[]>
|
||||
|
||||
// Find latest results file
|
||||
findLatestResultsFile(resultsDir?: string): string
|
||||
```
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
npm test
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
```bash
|
||||
npm run test:coverage
|
||||
```
|
||||
|
||||
### Specific Test Suites
|
||||
|
||||
```bash
|
||||
# Logger tests
|
||||
npm test -- --testNamePattern="Logger"
|
||||
|
||||
# Text utilities tests
|
||||
npm test -- --testNamePattern="Text"
|
||||
|
||||
# Location utilities tests
|
||||
npm test -- --testNamePattern="Location"
|
||||
|
||||
# AI utilities tests
|
||||
npm test -- --testNamePattern="AI"
|
||||
```
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```env
|
||||
# AI Configuration
|
||||
OLLAMA_HOST=http://localhost:11434
|
||||
OLLAMA_MODEL=mistral
|
||||
AI_CONTEXT="job market analysis and trends"
|
||||
|
||||
# Logging Configuration
|
||||
LOG_LEVEL=info
|
||||
LOG_COLORS=true
|
||||
|
||||
# Location Configuration
|
||||
LOCATION_FILTER=Ontario,Manitoba
|
||||
ENABLE_LOCATION_CHECK=true
|
||||
```
|
||||
|
||||
### Logger Configuration
|
||||
|
||||
```javascript
|
||||
const logger = new Logger({
|
||||
debug: true, // Enable debug logging
|
||||
info: true, // Enable info logging
|
||||
warning: true, // Enable warning logging
|
||||
error: true, // Enable error logging
|
||||
success: true, // Enable success logging
|
||||
colors: true, // Enable color output
|
||||
});
|
||||
```
|
||||
|
||||
## 📊 Usage Examples
|
||||
|
||||
### Basic Logging Setup
|
||||
|
||||
```javascript
|
||||
const { logger } = require("ai-analyzer");
|
||||
|
||||
// Configure for production
|
||||
if (process.env.NODE_ENV === "production") {
|
||||
logger.setLevel("debug", false);
|
||||
logger.setLevel("info", true);
|
||||
}
|
||||
|
||||
// Use throughout your application
|
||||
logger.step("Starting LinkedIn scrape");
|
||||
logger.info("Found 150 posts");
|
||||
logger.warning("Rate limit approaching");
|
||||
logger.success("Scraping completed successfully");
|
||||
```
|
||||
|
||||
### Text Processing Pipeline
|
||||
|
||||
```javascript
|
||||
const { cleanText, containsAnyKeyword } = require("ai-analyzer");
|
||||
|
||||
function processPost(post) {
|
||||
// Clean the content
|
||||
const cleanedContent = cleanText(post.content);
|
||||
|
||||
// Check for keywords
|
||||
const keywords = ["layoff", "downsizing", "RIF"];
|
||||
const hasKeywords = containsAnyKeyword(cleanedContent, keywords);
|
||||
|
||||
return {
|
||||
...post,
|
||||
cleanedContent,
|
||||
hasKeywords,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Location Validation
|
||||
|
||||
```javascript
|
||||
const {
|
||||
parseLocationFilters,
|
||||
validateLocationAgainstFilters,
|
||||
} = require("ai-analyzer");
|
||||
|
||||
// Setup location filtering
|
||||
const locationFilters = parseLocationFilters("Ontario,Manitoba,Toronto");
|
||||
|
||||
// Validate each post
|
||||
function validatePost(post) {
|
||||
const isValidLocation = validateLocationAgainstFilters(
|
||||
post.author.location,
|
||||
locationFilters
|
||||
);
|
||||
|
||||
return isValidLocation ? post : null;
|
||||
}
|
||||
```
|
||||
|
||||
### AI Analysis Integration
|
||||
|
||||
```javascript
|
||||
const { analyzeBatch, checkOllamaStatus } = require("ai-analyzer");
|
||||
|
||||
async function analyzePosts(posts) {
|
||||
try {
|
||||
// Check AI availability
|
||||
const aiAvailable = await checkOllamaStatus("mistral");
|
||||
if (!aiAvailable) {
|
||||
logger.warning("AI not available - skipping analysis");
|
||||
return posts;
|
||||
}
|
||||
|
||||
// Run AI analysis
|
||||
const analysis = await analyzeBatch(
|
||||
posts,
|
||||
"job market analysis",
|
||||
"mistral"
|
||||
);
|
||||
|
||||
// Integrate AI analysis into results
|
||||
const resultsWithAI = posts.map((post, index) => ({
|
||||
...post,
|
||||
aiAnalysis: {
|
||||
isRelevant: analysis[index].isRelevant,
|
||||
confidence: analysis[index].confidence,
|
||||
reasoning: analysis[index].reasoning,
|
||||
context: "job market analysis",
|
||||
model: "mistral",
|
||||
analyzedAt: new Date().toISOString(),
|
||||
},
|
||||
}));
|
||||
|
||||
return resultsWithAI;
|
||||
} catch (error) {
|
||||
logger.error("AI analysis failed:", error.message);
|
||||
return posts; // Return original posts if AI fails
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### CLI Integration
|
||||
|
||||
```javascript
|
||||
// In your parser's package.json scripts
|
||||
{
|
||||
"scripts": {
|
||||
"analyze:latest": "node ../ai-analyzer/cli.js --latest --dir=results",
|
||||
"analyze:layoff": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"layoff analysis\"",
|
||||
"analyze:trends": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"job market trends\""
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🔒 Security & Best Practices
|
||||
|
||||
### Credential Management
|
||||
|
||||
- Store API keys in environment variables
|
||||
- Never commit sensitive data to version control
|
||||
- Use `.env` files for local development
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
- Implement delays between AI API calls
|
||||
- Respect service provider rate limits
|
||||
- Use batch processing to minimize requests
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Always wrap AI calls in try-catch blocks
|
||||
- Provide fallback behavior when services fail
|
||||
- Log errors with appropriate detail levels
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
### Development Setup
|
||||
|
||||
1. Fork the repository
|
||||
2. Create feature branch
|
||||
3. Add tests for new functionality
|
||||
4. Ensure all tests pass
|
||||
5. Submit pull request
|
||||
|
||||
### Code Standards
|
||||
|
||||
- Follow existing code style
|
||||
- Add JSDoc comments for all functions
|
||||
- Maintain test coverage above 90%
|
||||
- Update documentation for new features
|
||||
|
||||
## 📄 License
|
||||
|
||||
This package is part of the LinkedOut platform and follows the same licensing terms.
|
||||
|
||||
---
|
||||
|
||||
**Note**: This package is designed to be used as a dependency by other LinkedOut components. It provides the core utilities, CLI tool, and should not be used standalone.
|
||||
250
ai-analyzer/cli.js
Executable file
250
ai-analyzer/cli.js
Executable file
@ -0,0 +1,250 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
/**
|
||||
* AI Analyzer CLI
|
||||
*
|
||||
* Command-line interface for the ai-analyzer package
|
||||
* Can be used by any parser to analyze JSON files
|
||||
*/
|
||||
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
|
||||
// Import AI utilities from this package
|
||||
const {
|
||||
logger,
|
||||
analyzeBatch,
|
||||
checkOllamaStatus,
|
||||
findLatestResultsFile,
|
||||
} = require("./index");
|
||||
|
||||
// Default configuration
|
||||
const DEFAULT_CONTEXT =
|
||||
process.env.AI_CONTEXT || "job market analysis and trends";
|
||||
const DEFAULT_MODEL = process.env.OLLAMA_MODEL || "mistral";
|
||||
const DEFAULT_RESULTS_DIR = "results";
|
||||
|
||||
// Parse command line arguments
|
||||
const args = process.argv.slice(2);
|
||||
let inputFile = null;
|
||||
let outputFile = null;
|
||||
let context = DEFAULT_CONTEXT;
|
||||
let model = DEFAULT_MODEL;
|
||||
let findLatest = false;
|
||||
let resultsDir = DEFAULT_RESULTS_DIR;
|
||||
|
||||
for (const arg of args) {
|
||||
if (arg.startsWith("--input=")) {
|
||||
inputFile = arg.split("=")[1];
|
||||
} else if (arg.startsWith("--output=")) {
|
||||
outputFile = arg.split("=")[1];
|
||||
} else if (arg.startsWith("--context=")) {
|
||||
context = arg.split("=")[1];
|
||||
} else if (arg.startsWith("--model=")) {
|
||||
model = arg.split("=")[1];
|
||||
} else if (arg.startsWith("--dir=")) {
|
||||
resultsDir = arg.split("=")[1];
|
||||
} else if (arg === "--latest") {
|
||||
findLatest = true;
|
||||
} else if (arg === "--help" || arg === "-h") {
|
||||
console.log(`
|
||||
AI Analyzer CLI
|
||||
|
||||
Usage: node cli.js [options]
|
||||
|
||||
Options:
|
||||
--input=FILE Input JSON file
|
||||
--output=FILE Output file (default: ai-analysis-{timestamp}.json)
|
||||
--context="description" Analysis context (default: "${DEFAULT_CONTEXT}")
|
||||
--model=MODEL Ollama model (default: ${DEFAULT_MODEL})
|
||||
--latest Use latest results file from results directory
|
||||
--dir=PATH Directory to look for results (default: 'results')
|
||||
--help, -h Show this help
|
||||
|
||||
Examples:
|
||||
node cli.js --input=results.json
|
||||
node cli.js --latest --dir=results
|
||||
node cli.js --input=results.json --context="job trends" --model=mistral
|
||||
|
||||
Environment Variables:
|
||||
AI_CONTEXT Default analysis context
|
||||
OLLAMA_MODEL Default Ollama model
|
||||
`);
|
||||
process.exit(0);
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
// Determine input file
|
||||
if (findLatest) {
|
||||
try {
|
||||
inputFile = findLatestResultsFile(resultsDir);
|
||||
logger.info(`Found latest results file: ${inputFile}`);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`❌ No results files found in '${resultsDir}': ${error.message}`
|
||||
);
|
||||
logger.info(`💡 To create results files:`);
|
||||
logger.info(
|
||||
` 1. Run a parser first (e.g., npm start in linkedin-parser)`
|
||||
);
|
||||
logger.info(` 2. Or provide a specific file with --input=FILE`);
|
||||
logger.info(` 3. Or create a sample JSON file to test with`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
// If inputFile is a relative path and --dir is set, resolve it
|
||||
if (inputFile && !path.isAbsolute(inputFile) && !fs.existsSync(inputFile)) {
|
||||
const candidate = path.join(resultsDir, inputFile);
|
||||
if (fs.existsSync(candidate)) {
|
||||
inputFile = candidate;
|
||||
}
|
||||
}
|
||||
|
||||
if (!inputFile) {
|
||||
logger.error("❌ Input file required. Use --input=FILE or --latest");
|
||||
logger.info(`💡 Examples:`);
|
||||
logger.info(` node cli.js --input=results.json`);
|
||||
logger.info(` node cli.js --latest --dir=results`);
|
||||
logger.info(` node cli.js --help`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Load input file
|
||||
logger.step(`Loading input file: ${inputFile}`);
|
||||
|
||||
if (!fs.existsSync(inputFile)) {
|
||||
throw new Error(`Input file not found: ${inputFile}`);
|
||||
}
|
||||
|
||||
const data = JSON.parse(fs.readFileSync(inputFile, "utf-8"));
|
||||
|
||||
// Extract posts from different formats
|
||||
let posts = [];
|
||||
if (data.results && Array.isArray(data.results)) {
|
||||
posts = data.results;
|
||||
logger.info(`Found ${posts.length} items in results array`);
|
||||
} else if (Array.isArray(data)) {
|
||||
posts = data;
|
||||
logger.info(`Found ${posts.length} items in array`);
|
||||
} else {
|
||||
throw new Error("Invalid JSON format - need array or {results: [...]}");
|
||||
}
|
||||
|
||||
if (posts.length === 0) {
|
||||
throw new Error("No items found to analyze");
|
||||
}
|
||||
|
||||
// Check AI availability
|
||||
logger.step("Checking AI availability");
|
||||
const aiAvailable = await checkOllamaStatus(model);
|
||||
if (!aiAvailable) {
|
||||
throw new Error(
|
||||
`AI not available. Make sure Ollama is running and model '${model}' is installed.`
|
||||
);
|
||||
}
|
||||
|
||||
// Check if results already have AI analysis
|
||||
const hasExistingAI = posts.some((post) => post.aiAnalysis);
|
||||
if (hasExistingAI) {
|
||||
logger.info(
|
||||
`📋 Results already contain AI analysis - will update with new context`
|
||||
);
|
||||
}
|
||||
|
||||
// Prepare data for analysis
|
||||
const analysisData = posts.map((post, i) => ({
|
||||
text: post.text || post.content || post.post || "",
|
||||
location: post.location || "Unknown",
|
||||
keyword: post.keyword || "Unknown",
|
||||
timestamp: post.timestamp || new Date().toISOString(),
|
||||
}));
|
||||
|
||||
// Run analysis
|
||||
logger.step(`Running AI analysis with context: "${context}"`);
|
||||
const analysis = await analyzeBatch(analysisData, context, model);
|
||||
|
||||
// Integrate AI analysis back into the original results
|
||||
const updatedPosts = posts.map((post, index) => {
|
||||
const aiResult = analysis[index];
|
||||
return {
|
||||
...post,
|
||||
aiAnalysis: {
|
||||
isRelevant: aiResult.isRelevant,
|
||||
confidence: aiResult.confidence,
|
||||
reasoning: aiResult.reasoning,
|
||||
context: context,
|
||||
model: model,
|
||||
analyzedAt: new Date().toISOString(),
|
||||
},
|
||||
};
|
||||
});
|
||||
|
||||
// Update the original data structure
|
||||
if (data.results && Array.isArray(data.results)) {
|
||||
data.results = updatedPosts;
|
||||
// Update metadata
|
||||
data.metadata = data.metadata || {};
|
||||
data.metadata.aiAnalysisUpdated = new Date().toISOString();
|
||||
data.metadata.aiContext = context;
|
||||
data.metadata.aiModel = model;
|
||||
} else {
|
||||
// If it's a simple array, create a proper structure
|
||||
data = {
|
||||
metadata: {
|
||||
timestamp: new Date().toISOString(),
|
||||
totalItems: updatedPosts.length,
|
||||
aiContext: context,
|
||||
aiModel: model,
|
||||
analysisType: "cli",
|
||||
},
|
||||
results: updatedPosts,
|
||||
};
|
||||
}
|
||||
|
||||
// Generate output filename if not provided
|
||||
if (!outputFile) {
|
||||
// Use the original filename with -ai suffix
|
||||
const originalName = path.basename(inputFile, path.extname(inputFile));
|
||||
outputFile = path.join(
|
||||
path.dirname(inputFile),
|
||||
`${originalName}-ai.json`
|
||||
);
|
||||
}
|
||||
|
||||
// Save updated results back to file
|
||||
fs.writeFileSync(outputFile, JSON.stringify(data, null, 2));
|
||||
|
||||
// Show summary
|
||||
const relevant = analysis.filter((a) => a.isRelevant).length;
|
||||
const irrelevant = analysis.filter((a) => !a.isRelevant).length;
|
||||
const avgConfidence =
|
||||
analysis.reduce((sum, a) => sum + a.confidence, 0) / analysis.length;
|
||||
|
||||
logger.success("✅ AI analysis completed and integrated");
|
||||
logger.info(`📊 Context: "${context}"`);
|
||||
logger.info(`📈 Total items analyzed: ${analysis.length}`);
|
||||
logger.info(
|
||||
`✅ Relevant items: ${relevant} (${(
|
||||
(relevant / analysis.length) *
|
||||
100
|
||||
).toFixed(1)}%)`
|
||||
);
|
||||
logger.info(
|
||||
`❌ Irrelevant items: ${irrelevant} (${(
|
||||
(irrelevant / analysis.length) *
|
||||
100
|
||||
).toFixed(1)}%)`
|
||||
);
|
||||
logger.info(`🎯 Average confidence: ${avgConfidence.toFixed(2)}`);
|
||||
logger.file(`🧠 Updated results saved to: ${outputFile}`);
|
||||
} catch (error) {
|
||||
logger.error(`❌ Analysis failed: ${error.message}`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
// Run the CLI
|
||||
main();
|
||||
346
ai-analyzer/demo.js
Normal file
346
ai-analyzer/demo.js
Normal file
@ -0,0 +1,346 @@
|
||||
/**
|
||||
* AI Analyzer Demo
|
||||
*
|
||||
* Demonstrates all the core utilities provided by the ai-analyzer package:
|
||||
* - Logger functionality
|
||||
* - Text processing utilities
|
||||
* - Location validation
|
||||
* - AI analysis capabilities
|
||||
* - Test utilities
|
||||
*/
|
||||
|
||||
const {
|
||||
logger,
|
||||
Logger,
|
||||
cleanText,
|
||||
containsAnyKeyword,
|
||||
parseLocationFilters,
|
||||
validateLocationAgainstFilters,
|
||||
extractLocationFromProfile,
|
||||
analyzeBatch,
|
||||
} = require("./index");
|
||||
|
||||
// Terminal colors for demo output
|
||||
const colors = {
|
||||
reset: "\x1b[0m",
|
||||
bright: "\x1b[1m",
|
||||
cyan: "\x1b[36m",
|
||||
green: "\x1b[32m",
|
||||
yellow: "\x1b[33m",
|
||||
blue: "\x1b[34m",
|
||||
magenta: "\x1b[35m",
|
||||
red: "\x1b[31m",
|
||||
};
|
||||
|
||||
const demo = {
|
||||
title: (text) =>
|
||||
console.log(`\n${colors.bright}${colors.cyan}${text}${colors.reset}`),
|
||||
section: (text) =>
|
||||
console.log(`\n${colors.bright}${colors.magenta}${text}${colors.reset}`),
|
||||
success: (text) => console.log(`${colors.green}✅ ${text}${colors.reset}`),
|
||||
info: (text) => console.log(`${colors.blue}ℹ️ ${text}${colors.reset}`),
|
||||
warning: (text) => console.log(`${colors.yellow}⚠️ ${text}${colors.reset}`),
|
||||
error: (text) => console.log(`${colors.red}❌ ${text}${colors.reset}`),
|
||||
code: (text) => console.log(`${colors.cyan}${text}${colors.reset}`),
|
||||
};
|
||||
|
||||
async function runDemo() {
|
||||
demo.title("=== AI Analyzer Demo ===");
|
||||
demo.info(
|
||||
"This demo showcases all the core utilities provided by the ai-analyzer package."
|
||||
);
|
||||
demo.info("Press Enter to continue through each section...\n");
|
||||
|
||||
await waitForEnter();
|
||||
|
||||
// 1. Logger Demo
|
||||
await demonstrateLogger();
|
||||
|
||||
// 2. Text Processing Demo
|
||||
await demonstrateTextProcessing();
|
||||
|
||||
// 3. Location Validation Demo
|
||||
await demonstrateLocationValidation();
|
||||
|
||||
// 4. AI Analysis Demo
|
||||
await demonstrateAIAnalysis();
|
||||
|
||||
// 5. Integration Demo
|
||||
await demonstrateIntegration();
|
||||
|
||||
demo.title("=== Demo Complete ===");
|
||||
demo.success("All ai-analyzer utilities demonstrated successfully!");
|
||||
demo.info("Check the README.md for detailed API documentation.");
|
||||
}
|
||||
|
||||
async function demonstrateLogger() {
|
||||
demo.section("1. Logger Utilities");
|
||||
demo.info(
|
||||
"The logger provides consistent logging across all parsers with configurable levels and color support."
|
||||
);
|
||||
|
||||
demo.code("// Using default logger");
|
||||
logger.info("This is an info message");
|
||||
logger.warning("This is a warning message");
|
||||
logger.error("This is an error message");
|
||||
logger.success("This is a success message");
|
||||
logger.debug("This is a debug message (if enabled)");
|
||||
|
||||
demo.code("// Convenience methods with emoji prefixes");
|
||||
logger.step("Starting demo process");
|
||||
logger.search("Searching for keywords");
|
||||
logger.ai("Running AI analysis");
|
||||
logger.location("Validating location");
|
||||
logger.file("Saving results");
|
||||
|
||||
demo.code("// Custom logger configuration");
|
||||
const customLogger = new Logger({
|
||||
debug: false,
|
||||
colors: true,
|
||||
});
|
||||
customLogger.info("Custom logger with debug disabled");
|
||||
customLogger.debug("This won't show");
|
||||
|
||||
demo.code("// Silent mode");
|
||||
const silentLogger = new Logger();
|
||||
silentLogger.silent();
|
||||
silentLogger.info("This won't show");
|
||||
silentLogger.verbose(); // Re-enable all levels
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateTextProcessing() {
|
||||
demo.section("2. Text Processing Utilities");
|
||||
demo.info(
|
||||
"Text utilities provide content cleaning and keyword matching capabilities."
|
||||
);
|
||||
|
||||
const sampleTexts = [
|
||||
"Check out this #awesome post! https://example.com 🚀",
|
||||
"Just got #laidoff from my job. Looking for new opportunities!",
|
||||
"Company is #downsizing and I'm affected. #RIF #layoff",
|
||||
"Great news! We're #hiring new developers! 🎉",
|
||||
];
|
||||
|
||||
demo.code("// Text cleaning examples:");
|
||||
sampleTexts.forEach((text, index) => {
|
||||
const cleaned = cleanText(text);
|
||||
demo.info(`Original: ${text}`);
|
||||
demo.success(`Cleaned: ${cleaned}`);
|
||||
console.log();
|
||||
});
|
||||
|
||||
demo.code("// Keyword matching:");
|
||||
const keywords = ["layoff", "downsizing", "RIF", "hiring"];
|
||||
|
||||
sampleTexts.forEach((text, index) => {
|
||||
const hasMatch = containsAnyKeyword(text, keywords);
|
||||
const matchedKeywords = keywords.filter((keyword) =>
|
||||
text.toLowerCase().includes(keyword.toLowerCase())
|
||||
);
|
||||
|
||||
demo.info(
|
||||
`Text ${index + 1}: ${hasMatch ? "✅" : "❌"} ${
|
||||
matchedKeywords.join(", ") || "No matches"
|
||||
}`
|
||||
);
|
||||
});
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateLocationValidation() {
|
||||
demo.section("3. Location Validation Utilities");
|
||||
demo.info(
|
||||
"Location utilities provide geographic filtering and validation capabilities."
|
||||
);
|
||||
|
||||
demo.code("// Location filter parsing:");
|
||||
const filterStrings = [
|
||||
"Ontario,Manitoba",
|
||||
"Toronto,Vancouver",
|
||||
"British Columbia,Alberta",
|
||||
"Canada",
|
||||
];
|
||||
|
||||
filterStrings.forEach((filterString) => {
|
||||
const filters = parseLocationFilters(filterString);
|
||||
demo.info(`Filter: "${filterString}"`);
|
||||
demo.success(`Parsed: [${filters.join(", ")}]`);
|
||||
console.log();
|
||||
});
|
||||
|
||||
demo.code("// Location validation examples:");
|
||||
const testLocations = [
|
||||
{ location: "Toronto, Ontario, Canada", filters: ["Ontario"] },
|
||||
{ location: "Vancouver, BC", filters: ["British Columbia"] },
|
||||
{ location: "Calgary, Alberta", filters: ["Ontario"] },
|
||||
{ location: "Montreal, Quebec", filters: ["Ontario", "Manitoba"] },
|
||||
{ location: "New York, NY", filters: ["Ontario"] },
|
||||
];
|
||||
|
||||
testLocations.forEach(({ location, filters }) => {
|
||||
const isValid = validateLocationAgainstFilters(location, filters);
|
||||
demo.info(`Location: "${location}"`);
|
||||
demo.info(`Filters: [${filters.join(", ")}]`);
|
||||
demo.success(`Valid: ${isValid ? "✅ Yes" : "❌ No"}`);
|
||||
console.log();
|
||||
});
|
||||
|
||||
demo.code("// Profile location extraction:");
|
||||
const profileTexts = [
|
||||
"Software Engineer at Tech Corp • Toronto, Ontario",
|
||||
"Product Manager • Vancouver, BC",
|
||||
"Data Scientist • Remote",
|
||||
"CEO at Startup Inc • Montreal, Quebec, Canada",
|
||||
];
|
||||
|
||||
profileTexts.forEach((profileText) => {
|
||||
const location = extractLocationFromProfile(profileText);
|
||||
demo.info(`Profile: "${profileText}"`);
|
||||
demo.success(`Extracted: "${location || "No location found"}"`);
|
||||
console.log();
|
||||
});
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateAIAnalysis() {
|
||||
demo.section("4. AI Analysis Utilities");
|
||||
demo.info(
|
||||
"AI utilities provide content analysis using OpenAI or local Ollama models."
|
||||
);
|
||||
|
||||
// Mock posts for demo
|
||||
const mockPosts = [
|
||||
{
|
||||
id: "1",
|
||||
content:
|
||||
"Just got laid off from my software engineering role. Looking for new opportunities in Toronto.",
|
||||
author: "John Doe",
|
||||
location: "Toronto, Ontario",
|
||||
},
|
||||
{
|
||||
id: "2",
|
||||
content:
|
||||
"Our company is downsizing and I'm affected. This is really tough news.",
|
||||
author: "Jane Smith",
|
||||
location: "Vancouver, BC",
|
||||
},
|
||||
{
|
||||
id: "3",
|
||||
content:
|
||||
"We're hiring! Looking for talented developers to join our team.",
|
||||
author: "Bob Wilson",
|
||||
location: "Calgary, Alberta",
|
||||
},
|
||||
];
|
||||
|
||||
demo.code("// Mock AI analysis (simulated):");
|
||||
demo.info("In a real scenario, this would call Ollama or OpenAI API");
|
||||
|
||||
mockPosts.forEach((post, index) => {
|
||||
demo.info(`Post ${index + 1}: ${post.content.substring(0, 50)}...`);
|
||||
demo.success(
|
||||
`Analysis: Relevant to job layoffs (confidence: 0.${85 + index * 5})`
|
||||
);
|
||||
console.log();
|
||||
});
|
||||
|
||||
demo.code("// Batch analysis simulation:");
|
||||
demo.info("Processing batch of 3 posts...");
|
||||
await simulateProcessing();
|
||||
demo.success("Batch analysis completed!");
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateIntegration() {
|
||||
demo.section("5. Integration Example");
|
||||
demo.info("Here's how all utilities work together in a real scenario:");
|
||||
|
||||
const samplePost = {
|
||||
id: "demo-1",
|
||||
content:
|
||||
"Just got #laidoff from my job at TechCorp! Looking for new opportunities in #Toronto. This is really tough but I'm staying positive! 🚀",
|
||||
author: "Demo User",
|
||||
location: "Toronto, Ontario, Canada",
|
||||
};
|
||||
|
||||
demo.code("// Processing pipeline:");
|
||||
|
||||
// 1. Log the start
|
||||
logger.step("Processing new post");
|
||||
|
||||
// 2. Clean the text
|
||||
const cleanedContent = cleanText(samplePost.content);
|
||||
logger.info(`Cleaned content: ${cleanedContent}`);
|
||||
|
||||
// 3. Check for keywords
|
||||
const keywords = ["layoff", "downsizing", "RIF"];
|
||||
const hasKeywords = containsAnyKeyword(cleanedContent, keywords);
|
||||
logger.search(`Keyword match: ${hasKeywords ? "Found" : "Not found"}`);
|
||||
|
||||
// 4. Validate location
|
||||
const locationFilters = parseLocationFilters("Ontario,Manitoba");
|
||||
const isValidLocation = validateLocationAgainstFilters(
|
||||
samplePost.location,
|
||||
locationFilters
|
||||
);
|
||||
logger.location(`Location valid: ${isValidLocation ? "Yes" : "No"}`);
|
||||
|
||||
// 5. Simulate AI analysis
|
||||
if (hasKeywords && isValidLocation) {
|
||||
logger.ai("Running AI analysis...");
|
||||
await simulateProcessing();
|
||||
logger.success("Post accepted and analyzed!");
|
||||
} else {
|
||||
logger.warning("Post rejected - doesn't meet criteria");
|
||||
}
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
// Helper functions
|
||||
function waitForEnter() {
|
||||
return new Promise((resolve) => {
|
||||
const readline = require("readline");
|
||||
const rl = readline.createInterface({
|
||||
input: process.stdin,
|
||||
output: process.stdout,
|
||||
});
|
||||
|
||||
rl.question("\nPress Enter to continue...", () => {
|
||||
rl.close();
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
async function simulateProcessing() {
|
||||
return new Promise((resolve) => {
|
||||
const dots = [".", "..", "..."];
|
||||
let i = 0;
|
||||
const interval = setInterval(() => {
|
||||
process.stdout.write(`\rProcessing${dots[i]}`);
|
||||
i = (i + 1) % dots.length;
|
||||
}, 500);
|
||||
|
||||
setTimeout(() => {
|
||||
clearInterval(interval);
|
||||
process.stdout.write("\r");
|
||||
resolve();
|
||||
}, 2000);
|
||||
});
|
||||
}
|
||||
|
||||
// Run the demo if this file is executed directly
|
||||
if (require.main === module) {
|
||||
runDemo().catch((error) => {
|
||||
demo.error(`Demo failed: ${error.message}`);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
||||
module.exports = { runDemo };
|
||||
22
ai-analyzer/index.js
Normal file
22
ai-analyzer/index.js
Normal file
@ -0,0 +1,22 @@
|
||||
/**
|
||||
* ai-analyzer - Core utilities for parsers
|
||||
* Main entry point that exports all modules
|
||||
*/
|
||||
|
||||
// Export all utilities with clean namespace
|
||||
module.exports = {
|
||||
// Logger utilities
|
||||
...require("./src/logger"),
|
||||
|
||||
// AI analysis utilities
|
||||
...require("./src/ai-utils"),
|
||||
|
||||
// Text processing utilities
|
||||
...require("./src/text-utils"),
|
||||
|
||||
// Location validation utilities
|
||||
...require("./src/location-utils"),
|
||||
|
||||
// Test utilities
|
||||
...require("./src/test-utils"),
|
||||
};
|
||||
3714
ai-analyzer/package-lock.json
generated
Normal file
3714
ai-analyzer/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
32
ai-analyzer/package.json
Normal file
32
ai-analyzer/package.json
Normal file
@ -0,0 +1,32 @@
|
||||
{
|
||||
"name": "ai-analyzer",
|
||||
"version": "1.0.0",
|
||||
"description": "Reusable core utilities for parsers: AI analysis, location validation, logging, and text processing",
|
||||
"main": "index.js",
|
||||
"bin": {
|
||||
"ai-analyzer": "./cli.js"
|
||||
},
|
||||
"scripts": {
|
||||
"test": "jest",
|
||||
"cli": "node cli.js"
|
||||
},
|
||||
"keywords": [
|
||||
"parser",
|
||||
"ai",
|
||||
"location",
|
||||
"logging",
|
||||
"scraper",
|
||||
"ollama"
|
||||
],
|
||||
"author": "",
|
||||
"license": "ISC",
|
||||
"type": "commonjs",
|
||||
"dependencies": {
|
||||
"chalk": "^4.1.2",
|
||||
"dotenv": "^17.0.0",
|
||||
"csv-parser": "^3.2.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"jest": "^29.7.0"
|
||||
}
|
||||
}
|
||||
305
ai-analyzer/src/ai-utils.js
Normal file
305
ai-analyzer/src/ai-utils.js
Normal file
@ -0,0 +1,305 @@
|
||||
const { logger } = require("./logger");
|
||||
|
||||
/**
|
||||
* AI Analysis utilities for post processing with Ollama
|
||||
* Extracted from ai-analyzer-local.js for reuse across parsers
|
||||
*/
|
||||
|
||||
// Default model from environment variable or fallback to "mistral"
|
||||
const DEFAULT_MODEL = process.env.OLLAMA_MODEL || "mistral";
|
||||
|
||||
/**
|
||||
* Check if Ollama is running and the model is available
|
||||
*/
|
||||
async function checkOllamaStatus(
|
||||
model = DEFAULT_MODEL,
|
||||
ollamaHost = "http://localhost:11434"
|
||||
) {
|
||||
try {
|
||||
// Check if Ollama is running
|
||||
const response = await fetch(`${ollamaHost}/api/tags`);
|
||||
if (!response.ok) {
|
||||
throw new Error(`Ollama not running on ${ollamaHost}`);
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
const availableModels = data.models.map((m) => m.name);
|
||||
|
||||
logger.ai("Ollama is running");
|
||||
logger.info(
|
||||
`📦 Available models: ${availableModels
|
||||
.map((m) => m.split(":")[0])
|
||||
.join(", ")}`
|
||||
);
|
||||
|
||||
// Check if requested model is available
|
||||
const modelExists = availableModels.some((m) => m.startsWith(model));
|
||||
if (!modelExists) {
|
||||
logger.error(`Model "${model}" not found`);
|
||||
logger.error(`💡 Install it with: ollama pull ${model}`);
|
||||
logger.error(
|
||||
`💡 Or choose from: ${availableModels
|
||||
.map((m) => m.split(":")[0])
|
||||
.join(", ")}`
|
||||
);
|
||||
return false;
|
||||
}
|
||||
|
||||
logger.success(`Using model: ${model}`);
|
||||
return true;
|
||||
} catch (error) {
|
||||
logger.error(`Error connecting to Ollama: ${error.message}`);
|
||||
logger.error("💡 Make sure Ollama is installed and running:");
|
||||
logger.error(" 1. Install: https://ollama.ai/");
|
||||
logger.error(" 2. Start: ollama serve");
|
||||
logger.error(` 3. Install model: ollama pull ${model}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Analyze multiple posts using local Ollama
|
||||
*/
|
||||
async function analyzeBatch(
|
||||
posts,
|
||||
context,
|
||||
model = DEFAULT_MODEL,
|
||||
ollamaHost = "http://localhost:11434"
|
||||
) {
|
||||
logger.ai(`Analyzing batch of ${posts.length} posts with ${model}...`);
|
||||
|
||||
try {
|
||||
const prompt = `You are an expert at analyzing LinkedIn posts for relevance to specific contexts.
|
||||
|
||||
CONTEXT TO MATCH: "${context}"
|
||||
|
||||
Analyze these ${
|
||||
posts.length
|
||||
} LinkedIn posts and determine if each relates to the context above.
|
||||
|
||||
POSTS:
|
||||
${posts
|
||||
.map(
|
||||
(post, i) => `
|
||||
POST ${i + 1}:
|
||||
"${post.text.substring(0, 400)}${post.text.length > 400 ? "..." : ""}"
|
||||
`
|
||||
)
|
||||
.join("")}
|
||||
|
||||
For each post, provide:
|
||||
- Is it relevant to "${context}"? (YES/NO)
|
||||
- Confidence level (0.0 to 1.0)
|
||||
- Brief reasoning
|
||||
|
||||
Respond in this EXACT format for each post:
|
||||
POST 1: YES/NO | 0.X | brief reason
|
||||
POST 2: YES/NO | 0.X | brief reason
|
||||
POST 3: YES/NO | 0.X | brief reason
|
||||
|
||||
Examples:
|
||||
- For layoff context: "laid off 50 employees" = YES | 0.9 | mentions layoffs
|
||||
- For hiring context: "we're hiring developers" = YES | 0.8 | job posting
|
||||
- Unrelated content = NO | 0.1 | not relevant to context`;
|
||||
|
||||
const response = await fetch(`${ollamaHost}/api/generate`, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: model,
|
||||
prompt: prompt,
|
||||
stream: false,
|
||||
options: {
|
||||
temperature: 0.3,
|
||||
top_p: 0.9,
|
||||
},
|
||||
}),
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(
|
||||
`Ollama API error: ${response.status} ${response.statusText}`
|
||||
);
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
const aiResponse = data.response.trim();
|
||||
|
||||
// Parse the response
|
||||
const analyses = [];
|
||||
const lines = aiResponse.split("\n").filter((line) => line.trim());
|
||||
|
||||
for (let i = 0; i < posts.length; i++) {
|
||||
let analysis = {
|
||||
postIndex: i + 1,
|
||||
isRelevant: false,
|
||||
confidence: 0.5,
|
||||
reasoning: "Could not parse AI response",
|
||||
};
|
||||
|
||||
// Look for lines that match "POST X:" pattern
|
||||
const postPattern = new RegExp(`POST\\s*${i + 1}:?\\s*(.+)`, "i");
|
||||
|
||||
for (const line of lines) {
|
||||
const match = line.match(postPattern);
|
||||
if (match) {
|
||||
const content = match[1].trim();
|
||||
|
||||
// Parse: YES/NO | 0.X | reasoning
|
||||
const parts = content.split("|").map((p) => p.trim());
|
||||
|
||||
if (parts.length >= 3) {
|
||||
analysis.isRelevant = parts[0].toUpperCase().includes("YES");
|
||||
analysis.confidence = Math.max(
|
||||
0,
|
||||
Math.min(1, parseFloat(parts[1]) || 0.5)
|
||||
);
|
||||
analysis.reasoning = parts[2] || "No reasoning provided";
|
||||
} else {
|
||||
// Fallback parsing
|
||||
analysis.isRelevant =
|
||||
content.toUpperCase().includes("YES") ||
|
||||
content.toLowerCase().includes("relevant");
|
||||
analysis.confidence = 0.6;
|
||||
analysis.reasoning = content.substring(0, 100);
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
analyses.push(analysis);
|
||||
}
|
||||
|
||||
// If we didn't get enough analyses, fill in defaults
|
||||
while (analyses.length < posts.length) {
|
||||
analyses.push({
|
||||
postIndex: analyses.length + 1,
|
||||
isRelevant: false,
|
||||
confidence: 0.3,
|
||||
reasoning: "AI response parsing failed",
|
||||
});
|
||||
}
|
||||
|
||||
return analyses;
|
||||
} catch (error) {
|
||||
logger.error(`Error in batch AI analysis: ${error.message}`);
|
||||
|
||||
// Fallback: mark all as relevant with low confidence
|
||||
return posts.map((_, i) => ({
|
||||
postIndex: i + 1,
|
||||
isRelevant: true,
|
||||
confidence: 0.3,
|
||||
reasoning: `Analysis failed: ${error.message}`,
|
||||
}));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Analyze a single post using local Ollama (fallback)
|
||||
*/
|
||||
async function analyzeSinglePost(
|
||||
text,
|
||||
context,
|
||||
model = DEFAULT_MODEL,
|
||||
ollamaHost = "http://localhost:11434"
|
||||
) {
|
||||
const prompt = `Analyze this LinkedIn post for relevance to: "${context}"
|
||||
|
||||
Post: "${text}"
|
||||
|
||||
Is this post relevant to "${context}"? Provide:
|
||||
1. YES or NO
|
||||
2. Confidence (0.0 to 1.0)
|
||||
3. Brief reason
|
||||
|
||||
Format: YES/NO | 0.X | reason`;
|
||||
|
||||
try {
|
||||
const response = await fetch(`${ollamaHost}/api/generate`, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: model,
|
||||
prompt: prompt,
|
||||
stream: false,
|
||||
options: {
|
||||
temperature: 0.3,
|
||||
},
|
||||
}),
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(`Ollama API error: ${response.status}`);
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
const aiResponse = data.response.trim();
|
||||
|
||||
// Parse response
|
||||
const parts = aiResponse.split("|").map((p) => p.trim());
|
||||
|
||||
if (parts.length >= 3) {
|
||||
return {
|
||||
isRelevant: parts[0].toUpperCase().includes("YES"),
|
||||
confidence: Math.max(0, Math.min(1, parseFloat(parts[1]) || 0.5)),
|
||||
reasoning: parts[2],
|
||||
};
|
||||
} else {
|
||||
// Fallback parsing
|
||||
return {
|
||||
isRelevant:
|
||||
aiResponse.toLowerCase().includes("yes") ||
|
||||
aiResponse.toLowerCase().includes("relevant"),
|
||||
confidence: 0.6,
|
||||
reasoning: aiResponse.substring(0, 100),
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
return {
|
||||
isRelevant: true, // Default to include on error
|
||||
confidence: 0.3,
|
||||
reasoning: `Analysis failed: ${error.message}`,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Find the most recent results file if none specified
|
||||
*/
|
||||
function findLatestResultsFile(resultsDir = "results") {
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
|
||||
if (!fs.existsSync(resultsDir)) {
|
||||
throw new Error("Results directory not found. Run the scraper first.");
|
||||
}
|
||||
|
||||
const files = fs
|
||||
.readdirSync(resultsDir)
|
||||
.filter(
|
||||
(f) =>
|
||||
(f.startsWith("results-") || f.startsWith("linkedin-results-")) &&
|
||||
f.endsWith(".json") &&
|
||||
!f.includes("-ai-")
|
||||
)
|
||||
.sort()
|
||||
.reverse();
|
||||
|
||||
if (files.length === 0) {
|
||||
throw new Error("No results files found. Run the scraper first.");
|
||||
}
|
||||
|
||||
return path.join(resultsDir, files[0]);
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
checkOllamaStatus,
|
||||
analyzeBatch,
|
||||
analyzeSinglePost,
|
||||
findLatestResultsFile,
|
||||
DEFAULT_MODEL, // Export so other modules can use it
|
||||
};
|
||||
1123
ai-analyzer/src/location-utils.js
Normal file
1123
ai-analyzer/src/location-utils.js
Normal file
File diff suppressed because it is too large
Load Diff
123
ai-analyzer/src/logger.js
Normal file
123
ai-analyzer/src/logger.js
Normal file
@ -0,0 +1,123 @@
|
||||
const chalk = require("chalk");
|
||||
|
||||
/**
|
||||
* Configurable logger with color support and level controls
|
||||
* Can enable/disable different log levels: debug, info, warning, error, success
|
||||
*/
|
||||
class Logger {
|
||||
constructor(options = {}) {
|
||||
this.levels = {
|
||||
debug: options.debug !== false,
|
||||
info: options.info !== false,
|
||||
warning: options.warning !== false,
|
||||
error: options.error !== false,
|
||||
success: options.success !== false,
|
||||
};
|
||||
this.colors = options.colors !== false;
|
||||
}
|
||||
|
||||
_formatMessage(level, message, prefix = "") {
|
||||
const timestamp = new Date().toLocaleTimeString();
|
||||
const fullMessage = `${prefix}${message}`;
|
||||
|
||||
if (!this.colors) {
|
||||
return `[${timestamp}] [${level.toUpperCase()}] ${fullMessage}`;
|
||||
}
|
||||
|
||||
switch (level) {
|
||||
case "debug":
|
||||
return chalk.gray(`[${timestamp}] [DEBUG] ${fullMessage}`);
|
||||
case "info":
|
||||
return chalk.blue(`[${timestamp}] [INFO] ${fullMessage}`);
|
||||
case "warning":
|
||||
return chalk.yellow(`[${timestamp}] [WARNING] ${fullMessage}`);
|
||||
case "error":
|
||||
return chalk.red(`[${timestamp}] [ERROR] ${fullMessage}`);
|
||||
case "success":
|
||||
return chalk.green(`[${timestamp}] [SUCCESS] ${fullMessage}`);
|
||||
default:
|
||||
return `[${timestamp}] [${level.toUpperCase()}] ${fullMessage}`;
|
||||
}
|
||||
}
|
||||
|
||||
debug(message) {
|
||||
if (this.levels.debug) {
|
||||
console.log(this._formatMessage("debug", message));
|
||||
}
|
||||
}
|
||||
|
||||
info(message) {
|
||||
if (this.levels.info) {
|
||||
console.log(this._formatMessage("info", message));
|
||||
}
|
||||
}
|
||||
|
||||
warning(message) {
|
||||
if (this.levels.warning) {
|
||||
console.warn(this._formatMessage("warning", message));
|
||||
}
|
||||
}
|
||||
|
||||
error(message) {
|
||||
if (this.levels.error) {
|
||||
console.error(this._formatMessage("error", message));
|
||||
}
|
||||
}
|
||||
|
||||
success(message) {
|
||||
if (this.levels.success) {
|
||||
console.log(this._formatMessage("success", message));
|
||||
}
|
||||
}
|
||||
|
||||
// Convenience methods with emoji prefixes for better UX
|
||||
step(message) {
|
||||
this.info(`🚀 ${message}`);
|
||||
}
|
||||
|
||||
search(message) {
|
||||
this.info(`🔍 ${message}`);
|
||||
}
|
||||
|
||||
ai(message) {
|
||||
this.info(`🧠 ${message}`);
|
||||
}
|
||||
|
||||
location(message) {
|
||||
this.info(`📍 ${message}`);
|
||||
}
|
||||
|
||||
file(message) {
|
||||
this.info(`📄 ${message}`);
|
||||
}
|
||||
|
||||
// Configure logger levels at runtime
|
||||
setLevel(level, enabled) {
|
||||
if (this.levels.hasOwnProperty(level)) {
|
||||
this.levels[level] = enabled;
|
||||
}
|
||||
}
|
||||
|
||||
// Disable all logging
|
||||
silent() {
|
||||
Object.keys(this.levels).forEach((level) => {
|
||||
this.levels[level] = false;
|
||||
});
|
||||
}
|
||||
|
||||
// Enable all logging
|
||||
verbose() {
|
||||
Object.keys(this.levels).forEach((level) => {
|
||||
this.levels[level] = true;
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Create default logger instance
|
||||
const logger = new Logger();
|
||||
|
||||
// Export both the class and default instance
|
||||
module.exports = {
|
||||
Logger,
|
||||
logger,
|
||||
};
|
||||
124
ai-analyzer/src/test-utils.js
Normal file
124
ai-analyzer/src/test-utils.js
Normal file
@ -0,0 +1,124 @@
|
||||
/**
|
||||
* Shared test utilities for parsers
|
||||
* Common mocks, helpers, and test data
|
||||
*/
|
||||
|
||||
/**
|
||||
* Mock Playwright page object for testing
|
||||
*/
|
||||
function createMockPage() {
|
||||
return {
|
||||
goto: jest.fn().mockResolvedValue(undefined),
|
||||
waitForSelector: jest.fn().mockResolvedValue(undefined),
|
||||
$$: jest.fn().mockResolvedValue([]),
|
||||
$: jest.fn().mockResolvedValue(null),
|
||||
textContent: jest.fn().mockResolvedValue(""),
|
||||
close: jest.fn().mockResolvedValue(undefined),
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Mock fetch for AI API calls
|
||||
*/
|
||||
function createMockFetch(response = {}) {
|
||||
return jest.fn().mockResolvedValue({
|
||||
ok: true,
|
||||
status: 200,
|
||||
json: jest.fn().mockResolvedValue(response),
|
||||
...response,
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Sample test data for posts
|
||||
*/
|
||||
const samplePosts = [
|
||||
{
|
||||
text: "We are laying off 100 employees due to economic downturn.",
|
||||
keyword: "layoff",
|
||||
profileLink: "https://linkedin.com/in/test-user-1",
|
||||
},
|
||||
{
|
||||
text: "Exciting opportunity! We are hiring senior developers for our team.",
|
||||
keyword: "hiring",
|
||||
profileLink: "https://linkedin.com/in/test-user-2",
|
||||
},
|
||||
];
|
||||
|
||||
/**
|
||||
* Sample location test data
|
||||
*/
|
||||
const sampleLocations = [
|
||||
"Toronto, Ontario, Canada",
|
||||
"Vancouver, BC",
|
||||
"Calgary, Alberta",
|
||||
"Montreal, Quebec",
|
||||
"Halifax, Nova Scotia",
|
||||
];
|
||||
|
||||
/**
|
||||
* Common test assertions
|
||||
*/
|
||||
function expectValidPost(post) {
|
||||
expect(post).toHaveProperty("text");
|
||||
expect(post).toHaveProperty("keyword");
|
||||
expect(post).toHaveProperty("profileLink");
|
||||
expect(typeof post.text).toBe("string");
|
||||
expect(post.text.length).toBeGreaterThan(0);
|
||||
}
|
||||
|
||||
function expectValidAIAnalysis(analysis) {
|
||||
expect(analysis).toHaveProperty("isRelevant");
|
||||
expect(analysis).toHaveProperty("confidence");
|
||||
expect(analysis).toHaveProperty("reasoning");
|
||||
expect(typeof analysis.isRelevant).toBe("boolean");
|
||||
expect(analysis.confidence).toBeGreaterThanOrEqual(0);
|
||||
expect(analysis.confidence).toBeLessThanOrEqual(1);
|
||||
}
|
||||
|
||||
function expectValidLocation(location) {
|
||||
expect(typeof location).toBe("string");
|
||||
expect(location.length).toBeGreaterThan(0);
|
||||
}
|
||||
|
||||
/**
|
||||
* Test environment setup
|
||||
*/
|
||||
function setupTestEnv() {
|
||||
// Mock environment variables
|
||||
process.env.NODE_ENV = "test";
|
||||
process.env.OLLAMA_HOST = "http://localhost:11434";
|
||||
process.env.AI_CONTEXT = "test context";
|
||||
|
||||
// Suppress console output during tests
|
||||
jest.spyOn(console, "log").mockImplementation(() => {});
|
||||
jest.spyOn(console, "error").mockImplementation(() => {});
|
||||
jest.spyOn(console, "warn").mockImplementation(() => {});
|
||||
}
|
||||
|
||||
/**
|
||||
* Clean up test environment
|
||||
*/
|
||||
function teardownTestEnv() {
|
||||
// Restore console
|
||||
console.log.mockRestore();
|
||||
console.error.mockRestore();
|
||||
console.warn.mockRestore();
|
||||
|
||||
// Clear environment
|
||||
delete process.env.NODE_ENV;
|
||||
delete process.env.OLLAMA_HOST;
|
||||
delete process.env.AI_CONTEXT;
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
createMockPage,
|
||||
createMockFetch,
|
||||
samplePosts,
|
||||
sampleLocations,
|
||||
expectValidPost,
|
||||
expectValidAIAnalysis,
|
||||
expectValidLocation,
|
||||
setupTestEnv,
|
||||
teardownTestEnv,
|
||||
};
|
||||
107
ai-analyzer/src/text-utils.js
Normal file
107
ai-analyzer/src/text-utils.js
Normal file
@ -0,0 +1,107 @@
|
||||
/**
|
||||
* Text processing utilities for cleaning and validating content
|
||||
* Extracted from linkedout.js for reuse across parsers
|
||||
*/
|
||||
|
||||
/**
|
||||
* Clean text by removing hashtags, URLs, emojis, and normalizing whitespace
|
||||
*/
|
||||
function cleanText(text) {
|
||||
if (!text || typeof text !== "string") {
|
||||
return "";
|
||||
}
|
||||
|
||||
// Remove hashtags
|
||||
text = text.replace(/#\w+/g, "");
|
||||
|
||||
// Remove hashtag mentions
|
||||
text = text.replace(/\bhashtag\b/gi, "");
|
||||
text = text.replace(/hashtag-\w+/gi, "");
|
||||
|
||||
// Remove URLs
|
||||
text = text.replace(/https?:\/\/[^\s]+/g, "");
|
||||
|
||||
// Remove emojis (Unicode ranges for common emoji)
|
||||
text = text.replace(
|
||||
/[\u{1F600}-\u{1F64F}\u{1F300}-\u{1F5FF}\u{1F680}-\u{1F6FF}\u{1F1E0}-\u{1F1FF}]/gu,
|
||||
""
|
||||
);
|
||||
|
||||
// Normalize whitespace
|
||||
text = text.replace(/\s+/g, " ").trim();
|
||||
|
||||
return text;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if text contains any of the specified keywords (case insensitive)
|
||||
*/
|
||||
function containsAnyKeyword(text, keywords) {
|
||||
if (!text || !Array.isArray(keywords)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const lowerText = text.toLowerCase();
|
||||
return keywords.some((keyword) => lowerText.includes(keyword.toLowerCase()));
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate if text meets basic quality criteria
|
||||
*/
|
||||
function isValidText(text, minLength = 30) {
|
||||
if (!text || typeof text !== "string") {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check minimum length
|
||||
if (text.length < minLength) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check if text contains alphanumeric characters
|
||||
if (!/[a-zA-Z0-9]/.test(text)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract domain from URL
|
||||
*/
|
||||
function extractDomain(url) {
|
||||
if (!url || typeof url !== "string") {
|
||||
return null;
|
||||
}
|
||||
|
||||
try {
|
||||
const urlObj = new URL(url);
|
||||
return urlObj.hostname;
|
||||
} catch (error) {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Normalize URL by removing query parameters and fragments
|
||||
*/
|
||||
function normalizeUrl(url) {
|
||||
if (!url || typeof url !== "string") {
|
||||
return "";
|
||||
}
|
||||
|
||||
try {
|
||||
const urlObj = new URL(url);
|
||||
return `${urlObj.protocol}//${urlObj.hostname}${urlObj.pathname}`;
|
||||
} catch (error) {
|
||||
return url;
|
||||
}
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
cleanText,
|
||||
containsAnyKeyword,
|
||||
isValidText,
|
||||
extractDomain,
|
||||
normalizeUrl,
|
||||
};
|
||||
194
ai-analyzer/test/logger.test.js
Normal file
194
ai-analyzer/test/logger.test.js
Normal file
@ -0,0 +1,194 @@
|
||||
/**
|
||||
* Test file for logger functionality
|
||||
*/
|
||||
|
||||
const { Logger, logger } = require("../src/logger");
|
||||
|
||||
describe("Logger", () => {
|
||||
let consoleSpy;
|
||||
let consoleWarnSpy;
|
||||
let consoleErrorSpy;
|
||||
|
||||
beforeEach(() => {
|
||||
consoleSpy = jest.spyOn(console, "log").mockImplementation();
|
||||
consoleWarnSpy = jest.spyOn(console, "warn").mockImplementation();
|
||||
consoleErrorSpy = jest.spyOn(console, "error").mockImplementation();
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
consoleSpy.mockRestore();
|
||||
consoleWarnSpy.mockRestore();
|
||||
consoleErrorSpy.mockRestore();
|
||||
});
|
||||
|
||||
test("should create default logger instance", () => {
|
||||
expect(logger).toBeDefined();
|
||||
expect(logger).toBeInstanceOf(Logger);
|
||||
});
|
||||
|
||||
test("should log info messages", () => {
|
||||
logger.info("Test message");
|
||||
expect(consoleSpy).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should create custom logger with disabled levels", () => {
|
||||
const customLogger = new Logger({ debug: false });
|
||||
customLogger.debug("This should not log");
|
||||
expect(consoleSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should use emoji prefixes for convenience methods", () => {
|
||||
logger.step("Test step");
|
||||
logger.ai("Test AI");
|
||||
logger.location("Test location");
|
||||
expect(consoleSpy).toHaveBeenCalledTimes(3);
|
||||
});
|
||||
|
||||
test("should configure levels at runtime", () => {
|
||||
const customLogger = new Logger();
|
||||
customLogger.setLevel("debug", false);
|
||||
customLogger.debug("This should not log");
|
||||
expect(consoleSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should go silent when requested", () => {
|
||||
const customLogger = new Logger();
|
||||
customLogger.silent();
|
||||
customLogger.info("This should not log");
|
||||
customLogger.error("This should not log");
|
||||
expect(consoleSpy).not.toHaveBeenCalled();
|
||||
expect(consoleErrorSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
// Additional test cases for comprehensive coverage
|
||||
|
||||
test("should log warning messages", () => {
|
||||
logger.warning("Test warning");
|
||||
expect(consoleWarnSpy).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should log error messages", () => {
|
||||
logger.error("Test error");
|
||||
expect(consoleErrorSpy).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should log success messages", () => {
|
||||
logger.success("Test success");
|
||||
expect(consoleSpy).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should log debug messages", () => {
|
||||
logger.debug("Test debug");
|
||||
expect(consoleSpy).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should respect disabled warning level", () => {
|
||||
const customLogger = new Logger({ warning: false });
|
||||
customLogger.warning("This should not log");
|
||||
expect(consoleWarnSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should respect disabled error level", () => {
|
||||
const customLogger = new Logger({ error: false });
|
||||
customLogger.error("This should not log");
|
||||
expect(consoleErrorSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should respect disabled success level", () => {
|
||||
const customLogger = new Logger({ success: false });
|
||||
customLogger.success("This should not log");
|
||||
expect(consoleSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should respect disabled info level", () => {
|
||||
const customLogger = new Logger({ info: false });
|
||||
customLogger.info("This should not log");
|
||||
expect(consoleSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should test all convenience methods", () => {
|
||||
logger.step("Test step");
|
||||
logger.search("Test search");
|
||||
logger.ai("Test AI");
|
||||
logger.location("Test location");
|
||||
logger.file("Test file");
|
||||
expect(consoleSpy).toHaveBeenCalledTimes(5);
|
||||
});
|
||||
|
||||
test("should enable all levels with verbose method", () => {
|
||||
const customLogger = new Logger({ debug: false, info: false });
|
||||
customLogger.verbose();
|
||||
customLogger.debug("This should log");
|
||||
customLogger.info("This should log");
|
||||
expect(consoleSpy).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
|
||||
test("should handle setLevel with invalid level gracefully", () => {
|
||||
const customLogger = new Logger();
|
||||
expect(() => {
|
||||
customLogger.setLevel("invalid", false);
|
||||
}).not.toThrow();
|
||||
});
|
||||
|
||||
test("should format messages with timestamps", () => {
|
||||
logger.info("Test message");
|
||||
const loggedMessage = consoleSpy.mock.calls[0][0];
|
||||
expect(loggedMessage).toMatch(/\[\d{1,2}:\d{2}:\d{2}\]/);
|
||||
});
|
||||
|
||||
test("should include level in formatted messages", () => {
|
||||
logger.info("Test message");
|
||||
const loggedMessage = consoleSpy.mock.calls[0][0];
|
||||
expect(loggedMessage).toContain("[INFO]");
|
||||
});
|
||||
|
||||
test("should disable colors when colors option is false", () => {
|
||||
const customLogger = new Logger({ colors: false });
|
||||
customLogger.info("Test message");
|
||||
const loggedMessage = consoleSpy.mock.calls[0][0];
|
||||
// Should not contain ANSI color codes
|
||||
expect(loggedMessage).not.toMatch(/\u001b\[/);
|
||||
});
|
||||
|
||||
test("should enable colors by default", () => {
|
||||
logger.info("Test message");
|
||||
const loggedMessage = consoleSpy.mock.calls[0][0];
|
||||
// Should contain ANSI color codes
|
||||
expect(loggedMessage).toMatch(/\u001b\[/);
|
||||
});
|
||||
|
||||
test("should handle multiple level configurations", () => {
|
||||
const customLogger = new Logger({
|
||||
debug: false,
|
||||
info: true,
|
||||
warning: false,
|
||||
error: true,
|
||||
success: false,
|
||||
});
|
||||
|
||||
customLogger.debug("Should not log");
|
||||
customLogger.info("Should log");
|
||||
customLogger.warning("Should not log");
|
||||
customLogger.error("Should log");
|
||||
customLogger.success("Should not log");
|
||||
|
||||
expect(consoleSpy).toHaveBeenCalledTimes(1);
|
||||
expect(consoleErrorSpy).toHaveBeenCalledTimes(1);
|
||||
expect(consoleWarnSpy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("should handle empty or undefined messages", () => {
|
||||
expect(() => {
|
||||
logger.info("");
|
||||
logger.info(undefined);
|
||||
logger.info(null);
|
||||
}).not.toThrow();
|
||||
});
|
||||
|
||||
test("should handle complex message objects", () => {
|
||||
const testObj = { key: "value", nested: { data: "test" } };
|
||||
expect(() => {
|
||||
logger.info(testObj);
|
||||
}).not.toThrow();
|
||||
});
|
||||
});
|
||||
94
core-parser/auth-manager.js
Normal file
94
core-parser/auth-manager.js
Normal file
@ -0,0 +1,94 @@
|
||||
/**
|
||||
* Authentication Manager
|
||||
*
|
||||
* Handles login/authentication for different sites
|
||||
*/
|
||||
|
||||
class AuthManager {
|
||||
constructor(coreParser) {
|
||||
this.coreParser = coreParser;
|
||||
}
|
||||
|
||||
/**
|
||||
* Authenticate to a specific site
|
||||
*/
|
||||
async authenticate(site, credentials, pageId = "default") {
|
||||
const strategies = {
|
||||
linkedin: this.authenticateLinkedIn.bind(this),
|
||||
// Add more auth strategies as needed
|
||||
};
|
||||
|
||||
const strategy = strategies[site.toLowerCase()];
|
||||
if (!strategy) {
|
||||
throw new Error(`No authentication strategy found for site: ${site}`);
|
||||
}
|
||||
|
||||
return await strategy(credentials, pageId);
|
||||
}
|
||||
|
||||
/**
|
||||
* LinkedIn authentication strategy
|
||||
*/
|
||||
async authenticateLinkedIn(credentials, pageId = "default") {
|
||||
const { username, password } = credentials;
|
||||
if (!username || !password) {
|
||||
throw new Error("LinkedIn authentication requires username and password");
|
||||
}
|
||||
|
||||
const page = this.coreParser.getPage(pageId);
|
||||
if (!page) {
|
||||
throw new Error(`Page with ID '${pageId}' not found`);
|
||||
}
|
||||
|
||||
try {
|
||||
// Navigate to LinkedIn login
|
||||
await this.coreParser.navigateTo("https://www.linkedin.com/login", {
|
||||
pageId,
|
||||
});
|
||||
|
||||
// Fill credentials
|
||||
await page.fill('input[name="session_key"]', username);
|
||||
await page.fill('input[name="session_password"]', password);
|
||||
|
||||
// Submit form
|
||||
await page.click('button[type="submit"]');
|
||||
|
||||
// Wait for successful login (profile image appears)
|
||||
await page.waitForSelector("img.global-nav__me-photo", {
|
||||
timeout: 15000,
|
||||
});
|
||||
|
||||
return true;
|
||||
} catch (error) {
|
||||
throw new Error(`LinkedIn authentication failed: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if currently authenticated to a site
|
||||
*/
|
||||
async isAuthenticated(site, pageId = "default") {
|
||||
const page = this.coreParser.getPage(pageId);
|
||||
if (!page) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const checkers = {
|
||||
linkedin: async () => {
|
||||
try {
|
||||
await page.waitForSelector("img.global-nav__me-photo", {
|
||||
timeout: 2000,
|
||||
});
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
const checker = checkers[site.toLowerCase()];
|
||||
return checker ? await checker() : false;
|
||||
}
|
||||
}
|
||||
|
||||
module.exports = AuthManager;
|
||||
64
core-parser/index.js
Normal file
64
core-parser/index.js
Normal file
@ -0,0 +1,64 @@
|
||||
const playwright = require('playwright');
|
||||
const AuthManager = require('./auth-manager');
|
||||
const NavigationManager = require('./navigation');
|
||||
|
||||
class CoreParser {
|
||||
constructor(config = {}) {
|
||||
this.config = {
|
||||
headless: true,
|
||||
timeout: 60000, // Increased default timeout
|
||||
...config
|
||||
};
|
||||
this.browser = null;
|
||||
this.context = null;
|
||||
this.pages = {};
|
||||
this.authManager = new AuthManager(this);
|
||||
this.navigationManager = new NavigationManager(this);
|
||||
}
|
||||
|
||||
async init() {
|
||||
this.browser = await playwright.chromium.launch({
|
||||
headless: this.config.headless
|
||||
});
|
||||
this.context = await this.browser.newContext();
|
||||
}
|
||||
|
||||
async createPage(id) {
|
||||
if (!this.browser) await this.init();
|
||||
const page = await this.context.newPage();
|
||||
this.pages[id] = page;
|
||||
return page;
|
||||
}
|
||||
|
||||
getPage(id) {
|
||||
return this.pages[id];
|
||||
}
|
||||
|
||||
async authenticate(site, credentials, pageId) {
|
||||
return this.authManager.authenticate(site, credentials, pageId);
|
||||
}
|
||||
|
||||
async navigateTo(url, options = {}) {
|
||||
const {
|
||||
pageId = "default",
|
||||
waitUntil = "networkidle", // Changed default to networkidle
|
||||
retries = 1,
|
||||
retryDelay = 2000,
|
||||
timeout = this.config.timeout,
|
||||
} = options;
|
||||
|
||||
return this.navigationManager.navigateTo(url, options);
|
||||
}
|
||||
|
||||
async cleanup() {
|
||||
if (this.browser) {
|
||||
await this.browser.close();
|
||||
this.browser = null;
|
||||
this.context = null;
|
||||
this.pages = {};
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
module.exports = CoreParser;
|
||||
|
||||
131
core-parser/navigation.js
Normal file
131
core-parser/navigation.js
Normal file
@ -0,0 +1,131 @@
|
||||
/**
|
||||
* Navigation Manager
|
||||
*
|
||||
* Handles page navigation with error handling, retries, and logging
|
||||
*/
|
||||
|
||||
class NavigationManager {
|
||||
constructor(coreParser) {
|
||||
this.coreParser = coreParser;
|
||||
}
|
||||
|
||||
/**
|
||||
* Navigate to URL with comprehensive error handling
|
||||
*/
|
||||
async navigateTo(url, options = {}) {
|
||||
const {
|
||||
pageId = "default",
|
||||
waitUntil = "domcontentloaded",
|
||||
retries = 1,
|
||||
retryDelay = 2000,
|
||||
timeout = this.coreParser.config.timeout,
|
||||
} = options;
|
||||
|
||||
const page = this.coreParser.getPage(pageId);
|
||||
if (!page) {
|
||||
throw new Error(`Page with ID '${pageId}' not found`);
|
||||
}
|
||||
|
||||
let lastError;
|
||||
|
||||
for (let attempt = 0; attempt <= retries; attempt++) {
|
||||
try {
|
||||
console.log(
|
||||
`🌐 Navigating to: ${url} (attempt ${attempt + 1}/${retries + 1})`
|
||||
);
|
||||
|
||||
await page.goto(url, {
|
||||
waitUntil,
|
||||
timeout,
|
||||
});
|
||||
|
||||
console.log(`✅ Navigation successful: ${url}`);
|
||||
return true;
|
||||
} catch (error) {
|
||||
lastError = error;
|
||||
console.warn(
|
||||
`⚠️ Navigation attempt ${attempt + 1} failed: ${error.message}`
|
||||
);
|
||||
|
||||
if (attempt < retries) {
|
||||
console.log(`🔄 Retrying in ${retryDelay}ms...`);
|
||||
await this.delay(retryDelay);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// All attempts failed
|
||||
const errorMessage = `Navigation failed after ${retries + 1} attempts: ${
|
||||
lastError.message
|
||||
}`;
|
||||
console.error(`❌ ${errorMessage}`);
|
||||
throw new Error(errorMessage);
|
||||
}
|
||||
|
||||
/**
|
||||
* Navigate and wait for specific selector
|
||||
*/
|
||||
async navigateAndWaitFor(url, selector, options = {}) {
|
||||
await this.navigateTo(url, options);
|
||||
|
||||
const { pageId = "default", timeout = this.coreParser.config.timeout } =
|
||||
options;
|
||||
const page = this.coreParser.getPage(pageId);
|
||||
|
||||
try {
|
||||
await page.waitForSelector(selector, { timeout });
|
||||
console.log(`✅ Selector found: ${selector}`);
|
||||
return true;
|
||||
} catch (error) {
|
||||
console.warn(`⚠️ Selector not found: ${selector} - ${error.message}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if current page has specific content
|
||||
*/
|
||||
async hasContent(content, options = {}) {
|
||||
const { pageId = "default", timeout = 5000 } = options;
|
||||
const page = this.coreParser.getPage(pageId);
|
||||
|
||||
try {
|
||||
await page.waitForFunction(
|
||||
(text) => document.body.innerText.includes(text),
|
||||
content,
|
||||
{ timeout }
|
||||
);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Utility delay function
|
||||
*/
|
||||
async delay(ms) {
|
||||
return new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current page URL
|
||||
*/
|
||||
getCurrentUrl(pageId = "default") {
|
||||
const page = this.coreParser.getPage(pageId);
|
||||
return page ? page.url() : null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Take screenshot for debugging
|
||||
*/
|
||||
async screenshot(filepath, pageId = "default") {
|
||||
const page = this.coreParser.getPage(pageId);
|
||||
if (page) {
|
||||
await page.screenshot({ path: filepath });
|
||||
console.log(`📸 Screenshot saved: ${filepath}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
module.exports = NavigationManager;
|
||||
9
core-parser/package.json
Normal file
9
core-parser/package.json
Normal file
@ -0,0 +1,9 @@
|
||||
{
|
||||
"name": "core-parser",
|
||||
"version": "1.0.0",
|
||||
"main": "index.js",
|
||||
"description": "Core parser utilities for browser management",
|
||||
"dependencies": {
|
||||
"playwright": "^1.40.0"
|
||||
}
|
||||
}
|
||||
497
job-search-parser/README.md
Normal file
497
job-search-parser/README.md
Normal file
@ -0,0 +1,497 @@
|
||||
# Job Search Parser - Job Market Intelligence
|
||||
|
||||
Specialized parser for job market intelligence, tracking job postings, market trends, and competitive analysis. Focuses on tech roles and industry insights.
|
||||
|
||||
## 🎯 Purpose
|
||||
|
||||
The Job Search Parser is designed to:
|
||||
|
||||
- **Track Job Market Trends**: Monitor demand for specific roles and skills
|
||||
- **Competitive Intelligence**: Analyze salary ranges and requirements
|
||||
- **Industry Insights**: Track hiring patterns across different sectors
|
||||
- **Skill Gap Analysis**: Identify in-demand technologies and frameworks
|
||||
- **Market Demand Forecasting**: Predict job market trends
|
||||
|
||||
## 🚀 Features
|
||||
|
||||
### Core Functionality
|
||||
|
||||
- **Multi-Source Aggregation**: Collect job data from multiple platforms
|
||||
- **Role-Specific Tracking**: Focus on tech roles and emerging positions
|
||||
- **Skill Analysis**: Extract and categorize required skills
|
||||
- **Salary Intelligence**: Track compensation ranges and trends
|
||||
- **Company Intelligence**: Monitor hiring companies and patterns
|
||||
|
||||
### Advanced Features
|
||||
|
||||
- **Market Trend Analysis**: Identify growing and declining job categories
|
||||
- **Geographic Distribution**: Track job distribution by location
|
||||
- **Experience Level Analysis**: Entry, mid, senior level tracking
|
||||
- **Remote Work Trends**: Monitor remote/hybrid work patterns
|
||||
- **Technology Stack Tracking**: Framework and tool popularity
|
||||
|
||||
## 🌐 Supported Job Sites
|
||||
|
||||
### ✅ Implemented Parsers
|
||||
|
||||
#### SkipTheDrive Parser
|
||||
|
||||
Remote job board specializing in work-from-home positions.
|
||||
|
||||
**Features:**
|
||||
|
||||
- Keyword-based job search with relevance sorting
|
||||
- Job type filtering (full-time, part-time, contract)
|
||||
- Multi-page result parsing with pagination
|
||||
- Featured/sponsored job identification
|
||||
- AI-powered job relevance analysis
|
||||
- Automatic duplicate detection
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# Parse SkipTheDrive for QA automation jobs
|
||||
node index.js --sites=skipthedrive --keywords="automation qa,qa engineer"
|
||||
|
||||
# Filter by job type
|
||||
JOB_TYPES="full time,contract" node index.js --sites=skipthedrive
|
||||
|
||||
# Run demo with limited results
|
||||
node index.js --sites=skipthedrive --demo
|
||||
```
|
||||
|
||||
### 🚧 Planned Parsers
|
||||
|
||||
- **Indeed**: Comprehensive job aggregator
|
||||
- **Glassdoor**: Jobs with company reviews and salary data
|
||||
- **Monster**: Traditional job board
|
||||
- **SimplyHired**: Job aggregator with salary estimates
|
||||
- **LinkedIn Jobs**: Professional network job postings
|
||||
- **AngelList**: Startup and tech jobs
|
||||
- **Remote.co**: Dedicated remote work jobs
|
||||
- **FlexJobs**: Flexible and remote positions
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Run tests
|
||||
npm test
|
||||
|
||||
# Run demo
|
||||
node demo.js
|
||||
```
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file in the parser directory:
|
||||
|
||||
```env
|
||||
# Job Search Configuration
|
||||
SEARCH_SOURCES=linkedin,indeed,glassdoor
|
||||
TARGET_ROLES=software engineer,data scientist,product manager
|
||||
LOCATION_FILTER=Toronto,Vancouver,Calgary
|
||||
EXPERIENCE_LEVELS=entry,mid,senior
|
||||
REMOTE_PREFERENCE=remote,hybrid,onsite
|
||||
|
||||
# Analysis Configuration
|
||||
ENABLE_SALARY_ANALYSIS=true
|
||||
ENABLE_SKILL_ANALYSIS=true
|
||||
ENABLE_TREND_ANALYSIS=true
|
||||
MIN_SALARY=50000
|
||||
MAX_SALARY=200000
|
||||
|
||||
# Output Configuration
|
||||
OUTPUT_FORMAT=json,csv
|
||||
SAVE_RAW_DATA=true
|
||||
ANALYSIS_INTERVAL=daily
|
||||
```
|
||||
|
||||
### Command Line Options
|
||||
|
||||
```bash
|
||||
# Basic usage
|
||||
node index.js
|
||||
|
||||
# Specific roles
|
||||
node index.js --roles="frontend developer,backend developer"
|
||||
|
||||
# Geographic focus
|
||||
node index.js --locations="Toronto,Vancouver"
|
||||
|
||||
# Experience level
|
||||
node index.js --experience="senior"
|
||||
|
||||
# Output format
|
||||
node index.js --output=results/job-market-analysis.json
|
||||
```
|
||||
|
||||
**Available Options:**
|
||||
|
||||
- `--roles="role1,role2"`: Target job roles
|
||||
- `--locations="city1,city2"`: Geographic focus
|
||||
- `--experience="entry|mid|senior"`: Experience level
|
||||
- `--remote="remote|hybrid|onsite"`: Remote work preference
|
||||
- `--salary-min=NUMBER`: Minimum salary filter
|
||||
- `--salary-max=NUMBER`: Maximum salary filter
|
||||
- `--output=FILE`: Output filename
|
||||
- `--format=json|csv`: Output format
|
||||
- `--trends`: Enable trend analysis
|
||||
- `--skills`: Enable skill analysis
|
||||
|
||||
## 📊 Keywords
|
||||
|
||||
### Role-Specific Keywords
|
||||
|
||||
Place keyword CSV files in the `keywords/` directory:
|
||||
|
||||
```
|
||||
job-search-parser/
|
||||
├── keywords/
|
||||
│ ├── job-search-keywords.csv # General job search terms
|
||||
│ ├── tech-roles.csv # Technology roles
|
||||
│ ├── data-roles.csv # Data science roles
|
||||
│ ├── management-roles.csv # Management positions
|
||||
│ └── emerging-roles.csv # Emerging job categories
|
||||
└── index.js
|
||||
```
|
||||
|
||||
### Tech Roles Keywords
|
||||
|
||||
```csv
|
||||
keyword
|
||||
software engineer
|
||||
frontend developer
|
||||
backend developer
|
||||
full stack developer
|
||||
data scientist
|
||||
machine learning engineer
|
||||
devops engineer
|
||||
site reliability engineer
|
||||
cloud architect
|
||||
security engineer
|
||||
mobile developer
|
||||
iOS developer
|
||||
Android developer
|
||||
react developer
|
||||
vue developer
|
||||
angular developer
|
||||
node.js developer
|
||||
python developer
|
||||
java developer
|
||||
golang developer
|
||||
rust developer
|
||||
data engineer
|
||||
analytics engineer
|
||||
```
|
||||
|
||||
### Data Science Keywords
|
||||
|
||||
```csv
|
||||
keyword
|
||||
data scientist
|
||||
machine learning engineer
|
||||
data analyst
|
||||
business analyst
|
||||
data engineer
|
||||
analytics engineer
|
||||
ML engineer
|
||||
AI engineer
|
||||
statistician
|
||||
quantitative analyst
|
||||
research scientist
|
||||
data architect
|
||||
BI developer
|
||||
ETL developer
|
||||
```
|
||||
|
||||
## 📈 Usage Examples
|
||||
|
||||
### Basic Job Search
|
||||
|
||||
```bash
|
||||
# Standard job market analysis
|
||||
node index.js
|
||||
|
||||
# Specific tech roles
|
||||
node index.js --roles="software engineer,data scientist"
|
||||
|
||||
# Geographic focus
|
||||
node index.js --locations="Toronto,Vancouver,Calgary"
|
||||
```
|
||||
|
||||
### Advanced Analysis
|
||||
|
||||
```bash
|
||||
# Senior level positions
|
||||
node index.js --experience="senior" --salary-min=100000
|
||||
|
||||
# Remote work opportunities
|
||||
node index.js --remote="remote" --roles="frontend developer"
|
||||
|
||||
# Trend analysis
|
||||
node index.js --trends --skills --output=results/trends.json
|
||||
```
|
||||
|
||||
### Market Intelligence
|
||||
|
||||
```bash
|
||||
# Salary analysis
|
||||
node index.js --salary-min=80000 --salary-max=150000
|
||||
|
||||
# Skill gap analysis
|
||||
node index.js --skills --roles="machine learning engineer"
|
||||
|
||||
# Competitive intelligence
|
||||
node index.js --companies="Google,Microsoft,Amazon"
|
||||
```
|
||||
|
||||
## 📊 Output Format
|
||||
|
||||
### JSON Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"timestamp": "2024-01-15T10:30:00Z",
|
||||
"search_parameters": {
|
||||
"roles": ["software engineer", "data scientist"],
|
||||
"locations": ["Toronto", "Vancouver"],
|
||||
"experience_levels": ["mid", "senior"],
|
||||
"remote_preference": ["remote", "hybrid"]
|
||||
},
|
||||
"total_jobs_found": 1250,
|
||||
"analysis_duration_seconds": 45
|
||||
},
|
||||
"market_overview": {
|
||||
"total_jobs": 1250,
|
||||
"average_salary": 95000,
|
||||
"salary_range": {
|
||||
"min": 65000,
|
||||
"max": 180000,
|
||||
"median": 92000
|
||||
},
|
||||
"remote_distribution": {
|
||||
"remote": 45,
|
||||
"hybrid": 35,
|
||||
"onsite": 20
|
||||
},
|
||||
"experience_distribution": {
|
||||
"entry": 15,
|
||||
"mid": 45,
|
||||
"senior": 40
|
||||
}
|
||||
},
|
||||
"trends": {
|
||||
"growing_skills": [
|
||||
{ "skill": "React", "growth_rate": 25 },
|
||||
{ "skill": "Python", "growth_rate": 18 },
|
||||
{ "skill": "AWS", "growth_rate": 22 }
|
||||
],
|
||||
"declining_skills": [
|
||||
{ "skill": "jQuery", "growth_rate": -12 },
|
||||
{ "skill": "PHP", "growth_rate": -8 }
|
||||
],
|
||||
"emerging_roles": ["AI Engineer", "DevSecOps Engineer", "Data Engineer"]
|
||||
},
|
||||
"jobs": [
|
||||
{
|
||||
"id": "job_1",
|
||||
"title": "Senior Software Engineer",
|
||||
"company": "TechCorp",
|
||||
"location": "Toronto, Ontario",
|
||||
"remote_type": "hybrid",
|
||||
"salary": {
|
||||
"min": 100000,
|
||||
"max": 140000,
|
||||
"currency": "CAD"
|
||||
},
|
||||
"required_skills": ["React", "Node.js", "TypeScript", "AWS"],
|
||||
"preferred_skills": ["GraphQL", "Docker", "Kubernetes"],
|
||||
"experience_level": "senior",
|
||||
"job_url": "https://example.com/job/1",
|
||||
"posted_date": "2024-01-10T09:00:00Z",
|
||||
"scraped_at": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
],
|
||||
"analysis": {
|
||||
"skill_demand": {
|
||||
"React": { "count": 45, "avg_salary": 98000 },
|
||||
"Python": { "count": 38, "avg_salary": 102000 },
|
||||
"AWS": { "count": 32, "avg_salary": 105000 }
|
||||
},
|
||||
"company_insights": {
|
||||
"top_hirers": [
|
||||
{ "company": "TechCorp", "jobs": 25 },
|
||||
{ "company": "StartupXYZ", "jobs": 18 }
|
||||
],
|
||||
"salary_leaders": [
|
||||
{ "company": "BigTech", "avg_salary": 120000 },
|
||||
{ "company": "FinTech", "avg_salary": 115000 }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### CSV Output
|
||||
|
||||
The parser can also generate CSV files for easy analysis:
|
||||
|
||||
```csv
|
||||
job_id,title,company,location,remote_type,salary_min,salary_max,required_skills,experience_level,posted_date
|
||||
job_1,Senior Software Engineer,TechCorp,Toronto,hybrid,100000,140000,"React,Node.js,TypeScript",senior,2024-01-10
|
||||
job_2,Data Scientist,DataCorp,Vancouver,remote,90000,130000,"Python,SQL,ML",mid,2024-01-09
|
||||
```
|
||||
|
||||
## 🔒 Security & Best Practices
|
||||
|
||||
### Data Privacy
|
||||
|
||||
- Respect job site terms of service
|
||||
- Implement appropriate rate limiting
|
||||
- Store data securely and responsibly
|
||||
- Anonymize sensitive information
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
- Implement delays between requests
|
||||
- Respect API rate limits
|
||||
- Use multiple data sources
|
||||
- Monitor for blocking/detection
|
||||
|
||||
### Legal Compliance
|
||||
|
||||
- Educational and research purposes only
|
||||
- Respect website terms of service
|
||||
- Implement data retention policies
|
||||
- Monitor for legal changes
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
### Run Tests
|
||||
|
||||
```bash
|
||||
# All tests
|
||||
npm test
|
||||
|
||||
# Specific test suites
|
||||
npm test -- --testNamePattern="JobSearch"
|
||||
npm test -- --testNamePattern="Analysis"
|
||||
npm test -- --testNamePattern="Trends"
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
```bash
|
||||
npm run test:coverage
|
||||
```
|
||||
|
||||
## 🚀 Performance Optimization
|
||||
|
||||
### Recommended Settings
|
||||
|
||||
#### Fast Analysis
|
||||
|
||||
```bash
|
||||
node index.js --roles="software engineer" --locations="Toronto"
|
||||
```
|
||||
|
||||
#### Comprehensive Analysis
|
||||
|
||||
```bash
|
||||
node index.js --trends --skills --experience="all"
|
||||
```
|
||||
|
||||
#### Focused Intelligence
|
||||
|
||||
```bash
|
||||
node index.js --salary-min=80000 --remote="remote" --trends
|
||||
```
|
||||
|
||||
### Performance Tips
|
||||
|
||||
- Use specific role filters to reduce data volume
|
||||
- Implement caching for repeated searches
|
||||
- Use parallel processing for multiple sources
|
||||
- Optimize data storage and retrieval
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Rate Limiting
|
||||
|
||||
```bash
|
||||
# Reduce request frequency
|
||||
export REQUEST_DELAY=2000
|
||||
node index.js
|
||||
```
|
||||
|
||||
#### Data Source Issues
|
||||
|
||||
```bash
|
||||
# Use specific sources
|
||||
node index.js --sources="linkedin,indeed"
|
||||
|
||||
# Check source availability
|
||||
node index.js --test-sources
|
||||
```
|
||||
|
||||
#### Output Issues
|
||||
|
||||
```bash
|
||||
# Check output directory
|
||||
mkdir -p results
|
||||
node index.js --output=results/analysis.json
|
||||
|
||||
# Verify file permissions
|
||||
chmod 755 results/
|
||||
```
|
||||
|
||||
## 📈 Monitoring & Analytics
|
||||
|
||||
### Key Metrics
|
||||
|
||||
- **Job Volume**: Total jobs found per search
|
||||
- **Salary Trends**: Average and median salary changes
|
||||
- **Skill Demand**: Most requested skills
|
||||
- **Remote Adoption**: Remote work trend analysis
|
||||
- **Market Velocity**: Job posting frequency
|
||||
|
||||
### Dashboard Integration
|
||||
|
||||
- Real-time market monitoring
|
||||
- Trend visualization
|
||||
- Salary benchmarking
|
||||
- Skill gap analysis
|
||||
- Competitive intelligence
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
### Development Setup
|
||||
|
||||
1. Fork the repository
|
||||
2. Create feature branch
|
||||
3. Add tests for new functionality
|
||||
4. Ensure all tests pass
|
||||
5. Submit pull request
|
||||
|
||||
### Code Standards
|
||||
|
||||
- Follow existing code style
|
||||
- Add JSDoc comments
|
||||
- Maintain test coverage
|
||||
- Update documentation
|
||||
|
||||
## 📄 License
|
||||
|
||||
This parser is part of the LinkedOut platform and follows the same licensing terms.
|
||||
|
||||
---
|
||||
|
||||
**Note**: This tool is designed for educational and research purposes. Always respect website terms of service and implement appropriate rate limiting and ethical usage practices.
|
||||
543
job-search-parser/demo.js
Normal file
543
job-search-parser/demo.js
Normal file
@ -0,0 +1,543 @@
|
||||
/**
|
||||
* Job Search Parser Demo
|
||||
*
|
||||
* Demonstrates the Job Search Parser's capabilities for job market intelligence,
|
||||
* trend analysis, and competitive insights.
|
||||
*
|
||||
* This demo uses simulated data for demonstration purposes.
|
||||
*/
|
||||
|
||||
const { logger } = require("../ai-analyzer");
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
|
||||
// Terminal colors for demo output
|
||||
const colors = {
|
||||
reset: "\x1b[0m",
|
||||
bright: "\x1b[1m",
|
||||
cyan: "\x1b[36m",
|
||||
green: "\x1b[32m",
|
||||
yellow: "\x1b[33m",
|
||||
blue: "\x1b[34m",
|
||||
magenta: "\x1b[35m",
|
||||
red: "\x1b[31m",
|
||||
};
|
||||
|
||||
const demo = {
|
||||
title: (text) =>
|
||||
console.log(`\n${colors.bright}${colors.cyan}${text}${colors.reset}`),
|
||||
section: (text) =>
|
||||
console.log(`\n${colors.bright}${colors.magenta}${text}${colors.reset}`),
|
||||
success: (text) => console.log(`${colors.green}✅ ${text}${colors.reset}`),
|
||||
info: (text) => console.log(`${colors.blue}ℹ️ ${text}${colors.reset}`),
|
||||
warning: (text) => console.log(`${colors.yellow}⚠️ ${text}${colors.reset}`),
|
||||
error: (text) => console.log(`${colors.red}❌ ${text}${colors.reset}`),
|
||||
code: (text) => console.log(`${colors.cyan}${text}${colors.reset}`),
|
||||
};
|
||||
|
||||
// Mock job data for demonstration
|
||||
const mockJobs = [
|
||||
{
|
||||
id: "job_1",
|
||||
title: "Senior Software Engineer",
|
||||
company: "TechCorp",
|
||||
location: "Toronto, Ontario",
|
||||
remote_type: "hybrid",
|
||||
salary: { min: 100000, max: 140000, currency: "CAD" },
|
||||
required_skills: ["React", "Node.js", "TypeScript", "AWS"],
|
||||
preferred_skills: ["GraphQL", "Docker", "Kubernetes"],
|
||||
experience_level: "senior",
|
||||
job_url: "https://example.com/job/1",
|
||||
posted_date: "2024-01-10T09:00:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
},
|
||||
{
|
||||
id: "job_2",
|
||||
title: "Data Scientist",
|
||||
company: "DataCorp",
|
||||
location: "Vancouver, British Columbia",
|
||||
remote_type: "remote",
|
||||
salary: { min: 90000, max: 130000, currency: "CAD" },
|
||||
required_skills: ["Python", "SQL", "Machine Learning", "Statistics"],
|
||||
preferred_skills: ["TensorFlow", "PyTorch", "AWS"],
|
||||
experience_level: "mid",
|
||||
job_url: "https://example.com/job/2",
|
||||
posted_date: "2024-01-09T14:30:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
},
|
||||
{
|
||||
id: "job_3",
|
||||
title: "Frontend Developer",
|
||||
company: "StartupXYZ",
|
||||
location: "Calgary, Alberta",
|
||||
remote_type: "onsite",
|
||||
salary: { min: 70000, max: 95000, currency: "CAD" },
|
||||
required_skills: ["React", "JavaScript", "CSS", "HTML"],
|
||||
preferred_skills: ["Vue.js", "TypeScript", "Webpack"],
|
||||
experience_level: "entry",
|
||||
job_url: "https://example.com/job/3",
|
||||
posted_date: "2024-01-08T11:15:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
},
|
||||
{
|
||||
id: "job_4",
|
||||
title: "DevOps Engineer",
|
||||
company: "CloudTech",
|
||||
location: "Toronto, Ontario",
|
||||
remote_type: "hybrid",
|
||||
salary: { min: 95000, max: 125000, currency: "CAD" },
|
||||
required_skills: ["Docker", "Kubernetes", "AWS", "Linux"],
|
||||
preferred_skills: ["Terraform", "Jenkins", "Prometheus"],
|
||||
experience_level: "senior",
|
||||
job_url: "https://example.com/job/4",
|
||||
posted_date: "2024-01-07T16:45:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
},
|
||||
{
|
||||
id: "job_5",
|
||||
title: "Machine Learning Engineer",
|
||||
company: "AI Solutions",
|
||||
location: "Vancouver, British Columbia",
|
||||
remote_type: "remote",
|
||||
salary: { min: 110000, max: 150000, currency: "CAD" },
|
||||
required_skills: ["Python", "TensorFlow", "PyTorch", "ML"],
|
||||
preferred_skills: ["AWS", "Docker", "Kubernetes", "Spark"],
|
||||
experience_level: "senior",
|
||||
job_url: "https://example.com/job/5",
|
||||
posted_date: "2024-01-06T10:20:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
},
|
||||
];
|
||||
|
||||
async function runDemo() {
|
||||
demo.title("=== Job Search Parser Demo ===");
|
||||
demo.info(
|
||||
"This demo showcases the Job Search Parser's capabilities for job market intelligence."
|
||||
);
|
||||
demo.info("All data shown is simulated for demonstration purposes.");
|
||||
demo.info("Press Enter to continue through each section...\n");
|
||||
|
||||
await waitForEnter();
|
||||
|
||||
// 1. Configuration Demo
|
||||
await demonstrateConfiguration();
|
||||
|
||||
// 2. Job Search Process Demo
|
||||
await demonstrateJobSearch();
|
||||
|
||||
// 3. Market Analysis Demo
|
||||
await demonstrateMarketAnalysis();
|
||||
|
||||
// 4. Trend Analysis Demo
|
||||
await demonstrateTrendAnalysis();
|
||||
|
||||
// 5. Skill Analysis Demo
|
||||
await demonstrateSkillAnalysis();
|
||||
|
||||
// 6. Competitive Intelligence Demo
|
||||
await demonstrateCompetitiveIntelligence();
|
||||
|
||||
// 7. Output Generation Demo
|
||||
await demonstrateOutputGeneration();
|
||||
|
||||
demo.title("=== Demo Complete ===");
|
||||
demo.success("Job Search Parser demo completed successfully!");
|
||||
demo.info("Check the README.md for detailed usage instructions.");
|
||||
}
|
||||
|
||||
async function demonstrateConfiguration() {
|
||||
demo.section("1. Configuration Setup");
|
||||
demo.info(
|
||||
"The Job Search Parser uses environment variables and command-line options for configuration."
|
||||
);
|
||||
|
||||
demo.code("// Environment Variables (.env file)");
|
||||
demo.info("SEARCH_SOURCES=linkedin,indeed,glassdoor");
|
||||
demo.info("TARGET_ROLES=software engineer,data scientist,product manager");
|
||||
demo.info("LOCATION_FILTER=Toronto,Vancouver,Calgary");
|
||||
demo.info("EXPERIENCE_LEVELS=entry,mid,senior");
|
||||
demo.info("REMOTE_PREFERENCE=remote,hybrid,onsite");
|
||||
demo.info("ENABLE_SALARY_ANALYSIS=true");
|
||||
demo.info("ENABLE_SKILL_ANALYSIS=true");
|
||||
demo.info("ENABLE_TREND_ANALYSIS=true");
|
||||
|
||||
demo.code("// Command Line Options");
|
||||
demo.info('node index.js --roles="frontend developer,backend developer"');
|
||||
demo.info('node index.js --locations="Toronto,Vancouver"');
|
||||
demo.info('node index.js --experience="senior" --salary-min=100000');
|
||||
demo.info('node index.js --remote="remote" --trends --skills');
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateJobSearch() {
|
||||
demo.section("2. Job Search Process");
|
||||
demo.info(
|
||||
"The parser searches multiple job platforms for relevant positions."
|
||||
);
|
||||
|
||||
const searchSources = ["LinkedIn", "Indeed", "Glassdoor"];
|
||||
const targetRoles = [
|
||||
"Software Engineer",
|
||||
"Data Scientist",
|
||||
"Frontend Developer",
|
||||
];
|
||||
|
||||
demo.code("// Multi-source job search");
|
||||
for (const source of searchSources) {
|
||||
logger.search(`Searching ${source} for job postings...`);
|
||||
await simulateSearch();
|
||||
|
||||
const jobsFound = Math.floor(Math.random() * 200) + 50;
|
||||
logger.success(`Found ${jobsFound} jobs on ${source}`);
|
||||
}
|
||||
|
||||
demo.code("// Role-specific filtering");
|
||||
for (const role of targetRoles) {
|
||||
logger.info(`Filtering for ${role} positions...`);
|
||||
await simulateProcessing();
|
||||
|
||||
const roleJobs = Math.floor(Math.random() * 30) + 10;
|
||||
logger.success(`Found ${roleJobs} ${role} positions`);
|
||||
}
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateMarketAnalysis() {
|
||||
demo.section("3. Market Analysis");
|
||||
demo.info(
|
||||
"The parser analyzes market trends, salary ranges, and job distribution."
|
||||
);
|
||||
|
||||
demo.code("// Market overview analysis");
|
||||
logger.info("Analyzing market overview...");
|
||||
await simulateProcessing();
|
||||
|
||||
const marketOverview = {
|
||||
total_jobs: 1250,
|
||||
average_salary: 95000,
|
||||
salary_range: { min: 65000, max: 180000, median: 92000 },
|
||||
remote_distribution: { remote: 45, hybrid: 35, onsite: 20 },
|
||||
experience_distribution: { entry: 15, mid: 45, senior: 40 },
|
||||
};
|
||||
|
||||
demo.success(`Total jobs found: ${marketOverview.total_jobs}`);
|
||||
demo.info(
|
||||
`Average salary: $${marketOverview.average_salary.toLocaleString()}`
|
||||
);
|
||||
demo.info(
|
||||
`Salary range: $${marketOverview.salary_range.min.toLocaleString()} - $${marketOverview.salary_range.max.toLocaleString()}`
|
||||
);
|
||||
demo.info(
|
||||
`Remote work: ${marketOverview.remote_distribution.remote}% remote, ${marketOverview.remote_distribution.hybrid}% hybrid`
|
||||
);
|
||||
|
||||
demo.code("// Geographic distribution");
|
||||
const locations = {
|
||||
Toronto: 45,
|
||||
Vancouver: 30,
|
||||
Calgary: 15,
|
||||
Other: 10,
|
||||
};
|
||||
|
||||
Object.entries(locations).forEach(([location, percentage]) => {
|
||||
demo.info(`${location}: ${percentage}% of jobs`);
|
||||
});
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateTrendAnalysis() {
|
||||
demo.section("4. Trend Analysis");
|
||||
demo.info(
|
||||
"The parser identifies growing and declining skills and emerging roles."
|
||||
);
|
||||
|
||||
demo.code("// Skill trend analysis");
|
||||
logger.info("Analyzing skill trends...");
|
||||
await simulateProcessing();
|
||||
|
||||
const growingSkills = [
|
||||
{ skill: "React", growth_rate: 25 },
|
||||
{ skill: "Python", growth_rate: 18 },
|
||||
{ skill: "AWS", growth_rate: 22 },
|
||||
{ skill: "TypeScript", growth_rate: 30 },
|
||||
{ skill: "Docker", growth_rate: 15 },
|
||||
];
|
||||
|
||||
const decliningSkills = [
|
||||
{ skill: "jQuery", growth_rate: -12 },
|
||||
{ skill: "PHP", growth_rate: -8 },
|
||||
{ skill: "Angular", growth_rate: -5 },
|
||||
];
|
||||
|
||||
demo.success("Growing skills:");
|
||||
growingSkills.forEach((skill) => {
|
||||
demo.info(` ${skill.skill}: +${skill.growth_rate}% growth`);
|
||||
});
|
||||
|
||||
demo.warning("Declining skills:");
|
||||
decliningSkills.forEach((skill) => {
|
||||
demo.info(` ${skill.skill}: ${skill.growth_rate}% decline`);
|
||||
});
|
||||
|
||||
demo.code("// Emerging roles");
|
||||
const emergingRoles = [
|
||||
"AI Engineer",
|
||||
"DevSecOps Engineer",
|
||||
"Data Engineer",
|
||||
"Cloud Architect",
|
||||
"Site Reliability Engineer",
|
||||
];
|
||||
|
||||
demo.success("Emerging roles:");
|
||||
emergingRoles.forEach((role) => {
|
||||
demo.info(` ${role}`);
|
||||
});
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateSkillAnalysis() {
|
||||
demo.section("5. Skill Analysis");
|
||||
demo.info("The parser analyzes skill demand and salary correlation.");
|
||||
|
||||
demo.code("// Skill demand analysis");
|
||||
logger.info("Analyzing skill demand...");
|
||||
await simulateProcessing();
|
||||
|
||||
const skillDemand = {
|
||||
React: { count: 45, avg_salary: 98000 },
|
||||
Python: { count: 38, avg_salary: 102000 },
|
||||
AWS: { count: 32, avg_salary: 105000 },
|
||||
TypeScript: { count: 28, avg_salary: 95000 },
|
||||
Docker: { count: 25, avg_salary: 103000 },
|
||||
"Machine Learning": { count: 22, avg_salary: 115000 },
|
||||
};
|
||||
|
||||
demo.success("Top in-demand skills:");
|
||||
Object.entries(skillDemand)
|
||||
.sort((a, b) => b[1].count - a[1].count)
|
||||
.forEach(([skill, data]) => {
|
||||
demo.info(
|
||||
` ${skill}: ${
|
||||
data.count
|
||||
} jobs, avg salary $${data.avg_salary.toLocaleString()}`
|
||||
);
|
||||
});
|
||||
|
||||
demo.code("// Salary correlation analysis");
|
||||
const salaryCorrelation = [
|
||||
{ skill: "Machine Learning", correlation: 0.85 },
|
||||
{ skill: "AWS", correlation: 0.78 },
|
||||
{ skill: "Docker", correlation: 0.72 },
|
||||
{ skill: "Python", correlation: 0.68 },
|
||||
{ skill: "React", correlation: 0.65 },
|
||||
];
|
||||
|
||||
demo.success("Skills with highest salary correlation:");
|
||||
salaryCorrelation.forEach((item) => {
|
||||
demo.info(
|
||||
` ${item.skill}: ${(item.correlation * 100).toFixed(0)}% correlation`
|
||||
);
|
||||
});
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateCompetitiveIntelligence() {
|
||||
demo.section("6. Competitive Intelligence");
|
||||
demo.info(
|
||||
"The parser provides insights into company hiring patterns and salary competitiveness."
|
||||
);
|
||||
|
||||
demo.code("// Company hiring analysis");
|
||||
logger.info("Analyzing company hiring patterns...");
|
||||
await simulateProcessing();
|
||||
|
||||
const topHirers = [
|
||||
{ company: "TechCorp", jobs: 25, avg_salary: 105000 },
|
||||
{ company: "StartupXYZ", jobs: 18, avg_salary: 95000 },
|
||||
{ company: "DataCorp", jobs: 15, avg_salary: 110000 },
|
||||
{ company: "CloudTech", jobs: 12, avg_salary: 115000 },
|
||||
{ company: "AI Solutions", jobs: 10, avg_salary: 120000 },
|
||||
];
|
||||
|
||||
demo.success("Top hiring companies:");
|
||||
topHirers.forEach((company) => {
|
||||
demo.info(
|
||||
` ${company.company}: ${
|
||||
company.jobs
|
||||
} jobs, avg salary $${company.avg_salary.toLocaleString()}`
|
||||
);
|
||||
});
|
||||
|
||||
demo.code("// Salary competitiveness");
|
||||
const salaryLeaders = [
|
||||
{ company: "BigTech", avg_salary: 120000, market_position: "leader" },
|
||||
{ company: "FinTech", avg_salary: 115000, market_position: "leader" },
|
||||
{ company: "AI Solutions", avg_salary: 120000, market_position: "leader" },
|
||||
{
|
||||
company: "StartupXYZ",
|
||||
avg_salary: 95000,
|
||||
market_position: "competitive",
|
||||
},
|
||||
{ company: "TechCorp", avg_salary: 105000, market_position: "competitive" },
|
||||
];
|
||||
|
||||
demo.success("Salary leaders:");
|
||||
salaryLeaders.forEach((company) => {
|
||||
const position = company.market_position === "leader" ? "🏆" : "📊";
|
||||
demo.info(
|
||||
` ${position} ${
|
||||
company.company
|
||||
}: $${company.avg_salary.toLocaleString()}`
|
||||
);
|
||||
});
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateOutputGeneration() {
|
||||
demo.section("7. Output Generation");
|
||||
demo.info(
|
||||
"Results are saved in multiple formats with comprehensive analysis."
|
||||
);
|
||||
|
||||
demo.code("// Generating comprehensive report");
|
||||
logger.file("Generating job market analysis report...");
|
||||
|
||||
const outputData = {
|
||||
metadata: {
|
||||
timestamp: new Date().toISOString(),
|
||||
search_parameters: {
|
||||
roles: ["software engineer", "data scientist", "frontend developer"],
|
||||
locations: ["Toronto", "Vancouver", "Calgary"],
|
||||
experience_levels: ["entry", "mid", "senior"],
|
||||
remote_preference: ["remote", "hybrid", "onsite"],
|
||||
},
|
||||
total_jobs_found: 1250,
|
||||
analysis_duration_seconds: 45,
|
||||
},
|
||||
market_overview: {
|
||||
total_jobs: 1250,
|
||||
average_salary: 95000,
|
||||
salary_range: { min: 65000, max: 180000, median: 92000 },
|
||||
remote_distribution: { remote: 45, hybrid: 35, onsite: 20 },
|
||||
experience_distribution: { entry: 15, mid: 45, senior: 40 },
|
||||
},
|
||||
trends: {
|
||||
growing_skills: [
|
||||
{ skill: "React", growth_rate: 25 },
|
||||
{ skill: "Python", growth_rate: 18 },
|
||||
{ skill: "AWS", growth_rate: 22 },
|
||||
],
|
||||
declining_skills: [
|
||||
{ skill: "jQuery", growth_rate: -12 },
|
||||
{ skill: "PHP", growth_rate: -8 },
|
||||
],
|
||||
emerging_roles: ["AI Engineer", "DevSecOps Engineer", "Data Engineer"],
|
||||
},
|
||||
jobs: mockJobs,
|
||||
analysis: {
|
||||
skill_demand: {
|
||||
React: { count: 45, avg_salary: 98000 },
|
||||
Python: { count: 38, avg_salary: 102000 },
|
||||
AWS: { count: 32, avg_salary: 105000 },
|
||||
},
|
||||
company_insights: {
|
||||
top_hirers: [
|
||||
{ company: "TechCorp", jobs: 25 },
|
||||
{ company: "StartupXYZ", jobs: 18 },
|
||||
],
|
||||
salary_leaders: [
|
||||
{ company: "BigTech", avg_salary: 120000 },
|
||||
{ company: "FinTech", avg_salary: 115000 },
|
||||
],
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
// Save to demo file
|
||||
const outputPath = path.join(__dirname, "demo-job-analysis.json");
|
||||
fs.writeFileSync(outputPath, JSON.stringify(outputData, null, 2));
|
||||
|
||||
demo.success(`Analysis report saved to: ${outputPath}`);
|
||||
demo.info(`Total jobs analyzed: ${outputData.metadata.total_jobs_found}`);
|
||||
demo.info(
|
||||
`Analysis duration: ${outputData.metadata.analysis_duration_seconds} seconds`
|
||||
);
|
||||
|
||||
demo.code("// Output formats available");
|
||||
demo.info("📁 JSON: Comprehensive analysis with metadata");
|
||||
demo.info("📊 CSV: Tabular data for spreadsheet analysis");
|
||||
demo.info("📈 Charts: Visual trend analysis");
|
||||
demo.info("📋 Summary: Executive summary report");
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
// Helper functions
|
||||
function waitForEnter() {
|
||||
return new Promise((resolve) => {
|
||||
const readline = require("readline");
|
||||
const rl = readline.createInterface({
|
||||
input: process.stdin,
|
||||
output: process.stdout,
|
||||
});
|
||||
|
||||
rl.question("\nPress Enter to continue...", () => {
|
||||
rl.close();
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
async function simulateSearch() {
|
||||
return new Promise((resolve) => {
|
||||
const steps = [
|
||||
"Connecting to source",
|
||||
"Searching jobs",
|
||||
"Filtering results",
|
||||
"Extracting data",
|
||||
];
|
||||
let i = 0;
|
||||
const interval = setInterval(() => {
|
||||
if (i < steps.length) {
|
||||
logger.info(steps[i]);
|
||||
i++;
|
||||
} else {
|
||||
clearInterval(interval);
|
||||
resolve();
|
||||
}
|
||||
}, 600);
|
||||
});
|
||||
}
|
||||
|
||||
async function simulateProcessing() {
|
||||
return new Promise((resolve) => {
|
||||
const dots = [".", "..", "..."];
|
||||
let i = 0;
|
||||
const interval = setInterval(() => {
|
||||
process.stdout.write(`\rProcessing${dots[i]}`);
|
||||
i = (i + 1) % dots.length;
|
||||
}, 500);
|
||||
|
||||
setTimeout(() => {
|
||||
clearInterval(interval);
|
||||
process.stdout.write("\r");
|
||||
resolve();
|
||||
}, 2000);
|
||||
});
|
||||
}
|
||||
|
||||
// Run the demo if this file is executed directly
|
||||
if (require.main === module) {
|
||||
runDemo().catch((error) => {
|
||||
demo.error(`Demo failed: ${error.message}`);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
||||
module.exports = { runDemo };
|
||||
229
job-search-parser/index.js
Normal file
229
job-search-parser/index.js
Normal file
@ -0,0 +1,229 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
/**
|
||||
* Job Search Parser - Refactored
|
||||
*
|
||||
* Uses core-parser for browser management and site-specific strategies for parsing logic
|
||||
*/
|
||||
|
||||
const path = require("path");
|
||||
const fs = require("fs");
|
||||
const CoreParser = require("../core-parser");
|
||||
const { skipthedriveStrategy } = require("./strategies/skipthedrive-strategy");
|
||||
const { logger, analyzeBatch, checkOllamaStatus } = require("ai-analyzer");
|
||||
|
||||
// Load environment variables
|
||||
require("dotenv").config({ path: path.join(__dirname, ".env") });
|
||||
|
||||
// Configuration from environment
|
||||
const HEADLESS = process.env.HEADLESS !== "false";
|
||||
const SEARCH_KEYWORDS =
|
||||
process.env.SEARCH_KEYWORDS || "software engineer,developer,programmer";
|
||||
const LOCATION_FILTER = process.env.LOCATION_FILTER;
|
||||
const ENABLE_AI_ANALYSIS = process.env.ENABLE_AI_ANALYSIS === "true";
|
||||
const MAX_PAGES = parseInt(process.env.MAX_PAGES) || 5;
|
||||
|
||||
// Available site strategies
|
||||
const SITE_STRATEGIES = {
|
||||
skipthedrive: skipthedriveStrategy,
|
||||
// Add more site strategies here
|
||||
// indeed: indeedStrategy,
|
||||
// glassdoor: glassdoorStrategy,
|
||||
};
|
||||
|
||||
/**
|
||||
* Parse command line arguments
|
||||
*/
|
||||
function parseArguments() {
|
||||
const args = process.argv.slice(2);
|
||||
const options = {
|
||||
sites: ["skipthedrive"], // default
|
||||
keywords: null,
|
||||
locationFilter: null,
|
||||
maxPages: MAX_PAGES,
|
||||
};
|
||||
|
||||
args.forEach((arg) => {
|
||||
if (arg.startsWith("--sites=")) {
|
||||
options.sites = arg
|
||||
.split("=")[1]
|
||||
.split(",")
|
||||
.map((s) => s.trim());
|
||||
} else if (arg.startsWith("--keywords=")) {
|
||||
options.keywords = arg
|
||||
.split("=")[1]
|
||||
.split(",")
|
||||
.map((k) => k.trim());
|
||||
} else if (arg.startsWith("--location=")) {
|
||||
options.locationFilter = arg.split("=")[1];
|
||||
} else if (arg.startsWith("--max-pages=")) {
|
||||
options.maxPages = parseInt(arg.split("=")[1]) || MAX_PAGES;
|
||||
}
|
||||
});
|
||||
|
||||
return options;
|
||||
}
|
||||
|
||||
/**
|
||||
* Main job search parser function
|
||||
*/
|
||||
async function startJobSearchParser(options = {}) {
|
||||
const cliOptions = parseArguments();
|
||||
const finalOptions = { ...cliOptions, ...options };
|
||||
|
||||
const coreParser = new CoreParser({
|
||||
headless: HEADLESS,
|
||||
timeout: 30000,
|
||||
});
|
||||
|
||||
try {
|
||||
logger.step("🚀 Job Search Parser Starting...");
|
||||
|
||||
// Parse keywords
|
||||
const keywords =
|
||||
finalOptions.keywords || SEARCH_KEYWORDS.split(",").map((k) => k.trim());
|
||||
const locationFilter = finalOptions.locationFilter || LOCATION_FILTER;
|
||||
const sites = finalOptions.sites;
|
||||
|
||||
logger.info(`📦 Selected job sites: ${sites.join(", ")}`);
|
||||
logger.info(`🔍 Search Keywords: ${keywords.join(", ")}`);
|
||||
logger.info(`📍 Location Filter: ${locationFilter || "None"}`);
|
||||
logger.info(
|
||||
`🧠 AI Analysis: ${ENABLE_AI_ANALYSIS ? "Enabled" : "Disabled"}`
|
||||
);
|
||||
|
||||
const allResults = [];
|
||||
const allRejectedResults = [];
|
||||
const siteResults = {};
|
||||
|
||||
// Process each selected site
|
||||
for (const site of sites) {
|
||||
const strategy = SITE_STRATEGIES[site];
|
||||
if (!strategy) {
|
||||
logger.error(`❌ Unknown site strategy: ${site}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
logger.step(`\n🌐 Parsing ${site}...`);
|
||||
const startTime = Date.now();
|
||||
|
||||
const parseResult = await strategy(coreParser, {
|
||||
keywords,
|
||||
locationFilter,
|
||||
maxPages: finalOptions.maxPages,
|
||||
});
|
||||
|
||||
const { results, rejectedResults, summary } = parseResult;
|
||||
const duration = ((Date.now() - startTime) / 1000).toFixed(2);
|
||||
|
||||
// Collect results
|
||||
allResults.push(...results);
|
||||
allRejectedResults.push(...rejectedResults);
|
||||
|
||||
siteResults[site] = {
|
||||
count: results.length,
|
||||
rejected: rejectedResults.length,
|
||||
duration: `${duration}s`,
|
||||
summary,
|
||||
};
|
||||
|
||||
logger.success(
|
||||
`✅ ${site} completed in ${duration}s - Found ${results.length} jobs`
|
||||
);
|
||||
} catch (error) {
|
||||
logger.error(`❌ ${site} parsing failed: ${error.message}`);
|
||||
siteResults[site] = {
|
||||
count: 0,
|
||||
rejected: 0,
|
||||
duration: "0s",
|
||||
error: error.message,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// AI Analysis if enabled
|
||||
let analysisResults = null;
|
||||
if (ENABLE_AI_ANALYSIS && allResults.length > 0) {
|
||||
logger.step("🧠 Running AI Analysis...");
|
||||
|
||||
const ollamaStatus = await checkOllamaStatus();
|
||||
if (ollamaStatus.available) {
|
||||
analysisResults = await analyzeBatch(allResults, {
|
||||
context:
|
||||
"Job market analysis focusing on job postings, skills, and trends",
|
||||
});
|
||||
logger.success(
|
||||
`✅ AI Analysis completed for ${allResults.length} jobs`
|
||||
);
|
||||
} else {
|
||||
logger.warning("⚠️ Ollama not available, skipping AI analysis");
|
||||
}
|
||||
}
|
||||
|
||||
// Save results
|
||||
const outputData = {
|
||||
metadata: {
|
||||
extractedAt: new Date().toISOString(),
|
||||
parser: "job-search-parser",
|
||||
version: "2.0.0",
|
||||
sites: sites,
|
||||
keywords: keywords.join(", "),
|
||||
locationFilter,
|
||||
analysisResults,
|
||||
},
|
||||
results: allResults,
|
||||
rejectedResults: allRejectedResults,
|
||||
siteResults,
|
||||
};
|
||||
|
||||
const resultsDir = path.join(__dirname, "results");
|
||||
if (!fs.existsSync(resultsDir)) {
|
||||
fs.mkdirSync(resultsDir, { recursive: true });
|
||||
}
|
||||
|
||||
const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
|
||||
const filename = `job-search-results-${timestamp}.json`;
|
||||
const filepath = path.join(resultsDir, filename);
|
||||
|
||||
fs.writeFileSync(filepath, JSON.stringify(outputData, null, 2));
|
||||
|
||||
// Final summary
|
||||
logger.step("\n📊 Job Search Parser Summary");
|
||||
logger.success(`✅ Total jobs found: ${allResults.length}`);
|
||||
logger.info(`❌ Total rejected: ${allRejectedResults.length}`);
|
||||
logger.info(`📁 Results saved to: ${filepath}`);
|
||||
|
||||
logger.info("\n📈 Results by site:");
|
||||
for (const [site, stats] of Object.entries(siteResults)) {
|
||||
if (stats.error) {
|
||||
logger.error(` ${site}: ERROR - ${stats.error}`);
|
||||
} else {
|
||||
logger.info(
|
||||
` ${site}: ${stats.count} jobs found, ${stats.rejected} rejected (${stats.duration})`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
logger.success("\n✅ Job Search Parser completed successfully!");
|
||||
|
||||
return outputData;
|
||||
} catch (error) {
|
||||
logger.error(`❌ Job Search Parser failed: ${error.message}`);
|
||||
throw error;
|
||||
} finally {
|
||||
await coreParser.cleanup();
|
||||
}
|
||||
}
|
||||
|
||||
// CLI handling
|
||||
if (require.main === module) {
|
||||
startJobSearchParser()
|
||||
.then(() => process.exit(0))
|
||||
.catch((error) => {
|
||||
console.error("Fatal error:", error.message);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
||||
module.exports = { startJobSearchParser };
|
||||
9
job-search-parser/keywords/job-search-keywords.csv
Normal file
9
job-search-parser/keywords/job-search-keywords.csv
Normal file
@ -0,0 +1,9 @@
|
||||
keyword
|
||||
qa automation
|
||||
automation test
|
||||
sdet
|
||||
qa lead
|
||||
automation lead
|
||||
playwright
|
||||
cypress
|
||||
quality assurance engineer
|
||||
|
28
job-search-parser/package.json
Normal file
28
job-search-parser/package.json
Normal file
@ -0,0 +1,28 @@
|
||||
{
|
||||
"name": "job-search-parser",
|
||||
"version": "1.0.0",
|
||||
"description": "Job search parser for multiple job sites using ai-analyzer core",
|
||||
"main": "index.js",
|
||||
"scripts": {
|
||||
"start": "node index.js",
|
||||
"test": "jest",
|
||||
"demo": "node demo.js",
|
||||
"parse:skipthedrive": "node parsers/skipthedrive-demo.js"
|
||||
},
|
||||
"keywords": [
|
||||
"job",
|
||||
"search",
|
||||
"parser",
|
||||
"scraper",
|
||||
"ai"
|
||||
],
|
||||
"author": "",
|
||||
"license": "ISC",
|
||||
"type": "commonjs",
|
||||
"dependencies": {
|
||||
"ai-analyzer": "file:../ai-analyzer",
|
||||
"core-parser": "file:../core-parser",
|
||||
"dotenv": "^17.0.0",
|
||||
"csv-parser": "^3.0.0"
|
||||
}
|
||||
}
|
||||
129
job-search-parser/parsers/skipthedrive-demo.js
Normal file
129
job-search-parser/parsers/skipthedrive-demo.js
Normal file
@ -0,0 +1,129 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
/**
|
||||
* SkipTheDrive Parser Demo
|
||||
*
|
||||
* Demonstrates the SkipTheDrive job parser functionality
|
||||
*/
|
||||
|
||||
const { parseSkipTheDrive } = require("./skipthedrive");
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
const { logger } = require("../../ai-analyzer");
|
||||
|
||||
// Load environment variables
|
||||
require("dotenv").config({ path: path.join(__dirname, "..", ".env") });
|
||||
|
||||
async function runDemo() {
|
||||
logger.step("🚀 SkipTheDrive Parser Demo");
|
||||
|
||||
// Demo configuration
|
||||
const options = {
|
||||
// Search for QA automation jobs (from your example)
|
||||
keywords: process.env.SEARCH_KEYWORDS?.split(",").map((k) => k.trim()) || [
|
||||
"automation qa",
|
||||
"qa engineer",
|
||||
"test automation",
|
||||
],
|
||||
|
||||
// Job type filters - can be: "part time", "full time", "contract"
|
||||
jobTypes: process.env.JOB_TYPES?.split(",").map((t) => t.trim()) || [],
|
||||
|
||||
// Location filter (optional)
|
||||
locationFilter: process.env.LOCATION_FILTER || "",
|
||||
|
||||
// Maximum pages to parse
|
||||
maxPages: parseInt(process.env.MAX_PAGES) || 3,
|
||||
|
||||
// Browser headless mode
|
||||
headless: process.env.HEADLESS !== "false",
|
||||
|
||||
// AI analysis
|
||||
enableAI: process.env.ENABLE_AI_ANALYSIS !== "false",
|
||||
aiContext: "remote QA and test automation job opportunities",
|
||||
};
|
||||
|
||||
logger.info("Configuration:");
|
||||
logger.info(`- Keywords: ${options.keywords.join(", ")}`);
|
||||
logger.info(
|
||||
`- Job Types: ${
|
||||
options.jobTypes.length > 0 ? options.jobTypes.join(", ") : "All types"
|
||||
}`
|
||||
);
|
||||
logger.info(`- Location Filter: ${options.locationFilter || "None"}`);
|
||||
logger.info(`- Max Pages: ${options.maxPages}`);
|
||||
logger.info(`- Headless: ${options.headless}`);
|
||||
logger.info(`- AI Analysis: ${options.enableAI}`);
|
||||
logger.info("\nStarting parser...");
|
||||
|
||||
try {
|
||||
const startTime = Date.now();
|
||||
const results = await parseSkipTheDrive(options);
|
||||
const duration = ((Date.now() - startTime) / 1000).toFixed(2);
|
||||
|
||||
// Save results
|
||||
const timestamp = new Date()
|
||||
.toISOString()
|
||||
.replace(/[:.]/g, "-")
|
||||
.slice(0, -5);
|
||||
const resultsDir = path.join(__dirname, "..", "results");
|
||||
|
||||
if (!fs.existsSync(resultsDir)) {
|
||||
fs.mkdirSync(resultsDir, { recursive: true });
|
||||
}
|
||||
|
||||
const resultsFile = path.join(
|
||||
resultsDir,
|
||||
`skipthedrive-results-${timestamp}.json`
|
||||
);
|
||||
fs.writeFileSync(resultsFile, JSON.stringify(results, null, 2));
|
||||
|
||||
// Display summary
|
||||
logger.step("\n📊 Parsing Summary:");
|
||||
logger.info(`- Duration: ${duration} seconds`);
|
||||
logger.info(`- Jobs Found: ${results.results.length}`);
|
||||
logger.info(`- Jobs Rejected: ${results.rejectedResults.length}`);
|
||||
logger.file(`- Results saved to: ${resultsFile}`);
|
||||
|
||||
// Show sample results
|
||||
if (results.results.length > 0) {
|
||||
logger.info("\n🔍 Sample Jobs Found:");
|
||||
results.results.slice(0, 5).forEach((job, index) => {
|
||||
logger.info(`\n${index + 1}. ${job.title}`);
|
||||
logger.info(` Company: ${job.company}`);
|
||||
logger.info(` Posted: ${job.daysAgo}`);
|
||||
logger.info(` Featured: ${job.isFeatured ? "Yes" : "No"}`);
|
||||
logger.info(` URL: ${job.jobUrl}`);
|
||||
if (job.aiAnalysis) {
|
||||
logger.ai(
|
||||
` AI Relevant: ${job.aiAnalysis.isRelevant ? "Yes" : "No"} (${(
|
||||
job.aiAnalysis.confidence * 100
|
||||
).toFixed(0)}% confidence)`
|
||||
);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// Show rejection reasons
|
||||
if (results.rejectedResults.length > 0) {
|
||||
const rejectionReasons = {};
|
||||
results.rejectedResults.forEach((job) => {
|
||||
rejectionReasons[job.reason] = (rejectionReasons[job.reason] || 0) + 1;
|
||||
});
|
||||
|
||||
logger.info("\n❌ Rejection Reasons:");
|
||||
Object.entries(rejectionReasons).forEach(([reason, count]) => {
|
||||
logger.info(` ${reason}: ${count}`);
|
||||
});
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error("\n❌ Demo failed:", error.message);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
// Run the demo
|
||||
runDemo().catch((err) => {
|
||||
logger.error("Fatal error:", err);
|
||||
process.exit(1);
|
||||
});
|
||||
332
job-search-parser/parsers/skipthedrive.js
Normal file
332
job-search-parser/parsers/skipthedrive.js
Normal file
@ -0,0 +1,332 @@
|
||||
/**
|
||||
* SkipTheDrive Job Parser
|
||||
*
|
||||
* Parses remote job listings from SkipTheDrive.com
|
||||
* Supports keyword search, job type filters, and pagination
|
||||
*/
|
||||
|
||||
const { chromium } = require("playwright");
|
||||
const path = require("path");
|
||||
|
||||
// Import from ai-analyzer core package
|
||||
const {
|
||||
logger,
|
||||
cleanText,
|
||||
containsAnyKeyword,
|
||||
parseLocationFilters,
|
||||
validateLocationAgainstFilters,
|
||||
extractLocationFromProfile,
|
||||
analyzeBatch,
|
||||
checkOllamaStatus,
|
||||
} = require("../../ai-analyzer");
|
||||
|
||||
/**
|
||||
* Build search URL for SkipTheDrive
|
||||
* @param {string} keyword - Search keyword
|
||||
* @param {string} orderBy - Sort order (date, relevance)
|
||||
* @param {Array<string>} jobTypes - Job types to filter (part time, full time, contract)
|
||||
* @returns {string} - Formatted search URL
|
||||
*/
|
||||
function buildSearchUrl(keyword, orderBy = "date", jobTypes = []) {
|
||||
let url = `https://www.skipthedrive.com/?s=${encodeURIComponent(keyword)}`;
|
||||
|
||||
if (orderBy) {
|
||||
url += `&orderby=${orderBy}`;
|
||||
}
|
||||
|
||||
// Add job type filters
|
||||
jobTypes.forEach((type) => {
|
||||
url += `&jobtype=${encodeURIComponent(type)}`;
|
||||
});
|
||||
|
||||
return url;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract job data from a single job listing element
|
||||
* @param {Element} article - Job listing DOM element
|
||||
* @returns {Object} - Extracted job data
|
||||
*/
|
||||
async function extractJobData(article) {
|
||||
try {
|
||||
// Extract job title and URL
|
||||
const titleElement = await article.$("h2.post-title a");
|
||||
const title = titleElement ? await titleElement.textContent() : "";
|
||||
const jobUrl = titleElement ? await titleElement.getAttribute("href") : "";
|
||||
|
||||
// Extract date
|
||||
const dateElement = await article.$("time.post-date");
|
||||
const datePosted = dateElement
|
||||
? await dateElement.getAttribute("datetime")
|
||||
: "";
|
||||
const dateText = dateElement ? await dateElement.textContent() : "";
|
||||
|
||||
// Extract company name
|
||||
const companyElement = await article.$(
|
||||
".custom_fields_company_name_display_search_results"
|
||||
);
|
||||
let company = companyElement ? await companyElement.textContent() : "";
|
||||
company = company.replace(/^\s*[^\s]+\s*/, "").trim(); // Remove icon
|
||||
|
||||
// Extract days ago
|
||||
const daysAgoElement = await article.$(
|
||||
".custom_fields_job_date_display_search_results"
|
||||
);
|
||||
let daysAgo = daysAgoElement ? await daysAgoElement.textContent() : "";
|
||||
daysAgo = daysAgo.replace(/^\s*[^\s]+\s*/, "").trim(); // Remove icon
|
||||
|
||||
// Extract job description excerpt
|
||||
const excerptElement = await article.$(".excerpt_part");
|
||||
const description = excerptElement
|
||||
? await excerptElement.textContent()
|
||||
: "";
|
||||
|
||||
// Check if featured/sponsored
|
||||
const featuredElement = await article.$(".custom_fields_sponsored_job");
|
||||
const isFeatured = !!featuredElement;
|
||||
|
||||
// Extract job ID from article ID
|
||||
const articleId = await article.getAttribute("id");
|
||||
const jobId = articleId ? articleId.replace("post-", "") : "";
|
||||
|
||||
return {
|
||||
jobId,
|
||||
title: cleanText(title),
|
||||
company: cleanText(company),
|
||||
jobUrl,
|
||||
datePosted,
|
||||
dateText: cleanText(dateText),
|
||||
daysAgo: cleanText(daysAgo),
|
||||
description: cleanText(description),
|
||||
isFeatured,
|
||||
source: "skipthedrive",
|
||||
timestamp: new Date().toISOString(),
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`Error extracting job data: ${error.message}`);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse SkipTheDrive job listings
|
||||
* @param {Object} options - Parser options
|
||||
* @returns {Promise<Array>} - Array of parsed job listings
|
||||
*/
|
||||
async function parseSkipTheDrive(options = {}) {
|
||||
const {
|
||||
keywords = process.env.SEARCH_KEYWORDS?.split(",").map((k) => k.trim()) || [
|
||||
"software engineer",
|
||||
"developer",
|
||||
],
|
||||
jobTypes = process.env.JOB_TYPES?.split(",").map((t) => t.trim()) || [],
|
||||
locationFilter = process.env.LOCATION_FILTER || "",
|
||||
maxPages = parseInt(process.env.MAX_PAGES) || 5,
|
||||
headless = process.env.HEADLESS !== "false",
|
||||
enableAI = process.env.ENABLE_AI_ANALYSIS === "true",
|
||||
aiContext = process.env.AI_CONTEXT || "remote job opportunities analysis",
|
||||
} = options;
|
||||
|
||||
logger.step("Starting SkipTheDrive parser...");
|
||||
logger.info(`🔍 Keywords: ${keywords.join(", ")}`);
|
||||
logger.info(
|
||||
`📋 Job Types: ${jobTypes.length > 0 ? jobTypes.join(", ") : "All"}`
|
||||
);
|
||||
logger.info(`📍 Location Filter: ${locationFilter || "None"}`);
|
||||
logger.info(`📄 Max Pages: ${maxPages}`);
|
||||
|
||||
const browser = await chromium.launch({
|
||||
headless,
|
||||
args: [
|
||||
"--no-sandbox",
|
||||
"--disable-setuid-sandbox",
|
||||
"--disable-dev-shm-usage",
|
||||
],
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
userAgent:
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
|
||||
});
|
||||
|
||||
const results = [];
|
||||
const rejectedResults = [];
|
||||
const seenJobs = new Set();
|
||||
|
||||
try {
|
||||
// Search for each keyword
|
||||
for (const keyword of keywords) {
|
||||
logger.info(`\n🔍 Searching for: ${keyword}`);
|
||||
|
||||
const searchUrl = buildSearchUrl(keyword, "date", jobTypes);
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
logger.info(
|
||||
`Attempting navigation to: ${searchUrl} at ${new Date().toISOString()}`
|
||||
);
|
||||
await page.goto(searchUrl, {
|
||||
waitUntil: "domcontentloaded",
|
||||
timeout: 30000,
|
||||
});
|
||||
logger.info(
|
||||
`Navigation completed successfully at ${new Date().toISOString()}`
|
||||
);
|
||||
|
||||
// Wait for job listings to load
|
||||
logger.info("Waiting for selector #loops-wrapper");
|
||||
await page
|
||||
.waitForSelector("#loops-wrapper", { timeout: 5000 })
|
||||
.catch(() => {
|
||||
logger.warning(`No results found for keyword: ${keyword}`);
|
||||
});
|
||||
logger.info("Selector wait completed");
|
||||
|
||||
let currentPage = 1;
|
||||
let hasNextPage = true;
|
||||
|
||||
while (hasNextPage && currentPage <= maxPages) {
|
||||
logger.info(`📄 Processing page ${currentPage} for "${keyword}"`);
|
||||
|
||||
// Extract all job articles on current page
|
||||
const jobArticles = await page.$$("article[id^='post-']");
|
||||
logger.info(
|
||||
`Found ${jobArticles.length} job listings on page ${currentPage}`
|
||||
);
|
||||
|
||||
for (const article of jobArticles) {
|
||||
const jobData = await extractJobData(article);
|
||||
|
||||
if (!jobData || seenJobs.has(jobData.jobId)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
seenJobs.add(jobData.jobId);
|
||||
|
||||
// Add keyword that found this job
|
||||
jobData.searchKeyword = keyword;
|
||||
|
||||
// Validate job against keywords
|
||||
const fullText = `${jobData.title} ${jobData.description} ${jobData.company}`;
|
||||
if (!containsAnyKeyword(fullText, keywords)) {
|
||||
rejectedResults.push({
|
||||
...jobData,
|
||||
rejected: true,
|
||||
reason: "Keywords not found in job listing",
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
// Location validation (if enabled)
|
||||
if (locationFilter) {
|
||||
const locationFilters = parseLocationFilters(locationFilter);
|
||||
// For SkipTheDrive, most jobs are remote, but we can check the title/description
|
||||
const locationValid =
|
||||
fullText.toLowerCase().includes("remote") ||
|
||||
locationFilters.some((filter) =>
|
||||
fullText.toLowerCase().includes(filter.toLowerCase())
|
||||
);
|
||||
|
||||
if (!locationValid) {
|
||||
rejectedResults.push({
|
||||
...jobData,
|
||||
rejected: true,
|
||||
reason: "Location requirements not met",
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
jobData.locationValid = locationValid;
|
||||
}
|
||||
|
||||
logger.success(`✅ Found: ${jobData.title} at ${jobData.company}`);
|
||||
results.push(jobData);
|
||||
}
|
||||
|
||||
// Check for next page
|
||||
const nextPageLink = await page.$("a.nextp");
|
||||
if (nextPageLink && currentPage < maxPages) {
|
||||
logger.info("📄 Moving to next page...");
|
||||
await nextPageLink.click();
|
||||
await page.waitForLoadState("domcontentloaded");
|
||||
await page.waitForTimeout(2000); // Wait for content to load
|
||||
currentPage++;
|
||||
} else {
|
||||
hasNextPage = false;
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error(`Error processing keyword "${keyword}": ${error.message}`);
|
||||
} finally {
|
||||
await page.close();
|
||||
}
|
||||
}
|
||||
|
||||
logger.success(`\n✅ Parsing complete!`);
|
||||
logger.info(`📊 Total jobs found: ${results.length}`);
|
||||
logger.info(`❌ Rejected jobs: ${rejectedResults.length}`);
|
||||
|
||||
// Run AI analysis if enabled
|
||||
let aiAnalysis = null;
|
||||
if (enableAI && results.length > 0) {
|
||||
logger.step("Running AI analysis on job listings...");
|
||||
|
||||
const aiAvailable = await checkOllamaStatus();
|
||||
if (aiAvailable) {
|
||||
const analysisData = results.map((job) => ({
|
||||
text: `${job.title} at ${job.company}. ${job.description}`,
|
||||
metadata: {
|
||||
jobId: job.jobId,
|
||||
company: job.company,
|
||||
daysAgo: job.daysAgo,
|
||||
},
|
||||
}));
|
||||
|
||||
aiAnalysis = await analyzeBatch(analysisData, aiContext);
|
||||
|
||||
// Merge AI analysis with results
|
||||
results.forEach((job, index) => {
|
||||
if (aiAnalysis && aiAnalysis[index]) {
|
||||
job.aiAnalysis = {
|
||||
isRelevant: aiAnalysis[index].isRelevant,
|
||||
confidence: aiAnalysis[index].confidence,
|
||||
reasoning: aiAnalysis[index].reasoning,
|
||||
};
|
||||
}
|
||||
});
|
||||
|
||||
logger.success("✅ AI analysis completed");
|
||||
} else {
|
||||
logger.warning("⚠️ AI not available - skipping analysis");
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
results,
|
||||
rejectedResults,
|
||||
metadata: {
|
||||
source: "skipthedrive",
|
||||
totalJobs: results.length,
|
||||
rejectedJobs: rejectedResults.length,
|
||||
keywords: keywords,
|
||||
jobTypes: jobTypes,
|
||||
locationFilter: locationFilter,
|
||||
aiAnalysisEnabled: enableAI,
|
||||
aiAnalysisCompleted: !!aiAnalysis,
|
||||
timestamp: new Date().toISOString(),
|
||||
},
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`Fatal error in SkipTheDrive parser: ${error.message}`);
|
||||
throw error;
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
// Export the parser
|
||||
module.exports = {
|
||||
parseSkipTheDrive,
|
||||
buildSearchUrl,
|
||||
extractJobData,
|
||||
};
|
||||
302
job-search-parser/strategies/skipthedrive-strategy.js
Normal file
302
job-search-parser/strategies/skipthedrive-strategy.js
Normal file
@ -0,0 +1,302 @@
|
||||
/**
|
||||
* SkipTheDrive Parsing Strategy
|
||||
*
|
||||
* Uses core-parser for browser management and ai-analyzer for utilities
|
||||
*/
|
||||
|
||||
const {
|
||||
logger,
|
||||
cleanText,
|
||||
containsAnyKeyword,
|
||||
validateLocationAgainstFilters,
|
||||
} = require("ai-analyzer");
|
||||
|
||||
/**
|
||||
* SkipTheDrive URL builder
|
||||
*/
|
||||
function buildSearchUrl(keyword, orderBy = "date", jobTypes = []) {
|
||||
const baseUrl = "https://www.skipthedrive.com/";
|
||||
const params = new URLSearchParams({
|
||||
s: keyword,
|
||||
orderby: orderBy,
|
||||
});
|
||||
|
||||
if (jobTypes && jobTypes.length > 0) {
|
||||
params.append("job_type", jobTypes.join(","));
|
||||
}
|
||||
|
||||
return `${baseUrl}?${params.toString()}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* SkipTheDrive parsing strategy function
|
||||
*/
|
||||
async function skipthedriveStrategy(coreParser, options = {}) {
|
||||
const {
|
||||
keywords = ["software engineer", "developer", "programmer"],
|
||||
locationFilter = null,
|
||||
maxPages = 5,
|
||||
jobTypes = [],
|
||||
} = options;
|
||||
|
||||
const results = [];
|
||||
const rejectedResults = [];
|
||||
const seenJobs = new Set();
|
||||
|
||||
try {
|
||||
// Create main page
|
||||
const page = await coreParser.createPage("skipthedrive-main");
|
||||
|
||||
logger.info("🚀 Starting SkipTheDrive parser...");
|
||||
logger.info(`🔍 Keywords: ${keywords.join(", ")}`);
|
||||
logger.info(`📍 Location Filter: ${locationFilter || "None"}`);
|
||||
logger.info(`📄 Max Pages: ${maxPages}`);
|
||||
|
||||
// Search for each keyword
|
||||
for (const keyword of keywords) {
|
||||
logger.info(`\n🔍 Searching for: ${keyword}`);
|
||||
|
||||
const searchUrl = buildSearchUrl(keyword, "date", jobTypes);
|
||||
|
||||
try {
|
||||
// Navigate to search results
|
||||
await coreParser.navigateTo(searchUrl, {
|
||||
pageId: "skipthedrive-main",
|
||||
retries: 2,
|
||||
timeout: 30000,
|
||||
});
|
||||
|
||||
// Wait for job listings to load
|
||||
const hasResults = await coreParser
|
||||
.waitForSelector(
|
||||
"#loops-wrapper",
|
||||
{
|
||||
timeout: 5000,
|
||||
},
|
||||
"skipthedrive-main"
|
||||
)
|
||||
.catch(() => {
|
||||
logger.warning(`No results found for keyword: ${keyword}`);
|
||||
return false;
|
||||
});
|
||||
|
||||
if (!hasResults) {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Process multiple pages
|
||||
let currentPage = 1;
|
||||
let hasNextPage = true;
|
||||
|
||||
while (hasNextPage && currentPage <= maxPages) {
|
||||
logger.info(`📄 Processing page ${currentPage} for "${keyword}"`);
|
||||
|
||||
// Extract jobs from current page
|
||||
const pageJobs = await extractJobsFromPage(
|
||||
page,
|
||||
keyword,
|
||||
locationFilter
|
||||
);
|
||||
|
||||
for (const job of pageJobs) {
|
||||
// Skip duplicates
|
||||
if (seenJobs.has(job.jobId)) continue;
|
||||
seenJobs.add(job.jobId);
|
||||
|
||||
// Validate location if filtering enabled
|
||||
if (locationFilter) {
|
||||
const locationValid = validateLocationAgainstFilters(
|
||||
job.location,
|
||||
locationFilter
|
||||
);
|
||||
|
||||
if (!locationValid) {
|
||||
rejectedResults.push({
|
||||
...job,
|
||||
rejectionReason: "Location filter mismatch",
|
||||
});
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
results.push(job);
|
||||
}
|
||||
|
||||
// Check for next page
|
||||
hasNextPage = await hasNextPageAvailable(page);
|
||||
if (hasNextPage && currentPage < maxPages) {
|
||||
await navigateToNextPage(page, currentPage + 1);
|
||||
currentPage++;
|
||||
|
||||
// Wait for new page to load
|
||||
await page.waitForTimeout(2000);
|
||||
} else {
|
||||
hasNextPage = false;
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error(`Error processing keyword "${keyword}": ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
logger.info(
|
||||
`🎯 SkipTheDrive parsing completed: ${results.length} jobs found, ${rejectedResults.length} rejected`
|
||||
);
|
||||
|
||||
return {
|
||||
results,
|
||||
rejectedResults,
|
||||
summary: {
|
||||
totalJobs: results.length,
|
||||
totalRejected: rejectedResults.length,
|
||||
keywords: keywords.join(", "),
|
||||
locationFilter,
|
||||
source: "skipthedrive",
|
||||
},
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`❌ SkipTheDrive parsing failed: ${error.message}`);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract jobs from current page
|
||||
*/
|
||||
async function extractJobsFromPage(page, keyword, locationFilter) {
|
||||
const jobs = [];
|
||||
|
||||
try {
|
||||
// Get all job article elements
|
||||
const jobElements = await page.$$("article.job_listing");
|
||||
|
||||
for (const jobElement of jobElements) {
|
||||
try {
|
||||
const job = await extractJobData(jobElement, keyword);
|
||||
if (job) {
|
||||
jobs.push(job);
|
||||
}
|
||||
} catch (error) {
|
||||
logger.warning(`Failed to extract job data: ${error.message}`);
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error(`Failed to extract jobs from page: ${error.message}`);
|
||||
}
|
||||
|
||||
return jobs;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract data from individual job element
|
||||
*/
|
||||
async function extractJobData(jobElement, keyword) {
|
||||
try {
|
||||
// Extract job ID
|
||||
const articleId = (await jobElement.getAttribute("id")) || "";
|
||||
const jobId = articleId ? articleId.replace("post-", "") : "";
|
||||
|
||||
// Extract title
|
||||
const titleElement = await jobElement.$(".job_listing-title a");
|
||||
const title = titleElement
|
||||
? cleanText(await titleElement.textContent())
|
||||
: "";
|
||||
const jobUrl = titleElement ? await titleElement.getAttribute("href") : "";
|
||||
|
||||
// Extract company
|
||||
const companyElement = await jobElement.$(".company");
|
||||
const company = companyElement
|
||||
? cleanText(await companyElement.textContent())
|
||||
: "";
|
||||
|
||||
// Extract location
|
||||
const locationElement = await jobElement.$(".location");
|
||||
const location = locationElement
|
||||
? cleanText(await locationElement.textContent())
|
||||
: "";
|
||||
|
||||
// Extract date posted
|
||||
const dateElement = await jobElement.$(".job-date");
|
||||
const dateText = dateElement
|
||||
? cleanText(await dateElement.textContent())
|
||||
: "";
|
||||
|
||||
// Extract description
|
||||
const descElement = await jobElement.$(".job_listing-description");
|
||||
const description = descElement
|
||||
? cleanText(await descElement.textContent())
|
||||
: "";
|
||||
|
||||
// Check if featured
|
||||
const featuredElement = await jobElement.$(".featured");
|
||||
const isFeatured = featuredElement !== null;
|
||||
|
||||
// Parse date
|
||||
let datePosted = null;
|
||||
let daysAgo = null;
|
||||
|
||||
if (dateText) {
|
||||
const match = dateText.match(/(\d+)\s+days?\s+ago/);
|
||||
if (match) {
|
||||
daysAgo = parseInt(match[1]);
|
||||
const date = new Date();
|
||||
date.setDate(date.getDate() - daysAgo);
|
||||
datePosted = date.toISOString().split("T")[0];
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
jobId,
|
||||
title,
|
||||
company,
|
||||
location,
|
||||
jobUrl,
|
||||
datePosted,
|
||||
dateText,
|
||||
daysAgo,
|
||||
description,
|
||||
isFeatured,
|
||||
keyword,
|
||||
extractedAt: new Date().toISOString(),
|
||||
source: "skipthedrive",
|
||||
};
|
||||
} catch (error) {
|
||||
logger.warning(`Error extracting job data: ${error.message}`);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if next page is available
|
||||
*/
|
||||
async function hasNextPageAvailable(page) {
|
||||
try {
|
||||
const nextButton = await page.$(".next-page");
|
||||
return nextButton !== null;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Navigate to next page
|
||||
*/
|
||||
async function navigateToNextPage(page, pageNumber) {
|
||||
try {
|
||||
const nextButton = await page.$(".next-page");
|
||||
if (nextButton) {
|
||||
await nextButton.click();
|
||||
}
|
||||
} catch (error) {
|
||||
logger.warning(
|
||||
`Failed to navigate to page ${pageNumber}: ${error.message}`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
skipthedriveStrategy,
|
||||
buildSearchUrl,
|
||||
extractJobsFromPage,
|
||||
extractJobData,
|
||||
};
|
||||
315
linkedin-parser/README.md
Normal file
315
linkedin-parser/README.md
Normal file
@ -0,0 +1,315 @@
|
||||
# LinkedIn Parser
|
||||
|
||||
LinkedIn posts parser with **integrated AI analysis** using the ai-analyzer core package. AI analysis is now embedded directly into the results JSON file.
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Run with default settings (AI analysis integrated into results)
|
||||
npm start
|
||||
|
||||
# Run without AI analysis
|
||||
npm run start:no-ai
|
||||
```
|
||||
|
||||
## 📋 Available Scripts
|
||||
|
||||
### Parser Modes
|
||||
|
||||
```bash
|
||||
# Basic parsing with integrated AI analysis
|
||||
npm start
|
||||
|
||||
# Parsing without AI analysis
|
||||
npm run start:no-ai
|
||||
|
||||
# Headless browser mode
|
||||
npm run start:headless
|
||||
|
||||
# Visible browser mode (for debugging)
|
||||
npm run start:visible
|
||||
|
||||
# Disable location filtering
|
||||
npm run start:no-location
|
||||
|
||||
# Custom keywords
|
||||
npm run start:custom
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Run tests
|
||||
npm test
|
||||
|
||||
# Run tests in watch mode
|
||||
npm run test:watch
|
||||
|
||||
# Run tests with coverage
|
||||
npm run test:coverage
|
||||
```
|
||||
|
||||
### AI Analysis (CLI)
|
||||
|
||||
```bash
|
||||
# Analyze latest results file with default context
|
||||
npm run analyze:latest
|
||||
|
||||
# Analyze latest results file for layoffs
|
||||
npm run analyze:layoff
|
||||
|
||||
# Analyze latest results file for job market trends
|
||||
npm run analyze:trends
|
||||
|
||||
# Analyze specific file (requires --input parameter)
|
||||
npm run analyze -- --input=results.json
|
||||
```
|
||||
|
||||
### Utilities
|
||||
|
||||
```bash
|
||||
# Show help
|
||||
npm run help
|
||||
|
||||
# Run demo
|
||||
npm run demo
|
||||
|
||||
# Install Playwright browser
|
||||
npm run install:playwright
|
||||
```
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file in the `linkedin-parser` directory:
|
||||
|
||||
```env
|
||||
# LinkedIn Credentials
|
||||
LINKEDIN_USERNAME=your_email@example.com
|
||||
LINKEDIN_PASSWORD=your_password
|
||||
|
||||
# Search Configuration
|
||||
CITY=Toronto
|
||||
DATE_POSTED=past-week
|
||||
SORT_BY=date_posted
|
||||
WHEELS=5
|
||||
|
||||
# Location Filtering
|
||||
LOCATION_FILTER=Ontario,Manitoba
|
||||
ENABLE_LOCATION_CHECK=true
|
||||
|
||||
# AI Analysis
|
||||
ENABLE_AI_ANALYSIS=true
|
||||
AI_CONTEXT="job market analysis and trends"
|
||||
OLLAMA_MODEL=mistral
|
||||
|
||||
# Browser Configuration
|
||||
HEADLESS=true
|
||||
```
|
||||
|
||||
### Command Line Options
|
||||
|
||||
```bash
|
||||
# Browser options
|
||||
--headless=true|false # Browser headless mode
|
||||
--keyword="kw1,kw2" # Specific keywords
|
||||
--add-keyword="kw" # Additional keywords
|
||||
--no-location # Disable location filtering
|
||||
--no-ai # Disable AI analysis
|
||||
```
|
||||
|
||||
## 📊 Output Files
|
||||
|
||||
The parser generates two main files:
|
||||
|
||||
1. **`linkedin-results-YYYY-MM-DD-HH-MM.json`** - Main results with **integrated AI analysis**
|
||||
2. **`linkedin-rejected-YYYY-MM-DD-HH-MM.json`** - Rejected posts with reasons
|
||||
|
||||
### Results Structure
|
||||
|
||||
Each result in the JSON file now includes AI analysis:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"timestamp": "2025-07-21T02:00:08.561Z",
|
||||
"totalPosts": 10,
|
||||
"aiAnalysisEnabled": true,
|
||||
"aiAnalysisCompleted": true,
|
||||
"aiContext": "job market analysis and trends",
|
||||
"aiModel": "mistral"
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"keyword": "layoff",
|
||||
"text": "Post content...",
|
||||
"profileLink": "https://linkedin.com/in/user",
|
||||
"location": "Toronto, Ontario",
|
||||
"aiAnalysis": {
|
||||
"isRelevant": true,
|
||||
"confidence": 0.9,
|
||||
"reasoning": "Post discusses job market conditions and hiring",
|
||||
"context": "job market analysis and trends",
|
||||
"model": "mistral",
|
||||
"analyzedAt": "2025-07-21T02:48:42.487Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 🧠 AI Analysis Workflow
|
||||
|
||||
### Automatic Integration
|
||||
|
||||
AI analysis runs automatically after parsing completes and is **embedded directly into the results JSON** (unless disabled with `--no-ai`).
|
||||
|
||||
### Manual Re-analysis
|
||||
|
||||
You can re-analyze existing results with different contexts using the CLI:
|
||||
|
||||
```bash
|
||||
# Analyze latest results with default context
|
||||
npm run analyze:latest
|
||||
|
||||
# Analyze latest results for layoffs
|
||||
npm run analyze:layoff
|
||||
|
||||
# Analyze latest results for job market trends
|
||||
npm run analyze:trends
|
||||
|
||||
# Analyze specific file with custom context
|
||||
node ../ai-analyzer/cli.js --input=results.json --context="custom analysis"
|
||||
```
|
||||
|
||||
### CLI Options
|
||||
|
||||
The AI analyzer CLI supports:
|
||||
|
||||
```bash
|
||||
--input=FILE # Input JSON file
|
||||
--output=FILE # Output file (default: original-ai.json)
|
||||
--context="description" # Analysis context
|
||||
--model=MODEL # Ollama model (default: mistral)
|
||||
--latest # Use latest results file
|
||||
--dir=PATH # Directory to look for results
|
||||
```
|
||||
|
||||
## 🎯 Use Cases
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Run parser with integrated AI analysis
|
||||
npm start
|
||||
```
|
||||
|
||||
### Testing Different Keywords
|
||||
|
||||
```bash
|
||||
# Test with custom keywords
|
||||
npm run start:custom
|
||||
```
|
||||
|
||||
### Debugging
|
||||
|
||||
```bash
|
||||
# Run with visible browser
|
||||
npm run start:visible
|
||||
|
||||
# Run without location filtering
|
||||
npm run start:no-location
|
||||
```
|
||||
|
||||
### Re-analyzing Data
|
||||
|
||||
```bash
|
||||
# After running parser, re-analyze with different contexts
|
||||
npm run analyze:layoff
|
||||
npm run analyze:trends
|
||||
|
||||
# Analyze specific file
|
||||
node ../ai-analyzer/cli.js --input=results/linkedin-results-2025-07-20-18-00.json
|
||||
```
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Missing credentials**
|
||||
|
||||
```bash
|
||||
# Check .env file exists and has credentials
|
||||
cat .env
|
||||
```
|
||||
|
||||
2. **Browser issues**
|
||||
|
||||
```bash
|
||||
# Install Playwright browser
|
||||
npm run install:playwright
|
||||
```
|
||||
|
||||
3. **AI not available**
|
||||
|
||||
```bash
|
||||
# Make sure Ollama is running
|
||||
ollama list
|
||||
|
||||
# Install mistral model if needed
|
||||
ollama pull mistral
|
||||
```
|
||||
|
||||
4. **No results found**
|
||||
|
||||
```bash
|
||||
# Try different keywords
|
||||
npm run start:custom
|
||||
```
|
||||
|
||||
5. **CLI can't find results**
|
||||
```bash
|
||||
# Make sure you're in the linkedin-parser directory
|
||||
cd linkedin-parser
|
||||
npm run analyze:latest
|
||||
```
|
||||
|
||||
## 📁 Project Structure
|
||||
|
||||
```
|
||||
linkedin-parser/
|
||||
├── index.js # Main parser with integrated AI analysis
|
||||
├── package.json # Dependencies and scripts
|
||||
├── .env # Configuration (create this)
|
||||
├── keywords/ # Keyword CSV files
|
||||
└── results/ # Output files (created automatically)
|
||||
├── linkedin-results-*.json # Results with integrated AI analysis
|
||||
└── linkedin-rejected-*.json # Rejected posts
|
||||
```
|
||||
|
||||
## 🤝 Integration
|
||||
|
||||
This parser integrates with:
|
||||
|
||||
- **ai-analyzer**: Core AI utilities and CLI analysis tool
|
||||
- **job-search-parser**: Job market intelligence (separate module)
|
||||
|
||||
### AI Analysis Package
|
||||
|
||||
The `ai-analyzer` package provides:
|
||||
|
||||
- **Library functions**: `analyzeBatch`, `checkOllamaStatus`, etc.
|
||||
- **CLI tool**: `cli.js` for standalone analysis
|
||||
- **Reusable components**: For other parsers in the ecosystem
|
||||
|
||||
## 🆕 What's New
|
||||
|
||||
- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
|
||||
- **No Separate Files**: No more separate AI analysis files to manage
|
||||
- **Rich Context**: Each post includes detailed AI insights
|
||||
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
|
||||
- **Backward Compatible**: Original data structure preserved
|
||||
412
linkedin-parser/demo.js
Normal file
412
linkedin-parser/demo.js
Normal file
@ -0,0 +1,412 @@
|
||||
/**
|
||||
* LinkedIn Parser Demo
|
||||
*
|
||||
* Demonstrates the LinkedIn Parser's capabilities for scraping LinkedIn content
|
||||
* with keyword-based searching, location filtering, and AI analysis.
|
||||
*
|
||||
* This demo uses simulated data for safety and demonstration purposes.
|
||||
*/
|
||||
|
||||
const { logger } = require("../ai-analyzer");
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
|
||||
// Terminal colors for demo output
|
||||
const colors = {
|
||||
reset: "\x1b[0m",
|
||||
bright: "\x1b[1m",
|
||||
cyan: "\x1b[36m",
|
||||
green: "\x1b[32m",
|
||||
yellow: "\x1b[33m",
|
||||
blue: "\x1b[34m",
|
||||
magenta: "\x1b[35m",
|
||||
red: "\x1b[31m",
|
||||
};
|
||||
|
||||
const demo = {
|
||||
title: (text) =>
|
||||
console.log(`\n${colors.bright}${colors.cyan}${text}${colors.reset}`),
|
||||
section: (text) =>
|
||||
console.log(`\n${colors.bright}${colors.magenta}${text}${colors.reset}`),
|
||||
success: (text) => console.log(`${colors.green}✅ ${text}${colors.reset}`),
|
||||
info: (text) => console.log(`${colors.blue}ℹ️ ${text}${colors.reset}`),
|
||||
warning: (text) => console.log(`${colors.yellow}⚠️ ${text}${colors.reset}`),
|
||||
error: (text) => console.log(`${colors.red}❌ ${text}${colors.reset}`),
|
||||
code: (text) => console.log(`${colors.cyan}${text}${colors.reset}`),
|
||||
};
|
||||
|
||||
// Mock data for demonstration
|
||||
const mockPosts = [
|
||||
{
|
||||
id: "post_1",
|
||||
content:
|
||||
"Just got laid off from my software engineering role at TechCorp. Looking for new opportunities in Toronto. This is really tough but I'm staying positive!",
|
||||
original_content:
|
||||
"Just got #laidoff from my software engineering role at TechCorp! Looking for new opportunities in #Toronto. This is really tough but I'm staying positive! 🚀",
|
||||
author: {
|
||||
name: "John Doe",
|
||||
title: "Software Engineer",
|
||||
company: "TechCorp",
|
||||
location: "Toronto, Ontario, Canada",
|
||||
profile_url: "https://linkedin.com/in/johndoe",
|
||||
},
|
||||
engagement: { likes: 45, comments: 12, shares: 3 },
|
||||
metadata: {
|
||||
post_date: "2024-01-10T14:30:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
search_keyword: "layoff",
|
||||
location_validated: true,
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "post_2",
|
||||
content:
|
||||
"Our company is downsizing and I'm affected. This is really tough news but I'm grateful for the time I had here.",
|
||||
original_content:
|
||||
"Our company is #downsizing and I'm affected. This is really tough news but I'm grateful for the time I had here. #RIF #layoff",
|
||||
author: {
|
||||
name: "Jane Smith",
|
||||
title: "Product Manager",
|
||||
company: "StartupXYZ",
|
||||
location: "Vancouver, British Columbia, Canada",
|
||||
profile_url: "https://linkedin.com/in/janesmith",
|
||||
},
|
||||
engagement: { likes: 23, comments: 8, shares: 1 },
|
||||
metadata: {
|
||||
post_date: "2024-01-09T16:45:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
search_keyword: "downsizing",
|
||||
location_validated: true,
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "post_3",
|
||||
content:
|
||||
"Open to work! Looking for new opportunities in software development. I have 5 years of experience in React, Node.js, and cloud technologies.",
|
||||
original_content:
|
||||
"Open to work! Looking for new opportunities in software development. I have 5 years of experience in #React, #NodeJS, and #cloud technologies. #opentowork #jobsearch",
|
||||
author: {
|
||||
name: "Bob Wilson",
|
||||
title: "Full Stack Developer",
|
||||
company: "Freelance",
|
||||
location: "Calgary, Alberta, Canada",
|
||||
profile_url: "https://linkedin.com/in/bobwilson",
|
||||
},
|
||||
engagement: { likes: 67, comments: 15, shares: 8 },
|
||||
metadata: {
|
||||
post_date: "2024-01-08T11:20:00Z",
|
||||
scraped_at: "2024-01-15T10:30:00Z",
|
||||
search_keyword: "open to work",
|
||||
location_validated: true,
|
||||
},
|
||||
},
|
||||
];
|
||||
|
||||
async function runDemo() {
|
||||
demo.title("=== LinkedIn Parser Demo ===");
|
||||
demo.info(
|
||||
"This demo showcases the LinkedIn Parser's capabilities for scraping LinkedIn content."
|
||||
);
|
||||
demo.info("All data shown is simulated for demonstration purposes.");
|
||||
demo.info("Press Enter to continue through each section...\n");
|
||||
|
||||
await waitForEnter();
|
||||
|
||||
// 1. Configuration Demo
|
||||
await demonstrateConfiguration();
|
||||
|
||||
// 2. Keyword Loading Demo
|
||||
await demonstrateKeywordLoading();
|
||||
|
||||
// 3. Search Process Demo
|
||||
await demonstrateSearchProcess();
|
||||
|
||||
// 4. Location Filtering Demo
|
||||
await demonstrateLocationFiltering();
|
||||
|
||||
// 5. AI Analysis Demo
|
||||
await demonstrateAIAnalysis();
|
||||
|
||||
// 6. Output Generation Demo
|
||||
await demonstrateOutputGeneration();
|
||||
|
||||
demo.title("=== Demo Complete ===");
|
||||
demo.success("LinkedIn Parser demo completed successfully!");
|
||||
demo.info("Check the README.md for detailed usage instructions.");
|
||||
}
|
||||
|
||||
async function demonstrateConfiguration() {
|
||||
demo.section("1. Configuration Setup");
|
||||
demo.info(
|
||||
"The LinkedIn Parser uses environment variables and command-line options for configuration."
|
||||
);
|
||||
|
||||
demo.code("// Environment Variables (.env file)");
|
||||
demo.info("LINKEDIN_USERNAME=your_email@example.com");
|
||||
demo.info("LINKEDIN_PASSWORD=your_password");
|
||||
demo.info("CITY=Toronto");
|
||||
demo.info("DATE_POSTED=past-week");
|
||||
demo.info("SORT_BY=date_posted");
|
||||
demo.info("WHEELS=5");
|
||||
demo.info("LOCATION_FILTER=Ontario,Manitoba");
|
||||
demo.info("ENABLE_LOCATION_CHECK=true");
|
||||
demo.info("ENABLE_LOCAL_AI=true");
|
||||
demo.info('AI_CONTEXT="job layoffs and workforce reduction"');
|
||||
demo.info("OLLAMA_MODEL=mistral");
|
||||
|
||||
demo.code("// Command Line Options");
|
||||
demo.info('node index.js --keyword="layoff,downsizing" --city="Vancouver"');
|
||||
demo.info("node index.js --no-location --no-ai");
|
||||
demo.info("node index.js --output=results/my-results.json");
|
||||
demo.info("node index.js --ai-after");
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateKeywordLoading() {
|
||||
demo.section("2. Keyword Loading");
|
||||
demo.info(
|
||||
"Keywords can be loaded from CSV files or specified via command line."
|
||||
);
|
||||
|
||||
// Simulate loading keywords from CSV
|
||||
demo.code("// Loading keywords from CSV file");
|
||||
logger.step("Loading keywords from keywords/linkedin-keywords.csv");
|
||||
|
||||
const keywords = [
|
||||
"layoff",
|
||||
"downsizing",
|
||||
"reduction in force",
|
||||
"RIF",
|
||||
"termination",
|
||||
"job loss",
|
||||
"workforce reduction",
|
||||
"open to work",
|
||||
"actively seeking",
|
||||
"job search",
|
||||
];
|
||||
|
||||
demo.success(`Loaded ${keywords.length} keywords from CSV file`);
|
||||
demo.info("Keywords: " + keywords.slice(0, 5).join(", ") + "...");
|
||||
|
||||
demo.code("// Command line keyword override");
|
||||
demo.info('node index.js --keyword="layoff,downsizing"');
|
||||
demo.info('node index.js --add-keyword="hiring freeze"');
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateSearchProcess() {
|
||||
demo.section("3. Search Process Simulation");
|
||||
demo.info(
|
||||
"The parser performs automated LinkedIn searches for each keyword."
|
||||
);
|
||||
|
||||
const keywords = ["layoff", "downsizing", "open to work"];
|
||||
|
||||
for (const keyword of keywords) {
|
||||
demo.code(`// Searching for keyword: "${keyword}"`);
|
||||
logger.search(`Searching for "${keyword}" in Toronto`);
|
||||
|
||||
// Simulate search process
|
||||
await simulateSearch();
|
||||
|
||||
const foundCount = Math.floor(Math.random() * 50) + 10;
|
||||
const acceptedCount = Math.floor(foundCount * 0.3);
|
||||
|
||||
logger.info(`Found ${foundCount} posts, checking profiles for location...`);
|
||||
logger.success(`Accepted ${acceptedCount} posts after location validation`);
|
||||
|
||||
console.log();
|
||||
}
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateLocationFiltering() {
|
||||
demo.section("4. Location Filtering");
|
||||
demo.info(
|
||||
"Posts are filtered based on author location using geographic validation."
|
||||
);
|
||||
|
||||
demo.code("// Location filter configuration");
|
||||
demo.info("LOCATION_FILTER=Ontario,Manitoba");
|
||||
demo.info("ENABLE_LOCATION_CHECK=true");
|
||||
|
||||
demo.code("// Location validation examples");
|
||||
const testLocations = [
|
||||
{ location: "Toronto, Ontario, Canada", valid: true },
|
||||
{ location: "Vancouver, British Columbia, Canada", valid: false },
|
||||
{ location: "Calgary, Alberta, Canada", valid: false },
|
||||
{ location: "Winnipeg, Manitoba, Canada", valid: true },
|
||||
{ location: "New York, NY, USA", valid: false },
|
||||
];
|
||||
|
||||
testLocations.forEach(({ location, valid }) => {
|
||||
logger.location(`Checking location: ${location}`);
|
||||
if (valid) {
|
||||
logger.success(`✅ Location valid - post accepted`);
|
||||
} else {
|
||||
logger.warning(`❌ Location invalid - post rejected`);
|
||||
}
|
||||
});
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateAIAnalysis() {
|
||||
demo.section("5. AI Analysis");
|
||||
demo.info(
|
||||
"Posts can be analyzed using local Ollama or OpenAI for relevance scoring."
|
||||
);
|
||||
|
||||
demo.code("// AI analysis configuration");
|
||||
demo.info("ENABLE_LOCAL_AI=true");
|
||||
demo.info('AI_CONTEXT="job layoffs and workforce reduction"');
|
||||
demo.info("OLLAMA_MODEL=mistral");
|
||||
|
||||
demo.code("// Analyzing posts with AI");
|
||||
logger.ai("Starting AI analysis of accepted posts...");
|
||||
|
||||
for (let i = 0; i < mockPosts.length; i++) {
|
||||
const post = mockPosts[i];
|
||||
logger.info(`Analyzing post ${i + 1}: ${post.content.substring(0, 50)}...`);
|
||||
|
||||
// Simulate AI analysis
|
||||
await simulateProcessing();
|
||||
|
||||
const relevanceScore = 0.7 + Math.random() * 0.3;
|
||||
const confidence = 0.8 + Math.random() * 0.2;
|
||||
|
||||
logger.success(
|
||||
`Relevance: ${relevanceScore.toFixed(
|
||||
2
|
||||
)}, Confidence: ${confidence.toFixed(2)}`
|
||||
);
|
||||
|
||||
// Add AI analysis to post
|
||||
post.ai_analysis = {
|
||||
relevance_score: relevanceScore,
|
||||
confidence: confidence,
|
||||
context_match: relevanceScore > 0.7,
|
||||
analysis_text: `This post discusses ${post.metadata.search_keyword} and is relevant to the search context.`,
|
||||
};
|
||||
}
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
async function demonstrateOutputGeneration() {
|
||||
demo.section("6. Output Generation");
|
||||
demo.info("Results are saved to JSON files with comprehensive metadata.");
|
||||
|
||||
demo.code("// Generating output file");
|
||||
logger.file("Saving results to JSON file...");
|
||||
|
||||
const outputData = {
|
||||
metadata: {
|
||||
timestamp: new Date().toISOString(),
|
||||
keywords: ["layoff", "downsizing", "open to work"],
|
||||
city: "Toronto",
|
||||
date_posted: "past-week",
|
||||
sort_by: "date_posted",
|
||||
total_posts_found: 150,
|
||||
accepted_posts: mockPosts.length,
|
||||
rejected_posts: 147,
|
||||
processing_time_seconds: 180,
|
||||
},
|
||||
posts: mockPosts,
|
||||
};
|
||||
|
||||
// Save to demo file
|
||||
const outputPath = path.join(__dirname, "demo-results.json");
|
||||
fs.writeFileSync(outputPath, JSON.stringify(outputData, null, 2));
|
||||
|
||||
demo.success(`Results saved to: ${outputPath}`);
|
||||
demo.info(`Total posts processed: ${outputData.metadata.total_posts_found}`);
|
||||
demo.info(`Posts accepted: ${outputData.metadata.accepted_posts}`);
|
||||
demo.info(`Posts rejected: ${outputData.metadata.rejected_posts}`);
|
||||
|
||||
demo.code("// Output file structure");
|
||||
demo.info("📁 demo-results.json");
|
||||
demo.info(" ├── metadata");
|
||||
demo.info(" │ ├── timestamp");
|
||||
demo.info(" │ ├── keywords");
|
||||
demo.info(" │ ├── city");
|
||||
demo.info(" │ ├── total_posts_found");
|
||||
demo.info(" │ ├── accepted_posts");
|
||||
demo.info(" │ └── processing_time_seconds");
|
||||
demo.info(" └── posts[]");
|
||||
demo.info(" ├── id");
|
||||
demo.info(" ├── content");
|
||||
demo.info(" ├── author");
|
||||
demo.info(" ├── engagement");
|
||||
demo.info(" ├── ai_analysis");
|
||||
demo.info(" └── metadata");
|
||||
|
||||
await waitForEnter();
|
||||
}
|
||||
|
||||
// Helper functions
|
||||
function waitForEnter() {
|
||||
return new Promise((resolve) => {
|
||||
const readline = require("readline");
|
||||
const rl = readline.createInterface({
|
||||
input: process.stdin,
|
||||
output: process.stdout,
|
||||
});
|
||||
|
||||
rl.question("\nPress Enter to continue...", () => {
|
||||
rl.close();
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
async function simulateSearch() {
|
||||
return new Promise((resolve) => {
|
||||
const steps = [
|
||||
"Launching browser",
|
||||
"Logging in",
|
||||
"Navigating to search",
|
||||
"Loading results",
|
||||
];
|
||||
let i = 0;
|
||||
const interval = setInterval(() => {
|
||||
if (i < steps.length) {
|
||||
logger.info(steps[i]);
|
||||
i++;
|
||||
} else {
|
||||
clearInterval(interval);
|
||||
resolve();
|
||||
}
|
||||
}, 800);
|
||||
});
|
||||
}
|
||||
|
||||
async function simulateProcessing() {
|
||||
return new Promise((resolve) => {
|
||||
const dots = [".", "..", "..."];
|
||||
let i = 0;
|
||||
const interval = setInterval(() => {
|
||||
process.stdout.write(`\rProcessing${dots[i]}`);
|
||||
i = (i + 1) % dots.length;
|
||||
}, 500);
|
||||
|
||||
setTimeout(() => {
|
||||
clearInterval(interval);
|
||||
process.stdout.write("\r");
|
||||
resolve();
|
||||
}, 1500);
|
||||
});
|
||||
}
|
||||
|
||||
// Run the demo if this file is executed directly
|
||||
if (require.main === module) {
|
||||
runDemo().catch((error) => {
|
||||
demo.error(`Demo failed: ${error.message}`);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
||||
module.exports = { runDemo };
|
||||
216
linkedin-parser/index.js
Normal file
216
linkedin-parser/index.js
Normal file
@ -0,0 +1,216 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
/**
|
||||
* LinkedIn Parser - Refactored
|
||||
*
|
||||
* Uses core-parser for browser management and linkedin-strategy for parsing logic
|
||||
*/
|
||||
|
||||
const path = require("path");
|
||||
const fs = require("fs");
|
||||
const CoreParser = require("../core-parser");
|
||||
const { linkedinStrategy } = require("./strategies/linkedin-strategy");
|
||||
const { logger, analyzeBatch, checkOllamaStatus, DEFAULT_MODEL } = require("ai-analyzer");
|
||||
|
||||
// Load environment variables - check both linkedin-parser/.env and root .env
|
||||
const localEnvPath = path.join(__dirname, ".env");
|
||||
const rootEnvPath = path.join(__dirname, "..", ".env");
|
||||
|
||||
// Try local .env first, then root .env
|
||||
if (fs.existsSync(localEnvPath)) {
|
||||
require("dotenv").config({ path: localEnvPath });
|
||||
} else if (fs.existsSync(rootEnvPath)) {
|
||||
require("dotenv").config({ path: rootEnvPath });
|
||||
} else {
|
||||
// Try default dotenv behavior (looks in current directory and parent directories)
|
||||
require("dotenv").config();
|
||||
}
|
||||
|
||||
// Configuration from environment
|
||||
const LINKEDIN_USERNAME = process.env.LINKEDIN_USERNAME;
|
||||
const LINKEDIN_PASSWORD = process.env.LINKEDIN_PASSWORD;
|
||||
const HEADLESS = process.env.HEADLESS !== "false";
|
||||
const SEARCH_KEYWORDS =
|
||||
process.env.SEARCH_KEYWORDS || "layoff,downsizing,job cuts";
|
||||
const LOCATION_FILTER = process.env.LOCATION_FILTER;
|
||||
const ENABLE_AI_ANALYSIS = process.env.ENABLE_AI_ANALYSIS !== "false";
|
||||
const AI_CONTEXT = process.env.AI_CONTEXT || "job market analysis and trends";
|
||||
const OLLAMA_MODEL = process.env.OLLAMA_MODEL || DEFAULT_MODEL;
|
||||
const MAX_RESULTS = parseInt(process.env.MAX_RESULTS) || 50;
|
||||
|
||||
/**
|
||||
* Main LinkedIn parser function
|
||||
*/
|
||||
async function startLinkedInParser(options = {}) {
|
||||
const coreParser = new CoreParser({
|
||||
headless: HEADLESS,
|
||||
timeout: 30000,
|
||||
});
|
||||
|
||||
try {
|
||||
logger.step("🚀 LinkedIn Parser Starting...");
|
||||
|
||||
// Validate credentials
|
||||
if (!LINKEDIN_USERNAME || !LINKEDIN_PASSWORD) {
|
||||
throw new Error(
|
||||
"LinkedIn credentials not found. Please set LINKEDIN_USERNAME and LINKEDIN_PASSWORD in .env file"
|
||||
);
|
||||
}
|
||||
|
||||
// Parse keywords
|
||||
const keywords = SEARCH_KEYWORDS.split(",").map((k) => k.trim());
|
||||
logger.info(`🔍 Search Keywords: ${keywords.join(", ")}`);
|
||||
logger.info(`📍 Location Filter: ${LOCATION_FILTER || "None"}`);
|
||||
logger.info(
|
||||
`🧠 AI Analysis: ${ENABLE_AI_ANALYSIS ? "Enabled" : "Disabled"}`
|
||||
);
|
||||
logger.info(`📊 Max Results: ${MAX_RESULTS}`);
|
||||
|
||||
// Run LinkedIn parsing strategy
|
||||
const parseResult = await linkedinStrategy(coreParser, {
|
||||
keywords,
|
||||
locationFilter: LOCATION_FILTER,
|
||||
maxResults: MAX_RESULTS,
|
||||
credentials: {
|
||||
username: LINKEDIN_USERNAME,
|
||||
password: LINKEDIN_PASSWORD,
|
||||
},
|
||||
});
|
||||
|
||||
const { results, rejectedResults, summary } = parseResult;
|
||||
|
||||
// AI Analysis if enabled - embed results into each post
|
||||
let resultsWithAI = results;
|
||||
let aiAnalysisCompleted = false;
|
||||
if (ENABLE_AI_ANALYSIS && results.length > 0) {
|
||||
logger.step("🧠 Running AI Analysis...");
|
||||
|
||||
const ollamaAvailable = await checkOllamaStatus(OLLAMA_MODEL);
|
||||
if (ollamaAvailable) {
|
||||
// Prepare data for analysis (analyzeBatch expects posts with 'text' field)
|
||||
const analysisData = results.map((post) => ({
|
||||
text: post.text || post.content || "",
|
||||
location: post.location || "",
|
||||
keyword: post.keyword || "",
|
||||
timestamp: post.timestamp || post.extractedAt || "",
|
||||
}));
|
||||
|
||||
const analysisResults = await analyzeBatch(
|
||||
analysisData,
|
||||
AI_CONTEXT,
|
||||
OLLAMA_MODEL
|
||||
);
|
||||
|
||||
// Embed AI analysis into each result
|
||||
resultsWithAI = results.map((post, index) => {
|
||||
const aiResult = analysisResults[index];
|
||||
return {
|
||||
...post,
|
||||
aiAnalysis: {
|
||||
isRelevant: aiResult.isRelevant,
|
||||
confidence: aiResult.confidence,
|
||||
reasoning: aiResult.reasoning,
|
||||
context: AI_CONTEXT,
|
||||
model: OLLAMA_MODEL,
|
||||
analyzedAt: new Date().toISOString(),
|
||||
},
|
||||
};
|
||||
});
|
||||
|
||||
aiAnalysisCompleted = true;
|
||||
logger.success(`✅ AI Analysis completed for ${results.length} posts`);
|
||||
} else {
|
||||
logger.warning("⚠️ Ollama not available, skipping AI analysis");
|
||||
}
|
||||
}
|
||||
|
||||
// Prepare results with embedded AI analysis
|
||||
const outputData = {
|
||||
metadata: {
|
||||
timestamp: new Date().toISOString(),
|
||||
totalPosts: resultsWithAI.length,
|
||||
rejectedPosts: rejectedResults.length,
|
||||
aiAnalysisEnabled: ENABLE_AI_ANALYSIS,
|
||||
aiAnalysisCompleted: aiAnalysisCompleted,
|
||||
aiContext: aiAnalysisCompleted ? AI_CONTEXT : undefined,
|
||||
aiModel: aiAnalysisCompleted ? OLLAMA_MODEL : undefined,
|
||||
locationFilter: LOCATION_FILTER || undefined,
|
||||
parser: "linkedin-parser",
|
||||
version: "2.0.0",
|
||||
},
|
||||
results: resultsWithAI,
|
||||
};
|
||||
|
||||
// Prepare rejected posts file
|
||||
const rejectedData = rejectedResults.map((post) => ({
|
||||
rejected: true,
|
||||
reason: post.rejectionReason || "Location filter failed: Location not in filter",
|
||||
keyword: post.keyword,
|
||||
text: post.text || post.content,
|
||||
profileLink: post.profileLink || post.authorUrl,
|
||||
location: post.location || post.profileLocation,
|
||||
timestamp: post.timestamp || post.extractedAt,
|
||||
}));
|
||||
|
||||
const resultsDir = path.join(__dirname, "results");
|
||||
if (!fs.existsSync(resultsDir)) {
|
||||
fs.mkdirSync(resultsDir, { recursive: true });
|
||||
}
|
||||
|
||||
const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
|
||||
const resultsFilename = `linkedin-results-${timestamp}.json`;
|
||||
const rejectedFilename = `linkedin-rejected-${timestamp}.json`;
|
||||
const resultsFilepath = path.join(resultsDir, resultsFilename);
|
||||
const rejectedFilepath = path.join(resultsDir, rejectedFilename);
|
||||
|
||||
// Save results with AI analysis
|
||||
fs.writeFileSync(resultsFilepath, JSON.stringify(outputData, null, 2));
|
||||
|
||||
// Save rejected posts separately
|
||||
if (rejectedData.length > 0) {
|
||||
fs.writeFileSync(
|
||||
rejectedFilepath,
|
||||
JSON.stringify(rejectedData, null, 2)
|
||||
);
|
||||
}
|
||||
|
||||
// Final summary
|
||||
logger.success("✅ LinkedIn parsing completed successfully!");
|
||||
logger.info(`📊 Total posts found: ${resultsWithAI.length}`);
|
||||
logger.info(`❌ Total rejected: ${rejectedResults.length}`);
|
||||
logger.info(`📁 Results saved to: ${resultsFilepath}`);
|
||||
if (rejectedData.length > 0) {
|
||||
logger.info(`📁 Rejected posts saved to: ${rejectedFilepath}`);
|
||||
}
|
||||
|
||||
return outputData;
|
||||
} catch (error) {
|
||||
logger.error(`❌ LinkedIn parser failed: ${error.message}`);
|
||||
throw error;
|
||||
} finally {
|
||||
await coreParser.cleanup();
|
||||
}
|
||||
}
|
||||
|
||||
// CLI handling
|
||||
if (require.main === module) {
|
||||
const args = process.argv.slice(2);
|
||||
const options = {};
|
||||
|
||||
// Parse command line arguments
|
||||
args.forEach((arg) => {
|
||||
if (arg.startsWith("--")) {
|
||||
const [key, value] = arg.slice(2).split("=");
|
||||
options[key] = value || true;
|
||||
}
|
||||
});
|
||||
|
||||
startLinkedInParser(options)
|
||||
.then(() => process.exit(0))
|
||||
.catch((error) => {
|
||||
console.error("Fatal error:", error.message);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
||||
module.exports = { startLinkedInParser };
|
||||
51
linkedin-parser/keywords/linkedin-keywords.csv
Normal file
51
linkedin-parser/keywords/linkedin-keywords.csv
Normal file
@ -0,0 +1,51 @@
|
||||
keyword
|
||||
acquisition
|
||||
actively seeking
|
||||
bankruptcy
|
||||
business realignment
|
||||
career transition
|
||||
company closure
|
||||
company reorganization
|
||||
cost cutting
|
||||
department closure
|
||||
downsizing
|
||||
furlough
|
||||
headcount reduction
|
||||
hiring
|
||||
hiring freeze
|
||||
involuntary separation
|
||||
job cuts
|
||||
job elimination
|
||||
job loss
|
||||
job opportunity
|
||||
job search
|
||||
layoff
|
||||
looking for opportunities
|
||||
mass layoff
|
||||
merger
|
||||
new position
|
||||
new role
|
||||
office closure
|
||||
open to work
|
||||
organizational change
|
||||
outplacement
|
||||
plant closure
|
||||
position elimination
|
||||
recruiting
|
||||
reduction in force
|
||||
redundancies
|
||||
redundancy
|
||||
restructuring
|
||||
rightsizing
|
||||
RIF
|
||||
role elimination
|
||||
separation
|
||||
site closure
|
||||
staff reduction
|
||||
terminated
|
||||
termination
|
||||
voluntary separation
|
||||
workforce adjustment
|
||||
workforce optimization
|
||||
workforce reduction
|
||||
workforce transition
|
||||
|
3705
linkedin-parser/package-lock.json
generated
Normal file
3705
linkedin-parser/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
42
linkedin-parser/package.json
Normal file
42
linkedin-parser/package.json
Normal file
@ -0,0 +1,42 @@
|
||||
{
|
||||
"name": "linkedout-parser",
|
||||
"version": "1.0.0",
|
||||
"description": "LinkedIn posts parser using ai-analyzer core",
|
||||
"main": "index.js",
|
||||
"scripts": {
|
||||
"start": "node index.js",
|
||||
"start:no-ai": "node index.js --no-ai",
|
||||
"start:headless": "node index.js --headless=true",
|
||||
"start:visible": "node index.js --headless=false",
|
||||
"start:no-location": "node index.js --no-location",
|
||||
"start:custom": "node index.js --keyword=\"layoff,downsizing\"",
|
||||
"test": "jest",
|
||||
"test:watch": "jest --watch",
|
||||
"test:coverage": "jest --coverage",
|
||||
"demo": "node demo.js",
|
||||
"analyze": "node ../ai-analyzer/cli.js --dir=results",
|
||||
"analyze:latest": "node ../ai-analyzer/cli.js --latest --dir=results",
|
||||
"analyze:layoff": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"layoff analysis\"",
|
||||
"analyze:trends": "node ../ai-analyzer/cli.js --latest --dir=results --context=\"job market trends\"",
|
||||
"help": "node index.js --help",
|
||||
"install:playwright": "npx playwright install chromium"
|
||||
},
|
||||
"keywords": [
|
||||
"linkedin",
|
||||
"parser",
|
||||
"scraper",
|
||||
"ai"
|
||||
],
|
||||
"author": "",
|
||||
"license": "ISC",
|
||||
"type": "commonjs",
|
||||
"dependencies": {
|
||||
"ai-analyzer": "file:../ai-analyzer",
|
||||
"core-parser": "file:../core-parser",
|
||||
"dotenv": "^17.0.0",
|
||||
"csv-parser": "^3.2.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"jest": "^29.0.0"
|
||||
}
|
||||
}
|
||||
366
linkedin-parser/strategies/linkedin-strategy.js
Normal file
366
linkedin-parser/strategies/linkedin-strategy.js
Normal file
@ -0,0 +1,366 @@
|
||||
/**
|
||||
* LinkedIn Parsing Strategy
|
||||
*
|
||||
* Uses core-parser for browser management and ai-analyzer for utilities
|
||||
*/
|
||||
|
||||
const {
|
||||
logger,
|
||||
cleanText,
|
||||
containsAnyKeyword,
|
||||
validateLocationAgainstFilters,
|
||||
extractLocationFromProfile,
|
||||
} = require("ai-analyzer");
|
||||
|
||||
/**
|
||||
* LinkedIn parsing strategy function
|
||||
*/
|
||||
async function linkedinStrategy(coreParser, options = {}) {
|
||||
const {
|
||||
keywords = ["layoff", "downsizing", "job cuts"],
|
||||
locationFilter = null,
|
||||
maxResults = 50,
|
||||
credentials = {},
|
||||
} = options;
|
||||
|
||||
const results = [];
|
||||
const rejectedResults = [];
|
||||
const seenPosts = new Set();
|
||||
const seenProfiles = new Set();
|
||||
|
||||
try {
|
||||
// Create main page
|
||||
const page = await coreParser.createPage("linkedin-main");
|
||||
|
||||
// Authenticate to LinkedIn
|
||||
logger.info("🔐 Authenticating to LinkedIn...");
|
||||
await coreParser.authenticate("linkedin", credentials, "linkedin-main");
|
||||
logger.info("✅ LinkedIn authentication successful");
|
||||
|
||||
// Search for posts with each keyword
|
||||
for (const keyword of keywords) {
|
||||
logger.info(`🔍 Searching LinkedIn for: "${keyword}"`);
|
||||
|
||||
const searchUrl = `https://www.linkedin.com/search/results/content/?keywords=${encodeURIComponent(
|
||||
keyword
|
||||
)}&sortBy=date_posted`;
|
||||
|
||||
await coreParser.navigateTo(searchUrl, {
|
||||
pageId: "linkedin-main",
|
||||
retries: 2,
|
||||
});
|
||||
|
||||
// Wait for page to load - use delay utility instead of waitForTimeout
|
||||
await new Promise(resolve => setTimeout(resolve, 3000)); // Give LinkedIn time to render
|
||||
|
||||
// Wait for search results - try multiple selectors
|
||||
let hasResults = false;
|
||||
const possibleSelectors = [
|
||||
".search-results-container",
|
||||
".search-results__list",
|
||||
".reusable-search__result-container",
|
||||
"[data-test-id='search-results']",
|
||||
".feed-shared-update-v2",
|
||||
"article",
|
||||
];
|
||||
|
||||
for (const selector of possibleSelectors) {
|
||||
try {
|
||||
await page.waitForSelector(selector, { timeout: 5000 });
|
||||
hasResults = true;
|
||||
logger.info(`✅ Found results container with selector: ${selector}`);
|
||||
break;
|
||||
} catch (e) {
|
||||
// Try next selector
|
||||
}
|
||||
}
|
||||
|
||||
if (!hasResults) {
|
||||
logger.warning(`⚠️ No search results container found for keyword: ${keyword}`);
|
||||
// Take screenshot for debugging
|
||||
try {
|
||||
const screenshotPath = `debug-${keyword.replace(/\s+/g, '-')}-${Date.now()}.png`;
|
||||
await page.screenshot({ path: screenshotPath, fullPage: true });
|
||||
logger.info(`📸 Debug screenshot saved: ${screenshotPath}`);
|
||||
} catch (e) {
|
||||
logger.warning(`Could not take screenshot: ${e.message}`);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// Extract posts from current page
|
||||
const posts = await extractPostsFromPage(page, keyword);
|
||||
logger.info(`📊 Found ${posts.length} posts for keyword "${keyword}"`);
|
||||
|
||||
for (const post of posts) {
|
||||
// Skip duplicates
|
||||
if (seenPosts.has(post.postId)) continue;
|
||||
seenPosts.add(post.postId);
|
||||
|
||||
// Validate location if filtering enabled
|
||||
if (locationFilter) {
|
||||
const postLocation = post.location || post.profileLocation || "";
|
||||
const locationValid = validateLocationAgainstFilters(
|
||||
postLocation,
|
||||
locationFilter
|
||||
);
|
||||
|
||||
if (!locationValid) {
|
||||
logger.debug(`⏭️ Post rejected: location "${postLocation}" doesn't match filter "${locationFilter}"`);
|
||||
rejectedResults.push({
|
||||
...post,
|
||||
rejectionReason: `Location filter mismatch: "${postLocation}" not in "${locationFilter}"`,
|
||||
});
|
||||
continue;
|
||||
} else {
|
||||
logger.debug(`✅ Post location "${postLocation}" matches filter "${locationFilter}"`);
|
||||
}
|
||||
}
|
||||
|
||||
results.push(post);
|
||||
|
||||
if (results.length >= maxResults) {
|
||||
logger.info(`📊 Reached maximum results limit: ${maxResults}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (results.length >= maxResults) break;
|
||||
}
|
||||
|
||||
logger.info(
|
||||
`🎯 LinkedIn parsing completed: ${results.length} posts found, ${rejectedResults.length} rejected`
|
||||
);
|
||||
|
||||
return {
|
||||
results,
|
||||
rejectedResults,
|
||||
summary: {
|
||||
totalPosts: results.length,
|
||||
totalRejected: rejectedResults.length,
|
||||
keywords: keywords.join(", "),
|
||||
locationFilter,
|
||||
},
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`❌ LinkedIn parsing failed: ${error.message}`);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract posts from current search results page
|
||||
*/
|
||||
async function extractPostsFromPage(page, keyword) {
|
||||
const posts = [];
|
||||
|
||||
try {
|
||||
// Try multiple selectors for post elements (LinkedIn changes these frequently)
|
||||
const postSelectors = [
|
||||
".feed-shared-update-v2",
|
||||
"article.feed-shared-update-v2",
|
||||
"[data-urn*='urn:li:activity']",
|
||||
".reusable-search__result-container",
|
||||
".search-result__wrapper",
|
||||
"article",
|
||||
];
|
||||
|
||||
let postElements = [];
|
||||
let usedSelector = null;
|
||||
|
||||
for (const selector of postSelectors) {
|
||||
try {
|
||||
postElements = await page.$$(selector);
|
||||
if (postElements.length > 0) {
|
||||
usedSelector = selector;
|
||||
logger.info(`✅ Found ${postElements.length} post elements using selector: ${selector}`);
|
||||
break;
|
||||
}
|
||||
} catch (e) {
|
||||
// Try next selector
|
||||
}
|
||||
}
|
||||
|
||||
if (postElements.length === 0) {
|
||||
logger.warning(`⚠️ No post elements found with any selector. Page might have different structure.`);
|
||||
// Log page title and URL for debugging
|
||||
try {
|
||||
const pageTitle = await page.title();
|
||||
const pageUrl = page.url();
|
||||
logger.info(`📄 Page title: ${pageTitle}`);
|
||||
logger.info(`🔗 Page URL: ${pageUrl}`);
|
||||
} catch (e) {
|
||||
// Ignore
|
||||
}
|
||||
return posts;
|
||||
}
|
||||
|
||||
logger.info(`🔍 Processing ${postElements.length} post elements...`);
|
||||
|
||||
for (let i = 0; i < postElements.length; i++) {
|
||||
try {
|
||||
const post = await extractPostData(postElements[i], keyword);
|
||||
if (post) {
|
||||
posts.push(post);
|
||||
logger.debug(`✅ Extracted post ${i + 1}/${postElements.length}: ${post.postId.substring(0, 20)}...`);
|
||||
} else {
|
||||
logger.debug(`⏭️ Post ${i + 1}/${postElements.length} filtered out (no keyword match or missing data)`);
|
||||
}
|
||||
} catch (error) {
|
||||
logger.warning(`❌ Failed to extract post ${i + 1} data: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
logger.info(`✅ Successfully extracted ${posts.length} valid posts from ${postElements.length} elements`);
|
||||
} catch (error) {
|
||||
logger.error(`❌ Failed to extract posts from page: ${error.message}`);
|
||||
logger.error(`Stack trace: ${error.stack}`);
|
||||
}
|
||||
|
||||
return posts;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract data from individual post element
|
||||
*/
|
||||
async function extractPostData(postElement, keyword) {
|
||||
try {
|
||||
// Extract post ID
|
||||
const postId = (await postElement.getAttribute("data-urn")) || "";
|
||||
|
||||
// Extract author info
|
||||
const authorElement = await postElement.$(".feed-shared-actor__name");
|
||||
const authorName = authorElement
|
||||
? cleanText(await authorElement.textContent())
|
||||
: "";
|
||||
|
||||
const authorLinkElement = await postElement.$(".feed-shared-actor__name a");
|
||||
const authorUrl = authorLinkElement
|
||||
? await authorLinkElement.getAttribute("href")
|
||||
: "";
|
||||
|
||||
// Extract post content
|
||||
const contentElement = await postElement.$(".feed-shared-text");
|
||||
const content = contentElement
|
||||
? cleanText(await contentElement.textContent())
|
||||
: "";
|
||||
|
||||
// Extract timestamp
|
||||
const timeElement = await postElement.$(
|
||||
".feed-shared-actor__sub-description time"
|
||||
);
|
||||
const timestamp = timeElement
|
||||
? await timeElement.getAttribute("datetime")
|
||||
: "";
|
||||
|
||||
// Extract location from profile (try multiple selectors)
|
||||
let location = "";
|
||||
const locationSelectors = [
|
||||
".feed-shared-actor__sub-description .feed-shared-actor__sub-description-link",
|
||||
".feed-shared-actor__sub-description .feed-shared-actor__sub-description-link--without-hover",
|
||||
".feed-shared-actor__sub-description span[aria-label*='location']",
|
||||
".feed-shared-actor__sub-description span[aria-label*='Location']",
|
||||
];
|
||||
|
||||
for (const selector of locationSelectors) {
|
||||
try {
|
||||
const locationElement = await postElement.$(selector);
|
||||
if (locationElement) {
|
||||
const locationText = await locationElement.textContent();
|
||||
if (locationText && locationText.trim()) {
|
||||
location = cleanText(locationText);
|
||||
break;
|
||||
}
|
||||
}
|
||||
} catch (e) {
|
||||
// Try next selector
|
||||
}
|
||||
}
|
||||
|
||||
// If no location found in sub-description, try to extract from author link hover or profile
|
||||
if (!location) {
|
||||
try {
|
||||
// Try to get location from data attributes or other sources
|
||||
const subDescElement = await postElement.$(".feed-shared-actor__sub-description");
|
||||
if (subDescElement) {
|
||||
const subDescText = await subDescElement.textContent();
|
||||
// Look for location patterns (City, Province/State, Country)
|
||||
const locationMatch = subDescText.match(/([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*),\s*([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)(?:,\s*([A-Z][a-z]+))?/);
|
||||
if (locationMatch) {
|
||||
location = cleanText(locationMatch[0]);
|
||||
}
|
||||
}
|
||||
} catch (e) {
|
||||
// Location extraction failed, continue without it
|
||||
}
|
||||
}
|
||||
|
||||
// Extract engagement metrics
|
||||
const likesElement = await postElement.$(".social-counts-reactions__count");
|
||||
const likesText = likesElement
|
||||
? cleanText(await likesElement.textContent())
|
||||
: "0";
|
||||
|
||||
const commentsElement = await postElement.$(
|
||||
".social-counts-comments__count"
|
||||
);
|
||||
const commentsText = commentsElement
|
||||
? cleanText(await commentsElement.textContent())
|
||||
: "0";
|
||||
|
||||
// Note: LinkedIn search already filters by keyword semantically
|
||||
// We don't filter by content keyword match because:
|
||||
// 1. LinkedIn's search is semantic - it finds related posts, not just exact matches
|
||||
// 2. The keyword might be in comments, hashtags, or metadata, not visible text
|
||||
// 3. Posts might be about the topic without using the exact keyword
|
||||
//
|
||||
// Optional: Log if keyword appears in content (for debugging, but don't filter)
|
||||
const keywordLower = keyword.toLowerCase();
|
||||
const contentLower = content.toLowerCase();
|
||||
const hasKeywordInContent = contentLower.includes(keywordLower);
|
||||
if (!hasKeywordInContent && content.length > 50) {
|
||||
logger.debug(`ℹ️ Post doesn't contain keyword "${keyword}" in visible content, but including it (LinkedIn search matched it)`);
|
||||
}
|
||||
|
||||
// Validate we have minimum required data
|
||||
if (!postId && !content) {
|
||||
logger.debug(`⏭️ Post filtered: missing both postId and content`);
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
postId: cleanText(postId),
|
||||
authorName,
|
||||
authorUrl,
|
||||
profileLink: authorUrl ? (authorUrl.startsWith("http") ? authorUrl : `https://www.linkedin.com${authorUrl}`) : "",
|
||||
text: content,
|
||||
content: content,
|
||||
location: location,
|
||||
profileLocation: location, // Alias for compatibility
|
||||
timestamp,
|
||||
keyword,
|
||||
likes: extractNumber(likesText),
|
||||
comments: extractNumber(commentsText),
|
||||
extractedAt: new Date().toISOString(),
|
||||
source: "linkedin",
|
||||
parser: "linkedout-parser",
|
||||
};
|
||||
} catch (error) {
|
||||
logger.warning(`Error extracting post data: ${error.message}`);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract numbers from text (e.g., "15 likes" -> 15)
|
||||
*/
|
||||
function extractNumber(text) {
|
||||
const match = text.match(/\d+/);
|
||||
return match ? parseInt(match[0]) : 0;
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
linkedinStrategy,
|
||||
extractPostsFromPage,
|
||||
extractPostData,
|
||||
};
|
||||
3731
package-lock.json
generated
Normal file
3731
package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
54
package.json
Normal file
54
package.json
Normal file
@ -0,0 +1,54 @@
|
||||
{
|
||||
"name": "job-market-intelligence",
|
||||
"version": "1.0.0",
|
||||
"description": "Job Market Intelligence Platform - Modular parsers for comprehensive job market insights with built-in AI analysis",
|
||||
"main": "linkedin-parser/index.js",
|
||||
"scripts": {
|
||||
"test": "node test/all-tests.js",
|
||||
"test:location-utils": "node test/location-utils.test.js",
|
||||
"test:ai-analyzer": "node test/ai-analyzer.test.js",
|
||||
"demo": "node demo.js",
|
||||
"demo:ai-analyzer": "node ai-analyzer/demo.js",
|
||||
"demo:linkedin-parser": "node linkedin-parser/demo.js",
|
||||
"demo:job-search-parser": "node job-search-parser/demo.js",
|
||||
"demo:all": "npm run demo && npm run demo:ai-analyzer && npm run demo:linkedin-parser && npm run demo:job-search-parser",
|
||||
"start": "node linkedin-parser/index.js",
|
||||
"start:linkedin": "node linkedin-parser/index.js",
|
||||
"start:jobs": "node job-search-parser/index.js",
|
||||
"start:linkedin-no-ai": "node linkedin-parser/index.js --no-ai",
|
||||
"install:playwright": "npx playwright install chromium"
|
||||
},
|
||||
"keywords": [
|
||||
"job-market",
|
||||
"intelligence",
|
||||
"linkedin",
|
||||
"scraper",
|
||||
"ai-analysis",
|
||||
"data-intelligence",
|
||||
"market-research",
|
||||
"automation",
|
||||
"playwright",
|
||||
"ollama",
|
||||
"openai"
|
||||
],
|
||||
"author": "Job Market Intelligence Team",
|
||||
"license": "ISC",
|
||||
"type": "commonjs",
|
||||
"dependencies": {
|
||||
"ai-analyzer": "file:./ai-analyzer",
|
||||
"core-parser": "file:./core-parser",
|
||||
"csv-parser": "^3.2.0",
|
||||
"dotenv": "^17.0.0"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=18.0.0"
|
||||
},
|
||||
"repository": {
|
||||
"type": "git",
|
||||
"url": "https://github.com/your-username/job-market-intelligence.git"
|
||||
},
|
||||
"bugs": {
|
||||
"url": "https://github.com/your-username/job-market-intelligence/issues"
|
||||
},
|
||||
"homepage": "https://github.com/your-username/job-market-intelligence#readme"
|
||||
}
|
||||
34
sample-data.json
Normal file
34
sample-data.json
Normal file
@ -0,0 +1,34 @@
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"text": "Just got laid off from my software engineering role. Looking for new opportunities in the Toronto area.",
|
||||
"location": "Toronto, Ontario, Canada",
|
||||
"keyword": "layoff",
|
||||
"timestamp": "2024-01-15T10:30:00Z"
|
||||
},
|
||||
{
|
||||
"text": "Excited to share that I'm starting a new position as a Senior Developer at TechCorp!",
|
||||
"location": "Vancouver, BC, Canada",
|
||||
"keyword": "hiring",
|
||||
"timestamp": "2024-01-15T11:00:00Z"
|
||||
},
|
||||
{
|
||||
"text": "Our company is going through a restructuring and unfortunately had to let go of 50 employees.",
|
||||
"location": "Montreal, Quebec, Canada",
|
||||
"keyword": "layoff",
|
||||
"timestamp": "2024-01-15T11:30:00Z"
|
||||
},
|
||||
{
|
||||
"text": "Beautiful weather today! Perfect for a walk in the park.",
|
||||
"location": "Calgary, Alberta, Canada",
|
||||
"keyword": "weather",
|
||||
"timestamp": "2024-01-15T12:00:00Z"
|
||||
},
|
||||
{
|
||||
"text": "We're hiring! Looking for talented developers to join our growing team.",
|
||||
"location": "Ottawa, Ontario, Canada",
|
||||
"keyword": "hiring",
|
||||
"timestamp": "2024-01-15T12:30:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
80
test/ai-analyzer.test.js
Normal file
80
test/ai-analyzer.test.js
Normal file
@ -0,0 +1,80 @@
|
||||
const fs = require("fs");
|
||||
const assert = require("assert");
|
||||
const { analyzeSinglePost, checkOllamaStatus } = require("../ai-analyzer");
|
||||
|
||||
console.log("AI Analyzer logic tests");
|
||||
|
||||
const testData = JSON.parse(
|
||||
fs.readFileSync(__dirname + "/test-data.json", "utf-8")
|
||||
);
|
||||
const aiResults = testData.positive;
|
||||
const context = "job layoffs and workforce reduction";
|
||||
const model = process.env.OLLAMA_MODEL || "mistral"; // Use OLLAMA_MODEL from env or default to mistral
|
||||
|
||||
(async () => {
|
||||
// Check if Ollama is available
|
||||
const ollamaAvailable = await checkOllamaStatus(model);
|
||||
if (!ollamaAvailable) {
|
||||
console.log("SKIP: Ollama not available - skipping AI analyzer tests");
|
||||
console.log("PASS: AI analyzer tests skipped (Ollama not running)");
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`Testing AI analyzer with ${aiResults.length} posts...`);
|
||||
|
||||
for (let i = 0; i < aiResults.length; i++) {
|
||||
const post = aiResults[i];
|
||||
console.log(`Testing post ${i + 1}: "${post.text.substring(0, 50)}..."`);
|
||||
|
||||
const aiOutput = await analyzeSinglePost(post.text, context, model);
|
||||
|
||||
// Test that the function returns the expected structure
|
||||
assert(
|
||||
typeof aiOutput === "object" && aiOutput !== null,
|
||||
`Post ${i} output is not an object`
|
||||
);
|
||||
|
||||
assert(
|
||||
typeof aiOutput.isRelevant === "boolean",
|
||||
`Post ${i} isRelevant is not a boolean: ${typeof aiOutput.isRelevant}`
|
||||
);
|
||||
|
||||
assert(
|
||||
typeof aiOutput.confidence === "number",
|
||||
`Post ${i} confidence is not a number: ${typeof aiOutput.confidence}`
|
||||
);
|
||||
|
||||
assert(
|
||||
typeof aiOutput.reasoning === "string",
|
||||
`Post ${i} reasoning is not a string: ${typeof aiOutput.reasoning}`
|
||||
);
|
||||
|
||||
// Test that confidence is within valid range
|
||||
assert(
|
||||
aiOutput.confidence >= 0 && aiOutput.confidence <= 1,
|
||||
`Post ${i} confidence out of range: ${aiOutput.confidence} (should be 0-1)`
|
||||
);
|
||||
|
||||
// Test that reasoning exists and is not empty
|
||||
assert(
|
||||
aiOutput.reasoning && aiOutput.reasoning.length > 0,
|
||||
`Post ${i} missing or empty reasoning`
|
||||
);
|
||||
|
||||
// Test that relevance is a boolean value
|
||||
assert(
|
||||
aiOutput.isRelevant === true || aiOutput.isRelevant === false,
|
||||
`Post ${i} isRelevant is not a valid boolean: ${aiOutput.isRelevant}`
|
||||
);
|
||||
|
||||
console.log(
|
||||
` ✓ Post ${i + 1}: relevant=${aiOutput.isRelevant}, confidence=${
|
||||
aiOutput.confidence
|
||||
}`
|
||||
);
|
||||
}
|
||||
|
||||
console.log(
|
||||
"PASS: AI analyzer returns valid structure and values for all test posts."
|
||||
);
|
||||
})();
|
||||
15
test/all-tests.js
Normal file
15
test/all-tests.js
Normal file
@ -0,0 +1,15 @@
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
|
||||
const testDir = path.join(__dirname);
|
||||
const files = fs.readdirSync(testDir);
|
||||
|
||||
console.log("Running all tests...");
|
||||
|
||||
files.forEach((file) => {
|
||||
if (file === "all-tests.js" || !file.endsWith(".js")) return;
|
||||
console.log(`\n--- Running ${file} ---`);
|
||||
require(path.join(testDir, file));
|
||||
});
|
||||
|
||||
console.log("\nAll tests complete.");
|
||||
65
test/location-utils.test.js
Normal file
65
test/location-utils.test.js
Normal file
@ -0,0 +1,65 @@
|
||||
const assert = require("assert");
|
||||
const {
|
||||
parseLocationFilters,
|
||||
validateLocationAgainstFilters,
|
||||
} = require("../ai-analyzer");
|
||||
|
||||
console.log("Location Utils tests");
|
||||
|
||||
// Test parseLocationFilters
|
||||
const filters = parseLocationFilters("Ontario,Manitoba");
|
||||
assert.deepStrictEqual(
|
||||
filters,
|
||||
["ontario", "manitoba"],
|
||||
"parseLocationFilters failed"
|
||||
);
|
||||
console.log("PASS: parseLocationFilters works");
|
||||
|
||||
// Test validateLocationAgainstFilters positive
|
||||
let result = validateLocationAgainstFilters("Toronto, Ontario, Canada", [
|
||||
"ontario",
|
||||
]);
|
||||
assert(result.isValid, "Ontario should match Toronto, Ontario, Canada");
|
||||
|
||||
result = validateLocationAgainstFilters("Toronto, Ontario, Canada", [
|
||||
"toronto",
|
||||
]);
|
||||
assert(result.isValid, "Toronto should match Toronto, Ontario, Canada");
|
||||
|
||||
// Negative test
|
||||
result = validateLocationAgainstFilters("Vancouver, BC, Canada", ["ontario"]);
|
||||
assert(!result.isValid, "Vancouver should not match Ontario");
|
||||
|
||||
// More positive cases
|
||||
result = validateLocationAgainstFilters("Winnipeg, Manitoba, Canada", [
|
||||
"manitoba",
|
||||
]);
|
||||
assert(result.isValid, "Manitoba should match Winnipeg, Manitoba, Canada");
|
||||
|
||||
result = validateLocationAgainstFilters("Calgary, Alberta, Canada", [
|
||||
"alberta",
|
||||
]);
|
||||
assert(result.isValid, "Alberta should match Calgary, Alberta, Canada");
|
||||
|
||||
result = validateLocationAgainstFilters("Vancouver, BC, Canada", ["bc"]);
|
||||
assert(result.isValid, "BC should match Vancouver, BC, Canada");
|
||||
|
||||
result = validateLocationAgainstFilters("Montreal, Quebec, Canada", ["quebec"]);
|
||||
assert(result.isValid, "Quebec should match Montreal, Quebec, Canada");
|
||||
|
||||
result = validateLocationAgainstFilters("Halifax, NS, Canada", [
|
||||
"nova scotia",
|
||||
"ns",
|
||||
]);
|
||||
assert(result.isValid, "NS/Nova Scotia should match Halifax, NS, Canada");
|
||||
|
||||
// Negative edge cases
|
||||
result = validateLocationAgainstFilters("Seattle, Washington, USA", [
|
||||
"ontario",
|
||||
"manitoba",
|
||||
]);
|
||||
assert(!result.isValid, "Seattle should not match Ontario or Manitoba");
|
||||
|
||||
result = validateLocationAgainstFilters("Ottawa, Ontario, Canada", ["quebec"]);
|
||||
assert(!result.isValid, "Ottawa, Ontario should not match Quebec");
|
||||
console.log("PASS: validateLocationAgainstFilters positive/negative cases");
|
||||
83
test/test-data.json
Normal file
83
test/test-data.json
Normal file
@ -0,0 +1,83 @@
|
||||
{
|
||||
"positive": [
|
||||
{
|
||||
"keyword": "layoff",
|
||||
"text": "I'm working to report on the recent game industry layoffs and I'm hoping to connect with anyone in connected to or impacted by the recent mass layoffs at . please feel free to contact me either here or anonymously by email, rostad@postmedia.com",
|
||||
"profileLink": "https://www.linkedin.com/in/raminostad",
|
||||
"aiRelevant": true,
|
||||
"aiConfidence": 1
|
||||
},
|
||||
{
|
||||
"keyword": "layoff",
|
||||
"text": "Tariff Impacts on the Canadian Job Market – Mid‑2025 What’s Happening? U.S. tariffs on Canadian exports — Especially steel, aluminum, electric vehicles (EVs) and certain manufacturing goods — are starting to bite. Affected Sectors & Job Impacts Manufacturing & Export JobsCompanies facing higher costs to sell into the U.S.⬅️ Reduced exports → layoffs and production cuts❗ Manufacturing jobs are down, especially in Ontario & Quebec Auto & EV SectorU.S. tariffs targeting EV components made in CanadaSlower demand → production delays → fewer shifts & temp workersAffects job stability in places like Windsor, Oshawa & parts of Alberta🧊 Ripple EffectSupply chain disruptions → job slowdowns in trucking, warehousing, logisticsSmall manufacturers face pressure → reduced hiring & capital investment The Data Unemployment in Canada (May 2025):⬆️ 7.0% — highest since 2021 Youth Unemployment:14.2% — especially impacted by reduced seasonal & temp roles Job Loss Hotspots:Manufacturing-heavy provincesExport-dependent citiesContract workers most affected Some Resilient Areas Finance, Real Estate, and Health CareStill hiringNot export-dependentLess impacted by tariffs Canada’s job market is feeling the heat from international tariffs. While some industries stay strong, others — like manufacturing and exports — are seeing real pressure. If you're in a vulnerable sector, now’s the time to reskill, explore service-based roles, or seek contract flexibility. Let me know what you're seeing in your city or industry. Stay informed, stay adaptable!",
|
||||
"profileLink": "https://www.linkedin.com/in/sammy-aggarwal-pcp-director-practice-principal-8785702a4",
|
||||
"aiRelevant": true,
|
||||
"aiConfidence": 1
|
||||
},
|
||||
{
|
||||
"keyword": "termination",
|
||||
"text": "Hospital Worker’s Termination for Misconduct Overturned by Alberta Arbitration BoardThe arbitration panel found that the employee’s termination could be overturned on two separate grounds:1) the termination was an excessive response to the employee’s conduct, and2) the employer failed to establish that it could not accommodate the employee.With regard to the grievor’s conduct, the arbitration panel found that:the employee’s interaction with the admitting clerk was inappropriate, but it was a very minor incident which would never be a ground for termination;the employee had removed his mask to take a drink while on a break, which was not misconduct or cause for discipline; andthe employer’s allegation that the employee was disrespectful at the investigation meeting was not supported by the evidence at the hearing. execution is essential in serving business owners. Communication paramount to the process of offering extended health benefits. This means that sometimes getting to “DONE” is as simple as coffee & conversation. ☕️How about we have a chat… 🫱 Humans serving Humans",
|
||||
"profileLink": "https://www.linkedin.com/in/lori-power-1850a0a",
|
||||
"aiRelevant": false,
|
||||
"aiConfidence": 0.5
|
||||
},
|
||||
{
|
||||
"keyword": "termination",
|
||||
"text": "Not my favourite part being a Fractional HR on Demand... but one of the most critical - TERMINATIONS This month alone, I’ve supported 4 employee terminations.Over my career? 300+, including for cause, with zero litigation.That’s not a fluke—it’s expert HR, done right.Terminations are high-stakes. One poorly chosen word. One emotional reaction. One misstep—and suddenly, you’re exposed legally, financially, or reputationally.Here’s how I help companies protect their people and their business:✅ I draft clear, compliant termination letters✅ I coach the manager on what to say—and what not to say✅ I guide the termination meeting so it’s respectful, legal, and professional✅ I stay present with the employee until they’re ready to leave with dignity✅ I ensure company information, data, and property are protected as they collect their belongingsMost leaders aren’t trying to do things wrong—but when emotions run high, even good intentions can create legal and cultural fallout.That’s where I come in.My job is to have your back—and make sure the exit is done cleanly, compassionately, and in full compliance with Alberta Labour Law.But it doesn’t end there.Many companies choose to go a step further—and invest in career coaching for the departing employee.Having hired 2,300+ people with a 94% retention rate, I know what hiring managers are looking for, and I give employees the exact insights and tools to land their next opportunity faster.A difficult ending can still lead to a fresh start. I help make that possible. DM me for a free 20-minute consultation on how to protect your company and support your people—when it matters most. patty@millernet.cahashtag MillerNet HR & Business Solutions Inc.",
|
||||
"profileLink": "https://www.linkedin.com/in/millernethr",
|
||||
"aiRelevant": true,
|
||||
"aiConfidence": 0.9
|
||||
},
|
||||
{
|
||||
"keyword": "termination",
|
||||
"text": "Career Coaches & Counselors: This Needs to STOP.It bothers me that multiple newcomer clients in Calgary join my coaching program and bring resumes containing roles or experience they've NEVER had before, like warehouse experience and forklift operation.From conversations I discover these untruthful details and when I ask them why these fabrications are in their resume, the answer is often: \"This counselor for that immigrant-serving agency created it for me.\"Does this not concern you? Adding false information to a client’s resume may seem like “helping” them land a job. But in truth, it can ~ Jeopardize their job performance and safety,~ Undermine their confidence when asked to perform tasks they’ve never done,~ And worse, risk termination and reputational damage. We are here to empower and educate, with integrity. Not mislead.Let’s support clients by helping them:* Identify real, transferable skills,* Tell their authentic stories, and* Grow careers based on honesty and strength.Newcomers deserve ethical guidance. This starts with us.Have you seen anything similar? How do you think can we stop this? Alberta Association of Immigrant Serving Agencies (AAISA) TIES (The Immigrant Education Society)",
|
||||
"profileLink": "https://www.linkedin.com/in/gegajo",
|
||||
"aiRelevant": true,
|
||||
"aiConfidence": 0.7
|
||||
}
|
||||
],
|
||||
"negative": [
|
||||
{
|
||||
"rejected": true,
|
||||
"reason": "No profile link",
|
||||
"keyword": "layoff",
|
||||
"text": "Your Employment Rights Q&AJoin us live and get some answers to your workplace questions! Discover the truth about your employment rights and find out more about terminations, layoffs, severance entitlements, workplace discrimination and more. Join Canadian employment lawyer, Jeremy Herman, for a LIVE discussion on employment law.July 3, 2025 at 2:00pm ET. our live session? Join us weekly and get your questions answered live. Help@EmploymentLawyer.ca ️1-855-821-5900 (Toll-free)------Twitter: ",
|
||||
"profileLink": ""
|
||||
},
|
||||
{
|
||||
"rejected": true,
|
||||
"reason": "Keyword not present",
|
||||
"keyword": "layoff",
|
||||
"text": "Alberta Tech has the best career advice, as usual.",
|
||||
"profileLink": "https://www.linkedin.com/in/bleper"
|
||||
},
|
||||
{
|
||||
"rejected": true,
|
||||
"reason": "Location filter failed: Location 'Canada' does not match any of: ontario, alberta",
|
||||
"keyword": "terminated",
|
||||
"text": "While Section 23 of the Workers’ Compensation Act (WCA) provides that no action lies in respect of injuries arising out of and in the course of employment when compensation is payable, this statutory bar is not absolute and must be interpreted narrowly in light of established legal principles...",
|
||||
"profileLink": "https://www.linkedin.com/in/alan-penaverde-bscpt-rtwdm-a1a574355"
|
||||
},
|
||||
{
|
||||
"rejected": true,
|
||||
"reason": "Duplicate post",
|
||||
"keyword": "termination",
|
||||
"text": "Your Employment Rights Q&AJoin us live and get some answers to your workplace questions!...",
|
||||
"profileLink": null
|
||||
},
|
||||
{
|
||||
"rejected": true,
|
||||
"reason": "No text",
|
||||
"keyword": "separation",
|
||||
"text": "",
|
||||
"profileLink": null
|
||||
},
|
||||
{
|
||||
"rejected": true,
|
||||
"reason": "Keyword not present",
|
||||
"keyword": "job cuts",
|
||||
"text": "Not one single Canadian public service job will cut, in fact, I bet over the next 4 years under PM Carney the Federal Government civil service will grow by about 8 to 10% per year, as it has for the last 10 years.",
|
||||
"profileLink": "https://www.linkedin.com/in/gary-kalynchuk-p-eng-conservative-alberta-separatist-402a9011"
|
||||
}
|
||||
]
|
||||
}
|
||||
Loading…
x
Reference in New Issue
Block a user