linkedout/README.md
tanyar09 8de65bc04c Add initial project structure for Job Market Intelligence platform
- Created core modules: `ai-analyzer`, `core-parser`, and `job-search-parser`.
- Implemented LinkedIn and job search parsers with integrated AI analysis.
- Added CLI tools for AI analysis and job parsing.
- Included comprehensive README files for each module detailing usage and features.
- Established a `.gitignore` file to exclude unnecessary files.
- Introduced sample data for testing and demonstration purposes.
- Set up package.json files for dependency management across modules.
- Implemented logging and error handling utilities for better debugging and user feedback.
2025-12-12 14:23:01 -05:00

407 lines
11 KiB
Markdown

# Job Market Intelligence Platform
A comprehensive platform for job market intelligence with **integrated AI-powered insights**. Built with modular architecture for extensibility and maintainability.
## 🏗️ Architecture Overview
```
job-market-intelligence/
├── ai-analyzer/ # Shared core utilities (logger, AI, location, text) + CLI tool
├── linkedin-parser/ # LinkedIn-specific scraper with integrated AI analysis
├── job-search-parser/ # Job search intelligence
└── docs/ # Documentation
```
## 🚀 Quick Start
### Prerequisites
- Node.js 18+
- Playwright browser automation
- LinkedIn account credentials
- Optional: Ollama for local AI analysis
### Installation
```bash
npm install
npx playwright install chromium
```
### Basic Usage
```bash
# Run LinkedIn parser with integrated AI analysis
cd linkedin-parser && npm start
# Run LinkedIn parser with specific keywords
cd linkedin-parser && npm run start:custom
# Run LinkedIn parser without AI analysis
cd linkedin-parser && npm run start:no-ai
# Run job search parser
cd job-search-parser && npm start
# Analyze existing results with AI (CLI)
cd linkedin-parser && npm run analyze:latest
# Analyze with custom context
cd linkedin-parser && npm run analyze:layoff
# Run demo workflow
node demo.js
```
## 📦 Core Components
### 1. AI Analyzer (`ai-analyzer/`)
**Shared utilities and CLI tool used by all parsers**
- **Logger**: Consistent logging across all components
- **Text Processing**: Keyword matching, text cleaning
- **Location Validation**: Geographic filtering and validation
- **AI Integration**: Local Ollama support with integrated analysis
- **CLI Tool**: Command-line interface for standalone AI analysis
- **Test Utilities**: Shared testing helpers
**Key Features:**
- Configurable log levels with color support
- Intelligent text processing and keyword matching
- Geographic location validation against filters
- **Integrated AI analysis**: AI results embedded in data structure
- **CLI tool**: Standalone analysis with flexible options
- Comprehensive test coverage
### 2. LinkedIn Parser (`linkedin-parser/`)
**Specialized LinkedIn content scraper with integrated AI analysis**
- Automated LinkedIn login and navigation
- Keyword-based post searching
- Profile location validation
- Duplicate detection and filtering
- **Automatic AI analysis integrated into results**
- Configurable search parameters
**Key Features:**
- Browser automation with Playwright
- Geographic filtering by city/region
- Date range filtering (24h, week, month)
- **Integrated AI-powered content relevance analysis**
- **Single JSON output with embedded AI insights**
- **Two output files: results (with AI) and rejected posts**
### 3. Job Search Parser (`job-search-parser/`)
**Job market intelligence and analysis**
- Job posting aggregation
- Role-specific keyword tracking
- Market trend analysis
- Salary and requirement insights
**Key Features:**
- Tech role keyword tracking
- Industry-specific analysis
- Market demand insights
- Competitive intelligence
### 4. AI Analysis CLI (`ai-analyzer/cli.js`)
**Command-line tool for AI analysis of any results JSON file**
- Analyze any results JSON file from LinkedIn parser or other sources
- **Integrated analysis**: AI results embedded back into original JSON
- Custom analysis context and AI models
- Comprehensive analysis summary and statistics
- Flexible input format support
**Key Features:**
- Works with any JSON results file
- **Integrated output**: AI analysis embedded in original structure
- Custom analysis contexts
- Detailed relevance scoring
- Confidence level analysis
- Summary statistics and insights
## 🔧 Configuration
### Environment Variables
Create a `.env` file in the root directory:
```env
# LinkedIn Credentials
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password
# Search Configuration
CITY=Toronto
DATE_POSTED=past-week
SORT_BY=date_posted
WHEELS=5
# Location Filtering
LOCATION_FILTER=Ontario,Manitoba
ENABLE_LOCATION_CHECK=true
# AI Analysis
ENABLE_AI_ANALYSIS=true
AI_CONTEXT="job market analysis and trends"
OLLAMA_MODEL=mistral
# Keywords
KEYWORDS=keywords-layoff.csv
```
### Command Line Options
```bash
# LinkedIn Parser Options
--headless=true|false # Browser headless mode
--keyword="kw1,kw2" # Specific keywords
--add-keyword="kw1,kw2" # Additional keywords
--no-location # Disable location filtering
--no-ai # Disable AI analysis
# Job Search Parser Options
--help # Show parser-specific help
# AI Analysis CLI Options
--input=FILE # Input JSON file
--output=FILE # Output file
--context="description" # Custom AI analysis context
--model=MODEL # Ollama model
--latest # Use latest results file
--dir=PATH # Directory to look for results
```
## 📊 Output Formats
### LinkedIn Parser Output
The LinkedIn parser now generates **two main files** with **integrated AI analysis**:
#### 1. Main Results with AI Analysis (`linkedin-results-YYYY-MM-DD-HH-MM.json`)
```json
{
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"totalPosts": 45,
"rejectedPosts": 12,
"aiAnalysisEnabled": true,
"aiAnalysisCompleted": true,
"aiContext": "job market analysis and trends",
"aiModel": "mistral",
"locationFilter": "Ontario,Manitoba"
},
"results": [
{
"keyword": "layoff",
"text": "Cleaned post content...",
"profileLink": "https://linkedin.com/in/johndoe",
"location": "Toronto, Ontario, Canada",
"locationValid": true,
"locationMatchedFilter": "Ontario",
"locationReasoning": "Location matches filter",
"timestamp": "2024-01-15T10:30:00Z",
"source": "linkedin",
"parser": "linkedout-parser",
"aiAnalysis": {
"isRelevant": true,
"confidence": 0.9,
"reasoning": "Post discusses job market conditions and layoffs",
"context": "job market analysis and trends",
"model": "mistral",
"analyzedAt": "2024-01-15T10:30:00Z"
}
}
]
}
```
#### 2. Rejected Posts (`linkedin-rejected-YYYY-MM-DD-HH-MM.json`)
```json
[
{
"rejected": true,
"reason": "Location filter failed: Location not in filter",
"keyword": "layoff",
"text": "Post content...",
"profileLink": "https://linkedin.com/in/janedoe",
"location": "Vancouver, BC, Canada",
"timestamp": "2024-01-15T10:30:00Z"
}
]
```
### AI Analysis CLI Output
The CLI tool creates **integrated results** with AI analysis embedded:
#### Re-analyzed Results (`original-filename-ai.json`)
```json
{
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"totalPosts": 45,
"aiAnalysisUpdated": "2024-01-15T11:00:00Z",
"aiContext": "layoff analysis",
"aiModel": "mistral"
},
"results": [
{
"keyword": "layoff",
"text": "Post content...",
"profileLink": "https://linkedin.com/in/johndoe",
"location": "Toronto, Ontario, Canada",
"aiAnalysis": {
"isRelevant": true,
"confidence": 0.9,
"reasoning": "Post mentions layoffs and workforce reduction",
"context": "layoff analysis",
"model": "mistral",
"analyzedAt": "2024-01-15T11:00:00Z"
}
}
]
}
```
## 🧪 Testing
### Run All Tests
```bash
npm test
```
### Run Specific Test Suites
```bash
# AI Analyzer tests
cd ai-analyzer && npm test
# LinkedIn Parser tests
cd linkedin-parser && npm test
# Job Search Parser tests
cd job-search-parser && npm test
```
## 🔒 Security & Legal
### Security Best Practices
- Store credentials in `.env` file (never commit)
- Use environment variables for sensitive data
- Implement rate limiting to avoid detection
- Respect LinkedIn's Terms of Service
### Legal Compliance
- Educational/research purposes only
- Respect rate limits and usage policies
- Monitor LinkedIn ToS changes
- Implement data retention policies
## 🚀 Advanced Features
### AI-Powered Analysis
- **Local AI**: Ollama integration for privacy
- **Integrated Analysis**: AI results embedded in data structure
- **Automatic Analysis**: Runs after parsing completes
- **Context Analysis**: Relevance scoring
- **Confidence Scoring**: AI confidence levels for each post
- **CLI Tool**: Standalone analysis with flexible options
### Geographic Intelligence
- **Location Validation**: Profile location verification
- **Regional Filtering**: City/state/country filtering
- **Geographic Analysis**: Location-based insights
### Data Processing
- **Duplicate Detection**: Intelligent deduplication
- **Content Cleaning**: Remove hashtags, URLs, emojis
- **Metadata Extraction**: Author, engagement, timing data
- **Integrated AI**: AI insights embedded in each result
## 📈 Performance Optimization
### Recommended Settings
- **Headless Mode**: Faster execution
- **Location Filtering**: Reduces false positives
- **AI Analysis**: Improves result quality (enabled by default)
- **Batch Processing**: Efficient data handling
### Monitoring
- Real-time progress indicators
- Detailed logging with configurable levels
- Performance metrics tracking
- Error handling and recovery
## 🤝 Contributing
### Development Setup
1. Fork the repository
2. Create feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit pull request
### Code Standards
- Follow existing code style
- Add JSDoc comments
- Maintain test coverage
- Update documentation
## 📄 License
This project is for educational and research purposes. Please respect LinkedIn's Terms of Service and use responsibly.
## 🆘 Support
### Common Issues
- **Browser Issues**: Ensure Playwright is installed
- **Login Problems**: Check credentials in `.env`
- **Rate Limiting**: Implement delays between requests
- **Location Filtering**: Verify location filter format
- **AI Analysis**: Ensure Ollama is running for AI features
### Getting Help
- Check the component-specific READMEs
- Review the demo files for examples
- Examine the test files for usage patterns
- Open an issue with detailed error information
## 🆕 What's New
- **Integrated AI Analysis**: AI results are now embedded directly in the results JSON
- **No Separate Files**: No more separate AI analysis files to manage
- **CLI Tool**: Standalone AI analysis with flexible options
- **Rich Context**: Each post includes detailed AI insights
- **Flexible Re-analysis**: Easy to re-analyze with different contexts
- **Backward Compatible**: Original data structure preserved
---
**Note**: This tool is designed for educational and research purposes. Always respect LinkedIn's Terms of Service and implement appropriate rate limiting and ethical usage practices.