- Created core modules: `ai-analyzer`, `core-parser`, and `job-search-parser`. - Implemented LinkedIn and job search parsers with integrated AI analysis. - Added CLI tools for AI analysis and job parsing. - Included comprehensive README files for each module detailing usage and features. - Established a `.gitignore` file to exclude unnecessary files. - Introduced sample data for testing and demonstration purposes. - Set up package.json files for dependency management across modules. - Implemented logging and error handling utilities for better debugging and user feedback.
498 lines
12 KiB
Markdown
498 lines
12 KiB
Markdown
# Job Search Parser - Job Market Intelligence
|
|
|
|
Specialized parser for job market intelligence, tracking job postings, market trends, and competitive analysis. Focuses on tech roles and industry insights.
|
|
|
|
## 🎯 Purpose
|
|
|
|
The Job Search Parser is designed to:
|
|
|
|
- **Track Job Market Trends**: Monitor demand for specific roles and skills
|
|
- **Competitive Intelligence**: Analyze salary ranges and requirements
|
|
- **Industry Insights**: Track hiring patterns across different sectors
|
|
- **Skill Gap Analysis**: Identify in-demand technologies and frameworks
|
|
- **Market Demand Forecasting**: Predict job market trends
|
|
|
|
## 🚀 Features
|
|
|
|
### Core Functionality
|
|
|
|
- **Multi-Source Aggregation**: Collect job data from multiple platforms
|
|
- **Role-Specific Tracking**: Focus on tech roles and emerging positions
|
|
- **Skill Analysis**: Extract and categorize required skills
|
|
- **Salary Intelligence**: Track compensation ranges and trends
|
|
- **Company Intelligence**: Monitor hiring companies and patterns
|
|
|
|
### Advanced Features
|
|
|
|
- **Market Trend Analysis**: Identify growing and declining job categories
|
|
- **Geographic Distribution**: Track job distribution by location
|
|
- **Experience Level Analysis**: Entry, mid, senior level tracking
|
|
- **Remote Work Trends**: Monitor remote/hybrid work patterns
|
|
- **Technology Stack Tracking**: Framework and tool popularity
|
|
|
|
## 🌐 Supported Job Sites
|
|
|
|
### ✅ Implemented Parsers
|
|
|
|
#### SkipTheDrive Parser
|
|
|
|
Remote job board specializing in work-from-home positions.
|
|
|
|
**Features:**
|
|
|
|
- Keyword-based job search with relevance sorting
|
|
- Job type filtering (full-time, part-time, contract)
|
|
- Multi-page result parsing with pagination
|
|
- Featured/sponsored job identification
|
|
- AI-powered job relevance analysis
|
|
- Automatic duplicate detection
|
|
|
|
**Usage:**
|
|
|
|
```bash
|
|
# Parse SkipTheDrive for QA automation jobs
|
|
node index.js --sites=skipthedrive --keywords="automation qa,qa engineer"
|
|
|
|
# Filter by job type
|
|
JOB_TYPES="full time,contract" node index.js --sites=skipthedrive
|
|
|
|
# Run demo with limited results
|
|
node index.js --sites=skipthedrive --demo
|
|
```
|
|
|
|
### 🚧 Planned Parsers
|
|
|
|
- **Indeed**: Comprehensive job aggregator
|
|
- **Glassdoor**: Jobs with company reviews and salary data
|
|
- **Monster**: Traditional job board
|
|
- **SimplyHired**: Job aggregator with salary estimates
|
|
- **LinkedIn Jobs**: Professional network job postings
|
|
- **AngelList**: Startup and tech jobs
|
|
- **Remote.co**: Dedicated remote work jobs
|
|
- **FlexJobs**: Flexible and remote positions
|
|
|
|
## 📦 Installation
|
|
|
|
```bash
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Run tests
|
|
npm test
|
|
|
|
# Run demo
|
|
node demo.js
|
|
```
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Create a `.env` file in the parser directory:
|
|
|
|
```env
|
|
# Job Search Configuration
|
|
SEARCH_SOURCES=linkedin,indeed,glassdoor
|
|
TARGET_ROLES=software engineer,data scientist,product manager
|
|
LOCATION_FILTER=Toronto,Vancouver,Calgary
|
|
EXPERIENCE_LEVELS=entry,mid,senior
|
|
REMOTE_PREFERENCE=remote,hybrid,onsite
|
|
|
|
# Analysis Configuration
|
|
ENABLE_SALARY_ANALYSIS=true
|
|
ENABLE_SKILL_ANALYSIS=true
|
|
ENABLE_TREND_ANALYSIS=true
|
|
MIN_SALARY=50000
|
|
MAX_SALARY=200000
|
|
|
|
# Output Configuration
|
|
OUTPUT_FORMAT=json,csv
|
|
SAVE_RAW_DATA=true
|
|
ANALYSIS_INTERVAL=daily
|
|
```
|
|
|
|
### Command Line Options
|
|
|
|
```bash
|
|
# Basic usage
|
|
node index.js
|
|
|
|
# Specific roles
|
|
node index.js --roles="frontend developer,backend developer"
|
|
|
|
# Geographic focus
|
|
node index.js --locations="Toronto,Vancouver"
|
|
|
|
# Experience level
|
|
node index.js --experience="senior"
|
|
|
|
# Output format
|
|
node index.js --output=results/job-market-analysis.json
|
|
```
|
|
|
|
**Available Options:**
|
|
|
|
- `--roles="role1,role2"`: Target job roles
|
|
- `--locations="city1,city2"`: Geographic focus
|
|
- `--experience="entry|mid|senior"`: Experience level
|
|
- `--remote="remote|hybrid|onsite"`: Remote work preference
|
|
- `--salary-min=NUMBER`: Minimum salary filter
|
|
- `--salary-max=NUMBER`: Maximum salary filter
|
|
- `--output=FILE`: Output filename
|
|
- `--format=json|csv`: Output format
|
|
- `--trends`: Enable trend analysis
|
|
- `--skills`: Enable skill analysis
|
|
|
|
## 📊 Keywords
|
|
|
|
### Role-Specific Keywords
|
|
|
|
Place keyword CSV files in the `keywords/` directory:
|
|
|
|
```
|
|
job-search-parser/
|
|
├── keywords/
|
|
│ ├── job-search-keywords.csv # General job search terms
|
|
│ ├── tech-roles.csv # Technology roles
|
|
│ ├── data-roles.csv # Data science roles
|
|
│ ├── management-roles.csv # Management positions
|
|
│ └── emerging-roles.csv # Emerging job categories
|
|
└── index.js
|
|
```
|
|
|
|
### Tech Roles Keywords
|
|
|
|
```csv
|
|
keyword
|
|
software engineer
|
|
frontend developer
|
|
backend developer
|
|
full stack developer
|
|
data scientist
|
|
machine learning engineer
|
|
devops engineer
|
|
site reliability engineer
|
|
cloud architect
|
|
security engineer
|
|
mobile developer
|
|
iOS developer
|
|
Android developer
|
|
react developer
|
|
vue developer
|
|
angular developer
|
|
node.js developer
|
|
python developer
|
|
java developer
|
|
golang developer
|
|
rust developer
|
|
data engineer
|
|
analytics engineer
|
|
```
|
|
|
|
### Data Science Keywords
|
|
|
|
```csv
|
|
keyword
|
|
data scientist
|
|
machine learning engineer
|
|
data analyst
|
|
business analyst
|
|
data engineer
|
|
analytics engineer
|
|
ML engineer
|
|
AI engineer
|
|
statistician
|
|
quantitative analyst
|
|
research scientist
|
|
data architect
|
|
BI developer
|
|
ETL developer
|
|
```
|
|
|
|
## 📈 Usage Examples
|
|
|
|
### Basic Job Search
|
|
|
|
```bash
|
|
# Standard job market analysis
|
|
node index.js
|
|
|
|
# Specific tech roles
|
|
node index.js --roles="software engineer,data scientist"
|
|
|
|
# Geographic focus
|
|
node index.js --locations="Toronto,Vancouver,Calgary"
|
|
```
|
|
|
|
### Advanced Analysis
|
|
|
|
```bash
|
|
# Senior level positions
|
|
node index.js --experience="senior" --salary-min=100000
|
|
|
|
# Remote work opportunities
|
|
node index.js --remote="remote" --roles="frontend developer"
|
|
|
|
# Trend analysis
|
|
node index.js --trends --skills --output=results/trends.json
|
|
```
|
|
|
|
### Market Intelligence
|
|
|
|
```bash
|
|
# Salary analysis
|
|
node index.js --salary-min=80000 --salary-max=150000
|
|
|
|
# Skill gap analysis
|
|
node index.js --skills --roles="machine learning engineer"
|
|
|
|
# Competitive intelligence
|
|
node index.js --companies="Google,Microsoft,Amazon"
|
|
```
|
|
|
|
## 📊 Output Format
|
|
|
|
### JSON Structure
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"timestamp": "2024-01-15T10:30:00Z",
|
|
"search_parameters": {
|
|
"roles": ["software engineer", "data scientist"],
|
|
"locations": ["Toronto", "Vancouver"],
|
|
"experience_levels": ["mid", "senior"],
|
|
"remote_preference": ["remote", "hybrid"]
|
|
},
|
|
"total_jobs_found": 1250,
|
|
"analysis_duration_seconds": 45
|
|
},
|
|
"market_overview": {
|
|
"total_jobs": 1250,
|
|
"average_salary": 95000,
|
|
"salary_range": {
|
|
"min": 65000,
|
|
"max": 180000,
|
|
"median": 92000
|
|
},
|
|
"remote_distribution": {
|
|
"remote": 45,
|
|
"hybrid": 35,
|
|
"onsite": 20
|
|
},
|
|
"experience_distribution": {
|
|
"entry": 15,
|
|
"mid": 45,
|
|
"senior": 40
|
|
}
|
|
},
|
|
"trends": {
|
|
"growing_skills": [
|
|
{ "skill": "React", "growth_rate": 25 },
|
|
{ "skill": "Python", "growth_rate": 18 },
|
|
{ "skill": "AWS", "growth_rate": 22 }
|
|
],
|
|
"declining_skills": [
|
|
{ "skill": "jQuery", "growth_rate": -12 },
|
|
{ "skill": "PHP", "growth_rate": -8 }
|
|
],
|
|
"emerging_roles": ["AI Engineer", "DevSecOps Engineer", "Data Engineer"]
|
|
},
|
|
"jobs": [
|
|
{
|
|
"id": "job_1",
|
|
"title": "Senior Software Engineer",
|
|
"company": "TechCorp",
|
|
"location": "Toronto, Ontario",
|
|
"remote_type": "hybrid",
|
|
"salary": {
|
|
"min": 100000,
|
|
"max": 140000,
|
|
"currency": "CAD"
|
|
},
|
|
"required_skills": ["React", "Node.js", "TypeScript", "AWS"],
|
|
"preferred_skills": ["GraphQL", "Docker", "Kubernetes"],
|
|
"experience_level": "senior",
|
|
"job_url": "https://example.com/job/1",
|
|
"posted_date": "2024-01-10T09:00:00Z",
|
|
"scraped_at": "2024-01-15T10:30:00Z"
|
|
}
|
|
],
|
|
"analysis": {
|
|
"skill_demand": {
|
|
"React": { "count": 45, "avg_salary": 98000 },
|
|
"Python": { "count": 38, "avg_salary": 102000 },
|
|
"AWS": { "count": 32, "avg_salary": 105000 }
|
|
},
|
|
"company_insights": {
|
|
"top_hirers": [
|
|
{ "company": "TechCorp", "jobs": 25 },
|
|
{ "company": "StartupXYZ", "jobs": 18 }
|
|
],
|
|
"salary_leaders": [
|
|
{ "company": "BigTech", "avg_salary": 120000 },
|
|
{ "company": "FinTech", "avg_salary": 115000 }
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### CSV Output
|
|
|
|
The parser can also generate CSV files for easy analysis:
|
|
|
|
```csv
|
|
job_id,title,company,location,remote_type,salary_min,salary_max,required_skills,experience_level,posted_date
|
|
job_1,Senior Software Engineer,TechCorp,Toronto,hybrid,100000,140000,"React,Node.js,TypeScript",senior,2024-01-10
|
|
job_2,Data Scientist,DataCorp,Vancouver,remote,90000,130000,"Python,SQL,ML",mid,2024-01-09
|
|
```
|
|
|
|
## 🔒 Security & Best Practices
|
|
|
|
### Data Privacy
|
|
|
|
- Respect job site terms of service
|
|
- Implement appropriate rate limiting
|
|
- Store data securely and responsibly
|
|
- Anonymize sensitive information
|
|
|
|
### Rate Limiting
|
|
|
|
- Implement delays between requests
|
|
- Respect API rate limits
|
|
- Use multiple data sources
|
|
- Monitor for blocking/detection
|
|
|
|
### Legal Compliance
|
|
|
|
- Educational and research purposes only
|
|
- Respect website terms of service
|
|
- Implement data retention policies
|
|
- Monitor for legal changes
|
|
|
|
## 🧪 Testing
|
|
|
|
### Run Tests
|
|
|
|
```bash
|
|
# All tests
|
|
npm test
|
|
|
|
# Specific test suites
|
|
npm test -- --testNamePattern="JobSearch"
|
|
npm test -- --testNamePattern="Analysis"
|
|
npm test -- --testNamePattern="Trends"
|
|
```
|
|
|
|
### Test Coverage
|
|
|
|
```bash
|
|
npm run test:coverage
|
|
```
|
|
|
|
## 🚀 Performance Optimization
|
|
|
|
### Recommended Settings
|
|
|
|
#### Fast Analysis
|
|
|
|
```bash
|
|
node index.js --roles="software engineer" --locations="Toronto"
|
|
```
|
|
|
|
#### Comprehensive Analysis
|
|
|
|
```bash
|
|
node index.js --trends --skills --experience="all"
|
|
```
|
|
|
|
#### Focused Intelligence
|
|
|
|
```bash
|
|
node index.js --salary-min=80000 --remote="remote" --trends
|
|
```
|
|
|
|
### Performance Tips
|
|
|
|
- Use specific role filters to reduce data volume
|
|
- Implement caching for repeated searches
|
|
- Use parallel processing for multiple sources
|
|
- Optimize data storage and retrieval
|
|
|
|
## 🔧 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Rate Limiting
|
|
|
|
```bash
|
|
# Reduce request frequency
|
|
export REQUEST_DELAY=2000
|
|
node index.js
|
|
```
|
|
|
|
#### Data Source Issues
|
|
|
|
```bash
|
|
# Use specific sources
|
|
node index.js --sources="linkedin,indeed"
|
|
|
|
# Check source availability
|
|
node index.js --test-sources
|
|
```
|
|
|
|
#### Output Issues
|
|
|
|
```bash
|
|
# Check output directory
|
|
mkdir -p results
|
|
node index.js --output=results/analysis.json
|
|
|
|
# Verify file permissions
|
|
chmod 755 results/
|
|
```
|
|
|
|
## 📈 Monitoring & Analytics
|
|
|
|
### Key Metrics
|
|
|
|
- **Job Volume**: Total jobs found per search
|
|
- **Salary Trends**: Average and median salary changes
|
|
- **Skill Demand**: Most requested skills
|
|
- **Remote Adoption**: Remote work trend analysis
|
|
- **Market Velocity**: Job posting frequency
|
|
|
|
### Dashboard Integration
|
|
|
|
- Real-time market monitoring
|
|
- Trend visualization
|
|
- Salary benchmarking
|
|
- Skill gap analysis
|
|
- Competitive intelligence
|
|
|
|
## 🤝 Contributing
|
|
|
|
### Development Setup
|
|
|
|
1. Fork the repository
|
|
2. Create feature branch
|
|
3. Add tests for new functionality
|
|
4. Ensure all tests pass
|
|
5. Submit pull request
|
|
|
|
### Code Standards
|
|
|
|
- Follow existing code style
|
|
- Add JSDoc comments
|
|
- Maintain test coverage
|
|
- Update documentation
|
|
|
|
## 📄 License
|
|
|
|
This parser is part of the LinkedOut platform and follows the same licensing terms.
|
|
|
|
---
|
|
|
|
**Note**: This tool is designed for educational and research purposes. Always respect website terms of service and implement appropriate rate limiting and ethical usage practices.
|