feat(jobs): suppress duplicate postings after skip or apply
Some checks failed
CI / Linting (Biome) (push) Failing after 41s
CI / Tests (push) Successful in 5m25s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m12s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m9s
CI / Type Check (orchestrator) (push) Successful in 1m25s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m9s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m9s
CI / Documentation (push) Failing after 1m56s
Some checks failed
CI / Linting (Biome) (push) Failing after 41s
CI / Tests (push) Successful in 5m25s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m12s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m9s
CI / Type Check (orchestrator) (push) Successful in 1m25s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m9s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m9s
CI / Documentation (push) Failing after 1m56s
Dedup by employer+title and description at import; cascade skip on dismiss; hide repeats in the job list. Document product scope and duplicate detection in docs. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
parent
5401f384c1
commit
17c4d4490a
@ -15,6 +15,8 @@ Country tokens are normalized to canonical keys (for example `India` → `india`
|
|||||||
|
|
||||||
Global and remote boards often return roles tagged to countries you do not want (for example India-remote QA listings while you target Canada). This filter applies at import time so those rows never enter your **Discovered** queue.
|
Global and remote boards often return roles tagged to countries you do not want (for example India-remote QA listings while you target Canada). This filter applies at import time so those rows never enter your **Discovered** queue.
|
||||||
|
|
||||||
|
When **Search cities** (or pipeline geography) is a single country such as **Canada**, JobOps also enforces an **allow-list**: only jobs that clearly hire in that country are kept. Vague `Remote` / `Worldwide` rows with no Canada signal are dropped, as are rows that mention any other country.
|
||||||
|
|
||||||
## How to use it
|
## How to use it
|
||||||
|
|
||||||
1. Open **Settings** and expand **Scoring Settings**.
|
1. Open **Settings** and expand **Scoring Settings**.
|
||||||
@ -26,7 +28,8 @@ Global and remote boards often return roles tagged to countries you do not want
|
|||||||
### Tips
|
### Tips
|
||||||
|
|
||||||
- Use country names as they appear on listings: `India`, `Poland`, `United Kingdom`, or aliases `UK`, `USA`.
|
- Use country names as they appear on listings: `India`, `Poland`, `United Kingdom`, or aliases `UK`, `USA`.
|
||||||
- Listings with **no recognizable country** in the location field (for example `Remote` only) are **kept**, not blocked.
|
- With **no** country-level search geography, listings whose location is only `Remote` / `Worldwide` with no blocked country in the text are **kept**.
|
||||||
|
- With search geography set to a country (for example `Canada`), vague remote rows with no signal for that country are **dropped** even if they are not on the blocked list.
|
||||||
- The list is capped in Settings validation (max 50 entries, each up to 100 characters).
|
- The list is capped in Settings validation (max 50 entries, each up to 100 characters).
|
||||||
- Pair with **Search cities** / **country** settings to narrow what extractors query; blocked countries filter what still comes back from broad boards.
|
- Pair with **Search cities** / **country** settings to narrow what extractors query; blocked countries filter what still comes back from broad boards.
|
||||||
|
|
||||||
@ -50,6 +53,7 @@ Global and remote boards often return roles tagged to countries you do not want
|
|||||||
|
|
||||||
## Related pages
|
## Related pages
|
||||||
|
|
||||||
|
- [Duplicate job detection](./duplicate-jobs)
|
||||||
- [Company skip list](./company-skip-list)
|
- [Company skip list](./company-skip-list)
|
||||||
- [Settings](/docs/features/settings)
|
- [Settings](/docs/features/settings)
|
||||||
- [Pipeline Run](/docs/features/pipeline-run)
|
- [Pipeline Run](/docs/features/pipeline-run)
|
||||||
|
|||||||
@ -48,6 +48,7 @@ You may want to avoid certain agencies, staffing brands, or employers without ha
|
|||||||
|
|
||||||
## Related pages
|
## Related pages
|
||||||
|
|
||||||
|
- [Duplicate job detection](./duplicate-jobs)
|
||||||
- [Blocked countries](./blocked-countries)
|
- [Blocked countries](./blocked-countries)
|
||||||
- [Settings](/docs/features/settings)
|
- [Settings](/docs/features/settings)
|
||||||
- [Pipeline Run](/docs/features/pipeline-run)
|
- [Pipeline Run](/docs/features/pipeline-run)
|
||||||
|
|||||||
79
docs-site/docs/features/duplicate-jobs.md
Normal file
79
docs-site/docs/features/duplicate-jobs.md
Normal file
@ -0,0 +1,79 @@
|
|||||||
|
---
|
||||||
|
id: duplicate-jobs
|
||||||
|
title: Duplicate job detection
|
||||||
|
description: How JobOps deduplicates cross-source postings and hides repeats after you skip or apply.
|
||||||
|
sidebar_position: 8
|
||||||
|
---
|
||||||
|
|
||||||
|
## What it is
|
||||||
|
|
||||||
|
JobOps treats the same role reposted on different boards as **one opportunity**, and remembers when you have already **skipped** or **applied** so you do not see it again.
|
||||||
|
|
||||||
|
Duplicate detection uses normalized keys:
|
||||||
|
|
||||||
|
- **Employer + title** — strips punctuation, `(Remote)`, trailing city lines, and legal suffixes (`Inc.`, `Ltd.`) so `Acme Inc.` / `SDET (Remote)` matches `Acme` / `SDET`.
|
||||||
|
- **Employer + description** — when the job description is long enough, the same posting copy under the same company matches even if the title wording differs slightly.
|
||||||
|
|
||||||
|
This runs at **import time** (pipeline) and in the **Jobs list** (UI).
|
||||||
|
|
||||||
|
## Why it exists
|
||||||
|
|
||||||
|
Job boards repost the same role on LinkedIn, Indeed, QAJobsBoard, and aggregators. Skipping or applying once should mean you do not wade through the same listing again on the next run.
|
||||||
|
|
||||||
|
## How to use it
|
||||||
|
|
||||||
|
You do not configure duplicate detection separately. It is always on for your profile.
|
||||||
|
|
||||||
|
### When you skip a job
|
||||||
|
|
||||||
|
1. Skip from the job detail panel or press **`s`** on the Jobs page.
|
||||||
|
2. JobOps marks that row `skipped`.
|
||||||
|
3. Any other **Discovered** or **Ready** jobs with the same employer+title (or same employer+description) are **auto-skipped**.
|
||||||
|
4. Future pipeline imports that match those keys are **not imported**.
|
||||||
|
|
||||||
|
### When you mark applied
|
||||||
|
|
||||||
|
1. Mark the job applied from the UI.
|
||||||
|
2. Duplicate open rows are **auto-skipped** (not marked applied) so your Applied tab stays clean.
|
||||||
|
3. Future imports of the same role are suppressed the same way as for skips.
|
||||||
|
|
||||||
|
### Cross-source import
|
||||||
|
|
||||||
|
During a pipeline run, if a new posting matches an existing row by URL, source id, or content fingerprint, JobOps **does not create a second row** — the import is counted as skipped in run stats.
|
||||||
|
|
||||||
|
### Jobs list
|
||||||
|
|
||||||
|
Open jobs that match a prior skip or apply are **hidden** from Discovered, Ready, and All tabs so the queue stays fresh. Skipped and applied rows themselves remain visible in their statuses.
|
||||||
|
|
||||||
|
## Defaults and constraints
|
||||||
|
|
||||||
|
- Description matching requires at least **80 characters** of normalized text; short or empty descriptions fall back to employer+title only.
|
||||||
|
- Matching is **per profile** (`ownerProfileId`); different login profiles do not share dedup state.
|
||||||
|
- Dedup does **not** delete existing rows retroactively when you change skip list or country filters — run discovery again or skip manually for old data.
|
||||||
|
- Very different titles at the same company (for example `SDET` vs `Product Designer`) are **not** collapsed.
|
||||||
|
|
||||||
|
## Common problems
|
||||||
|
|
||||||
|
### I still see the same job from another source
|
||||||
|
|
||||||
|
- Titles or employer names may differ enough that normalization does not match (for example a staffing agency name vs the hiring company).
|
||||||
|
- Add the employer to the [Company skip list](./company-skip-list) if it is always noise.
|
||||||
|
- Skip one row — siblings with matching keys should auto-skip and future imports should stop.
|
||||||
|
|
||||||
|
### A different role at the same company disappeared
|
||||||
|
|
||||||
|
- Employer+title dedup only merges **the same normalized title**. Different roles at one company should remain separate.
|
||||||
|
- If two titles normalize to the same string, check for overly generic titles on the board.
|
||||||
|
|
||||||
|
### Skipped jobs reappear after a pipeline run
|
||||||
|
|
||||||
|
- Confirm the skip saved (status `skipped` in the UI).
|
||||||
|
- If the repost uses a new employer spelling and a new title **and** a short description, it may not match — block the employer or country instead.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- [Orchestrator](/docs/features/orchestrator)
|
||||||
|
- [Pipeline Run](/docs/features/pipeline-run)
|
||||||
|
- [Company skip list](./company-skip-list)
|
||||||
|
- [Blocked countries](./blocked-countries)
|
||||||
|
- [Keyboard Shortcuts](/docs/features/keyboard-shortcuts)
|
||||||
@ -27,6 +27,8 @@ Job states:
|
|||||||
- `skipped`: explicitly excluded from active queue
|
- `skipped`: explicitly excluded from active queue
|
||||||
- `expired`: deadline passed
|
- `expired`: deadline passed
|
||||||
|
|
||||||
|
When you **skip** or **mark applied**, JobOps also skips matching open duplicates (same company + title or description) and blocks re-import on future runs. See [Duplicate job detection](/docs/features/duplicate-jobs).
|
||||||
|
|
||||||
## Why it exists
|
## Why it exists
|
||||||
|
|
||||||
Orchestrator centralizes the transition from discovered opportunities to application-ready artifacts.
|
Orchestrator centralizes the transition from discovered opportunities to application-ready artifacts.
|
||||||
@ -132,8 +134,15 @@ curl -X POST "http://localhost:3001/api/jobs/<jobId>/generate-pdf"
|
|||||||
|
|
||||||
- Patch `status` back to `discovered` to return the job to the active queue.
|
- Patch `status` back to `discovered` to return the job to the active queue.
|
||||||
|
|
||||||
|
### Duplicate postings
|
||||||
|
|
||||||
|
- Skipping one listing auto-skips other **Discovered** / **Ready** rows that match the same normalized employer+title (or employer+description when available).
|
||||||
|
- The Jobs list hides open rows that match a job you already skipped or applied to.
|
||||||
|
- Details: [Duplicate job detection](/docs/features/duplicate-jobs).
|
||||||
|
|
||||||
## Related pages
|
## Related pages
|
||||||
|
|
||||||
|
- [Duplicate job detection](/docs/features/duplicate-jobs)
|
||||||
- [Pipeline Run](/docs/next/features/pipeline-run)
|
- [Pipeline Run](/docs/next/features/pipeline-run)
|
||||||
- [Ghostwriter](/docs/next/features/ghostwriter)
|
- [Ghostwriter](/docs/next/features/ghostwriter)
|
||||||
- [Reactive Resume](/docs/next/features/reactive-resume)
|
- [Reactive Resume](/docs/next/features/reactive-resume)
|
||||||
|
|||||||
@ -102,13 +102,15 @@ When new listings are imported, JobOps does not create a second database row if
|
|||||||
|
|
||||||
- a **canonical job URL** (normalizes `http`/`https`, `www`, trailing slashes, common tracking query params, and sorts remaining query keys)
|
- a **canonical job URL** (normalizes `http`/`https`, `www`, trailing slashes, common tracking query params, and sorts remaining query keys)
|
||||||
- the pair **`source` + `source_job_id`** when the extractor provides an external id
|
- the pair **`source` + `source_job_id`** when the extractor provides an external id
|
||||||
|
- a **content fingerprint** (normalized **employer + title**) so the same role from another board is not imported twice
|
||||||
|
- **skip/apply memory** — imports that match a job you already skipped or applied are not added
|
||||||
|
|
||||||
Existing jobs keep their stored URL; new imports use the canonical form so the same role is not added again under a slightly different link.
|
See [Duplicate job detection](./duplicate-jobs) for skip cascades and description matching.
|
||||||
|
|
||||||
To drop listings before import, use **Settings → Scoring Settings**:
|
To drop listings before import, use **Settings → Scoring Settings** and pipeline geography:
|
||||||
|
|
||||||
- [Company skip list](./company-skip-list) — blocked **employer** keywords
|
- [Company skip list](./company-skip-list) — blocked **employer** keywords
|
||||||
- [Blocked countries](./blocked-countries) — drop jobs whose **location** mentions a country you list (for example India)
|
- [Blocked countries](./blocked-countries) — block specific countries; when search geography is a country (for example Canada), enforce that country only
|
||||||
|
|
||||||
## Common problems
|
## Common problems
|
||||||
|
|
||||||
|
|||||||
@ -8,6 +8,19 @@ slug: /
|
|||||||
|
|
||||||
Welcome to the JobOps documentation. This site contains guides for setup, configuration, and day-to-day usage.
|
Welcome to the JobOps documentation. This site contains guides for setup, configuration, and day-to-day usage.
|
||||||
|
|
||||||
|
## What JobOps does
|
||||||
|
|
||||||
|
JobOps is a self-hosted job search operations stack: it **discovers** roles from many boards, **filters** them to your geography and profile, **scores** fit, **tailors** resumes and PDFs, and **tracks** applications after you apply.
|
||||||
|
|
||||||
|
In practice:
|
||||||
|
|
||||||
|
1. **Discover** — Run a pipeline against LinkedIn, Indeed, Glassdoor, QAJobsBoard, Canadian boards, and other extractors using your search terms and country (for example Canada, remote-only QA).
|
||||||
|
2. **Filter** — Drop unwanted countries, companies, co-op/intern patterns, non-matching locations, expired LinkedIn reposts, and duplicate postings you already skipped or applied to.
|
||||||
|
3. **Review** — Work through **Discovered** and **Ready** in the Orchestrator; skip noise, move strong fits to Ready, generate tailored PDFs.
|
||||||
|
4. **Apply & track** — Mark applied, sync Gmail for recruiter mail, and use the in-progress board and analytics.
|
||||||
|
|
||||||
|
Key filters and quality controls are documented under [Core Features](#feature-documentation) — especially [Blocked countries](/docs/features/blocked-countries), [Company skip list](/docs/features/company-skip-list), and [Duplicate job detection](/docs/features/duplicate-jobs).
|
||||||
|
|
||||||
## Getting Started
|
## Getting Started
|
||||||
|
|
||||||
- **<a href="/docs/next/getting-started/self-hosting" data-umami-event="docs_intro_self_hosting_click">Self-Hosting Guide</a>**
|
- **<a href="/docs/next/getting-started/self-hosting" data-umami-event="docs_intro_self_hosting_click">Self-Hosting Guide</a>**
|
||||||
@ -56,6 +69,18 @@ Welcome to the JobOps documentation. This site contains guides for setup, config
|
|||||||
- `?` shortcut help dialog and `Control` hint bar behavior
|
- `?` shortcut help dialog and `Control` hint bar behavior
|
||||||
- Tab-specific actions like skip, move to ready, and mark applied
|
- Tab-specific actions like skip, move to ready, and mark applied
|
||||||
|
|
||||||
|
- **[Duplicate job detection](/docs/features/duplicate-jobs)**
|
||||||
|
- Cross-source dedup by employer and title
|
||||||
|
- Auto-skip repeats when you skip or apply
|
||||||
|
- Hide duplicate open rows in the Jobs list
|
||||||
|
|
||||||
|
- **[Blocked countries](/docs/features/blocked-countries)**
|
||||||
|
- Block listings that mention specific countries
|
||||||
|
- Canada-only (and other) search geography enforcement
|
||||||
|
|
||||||
|
- **[Company skip list](/docs/features/company-skip-list)**
|
||||||
|
- Block employers by keyword during discovery
|
||||||
|
|
||||||
- **[Multi-Select and Bulk Actions](/docs/next/features/multi-select-and-bulk-actions)**
|
- **[Multi-Select and Bulk Actions](/docs/next/features/multi-select-and-bulk-actions)**
|
||||||
- Select many jobs using row checkboxes or select-all
|
- Select many jobs using row checkboxes or select-all
|
||||||
- Run bulk move, skip, and rescore actions from the floating action bar
|
- Run bulk move, skip, and rescore actions from the floating action bar
|
||||||
@ -117,12 +142,12 @@ Welcome to the JobOps documentation. This site contains guides for setup, config
|
|||||||
|
|
||||||
### Key Features
|
### Key Features
|
||||||
|
|
||||||
1. **Job Discovery**: Automatically find jobs from multiple sources.
|
1. **Job discovery**: Find roles from multiple extractors in one pipeline run.
|
||||||
2. **AI Scoring**: Rank jobs by suitability for your profile.
|
2. **Geography and quality filters**: Block countries and employers, enforce search-country allow-lists, remote-only runs, and profile deal-breakers.
|
||||||
3. **Resume Tailoring**: Generate custom resumes for each job.
|
3. **Duplicate suppression**: Collapse cross-board reposts; remember skips and applications.
|
||||||
4. **PDF Export**: Create tailored PDFs via RxResume integration.
|
4. **AI scoring**: Rank jobs by suitability for your profile.
|
||||||
5. **Application Tracking**: Monitor your applied jobs.
|
5. **Resume tailoring**: Generate custom resumes and PDFs per job (RxResume).
|
||||||
6. **Email Tracking**: Auto-track post-application responses.
|
6. **Application tracking**: Applied status, post-application Gmail sync, in-progress board, and analytics.
|
||||||
|
|
||||||
## Contributing to Documentation
|
## Contributing to Documentation
|
||||||
|
|
||||||
|
|||||||
@ -34,6 +34,7 @@ const sidebars: SidebarsConfig = {
|
|||||||
"features/settings",
|
"features/settings",
|
||||||
"features/company-skip-list",
|
"features/company-skip-list",
|
||||||
"features/blocked-countries",
|
"features/blocked-countries",
|
||||||
|
"features/duplicate-jobs",
|
||||||
"features/reactive-resume",
|
"features/reactive-resume",
|
||||||
"features/in-progress-board",
|
"features/in-progress-board",
|
||||||
"features/ghostwriter",
|
"features/ghostwriter",
|
||||||
|
|||||||
32
orchestrator/src/client/lib/job-dedup.test.ts
Normal file
32
orchestrator/src/client/lib/job-dedup.test.ts
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
import { createJob } from "@shared/testing/factories";
|
||||||
|
import { describe, expect, it } from "vitest";
|
||||||
|
import { buildDuplicateDismissHints } from "./job-dedup";
|
||||||
|
|
||||||
|
describe("buildDuplicateDismissHints", () => {
|
||||||
|
it("flags open jobs that match a skipped posting", () => {
|
||||||
|
const jobs = [
|
||||||
|
createJob({
|
||||||
|
id: "skipped-1",
|
||||||
|
employer: "Acme",
|
||||||
|
title: "SDET",
|
||||||
|
status: "skipped",
|
||||||
|
}),
|
||||||
|
createJob({
|
||||||
|
id: "open-1",
|
||||||
|
employer: "Acme Inc.",
|
||||||
|
title: "SDET (Remote)",
|
||||||
|
status: "discovered",
|
||||||
|
}),
|
||||||
|
createJob({
|
||||||
|
id: "open-2",
|
||||||
|
employer: "Contoso",
|
||||||
|
title: "QA Engineer",
|
||||||
|
status: "discovered",
|
||||||
|
}),
|
||||||
|
];
|
||||||
|
|
||||||
|
const hints = buildDuplicateDismissHints(jobs);
|
||||||
|
expect(hints.get("open-1")).toBe("skipped");
|
||||||
|
expect(hints.has("open-2")).toBe(false);
|
||||||
|
});
|
||||||
|
});
|
||||||
50
orchestrator/src/client/lib/job-dedup.ts
Normal file
50
orchestrator/src/client/lib/job-dedup.ts
Normal file
@ -0,0 +1,50 @@
|
|||||||
|
import { collectJobDedupKeys } from "@shared/job-fingerprint";
|
||||||
|
import type { JobListItem, JobStatus } from "@shared/types";
|
||||||
|
|
||||||
|
export type DuplicateDismissReason = "skipped" | "applied";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Map open jobs to a prior skip/apply when employer+title or description matches.
|
||||||
|
*/
|
||||||
|
export function buildDuplicateDismissHints(
|
||||||
|
jobs: readonly JobListItem[],
|
||||||
|
): Map<string, DuplicateDismissReason> {
|
||||||
|
const dismissedKeys = new Map<string, DuplicateDismissReason>();
|
||||||
|
|
||||||
|
for (const job of jobs) {
|
||||||
|
if (job.status !== "skipped" && job.status !== "applied") continue;
|
||||||
|
const reason: DuplicateDismissReason =
|
||||||
|
job.status === "applied" ? "applied" : "skipped";
|
||||||
|
for (const key of collectJobDedupKeys({
|
||||||
|
employer: job.employer,
|
||||||
|
title: job.title,
|
||||||
|
})) {
|
||||||
|
if (!dismissedKeys.has(key)) dismissedKeys.set(key, reason);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const hints = new Map<string, DuplicateDismissReason>();
|
||||||
|
const openStatuses = new Set<JobStatus>([
|
||||||
|
"discovered",
|
||||||
|
"ready",
|
||||||
|
"processing",
|
||||||
|
]);
|
||||||
|
|
||||||
|
for (const job of jobs) {
|
||||||
|
if (!openStatuses.has(job.status)) continue;
|
||||||
|
for (const key of collectJobDedupKeys({
|
||||||
|
employer: job.employer,
|
||||||
|
title: job.title,
|
||||||
|
})) {
|
||||||
|
const reason = dismissedKeys.get(key);
|
||||||
|
if (reason) {
|
||||||
|
hints.set(job.id, reason);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return hints;
|
||||||
|
}
|
||||||
|
|
||||||
|
export { collectJobDedupKeys };
|
||||||
@ -1,4 +1,5 @@
|
|||||||
import { useSettings } from "@client/hooks/useSettings";
|
import { useSettings } from "@client/hooks/useSettings";
|
||||||
|
import { buildDuplicateDismissHints } from "@client/lib/job-dedup";
|
||||||
import { inferCountryKeyFromSearchGeography } from "@shared/search-cities";
|
import { inferCountryKeyFromSearchGeography } from "@shared/search-cities";
|
||||||
import type React from "react";
|
import type React from "react";
|
||||||
import { useCallback, useEffect, useMemo, useState } from "react";
|
import { useCallback, useEffect, useMemo, useState } from "react";
|
||||||
@ -167,6 +168,11 @@ export const OrchestratorPage: React.FC = () => {
|
|||||||
[settings?.searchCities?.value],
|
[settings?.searchCities?.value],
|
||||||
);
|
);
|
||||||
|
|
||||||
|
const duplicateDismissHints = useMemo(
|
||||||
|
() => buildDuplicateDismissHints(jobs),
|
||||||
|
[jobs],
|
||||||
|
);
|
||||||
|
|
||||||
const jobListFilterExtras = useMemo(
|
const jobListFilterExtras = useMemo(
|
||||||
() => ({
|
() => ({
|
||||||
foundAfterYmd,
|
foundAfterYmd,
|
||||||
@ -177,6 +183,7 @@ export const OrchestratorPage: React.FC = () => {
|
|||||||
? settingsSkipEmployerKeywords
|
? settingsSkipEmployerKeywords
|
||||||
: [],
|
: [],
|
||||||
searchGeographyCountryKey,
|
searchGeographyCountryKey,
|
||||||
|
duplicateDismissHints,
|
||||||
}),
|
}),
|
||||||
[
|
[
|
||||||
foundAfterYmd,
|
foundAfterYmd,
|
||||||
@ -186,6 +193,7 @@ export const OrchestratorPage: React.FC = () => {
|
|||||||
applySettingsCompanySkipList,
|
applySettingsCompanySkipList,
|
||||||
settingsSkipEmployerKeywords,
|
settingsSkipEmployerKeywords,
|
||||||
searchGeographyCountryKey,
|
searchGeographyCountryKey,
|
||||||
|
duplicateDismissHints,
|
||||||
],
|
],
|
||||||
);
|
);
|
||||||
|
|
||||||
|
|||||||
@ -1,3 +1,4 @@
|
|||||||
|
import type { DuplicateDismissReason } from "@client/lib/job-dedup";
|
||||||
import { jobMatchesAllowedCountry } from "@shared/blocked-countries";
|
import { jobMatchesAllowedCountry } from "@shared/blocked-countries";
|
||||||
import { textMatchesKeyword } from "@shared/keyword-match";
|
import { textMatchesKeyword } from "@shared/keyword-match";
|
||||||
import type { JobListItem, JobSource } from "@shared/types";
|
import type { JobListItem, JobSource } from "@shared/types";
|
||||||
@ -19,6 +20,8 @@ export type JobListFilterExtras = {
|
|||||||
settingsBlockedEmployerKeywords: string[];
|
settingsBlockedEmployerKeywords: string[];
|
||||||
/** When settings search geography is a country (e.g. Canada), hide other countries. */
|
/** When settings search geography is a country (e.g. Canada), hide other countries. */
|
||||||
searchGeographyCountryKey?: string | null;
|
searchGeographyCountryKey?: string | null;
|
||||||
|
/** Hide open jobs that match a prior skip/apply (same company + title/description). */
|
||||||
|
duplicateDismissHints?: ReadonlyMap<string, DuplicateDismissReason>;
|
||||||
};
|
};
|
||||||
|
|
||||||
const startOfLocalDayMs = (ymd: string): number =>
|
const startOfLocalDayMs = (ymd: string): number =>
|
||||||
@ -64,6 +67,7 @@ export const useFilteredJobs = (
|
|||||||
employerExclude: [],
|
employerExclude: [],
|
||||||
settingsBlockedEmployerKeywords: [],
|
settingsBlockedEmployerKeywords: [],
|
||||||
searchGeographyCountryKey: null,
|
searchGeographyCountryKey: null,
|
||||||
|
duplicateDismissHints: undefined,
|
||||||
},
|
},
|
||||||
) =>
|
) =>
|
||||||
useMemo(() => {
|
useMemo(() => {
|
||||||
@ -96,6 +100,11 @@ export const useFilteredJobs = (
|
|||||||
filtered = filtered.filter((job) => job.closedAt == null);
|
filtered = filtered.filter((job) => job.closedAt == null);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const duplicateHints = listExtras.duplicateDismissHints;
|
||||||
|
if (duplicateHints && duplicateHints.size > 0) {
|
||||||
|
filtered = filtered.filter((job) => !duplicateHints.has(job.id));
|
||||||
|
}
|
||||||
|
|
||||||
if (sourcesFilter.length > 0) {
|
if (sourcesFilter.length > 0) {
|
||||||
const allow = new Set(sourcesFilter);
|
const allow = new Set(sourcesFilter);
|
||||||
filtered = filtered.filter((job) => allow.has(job.source));
|
filtered = filtered.filter((job) => allow.has(job.source));
|
||||||
|
|||||||
@ -389,6 +389,22 @@ async function executeJobActionForJob(
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const alsoSkipped = await jobsRepo.skipOpenJobsWithMatchingDedupKeys(
|
||||||
|
{
|
||||||
|
employer: updated.employer,
|
||||||
|
title: updated.title,
|
||||||
|
jobDescription: updated.jobDescription,
|
||||||
|
},
|
||||||
|
updated.ownerProfileId,
|
||||||
|
updated.id,
|
||||||
|
);
|
||||||
|
if (alsoSkipped > 0) {
|
||||||
|
logger.info("Auto-skipped duplicate open jobs", {
|
||||||
|
jobId: updated.id,
|
||||||
|
alsoSkipped,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
return { jobId, ok: true, job: updated };
|
return { jobId, ok: true, job: updated };
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -1383,6 +1399,22 @@ jobsRouter.post("/:id/apply", async (req: Request, res: Response) => {
|
|||||||
return fail(res, notFound("Job not found"));
|
return fail(res, notFound("Job not found"));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const alsoSkipped = await jobsRepo.skipOpenJobsWithMatchingDedupKeys(
|
||||||
|
{
|
||||||
|
employer: updatedJob.employer,
|
||||||
|
title: updatedJob.title,
|
||||||
|
jobDescription: updatedJob.jobDescription,
|
||||||
|
},
|
||||||
|
updatedJob.ownerProfileId,
|
||||||
|
updatedJob.id,
|
||||||
|
);
|
||||||
|
if (alsoSkipped > 0) {
|
||||||
|
logger.info("Auto-skipped duplicate open jobs after apply", {
|
||||||
|
jobId: updatedJob.id,
|
||||||
|
alsoSkipped,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
res.json({ success: true, data: updatedJob });
|
res.json({ success: true, data: updatedJob });
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
const message = error instanceof Error ? error.message : "Unknown error";
|
const message = error instanceof Error ? error.message : "Unknown error";
|
||||||
|
|||||||
@ -5,9 +5,11 @@
|
|||||||
import { randomUUID } from "node:crypto";
|
import { randomUUID } from "node:crypto";
|
||||||
import { getJobOwnerProfileId } from "@infra/request-context";
|
import { getJobOwnerProfileId } from "@infra/request-context";
|
||||||
import { DEFAULT_JOB_OWNER_PROFILE_ID } from "@server/infra/job-owner-context";
|
import { DEFAULT_JOB_OWNER_PROFILE_ID } from "@server/infra/job-owner-context";
|
||||||
import { buildJobContentFingerprint } from "@shared/job-fingerprint";
|
import {
|
||||||
|
buildJobContentFingerprint,
|
||||||
|
collectJobDedupKeys,
|
||||||
|
} from "@shared/job-fingerprint";
|
||||||
import { canonicalizeJobUrl } from "@shared/job-url-canonical";
|
import { canonicalizeJobUrl } from "@shared/job-url-canonical";
|
||||||
import { normalizeIsRemote } from "@shared/work-arrangement";
|
|
||||||
import type {
|
import type {
|
||||||
CreateJobInput,
|
CreateJobInput,
|
||||||
Job,
|
Job,
|
||||||
@ -16,6 +18,7 @@ import type {
|
|||||||
JobsRevisionResponse,
|
JobsRevisionResponse,
|
||||||
UpdateJobInput,
|
UpdateJobInput,
|
||||||
} from "@shared/types";
|
} from "@shared/types";
|
||||||
|
import { normalizeIsRemote } from "@shared/work-arrangement";
|
||||||
import { and, desc, eq, inArray, isNull, lt, ne, sql } from "drizzle-orm";
|
import { and, desc, eq, inArray, isNull, lt, ne, sql } from "drizzle-orm";
|
||||||
import { db, schema } from "../db/index";
|
import { db, schema } from "../db/index";
|
||||||
|
|
||||||
@ -39,10 +42,13 @@ function resolveOwnerForCreate(input: CreateJobInput): string {
|
|||||||
return getJobOwnerProfileId() ?? DEFAULT_JOB_OWNER_PROFILE_ID;
|
return getJobOwnerProfileId() ?? DEFAULT_JOB_OWNER_PROFILE_ID;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const OPEN_JOB_STATUSES: JobStatus[] = ["discovered", "ready"];
|
||||||
|
|
||||||
async function loadJobDedupIndexes(ownerProfileId: string): Promise<{
|
async function loadJobDedupIndexes(ownerProfileId: string): Promise<{
|
||||||
existingCanonicalSet: Set<string>;
|
existingCanonicalSet: Set<string>;
|
||||||
existingSourceJobKeySet: Set<string>;
|
existingSourceJobKeySet: Set<string>;
|
||||||
existingContentFingerprintSet: Set<string>;
|
existingContentFingerprintSet: Set<string>;
|
||||||
|
dismissedDedupKeySet: Set<string>;
|
||||||
}> {
|
}> {
|
||||||
const rows = await db
|
const rows = await db
|
||||||
.select({
|
.select({
|
||||||
@ -52,6 +58,8 @@ async function loadJobDedupIndexes(ownerProfileId: string): Promise<{
|
|||||||
contentFingerprint: jobs.contentFingerprint,
|
contentFingerprint: jobs.contentFingerprint,
|
||||||
employer: jobs.employer,
|
employer: jobs.employer,
|
||||||
title: jobs.title,
|
title: jobs.title,
|
||||||
|
jobDescription: jobs.jobDescription,
|
||||||
|
status: jobs.status,
|
||||||
})
|
})
|
||||||
.from(jobs)
|
.from(jobs)
|
||||||
.where(eq(jobs.ownerProfileId, ownerProfileId));
|
.where(eq(jobs.ownerProfileId, ownerProfileId));
|
||||||
@ -70,27 +78,128 @@ async function loadJobDedupIndexes(ownerProfileId: string): Promise<{
|
|||||||
// recomputing it from (employer, title) so legacy rows participate in
|
// recomputing it from (employer, title) so legacy rows participate in
|
||||||
// dedup until they're rewritten.
|
// dedup until they're rewritten.
|
||||||
const existingContentFingerprintSet = new Set<string>();
|
const existingContentFingerprintSet = new Set<string>();
|
||||||
|
const dismissedDedupKeySet = new Set<string>();
|
||||||
for (const row of rows) {
|
for (const row of rows) {
|
||||||
const stored = row.contentFingerprint?.trim();
|
const stored = row.contentFingerprint?.trim();
|
||||||
if (stored) {
|
if (stored) {
|
||||||
existingContentFingerprintSet.add(stored);
|
existingContentFingerprintSet.add(stored);
|
||||||
continue;
|
} else {
|
||||||
|
const recomputed = buildJobContentFingerprint({
|
||||||
|
employer: row.employer,
|
||||||
|
title: row.title,
|
||||||
|
});
|
||||||
|
if (recomputed) {
|
||||||
|
existingContentFingerprintSet.add(recomputed);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
const recomputed = buildJobContentFingerprint({
|
|
||||||
employer: row.employer,
|
if (row.status === "skipped" || row.status === "applied") {
|
||||||
title: row.title,
|
for (const key of collectJobDedupKeys({
|
||||||
});
|
employer: row.employer,
|
||||||
if (recomputed) {
|
title: row.title,
|
||||||
existingContentFingerprintSet.add(recomputed);
|
jobDescription: row.jobDescription,
|
||||||
|
})) {
|
||||||
|
dismissedDedupKeySet.add(key);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return {
|
return {
|
||||||
existingCanonicalSet,
|
existingCanonicalSet,
|
||||||
existingSourceJobKeySet,
|
existingSourceJobKeySet,
|
||||||
existingContentFingerprintSet,
|
existingContentFingerprintSet,
|
||||||
|
dismissedDedupKeySet,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function inputMatchesDismissedDedupKeys(
|
||||||
|
input: CreateJobInput,
|
||||||
|
dismissedDedupKeySet: Set<string>,
|
||||||
|
): boolean {
|
||||||
|
if (dismissedDedupKeySet.size === 0) return false;
|
||||||
|
const keys = collectJobDedupKeys({
|
||||||
|
employer: input.employer,
|
||||||
|
title: input.title,
|
||||||
|
jobDescription: input.jobDescription,
|
||||||
|
});
|
||||||
|
return keys.some((key) => dismissedDedupKeySet.has(key));
|
||||||
|
}
|
||||||
|
|
||||||
|
async function findDismissedJobByDedupKeys(
|
||||||
|
keys: string[],
|
||||||
|
ownerProfileId: string,
|
||||||
|
): Promise<Job | null> {
|
||||||
|
if (keys.length === 0) return null;
|
||||||
|
const rows = await db
|
||||||
|
.select()
|
||||||
|
.from(jobs)
|
||||||
|
.where(
|
||||||
|
and(
|
||||||
|
eq(jobs.ownerProfileId, ownerProfileId),
|
||||||
|
inArray(jobs.status, ["skipped", "applied"]),
|
||||||
|
),
|
||||||
|
);
|
||||||
|
for (const row of rows) {
|
||||||
|
const rowKeys = collectJobDedupKeys({
|
||||||
|
employer: row.employer,
|
||||||
|
title: row.title,
|
||||||
|
jobDescription: row.jobDescription,
|
||||||
|
});
|
||||||
|
if (rowKeys.some((key) => keys.includes(key))) {
|
||||||
|
return mapRowToJob(row);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Skip other open jobs that match the same employer/title or description keys.
|
||||||
|
*/
|
||||||
|
export async function skipOpenJobsWithMatchingDedupKeys(
|
||||||
|
anchor: {
|
||||||
|
employer: string;
|
||||||
|
title: string;
|
||||||
|
jobDescription?: string | null;
|
||||||
|
},
|
||||||
|
ownerProfileId: string,
|
||||||
|
excludeJobId: string,
|
||||||
|
): Promise<number> {
|
||||||
|
const anchorKeys = collectJobDedupKeys(anchor);
|
||||||
|
if (anchorKeys.length === 0) return 0;
|
||||||
|
|
||||||
|
const rows = await db
|
||||||
|
.select({
|
||||||
|
id: jobs.id,
|
||||||
|
employer: jobs.employer,
|
||||||
|
title: jobs.title,
|
||||||
|
jobDescription: jobs.jobDescription,
|
||||||
|
})
|
||||||
|
.from(jobs)
|
||||||
|
.where(
|
||||||
|
and(
|
||||||
|
eq(jobs.ownerProfileId, ownerProfileId),
|
||||||
|
inArray(jobs.status, OPEN_JOB_STATUSES),
|
||||||
|
ne(jobs.id, excludeJobId),
|
||||||
|
),
|
||||||
|
);
|
||||||
|
|
||||||
|
let skipped = 0;
|
||||||
|
for (const row of rows) {
|
||||||
|
const rowKeys = collectJobDedupKeys({
|
||||||
|
employer: row.employer,
|
||||||
|
title: row.title,
|
||||||
|
jobDescription: row.jobDescription,
|
||||||
|
});
|
||||||
|
if (!rowKeys.some((key) => anchorKeys.includes(key))) continue;
|
||||||
|
const updated = await updateJob(
|
||||||
|
row.id,
|
||||||
|
{ status: "skipped" },
|
||||||
|
ownerProfileId,
|
||||||
|
);
|
||||||
|
if (updated) skipped += 1;
|
||||||
|
}
|
||||||
|
return skipped;
|
||||||
|
}
|
||||||
|
|
||||||
async function findJobByCanonicalUrl(
|
async function findJobByCanonicalUrl(
|
||||||
canonical: string,
|
canonical: string,
|
||||||
ownerProfileId: string,
|
ownerProfileId: string,
|
||||||
@ -480,8 +589,23 @@ export async function createJobs(
|
|||||||
existingCanonicalSet,
|
existingCanonicalSet,
|
||||||
existingSourceJobKeySet,
|
existingSourceJobKeySet,
|
||||||
existingContentFingerprintSet,
|
existingContentFingerprintSet,
|
||||||
|
dismissedDedupKeySet,
|
||||||
} = await loadJobDedupIndexes(ownerProfileId);
|
} = await loadJobDedupIndexes(ownerProfileId);
|
||||||
|
|
||||||
|
if (
|
||||||
|
inputMatchesDismissedDedupKeys(normalizedWithOwner, dismissedDedupKeySet)
|
||||||
|
) {
|
||||||
|
const existing = await findDismissedJobByDedupKeys(
|
||||||
|
collectJobDedupKeys({
|
||||||
|
employer: normalized.employer,
|
||||||
|
title: normalized.title,
|
||||||
|
jobDescription: normalized.jobDescription,
|
||||||
|
}),
|
||||||
|
ownerProfileId,
|
||||||
|
);
|
||||||
|
if (existing) return existing;
|
||||||
|
}
|
||||||
|
|
||||||
const sid = normalized.sourceJobId?.trim();
|
const sid = normalized.sourceJobId?.trim();
|
||||||
if (sid) {
|
if (sid) {
|
||||||
const sk = sourceJobKey(normalized.source, sid);
|
const sk = sourceJobKey(normalized.source, sid);
|
||||||
@ -537,6 +661,7 @@ export async function createJobs(
|
|||||||
existingCanonicalSet,
|
existingCanonicalSet,
|
||||||
existingSourceJobKeySet,
|
existingSourceJobKeySet,
|
||||||
existingContentFingerprintSet,
|
existingContentFingerprintSet,
|
||||||
|
dismissedDedupKeySet,
|
||||||
} = await loadJobDedupIndexes(ownerProfileId);
|
} = await loadJobDedupIndexes(ownerProfileId);
|
||||||
|
|
||||||
const batchBuckets = new Map<
|
const batchBuckets = new Map<
|
||||||
@ -582,6 +707,10 @@ export async function createJobs(
|
|||||||
const sid = input.sourceJobId?.trim();
|
const sid = input.sourceJobId?.trim();
|
||||||
const sk = sid ? sourceJobKey(input.source, sid) : null;
|
const sk = sid ? sourceJobKey(input.source, sid) : null;
|
||||||
|
|
||||||
|
if (inputMatchesDismissedDedupKeys(input, dismissedDedupKeySet)) {
|
||||||
|
skipped += count;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
if (sk && existingSourceJobKeySet.has(sk)) {
|
if (sk && existingSourceJobKeySet.has(sk)) {
|
||||||
skipped += count;
|
skipped += count;
|
||||||
continue;
|
continue;
|
||||||
|
|||||||
@ -1,6 +1,8 @@
|
|||||||
import { describe, expect, it } from "vitest";
|
import { describe, expect, it } from "vitest";
|
||||||
import {
|
import {
|
||||||
buildJobContentFingerprint,
|
buildJobContentFingerprint,
|
||||||
|
buildJobDescriptionFingerprint,
|
||||||
|
collectJobDedupKeys,
|
||||||
normalizeEmployerForFingerprint,
|
normalizeEmployerForFingerprint,
|
||||||
normalizeTitleForFingerprint,
|
normalizeTitleForFingerprint,
|
||||||
} from "./job-fingerprint";
|
} from "./job-fingerprint";
|
||||||
@ -64,6 +66,29 @@ describe("buildJobContentFingerprint", () => {
|
|||||||
expect(a).not.toBe(b);
|
expect(a).not.toBe(b);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it("matches reposts with the same employer and description body", () => {
|
||||||
|
const description =
|
||||||
|
"We are hiring an Automation Test Engineer to build scalable test frameworks. ".repeat(
|
||||||
|
4,
|
||||||
|
);
|
||||||
|
const a = buildJobDescriptionFingerprint({
|
||||||
|
employer: "Joveo",
|
||||||
|
jobDescription: description,
|
||||||
|
});
|
||||||
|
const b = buildJobDescriptionFingerprint({
|
||||||
|
employer: "Joveo",
|
||||||
|
jobDescription: description,
|
||||||
|
});
|
||||||
|
expect(a).toBe(b);
|
||||||
|
expect(
|
||||||
|
collectJobDedupKeys({
|
||||||
|
employer: "Joveo",
|
||||||
|
title: "SDET",
|
||||||
|
jobDescription: description,
|
||||||
|
}),
|
||||||
|
).toContain(a);
|
||||||
|
});
|
||||||
|
|
||||||
describe("normalizers", () => {
|
describe("normalizers", () => {
|
||||||
it("normalizeEmployerForFingerprint strips legal suffixes", () => {
|
it("normalizeEmployerForFingerprint strips legal suffixes", () => {
|
||||||
expect(normalizeEmployerForFingerprint("Acme Corporation")).toBe("acme");
|
expect(normalizeEmployerForFingerprint("Acme Corporation")).toBe("acme");
|
||||||
|
|||||||
@ -75,3 +75,49 @@ export function buildJobContentFingerprint(args: {
|
|||||||
if (!employer || !title) return null;
|
if (!employer || !title) return null;
|
||||||
return `${employer}::${title}`;
|
return `${employer}::${title}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const DESCRIPTION_MIN_CHARS = 80;
|
||||||
|
|
||||||
|
export function normalizeDescriptionForFingerprint(
|
||||||
|
jobDescription: string | null | undefined,
|
||||||
|
): string {
|
||||||
|
if (!jobDescription?.trim()) return "";
|
||||||
|
let value = stripDiacritics(jobDescription.toLowerCase());
|
||||||
|
value = value.replace(/<[^>]+>/g, " ");
|
||||||
|
value = value.replace(PUNCTUATION_RE, " ");
|
||||||
|
value = value.replace(WHITESPACE_RE, " ").trim();
|
||||||
|
if (value.length < DESCRIPTION_MIN_CHARS) return "";
|
||||||
|
return value.slice(0, 400);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Same employer + materially similar description (cross-posted copy).
|
||||||
|
*/
|
||||||
|
export function buildJobDescriptionFingerprint(args: {
|
||||||
|
employer: string | null | undefined;
|
||||||
|
jobDescription: string | null | undefined;
|
||||||
|
}): string | null {
|
||||||
|
const employer = normalizeEmployerForFingerprint(args.employer);
|
||||||
|
const description = normalizeDescriptionForFingerprint(args.jobDescription);
|
||||||
|
if (!employer || !description) return null;
|
||||||
|
return `${employer}::desc::${description}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function collectJobDedupKeys(args: {
|
||||||
|
employer: string | null | undefined;
|
||||||
|
title: string | null | undefined;
|
||||||
|
jobDescription?: string | null | undefined;
|
||||||
|
}): string[] {
|
||||||
|
const keys = new Set<string>();
|
||||||
|
const content = buildJobContentFingerprint({
|
||||||
|
employer: args.employer,
|
||||||
|
title: args.title,
|
||||||
|
});
|
||||||
|
if (content) keys.add(content);
|
||||||
|
const description = buildJobDescriptionFingerprint({
|
||||||
|
employer: args.employer,
|
||||||
|
jobDescription: args.jobDescription,
|
||||||
|
});
|
||||||
|
if (description) keys.add(description);
|
||||||
|
return [...keys];
|
||||||
|
}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user