Resume screening you could defend in front of a regulator
Trigger: ATS webhook · Pattern: rubric score → advance flows, declines pause · Sample:
examples/use-cases/resume-screening.toml· Status: runs today (intel-remote,schema,tools-http-tls)
The problem
AI resume screening is the use case everyone wants and nobody wants to defend. The efficiency is real — hundreds of applications per posting, most clearly not a fit. The risk is real too: hiring decisions are regulated (NYC Local Law 144 audits automated screens; the EU AI Act classes hiring AI as high-risk), and "the model rejected them and we can't say why" is a sentence with legal consequences.
The trap is treating this as a model-quality problem. It's a process problem: who decided, against what criteria, with what record?
What the agent does
- The ATS webhooks each new application.
- One LLM step scores it against the rubric written verbatim in the
workflow — relevant shipped work (double weight), stack depth,
communication clarity — with the explicit instruction to ignore
name, school prestige, gaps, and demographic signals. The output is
schema-forced:
{score 1-5, strengths, gaps, recommendation: advance|decline}. - The asymmetry that makes it defensible:
- advance → flows back to the ATS automatically. A false positive costs an interviewer thirty minutes.
- decline → the run checkpoints. A recruiter reads the scorecard and resumes it. Their resume action is the decision. No candidate is rejected by a machine.
[[edges]]
from = "gate"
when = "decline"
to = "human_owns_declines" # pause_for_approval — a person decides
The audit story is the product
Compliance for automated hiring tools comes down to three questions, and this architecture answers each with an artifact rather than a policy memo:
- What are the criteria? — The rubric is in the workflow file: version-controlled, diffable, and ed25519-signable, so the process that ran is provably the process that was approved. Every candidate is scored by the same prompt — no recruiter-by-recruiter drift.
- What happened for this candidate? — Run with
--record: input, scorecard, gate decision, who resumed, timestamps. Per-candidate evidence, machine-readable. - Does it behave consistently? — The conformance
suite runs a corpus of test applications through
the real workflow with pass-rate bars (
min_pass_rate) — including paired applications that differ only in demographic signals, which must score identically. Drift detection re-runs the corpus after every model update and fails CI on regression; the model changing under you is the quiet killer of "we audited it once."
Honest limits
- A screen, not a judge: it sorts the obvious so humans spend judgment where it's contested. The decline gate must stay human — that's not a v2 to automate away; it's the design.
- Bias mitigation here is structural (same rubric, paired-input tests, human declines), not a fairness proof. Run your jurisdiction's audit regime on top — the run records are exactly the dataset it needs.
- Resume text arrives via the ATS webhook; parsing PDFs upstream is the usual document gap.