Order fraud review in three lanes — and what webhook retries teach us about honesty
Trigger: order webhook · Pattern: risk score → three-lane switch → fulfill / queue / hold · Sample:
examples/use-cases/fraud-review.toml· Status: runs today (intel-remote,schema,tools-http-tls), redelivery-safe as of v1.2.0
The problem
Every order is a race between two costs: hold a good order and you annoy a customer and slow revenue; release a bad one and you eat the chargeback. Pure-rules engines age badly (fraudsters read your rules faster than you write them); pure-ML scores are opaque exactly when the fraud team asks "why did this one get through?"; manual review of everything doesn't survive your first good sales day.
What fraud teams actually run is lanes: obvious-fine flows, obvious-bad stops, ambiguous gets a human. The question is what decides the lane, and whether you can explain it on Thursday.
What the agent does
The order webhook arrives (HMAC-verified, rate-limited), and within a 30-second budget — an order is waiting:
- One schema-enforced LLM step weighs the signals fraud analysts
actually use — billing/shipping mismatch, value vs account age, rush
shipping on resellables, disposable email — and must commit to
{band: low|medium|high, signals: "…in plain language"}. - A
switchroutes the lane:- low →
POST /fulfill. Ship it. - medium → the run checkpoints into the fraud queue. An analyst resumes to fulfill, or kills it to refund.
- high →
POST /hold, alert the channel. No automatic fulfillment path exists from this lane — not as policy, as graph shape: there is no edge fromhightofulfill.
- low →
- The
signalsfield rides along to the shop API and the alert — the "why" is attached to every decision at decision time, not reconstructed for the chargeback dispute.
Why the lanes belong in the graph
The LLM contributes judgment; the lanes are declared edges. That split is what makes the system tunable under pressure: after a bad week, tightening "medium" to start at a lower threshold is a prompt edit reviewed in a pull request — the routing, the queue, the audit trail don't move. The model can be replaced entirely (or A/B'd via a second backend) without touching the lanes.
The honest part: webhooks lie about "once"
Every webhook provider redelivers — timeouts, retries, at-least-once semantics. Deliver the same order twice to a naive automation and it fulfills twice. This workflow's route declares the answer (shipped in v1.2.0, promoted by exactly this use case in the gap analysis §5):
[[http_routes]]
path = "/orders/created"
idempotency_key = "trigger.order.id"
A redelivered order id replays the recorded decision (marked
X-Agentd-Idempotent-Replay: true) instead of re-running the workflow —
at-least-once delivery collapses to exactly-once effect at the route
boundary. The semantics are deliberately conservative: a missing key is
a 400 and nothing runs; a concurrent duplicate gets a 409 while the
first is in flight; failed runs are not recorded, so a genuine
failure stays retryable by the provider's own redelivery.
Honest limits
- The LLM sees what the webhook carries. Velocity features ("third
card on this device today") live in your fraud data platform — fetch
them with an allowlisted
http_requestbefore scoring, same pattern. - Sub-second decisioning at checkout is a different sport; this design reviews orders post-placement, pre-fulfillment, where a 5-second budget is generous.