Order fraud review in three lanes — and what webhook retries teach us about honesty

Trigger: order webhook · Pattern: risk score → three-lane switch → fulfill / queue / hold · Sample: examples/use-cases/fraud-review.toml · Status: runs today (intel-remote,schema,tools-http-tls), redelivery-safe as of v1.2.0

The problem

Every order is a race between two costs: hold a good order and you annoy a customer and slow revenue; release a bad one and you eat the chargeback. Pure-rules engines age badly (fraudsters read your rules faster than you write them); pure-ML scores are opaque exactly when the fraud team asks "why did this one get through?"; manual review of everything doesn't survive your first good sales day.

What fraud teams actually run is lanes: obvious-fine flows, obvious-bad stops, ambiguous gets a human. The question is what decides the lane, and whether you can explain it on Thursday.

What the agent does

The order webhook arrives (HMAC-verified, rate-limited), and within a 30-second budget — an order is waiting:

  1. One schema-enforced LLM step weighs the signals fraud analysts actually use — billing/shipping mismatch, value vs account age, rush shipping on resellables, disposable email — and must commit to {band: low|medium|high, signals: "…in plain language"}.
  2. A switch routes the lane:
    • lowPOST /fulfill. Ship it.
    • medium → the run checkpoints into the fraud queue. An analyst resumes to fulfill, or kills it to refund.
    • highPOST /hold, alert the channel. No automatic fulfillment path exists from this lane — not as policy, as graph shape: there is no edge from high to fulfill.
  3. The signals field rides along to the shop API and the alert — the "why" is attached to every decision at decision time, not reconstructed for the chargeback dispute.

Why the lanes belong in the graph

The LLM contributes judgment; the lanes are declared edges. That split is what makes the system tunable under pressure: after a bad week, tightening "medium" to start at a lower threshold is a prompt edit reviewed in a pull request — the routing, the queue, the audit trail don't move. The model can be replaced entirely (or A/B'd via a second backend) without touching the lanes.

The honest part: webhooks lie about "once"

Every webhook provider redelivers — timeouts, retries, at-least-once semantics. Deliver the same order twice to a naive automation and it fulfills twice. This workflow's route declares the answer (shipped in v1.2.0, promoted by exactly this use case in the gap analysis §5):

[[http_routes]]
path = "/orders/created"
idempotency_key = "trigger.order.id"

A redelivered order id replays the recorded decision (marked X-Agentd-Idempotent-Replay: true) instead of re-running the workflow — at-least-once delivery collapses to exactly-once effect at the route boundary. The semantics are deliberately conservative: a missing key is a 400 and nothing runs; a concurrent duplicate gets a 409 while the first is in flight; failed runs are not recorded, so a genuine failure stays retryable by the provider's own redelivery.

Honest limits