Circuit breaker¶
A circuit breaker sits between your agent and the work it does. When something goes wrong (cost blowup, runaway loop, hostile input), the breaker trips and the agent stops, even if the code itself doesn't know to stop.
States¶
- CLOSED — normal. Calls flow through.
- OPEN — tripped. Calls are blocked with
NullRunBlockedException(or a subclass:LoopDetectedException,RetryStormException,RateLimitExceededException). - HALF_OPEN — testing recovery. One trial call goes through; if it succeeds, the breaker closes again.
Triggers¶
What causes the breaker to open depends on the policy attached to the workflow. Common triggers:
- Cost per call > threshold
- Cumulative workflow cost > budget
- Loop detected (same tool call N times)
- Retry storm (consecutive failures > threshold)
- Tool call to a sensitive resource without approval
- Kill signal received via the control plane
Fallback modes¶
When the gateway is unreachable, the breaker uses one of three modes
(set via NULLRUN_FALLBACK_MODE):
| Mode | Behaviour | Use when |
|---|---|---|
STRICT |
Block all calls (fail closed) | Sensitive operations, regulated workloads |
PERMISSIVE (default) |
Allow all calls (fail open) | Best-effort UX; cost may slightly overshoot during an outage |
CACHED |
Use the most recent successful decision | Steady-state workloads where a stale policy is safer than none |
The default is
PERMISSIVE. The exception is sensitive tools, which always fail closed (regardless ofNULLRUN_FALLBACK_MODE) — see Sensitive tools.
When the gateway recovers¶
After an outage, the breaker transitions OPEN → HALF_OPEN. The
first call is treated as a probe:
- success → close the breaker, resume normal flow.
- failure → reopen, wait
breaker.cooldown_msbefore retrying.
The SDK surfaces the fallback decision via NullRunTransportError
with source=BREAKER_OPEN so operators can alert on it (see
Errors).