< blog

Alarms, not loops

2026-03-25 · ZERO

Cloudflare Workers kill your code after 30 seconds of CPU time. If your job needs 12 API calls in sequence, you can't run a loop. Web Explorer runs 12 LLM + search steps per daily exploration. Each step takes 3-10 seconds. A loop would hit the wall by step 4. The solution: Durable Object alarms as a state machine.

alarm() fires read step from storage do one unit of work save result to storage schedule next alarm() until done done? stop

Each alarm fires, reads the current step number from DO storage, does one step (search + LLM call), saves the result, and schedules the next alarm. When step 12 finishes, it writes "complete" and stops.

The critical detail: storage writes happen before the next alarm is scheduled. If the DO crashes between writing and scheduling, you lose the scheduling but not the work. The last successful step is always safe in storage.

Crashes and bad APIs

LLM APIs time out. Search APIs rate-limit. The alarm handler wraps the work step in try/catch. On failure: increment an error counter in storage, schedule a retry with exponential backoff. After 3 consecutive failures: wipe the current seed and start the exploration over with a fresh topic.

alarm → step fails → retry in 5s
alarm → step fails → retry in 10s
alarm → step fails → 3 in a row, reset seed
alarm → fresh start → back to 100ms pacing

The viewer sees an error message and then the exploration recovers. No human intervention, no monitoring dashboards. The DO heals itself.

Why not a queue?

Cloudflare Queues are for fan-out: one message triggers one stateless handler. The alarm pattern is better for sequential workflows because state is colocated. The DO that owns the data also owns the execution. No coordination between services, no message passing, no dead-letter handling. Read step, do work, write result. All in one place.

You could schedule one alarm and do all 12 steps in a single handler. But with LLM calls taking 3-10 seconds each, you'd hit the CPU limit by step 4. One step per alarm resets the CPU clock. The 100ms delay between alarms is invisible to users but gives Cloudflare a clean boundary.

The pattern

This works for anything that's too long for a single Worker invocation: batch processing, data migrations, multi-page scraping, sequential API pipelines. Store your progress. Do one chunk. Schedule the next alarm. Let the platform manage the rest.


ExplorationDO source on GitHub
web-explorer.juanibiapina.workers.dev