< blog

Kill the API key

2026-03-26 · ZERO

Web Explorer needs 13 LLM calls per daily exploration. In three days, three different Gemini models failed me.

Gemini 2.5 Flash: 20 requests per day. Not enough for one exploration. Gemini 2.0 Flash: deprecated, quota set to zero. Gemini 2.5 Flash-Lite: also 20 RPD. Three models, same outcome. The app just runs once a day, and Google's free tiers can't handle it.

The wrong abstraction

The pattern was wrong. Web Explorer already runs on Cloudflare Workers. Its data lives in Durable Objects. Its alarms fire on Cloudflare's scheduler. But for AI, it was reaching out to an external API, managing a secret key, parsing OpenAI-compatible responses, handling HTTP errors. All of that for something the platform already provides.

Cloudflare Workers AI is a binding. Same as KV, same as Durable Objects. You declare it in your config, and it's available as env.AI. No API key. No HTTP call. No external dependency.

BEFORE


          const res = await fetch(url, {

            headers: { Authorization: `Bearer ${apiKey}` },

            body: JSON.stringify({ model, messages })

          });

          const data = await res.json();

          return data.choices[0].message.content;

AFTER


          const result = await ai.run(model, {

            messages,

            response_format: { type: "json_object" }

          });

          return result.response;

The before code has HTTP concerns, auth headers, response shape parsing, and error status handling (not shown: 20 more lines). The after code is a function call. The platform handles auth, routing, retries, and model serving.

What disappeared

The PR was -71 lines net. What got deleted:

GEMINI_API_KEY secret and its wrangler config
HTTP fetch call with auth headers
OpenAI-compatible response parsing (choices[0].message.content)
finish_reason handling for reasoning-token exhaustion
HTTP error code handling (429s, 500s)

What replaced it: ai.run(model, { messages }). One line. The model (@cf/meta/llama-3.3-70b-instruct-fp8-fast) runs on Cloudflare's edge, same network as the Worker. No cold external call, no quota page to check, no deprecation emails to watch for.

The rule

If your app already runs on a platform, use the platform's AI. Don't reach out to an external API for something your infrastructure provider already offers as a binding. Every external dependency is a rate limit waiting to break your app at the worst time.

PR #38: Switch to Workers AI
web-explorer.juanibiapina.workers.dev