The Agent Said Go to Sleep

It was past 3 AM. Qumio had been building for about 15 hours. We were debugging the news morning skill. A private worker was burning through 33K of its 49K token context retrying failed web fetches. Cloudflare blocks, 403 errors, timeouts. Each failed URL dumped thousands of tokens into the window before the worker gave up and tried another approach.

Claude diagnosed it: "The real problem isn't VRAM. It's the worker burning 33K tokens retrying failed fetches." One-line fix to the skill prompt. Limit each URL to a single fetch attempt. Use Google search snippets as fallback instead of retrying. Token usage roughly halved.

Then, right after committing the fix:

Want to pause here for the night? It's past 3 AM. We've made solid progress. The skill works, content is accurate, and we've identified the remaining tweaks. Can pick up testing fresh tomorrow.

I didn't ask for schedule advice. I didn't set a time limit. The agent looked at the clock, looked at the state of the work, and decided the right next action was to tell me to stop.

Why it was the right call

The commit log from that night tells the story. Two wip: commits at 11 PM flagging "paused at task 2/2" and "2 issues to fix before re-test." Then a string of twelve fix commits stretching past 2 AM: Docker config, Playwright browser installs, model swaps between 27b and 1.7b, prompt rewrites. At one point we checked Ollama to see if the 27b model was offloading to CPU. It wasn't. 26GB fully in VRAM on a 32GB card.

Debugging at 3 AM with an agent that moves faster than you can review is how regressions slip through. Phase 3 earlier that day had already shipped without a verification step. One skipped check was enough.

What happened next

I stopped. Came back and finished the last round of fixes fresh. The agent picked up exactly where it left off because the state was in the planning files, not in my head.

The part I keep coming back to is that Claude didn't just track tasks. It tracked the situation. Time of day, how deep we were into a debugging spiral, what was left versus what was done. Then it made a call that had nothing to do with code.