Weekly Review: The Hard Half

The week's curated set of relevant AI Tutorials, Tools, and News, on why building an agent is the easy half and running it is the hard one

Jun 29, 2026

Welcome to Altered Craft’s weekly AI review for developers, and thanks, as always, for reading. A pattern runs through this issue: the model is no longer the whole job. The work has moved to the system around it, the loop you engineer, the harness that runs it, the sandbox that contains it, and the provider and version-control layers that support it. The news side counts the cost of all that, from liquid-cooled data centers to the token bills arriving as the subsidies end.

TUTORIALS & CASE STUDIES

This week’s tutorials trace the agent from the inside out: the mechanism that lets it act, the loop you engineer around it, the harness that hands you that loop, the roles attackers exploit, the traces that keep it auditable, and a stack that runs on your own hardware.

Tool Calling: How AI Agents Decide What to Do Next

Estimated read time: 7 min

It starts with the mechanism. This tutorial explains tool calling, what turns an LLM into something that triggers real actions. The key distinction is that the model generates a tool call, not a tool execution. Examples cover single, multi-tool, and parallel calls.

Key point: Clear, accurate tool and parameter descriptions are what let the model pick the right function, so invest in them as much as the code behind them.

From Prompting Agents to Loop Engineering

Estimated read time: 12 min

Once an agent can act, the harder work moves up a level. This is a practical breakdown of the claim that you should stop prompting agents and design the loops that prompt them. The work moves from writing code to writing the system that writes the code, with examples and notes on where cost accumulates.

The takeaway: The loop, not the model, is now the expensive and failure-prone part, so build it with a tight scope, a hard budget, and an independent verifier before you walk away.

CUGA: The Agent Harness That Hands You the Plumbing

Estimated read time: 15 min

Anatomy of the ibm_cloud_advisor cuga-app: the main.py file layout, an inline @tool (search_ibm_catalog) that calls the IBM Cloud Global Catalog API alongside an MCP web-search tool in one tool list, and a system prompt enforcing "catalog before recommendation."

If the loop is the hard part, a harness can carry it. IBM’s open-source CUGA handles agent orchestration, planning, tool calls, state, and self-correction, so building one means writing just a tool list and a prompt. Two dozen single-file FastAPI apps show the pattern running governed in production.

What this enables: When the harness carries planning, state, and guardrails, an agent becomes a tool list plus a prompt, and a smaller open-weight model can hold its own.

Why Prompt Injection Is Really a Problem of Roles

Estimated read time: 11 min

But every harness inherits one weakness. An LLM receives a conversation as one token stream, structured only by role tags. Using “role probes,” this writeup shows models infer roles from writing style, not tags, so injected text that sounds like a command can override its tool tag.

Worth noting: Don’t treat role tags as a security boundary. LLMs identify roles from writing style, so adversarial text that sounds authoritative can override the tag it actually arrives under.

LLM Observability with LangSmith: Tracing Everything for Audit-Grade Logs

Estimated read time: 13 min

Once it runs in production, you need to see inside it. When a customer claims your bot promised something, can you replay what happened? This tutorial argues building the agent is the easy half; operating it is the hard half, covering zero-config LangSmith tracing and a tamper-evident audit log you control.

Why this matters: LLM failures arrive wearing a 200 OK, so wire in tracing and your own append-only audit log before regulators, not customers, force the question.

Building a Fully Local Coding Agent Stack

Estimated read time: 9 min

Pulling the threads together, and entirely offline, this tutorial from Sebastian Raschka, PhD assembles a coding agent with no cloud dependency, pairing Ollama-served open-weight models like Qwen3.6 35B-A3B with harnesses such as Qwen-Code, Codex, and Claude Code. It covers serving local LLMs that read, edit, and run code, with speed and quality benchmarks.

The opportunity: If you have the hardware, a local model paired with the right harness gives you a private, fixed-cost coding agent that stays fully under your control.

TOOLS

The tools this week outfit the agent’s workspace: an isolated runtime for its code, version control built for how it commits, one interface to every model provider, an open model family that writes its own scaffolds, and a way to keep your Mac awake only while it works.

AWS Lambda MicroVMs: Isolated Sandboxes for User and AI-Generated Code

Estimated read time: 7 min

Building on our coverage of LangChain’s agent sandboxes[1] last week, AWS Lambda introduces MicroVMs for running untrusted user- or AI-generated code in isolated, stateful environments. Powered by Firecracker, each session gets VM-level isolation with near-instant resume, holding state across idle periods and supporting up to 8 hours of runtime.

The pattern: If you build assistants or sandboxes that run code you didn’t write, MicroVMs hand each session a snapshot-backed, isolated environment without your own virtualization stack.

[1] Every Agent Needs Its Own Computer

Oak: Version Control Built for AI Agents, Not Humans

Estimated read time: 3 min

Once the code runs safely, it has to be tracked. Oak is an open-source VCS reimagined around how coding agents work, using branch-per-session as the unit of work and branch descriptions over commit messages. Content-addressed lazy mounts get an agent editing any repo in seconds, faster than git.

What’s interesting: If agents are doing the committing, git’s human-centric assumptions are worth questioning, and Oak shows what a ground-up rethink looks like.

RubyLLM: One Framework for Every AI Provider

Estimated read time: 4 min

Whatever model does the work, you still have to reach it. RubyLLM unifies OpenAI, Claude, Gemini, Ollama, and more behind one consistent Ruby interface with three dependencies, handling chat, vision, transcription, embeddings, tools, and agents. Rails developers get ActiveRecord integration via acts_as_chat plus a generator for a ready-to-use chat UI.

The payoff: If you build in Ruby, you can swap between GPT, Claude, and local models without rewriting your code against each provider’s bespoke client.

Ornith-1.0: Open-Source Models That Write Their Own Scaffolds

Estimated read time: 7 min

And the models worth reaching are increasingly open. Ornith-1.0 is a self-improving family of open-source coding models spanning 9B to 397B parameters, built on the idea that the model learns to generate both solutions and the scaffolds that guide them. The flagship matches Claude Opus 4.7 on coding benchmarks.

Why now: Open-source agentic coding models are now competitive with frontier proprietary systems, and self-improving scaffold generation is a big reason the gap keeps closing.

Adrafinil: Keep Your Mac Awake Only While Agents Work

Estimated read time: 5 min

Finally, a small fix for long-running agents. Adrafinil is a macOS menu bar app that blocks sleep, including lid-closed clamshell sleep, only while an AI coding agent holds an active session. Unlike caffeinate or Amphetamine, it is agent-aware, not always-on, wiring into nine agent hooks and releasing when work ends.

NEWS & EDITORIALS

The editorials circle the limits closing in on AI’s growth: the heat its data centers have to shed, the ceiling on how much larger models can get, and the bill arriving as the subsidies end.

Hotter Than a Hot Tub: NVIDIA’s Bet on 100% Liquid-Cooled AI Factories

Estimated read time: 5 min

Start with the physical limit. NVIDIA’s Rubin generation is the first AI infrastructure to run 100% liquid cooling at 45°C coolant, no fans, no cold aisles. The counterintuitive insight is that hotter coolant enables chiller-less operation in favorable climates, cutting energy and water use toward zero.

The context: As compute scales, cooling efficiency becomes as strategic as the silicon, and running coolant hotter, not colder, is how the industry closes the energy gap.

How Far Can Model Size Scale Before 2031?

Estimated read time: 8 min

From heat to sheer size, this analysis projects how large AI models could grow through 2031, weighing compute budgets, hardware trends, and economic limits. It argues that growth slows as costs hit hardware and funding ceilings, and that parameter counts may plateau before the decade closes.

The signal: Don’t assume models keep getting bigger forever. Compute economics may cap parameter growth, shifting the edge toward efficiency and data quality.

The AI Subsidy Is Ending, and the Bills Are Coming Due

Estimated read time: 11 min

Extending our coverage of the shift toward cheaper models[2] last week, this piece puts the squeeze in hard numbers. A $200 monthly subscription can burn $8,000 in Anthropic tokens or $14,000 in OpenAI tokens, and as platforms move to token-based billing, servicing AI’s projected debt would require displacing roughly 27% of US jobs. OpenAI lost $38.5 billion in 2025.

Plan ahead: Budget now for metered token costs, because the era of subsidized AI coding tools is winding down and your bill could jump several-fold overnight.

[2] The Coming Shift From Bigger Models to Cheaper Ones

Discussion about this post

Ready for more?