Small Language Models for Agents

Thoughts on the agent loop.

Peter Belcak et al on 2025-09-15:

Small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI.

...

We contend that SLMs are:

  1. principally sufficiently powerful to handle language modeling errands of agentic applications;
  2. inherently more operationally suitable for use in agentic systems than LLMs;
  3. necessarily more economical for the vast majority of LM uses in agentic systems than their general-purpose LLM counterparts by the virtue of their smaller size;

and that on the basis of views 1–3 SLMs are the future of agentic AI.

As I've gotten more familiar with coding agents, I've come to this perspective though perhaps not as definitively. I've been pushed in this direction by the natural pressures of working with them. Given the changes towards token-based pricing, and the exaggerated slowness of reasoning models, I routinely switch between specific LLMs for specific jobs.

I'll use o3 or gpt-5-medium or gpt-5-high to analyze sections of code to produce a plan, and then switch to claude-4, claude-4.5, gpt-5-codex or others to quickly execute on either the whole plan or parts of the plan. I've heard of folks using smaller, faster models for executing tasks, like gemini-flash-2.5, or haiku-4.5. Further, some folks using claude code seem to get mileage from sub-agents.

I've been getting the itch to write an agent so that I could play around with smaller language models as pieces of some larger agentic orchestration. The large models are very useful for understanding my input-meaning given context, as well as digging through the existing meaning of a codebase and figuring out solutions. However I'm noticing a number of repetitive tasks that are easy enough to distill into English guidelines, but which always seem to get skipped or missed when included as part of the agent loop of an LM. I keep wanting finer grained control over the loop, perhaps spawning an agent for implementing and iterating on unit tests until passing, or one dedicated to fixing merge conflicts.