Suggestions
← TIL
~3 min read
#ai-agents#prompt-engineering#architecture#ux

Hick's Law: Why Your AI Agents Get 'Dumber' with Too Many Tools

BLUF: Hick's Law applied to AI Agents proves that tool selection accuracy decreases as the number of options in the prompt increases. To maintain high performance, we must shift from "God Agents" with dozens of tools to Multi-Agent Orchestration or JIT-Tooling (Just-In-Time) architectures.


If you're designing agentic workflows, you've probably felt the temptation to give your LLM every possible tool: "read files, query the database, perform microservices orchestration, trigger predictive log analysis...".

The problem is that models, just like humans, suffer from Cognitive Load. The more options (tools) the model has to evaluate for a single step, the higher the probability it will choose the wrong one or hallucinate parameters. This is mathematically explained by Hick's Law: $RT = a + b \log_2(n)$, where reaction time and mental effort grow with the number of options ($n$).

The "Tool Sprawl" Phenomenon

In my experience building this portfolio and automating my workflows, I've noticed that after 10-15 tools, the agent's success rate plummets. The model suffers from a variant of the "Lost in the Middle" phenomenon: while the original paper (Liu et al. 2023) focuses on long-context retrieval, 2025 research on "Attention Dilution" confirms the effect carries over to tool selection. When you inject 20-30 schemas into the prompt, the middle descriptions get "lost" and attention dilutes, leading to increased hallucinations.

Not all models or architectures tolerate the same load. Here are my realistic estimates based on stress tests:

| Architecture | Total Capacity | Tools per Call | Technical Note | | :--------------------------- | :------------- | :------------- | :--------------------------------------- | | Single Agent Vanilla | 5-8 tools | All | Safe limit to avoid degradation. | | JIT-Tooling (RAG) | 15-40 tools | 3-8 tools | Just-in-Time injection via semantic RAG. | | Hierarchical Multi-Agent | 50-200+ tools | 3-5 tools | Orchestration via "Router Agents". |

Model Robustness Matters

Not all LLMs suffer equally from Hick's Law. Latest-generation models with aggressive native tool calling like Claude 4.6 Sonnet or GPT-5.4 tolerate windows of $n=25-40$ with surgical precision—something unthinkable two years ago with GPT-4o or Llama-3.1, which started failing much sooner. Still, the principle holds: lower $n$ leads to lower latency and higher determinism.

Mitigation Strategies

To keep your agents sharp, you must reduce the $n$ value in every interaction:

| Strategy | Technical Action | Load Impact | | :------------------------- | :------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------ | | Multi-Agent Delegation | Split a "God Agent" into specialized sub-agents (e.g., Coder, Researcher). | High Reduction: Each agent only sees the 3-5 tools in its niche. | | JIT-Tooling (RAG) | Use a tool RAG to inject only the most likely tools based on current context. | Max Efficiency: The prompt stays clean and focused. | | API Abstraction | Unify multiple granular endpoints into a single "Swiss Army Knife" with flexible parameters. | Simplification: The model makes one high-level decision instead of 20 small ones. |

Senior Conclusion: Less is More

Just like in SEO and Citability (GEO), density and relevance beat volume. Don't flood your agent with "just in case" tools. Design architectures where the AI always has the simplest path to the solution.


References

Link copied to clipboard