CLI > JSON: Why Tool Calling Might Be Hitting a Wall
Traditional function calling was a big step forward in the early post chatgpt API and json-mode era. Pydantic-style schemas gave us structured, reliable calls. But today, it’s starting to feel like a scaling bottleneck.
Too many tokens, too much upfront context JUST to decide what tool to call.
Context is GOLD, and shouldn’t be wasted so easily.
A former Manus backend lead wrote on r/LocalLLaMA an interesting thread on why he stopped using function calling entirely.
His approach that feels way more natural = remove rigid JSON schemas and give the agent a Unix-like CLI.
LLMs are already GREAT at writing unix commands (code in general), most repos in training datasets contains bash scripts why not playing on their strengths?
BUT wait … what makes it work is not just the CLI
It’s a good design behind it !!
Let’s get back to scale that was an issue with JSON tool calling. How do you solve it ?
If you design a mini-shell that mimics UNIX shell, with the 4 main operators (| / && / || / ; ), then
you can compose N commands (each command handling a stream of text in/out);
Result = the composition space grows exponentially and the REAL power is that to the LLM this is just a sequence of CLI-commands. A sequence of tokens, statistically, it’s already GREAT at how to generate it !!
At the level of each command, you do not need to give it a schema upfront, we can pass just a token-light short description (<300 tokens) just like what Anthropic recommends, in the Skills frontmatter, this should be enough to make the call decision.
Once the agent has decided to call a command, then it enters in a natural 3-phase loop:
- Discovery: the Unix “—help” we’ve been using all our lives is actually Progressive Disclosure Context for us humans !!!
The “no-argument call” of a command is typically what we do everyday to disclose a minimal doc in usually 1-2 sentences max (no noise no thousand tokens) !!!
This is the key to scale !!
- Ex: when we call
http --help(my fav HTTP CLI tool) without any argument, it will return a minimal doc, like :
no noise, no thousand tokens, just the minimal doc, this is the key to scale !!usage: http [METHOD] URL [REQUEST_ITEM ...] error: the following arguments are required: URL - Ex: when we call
- Correction: Error messages = Navigation guide. Just like CLI errors : clear concise messages of “what to do instead” —> minimal recovery cost
stderr is the information agents need most, precisely when commands fail. Never drop it.
- Implicit Learning (internalizing patterns in-context): Let the agent get better at using the system over time.
This can be done, by adding consistent signals to every tool result, e.g:
- leverage standard exit codes (0 / 1/ 127)
- add duration -> cost awareness
- 12ms → cheap, you can call this in the future freely
- 45s → expensive, use wisely
Consistent outputs = smarter agent. Just PURE stats, LLMs remember patterns/rules when they appear consistently in context. Inconsistency → makes every call feels like the first.
Takeaways
This aligns a lot with what we’ve been pushing internally at Cominty to scale our AGaaS infra. Even most experienced devs when stumbling with a new CLI tool naturally enter in this 3-phase loop:
- try an empty call or get help
- see errors, adjust and iterate.
Programmers (including me) building software, CLIs, should design good interaction loops for not only their colleagues but also AGENTS !!
the 3 techniques gives a progression :
--help → "What can I do?" → Proactive discovery
Error Msg → "What should I do?" → Reactive correction (steer in the right direction for next call)
Output Fmt → "How did it go?" → Continuous learning (to internalize success/error patterns + anticipate)