CLI > JSON: Why Tool Calling Might Be Hitting a Wall

Traditional function calling was a big step forward in the early post chatgpt API and json-mode era. Pydantic-style schemas gave us structured, reliable calls. But today, it’s starting to feel like a scaling bottleneck. Too many tokens, too much upfront context JUST to decide what tool to call. Context is GOLD, and shouldn’t be wasted so easily.

A former Manus backend lead wrote on r/LocalLLaMA an interesting thread on why he stopped using function calling entirely. His approach that feels way more natural = remove rigid JSON schemas and give the agent a Unix-like CLI.

LLMs are already GREAT at writing unix commands (code in general), most repos in training datasets contains bash scripts why not playing on their strengths?

BUT wait … what makes it work is not just the CLI

It’s a good design behind it !!

Let’s get back to scale that was an issue with JSON tool calling. How do you solve it ?

If you design a mini-shell that mimics UNIX shell, with the 4 main operators (| / && / || / ; ), then you can compose N commands (each command handling a stream of text in/out); Result = the composition space grows exponentially and the REAL power is that to the LLM this is just a sequence of CLI-commands. A sequence of tokens, statistically, it’s already GREAT at how to generate it !!

At the level of each command, you do not need to give it a schema upfront, we can pass just a token-light short description (<300 tokens) just like what Anthropic recommends, in the Skills frontmatter, this should be enough to make the call decision.

Once the agent has decided to call a command, then it enters in a natural 3-phase loop:

Discovery: the Unix “—help” we’ve been using all our lives is actually Progressive Disclosure Context for us humans !!! The “no-argument call” of a command is typically what we do everyday to disclose a minimal doc in usually 1-2 sentences max (no noise no thousand tokens) !!! This is the key to scale !!
- Ex: when we call http --help (my fav HTTP CLI tool) without any argument, it will return a minimal doc, like :
```
usage:
    http [METHOD] URL [REQUEST_ITEM ...]
error:
    the following arguments are required: URL
```
no noise, no thousand tokens, just the minimal doc, this is the key to scale !!
Correction: Error messages = Navigation guide. Just like CLI errors : clear concise messages of “what to do instead” —> minimal recovery cost

stderr is the information agents need most, precisely when commands fail. Never drop it.

Implicit Learning (internalizing patterns in-context): Let the agent get better at using the system over time. This can be done, by adding consistent signals to every tool result, e.g:
- leverage standard exit codes (0 / 1/ 127)
- add duration -> cost awareness
  - 12ms → cheap, you can call this in the future freely
  - 45s → expensive, use wisely

Consistent outputs = smarter agent. Just PURE stats, LLMs remember patterns/rules when they appear consistently in context. Inconsistency → makes every call feels like the first.

Takeaways

This aligns a lot with what we’ve been pushing internally at Cominty to scale our AGaaS infra. Even most experienced devs when stumbling with a new CLI tool naturally enter in this 3-phase loop:

try an empty call or get help
see errors, adjust and iterate.

Programmers (including me) building software, CLIs, should design good interaction loops for not only their colleagues but also AGENTS !!

the 3 techniques gives a progression :

--help       →  "What can I do?"        →  Proactive discovery
Error Msg    →  "What should I do?"     →  Reactive correction (steer in the right direction for next call)
Output Fmt   →  "How did it go?"        →  Continuous learning (to internalize success/error patterns + anticipate)