← logs

CLI > JSON: Why Tool Calling Might Be Hitting a Wall

Traditional function calling was a big step forward in the early post chatgpt API and json-mode era. Pydantic-style schemas gave us structured, reliable calls. But today, it’s starting to feel like a scaling bottleneck. Too many tokens, too much upfront context JUST to decide what tool to call. Context is GOLD, and shouldn’t be wasted so easily.

A former Manus backend lead wrote on r/LocalLLaMA an interesting thread on why he stopped using function calling entirely. His approach that feels way more natural = remove rigid JSON schemas and give the agent a Unix-like CLI.

LLMs are already GREAT at writing unix commands (code in general), most repos in training datasets contains bash scripts why not playing on their strengths?

BUT wait … what makes it work is not just the CLI

It’s a good design behind it !!

Let’s get back to scale that was an issue with JSON tool calling. How do you solve it ?

If you design a mini-shell that mimics UNIX shell, with the 4 main operators (| / && / || / ; ), then you can compose N commands (each command handling a stream of text in/out); Result = the composition space grows exponentially and the REAL power is that to the LLM this is just a sequence of CLI-commands. A sequence of tokens, statistically, it’s already GREAT at how to generate it !!

At the level of each command, you do not need to give it a schema upfront, we can pass just a token-light short description (<300 tokens) just like what Anthropic recommends, in the Skills frontmatter, this should be enough to make the call decision.

Once the agent has decided to call a command, then it enters in a natural 3-phase loop:

  1. Discovery: the Unix “—help” we’ve been using all our lives is actually Progressive Disclosure Context for us humans !!! The “no-argument call” of a command is typically what we do everyday to disclose a minimal doc in usually 1-2 sentences max (no noise no thousand tokens) !!! This is the key to scale !!
    • Ex: when we call http --help (my fav HTTP CLI tool) without any argument, it will return a minimal doc, like :
    usage:
        http [METHOD] URL [REQUEST_ITEM ...]
    error:
        the following arguments are required: URL
    no noise, no thousand tokens, just the minimal doc, this is the key to scale !!
  2. Correction: Error messages = Navigation guide. Just like CLI errors : clear concise messages of “what to do instead” —> minimal recovery cost

stderr is the information agents need most, precisely when commands fail. Never drop it.

  1. Implicit Learning (internalizing patterns in-context): Let the agent get better at using the system over time. This can be done, by adding consistent signals to every tool result, e.g:
    • leverage standard exit codes (0 / 1/ 127)
    • add duration -> cost awareness
      • 12ms → cheap, you can call this in the future freely
      • 45s → expensive, use wisely

Consistent outputs = smarter agent. Just PURE stats, LLMs remember patterns/rules when they appear consistently in context. Inconsistency → makes every call feels like the first.

Takeaways

This aligns a lot with what we’ve been pushing internally at Cominty to scale our AGaaS infra. Even most experienced devs when stumbling with a new CLI tool naturally enter in this 3-phase loop:

  • try an empty call or get help
  • see errors, adjust and iterate.

Programmers (including me) building software, CLIs, should design good interaction loops for not only their colleagues but also AGENTS !!

the 3 techniques gives a progression :

--help       →  "What can I do?"        →  Proactive discovery
Error Msg    →  "What should I do?"     →  Reactive correction (steer in the right direction for next call)
Output Fmt   →  "How did it go?"        →  Continuous learning (to internalize success/error patterns + anticipate)