Archives

How CLI Tools Can Drastically Reduce Token Consumption ⚡

by Joche Ojeda | Mar 26, 2026 | A.I

After discovering that tools were silently burning my context window, I had a new problem:

How do I keep powerful agents without paying the cost of massive schemas?

The answer did not come from AI.

It came from something much older.

👉 The command line.

🧠 The Problem with Traditional Tools

Most agent tools today are defined with JSON schemas, nested parameters, verbose descriptions, and strict typing.

They look clean to us as developers.

But to an LLM?

👉 They are heavy prompt payloads.

Every request includes:

full schema
all parameters
all descriptions

Even if the user only says:

“order coffee”

💥 Why This Does Not Scale

Let’s say you have:

30 tools
each with 800 tokens

That means:

👉 24,000 tokens before the user even speaks

And most of those tokens are:

never used
rarely relevant
repeated every time

🔍 Rethinking Tools

So I asked myself:

Do tools really need to be JSON schemas?

Or…

👉 Do they just need to be understandable commands?

⚡ Enter CLI-Style Tools

Instead of this:

{
  "name": "create_order",
  "parameters": {
    "userId": "...",
    "productId": "...",
    "quantity": "..."
  }
}

You define tools like this:

create_order uid pid qty

Or even:

order coffee size=large sugar=0

🧬 Why This Works

Because LLMs are incredibly good at:

parsing text
understanding intent
filling structured patterns

You do not need:

deep JSON
verbose schemas
long descriptions

👉 You just need clear syntax

📉 Token Cost Comparison

JSON Tool

~800–1500 tokens

CLI Tool

~10–30 tokens

👉 That is a 50x–100x reduction

🧠 Less Structure, Better Reasoning

With CLI tools:

fewer tokens means more room for conversation
simpler format means easier decisions
less noise means better accuracy

The model does not need to:

parse nested JSON
match schemas
validate deep structures

It just:

👉 generates a command

⚙️ How It Looks in Practice

User says:

“I want a large coffee with no sugar”

Agent outputs:

order_coffee size=large sugar=0

Your backend:

parses the string
validates parameters
executes the action

🧩 You Move Complexity Out of the Model

This is the key shift:

Before:

the model handles structure
the model handles validation
the model handles formatting

After:

the model generates intent
your code handles everything else

👉 Which is exactly what we want as engineers

🚀 Benefits for Real Systems

If you are building something like:

WhatsApp or Telegram agents
multi-domain assistants

CLI tools give you:

massive token savings
faster responses
lower cost per user
better scalability
simpler tool definitions

⚠️ Trade-offs

CLI tools are not magic.

You lose:

strict schema validation in the prompt
automatic argument formatting
some guardrails

So you must:

validate inputs in your backend
handle errors gracefully
design clean command syntax

🧠 Best Practices

1. Keep commands short

order_coffee
book_yoga
pay_invoice

2. Use key=value pairs

book_yoga date=2026-03-28 time=10:00

3. Avoid ambiguity

Bad:

order stuff

Good:

order_food item=pizza size=large

4. Build a parser layer in C#

This fits perfectly with a .NET stack:

Regex or tokenizer
map to DTO
validate
execute

5. Combine with routing

The best setup looks like this:

Router → Domain → CLI tools

Now you get:

small toolsets
tiny prompts
efficient agents

💡 The Big Insight

After all this, I realized:

JSON schemas are for machines.
CLI commands are for language models.

🏁 Conclusion

The goal is not to remove tools.

The goal is to:

👉 make tools cheaper to think about

CLI-style tools:

reduce tokens
simplify reasoning
scale better

And most importantly:

They let your agent focus on what actually matters — understanding the user.

From “Hello” to Quota Exceeded: The Day My Agent Broke 💥

by Joche Ojeda | Mar 26, 2026 | A.I

After testing OpenClaw, something clicked.

The future is not chat.

👉 The future is agents.

🛠️ Building the “Perfect” Agent

I started designing what I thought would be the ultimate assistant:

General purpose
Connected to everything
Capable of doing real tasks

And to make that happen…

I built tools.
A lot of tools.

Not generic ones — very specific tools:

booking flows
ordering systems
logistics
payments
daily life actions

Before I knew it…

👉 My agent had around 50 custom tools

And honestly, it felt powerful.

💡 The Business Idea

The plan was simple:

Give users a few free tokens per day
Let them try the agent
Hook them with real utility

A classic freemium model.

💥 Reality Hit Immediately

What actually happened?

Users would send:

“Hello”

…and then…

👉 Quota exceeded

Not after a conversation.
Not after a task.
After the second request.

🤨 That Made No Sense

At first, I thought:

Maybe there’s a bug
Maybe token counting is wrong
Maybe pricing is off

But everything checked out.

Still:

almost no conversation
almost no output
quota gone

🧠 That’s When I Started Digging

So I did what we always do:

👉 I looked under the hood

And what I found changed how I think about agents completely.

🔍 The Hidden Cost of Tools

I realized something critical:

My agent wasn’t just sending messages.
It was sending all 50 tools on every request.

Every. Single. Time.

📦 What That Actually Means

Each tool had:

name
description
parameters
JSON schema
nested objects

Individually? Fine.

Together?

👉 Massive.

So even a simple request like:

“Hello”

Was actually being processed like:

[system prompt]
[conversation]
[50 tool definitions]
[user: Hello]

🔥 I Was Burning Tokens Without Knowing

That’s when it clicked.

The user wasn’t paying for:

the message
the response

They were paying for:

👉 the entire toolset injected into the prompt

📉 Why My Quota Disappeared Instantly

Let’s do the math.

each tool ≈ 600–1000 tokens
I had ~50 tools

👉 I was sending 30,000–50,000 tokens per request

For a “Hello”.

No wonder the quota was gone after two messages.

😳 The Illusion of “Light Usage”

From the user’s perspective:

they typed almost nothing
they got almost nothing

From the system’s perspective:

👉 It processed a massive prompt

🧬 The Realization

That’s when I understood:

Tools are not just capabilities.
Tools are context weight.

Every tool:

consumes tokens
competes for attention
increases cost

⚠️ The Bigger Problem

It wasn’t just cost.

The agent was also:

slower
less accurate
sometimes picking the wrong tool

Because it had to:

👉 reason over 50 options every time

🧠 The Shift in Thinking

Before:

“More tools = smarter agent”

After:

“More tools = heavier prompt = worse performance”

🚀 What This Changed for Me

I stopped trying to build:

❌ One agent that does everything

And started designing:

✅ Systems that load only what’s needed

🧩 The New Approach

Instead of:

Agent → 50 tools

I moved to:

User → Router → Domain Agent → 5 tools

Now:

smaller prompts
lower cost
better decisions

💡 Final Insight

That experience taught me something simple but powerful:

If your agent feels expensive, slow, or dumb…
check how many tools you’re injecting into the prompt.

Because sometimes:

👉 You’re not scaling intelligence
👉 You’re scaling tokens

🏁 Closing

That “Hello → quota exceeded” moment was frustrating.

But it revealed a fundamental truth about agents:

The problem is not how many tools you have.
The problem is how many you send every time.

And once you see that…

You start building agents very differently.