AI

MCP vs CLI: The Token Cost You're Not Tracking

Every MCP tool call ships your entire tool schema to the model. On a long session that's thousands of tokens before you've done anything. CLI tools don't have that problem.

MCP vs CLI: The Token Cost You're Not Tracking

You added four MCP servers to your Claude setup — Jira, Notion, GitHub, and a database connector. The responses feel slightly slower. Your API bill crept up 30%. You haven’t changed any of your tasks or prompts. What changed?

The answer is invisible unless you know where to look: every single request now ships thousands of tokens of tool schema before your actual prompt even starts. You didn’t add workload. You added overhead.

This post is about understanding that overhead, when MCP is worth it, and when a plain CLI call is the right move. If you’re building toward the kind of setup described in The Agentic Developer’s Field Guide, this decision matters from day one.


What MCP Actually Is

The Model Context Protocol is a standard for connecting AI agents to external tools — Jira, Notion, GitHub, databases — via a server process. When you configure an MCP server, you’re giving the model a structured, bidirectional channel to an external system. The model can read data, write data, trigger actions, and get structured results back.

It’s genuinely powerful. The problem is the cost structure.


The Token Math

Every MCP server you have active ships its full tool schema in every request’s context window. Tool schema means: tool names, descriptions, parameter definitions, optional enum values, examples. It adds up faster than you’d expect.

A moderately-featured MCP server exposes maybe 10–15 tools. Each tool description runs 80–150 tokens once you include the name, description, and parameter schema. That’s 1,000–2,000 tokens per server.

Four servers? You’re looking at 4,000–8,000 tokens of overhead on every request before your prompt, before the conversation history, before anything. On a long session with 40–50 exchanges, you’ve spent 160,000–400,000 tokens on tool schema that you’re not using most of the time.

On Claude Sonnet this costs roughly $0.50–$1.50 per long session purely in schema overhead. It’s not catastrophic, but it’s also not nothing — and it compounds across sessions.


What CLI Tools Are

Shell commands. The agent calls them via Bash. No schema overhead. The agent runs gh issue list or reads a local cache file or pipes jq over a JSON response. The model knows how to use shell tools generically; it doesn’t need a per-tool schema registered at session start.

# Read current sprint from local cache — zero token overhead
jq '.[] | select(.status=="In Progress") | {key, summary}' .sprint-cache.json
 
# List open GitHub issues — one bash call, no schema
gh issue list --state open --assignee @me --json number,title,labels

These calls cost exactly what they cost: the tokens in the command itself and the tokens in the output. No structural overhead registered at session start.


The Decision Framework

The question isn’t “MCP or CLI?” as a global setting. It’s a per-task decision.

Use MCP when…Use CLI when…
The tool has no good CLIA CLI exists and is well-documented
You need structured data back (not just text)Plain text or JSON output is sufficient
The integration handles auth complexity (OAuth, pagination)A simple curl + API token works
The agent needs to write back (create ticket, update page)You only need to read data
You’re doing this operation many times per sessionIt’s a one-off orientation query
Error handling and retries matterA failed call is acceptable to just retry manually

The key insight: reads and writes have different value propositions. MCP’s structured, reliable channel is most justified when you’re writing back to an external system. For reads — especially orientation reads at session start — CLI often wins.


Practical Examples

Jira

Creating a ticket from a spec: MCP. The structured fields, the project selection, the assignee lookup — MCP handles this cleanly.

Reading the current sprint to orient the agent: curl + jq to a local cache file. One call, run once per day by a session-start hook. The agent reads .sprint-cache.json directly on every subsequent request. Zero MCP overhead.

Notion

Writing release notes or a meeting summary as a structured page: MCP. You want the block structure, the database properties, the formatting to come through correctly.

Reading a spec page for context: cache it to .notion-context.md at session start. The agent reads a flat markdown file for the rest of the session.

GitHub

Checking PR status, reading issue lists, viewing file contents: gh CLI. GitHub’s CLI is excellent and the agent knows how to use it.

Creating a PR with specific reviewers, labels, and linked issues from structured data: MCP might be worth it if you’re doing this repeatedly in automated flows.


The Hybrid Pattern

Use MCP for writes. Use CLI for reads. This roughly halves the overhead on read-heavy sessions, which is most development sessions.

In your AGENTS.md or .claude/rules/, make this explicit:

## Tool selection
 
- Read sprint/issue data from `.sprint-cache.json` (refreshed by session-start hook)
- Read Notion spec context from `.notion-context.md` (refreshed by session-start hook)
- Use Jira MCP only for: creating tickets, updating status, adding comments
- Use Notion MCP only for: creating pages, updating database items
- Use `gh` CLI for all GitHub reads (issues, PRs, file contents)

The agent follows explicit rules. Write the rules.


Auditing Your Current Setup

Run cat .claude/settings.json and count the entries under mcpServers. That’s your baseline overhead. For each server, ask: am I using this for reads, writes, or both?

Disable one server for a full day session. Watch whether anything breaks. If nothing breaks, the server was overhead for no benefit. If something breaks, you’ve identified a genuine dependency.

The goal isn’t to eliminate MCP. It’s to be deliberate about it. MCP is powerful infrastructure. It’s also a recurring cost that compounds silently across every session you run. Know what you’re paying for.


Coming next

Post 2 covers the Jira MCP server specifically — installation, configuration, what it does well, and how to set up the local cache pattern that makes the reads cheap. If Jira is part of your workflow, that’s where to go next.

About the author

Prakash Poudel Sharma

Engineering Manager · Product Owner · Varicon

Engineering Manager at Varicon, leading the Onboarding squad as Product Owner. Eleven years of building software — first as a programmer, then as a founder, now sharpening the product craft from the inside of a focused team.

AI Tooling for Developers

7 parts in this series.

A seven-part guide to the agent ecosystem beyond your coding IDE — MCP vs CLI tradeoffs, Jira/Notion integrations, release note automation, Paperclip, OpenClaw, and Hermes.

  1. 01 MCP vs CLI: The Token Cost You're Not Tracking ← you are here
  2. 02 Setting Up the Jira MCP Server (And When Not To) up next
  3. 03 Setting Up the Notion MCP Server
  4. 04 Automating Release Notes with AI Agents
  5. 05 Paperclip: Managing AI Agents Like a Team
  6. 06 OpenClaw: A Personal AI with Eyes and Hands
  7. 07 Hermes: Self-Improving Agents on Cheap Infrastructure
Join the conversation 0 comments

What did you take away?

Thoughts, pushback, or a story of your own? Drop a reply below — I read every one.

Comments are powered by Disqus. By posting you agree to their terms.