AI

Tracking Tokens-Per-Session: The Number Claude Code Hides

Tokens-per-session is the single most useful signal for agentic coding sessions — and Claude Code doesn't show it. /usage gives you cost. /context gives you window percent. Neither sums to the number you actually want. Here's what to track, why, and how to put it on your statusline.

Tracking Tokens-Per-Session: The Number Claude Code Hides

If you’re working in Claude Code (or any agentic CLI) for more than a few hours a week, there is exactly one number you should be watching mid-session that the tool doesn’t show you:

Tokens used so far in this session.

Not cost in dollars. Not context-window percentage. Tokens.

TL;DR

  • The metric to track mid-session is cumulative tokens spent, summed across all turns and models.
  • /usage shows cost in dollars and a four-part token breakdown per model. It does not sum to a session total.
  • /context shows how full your current context window is. That’s a different number, and it lies to you about session spend once compaction kicks in.
  • The default statusline slot in Claude Code is empty. The fix is a ~60-line bash script that consumes the JSON Claude pipes to your statusline on every render and surfaces the summed number ambiently.
  • The companion post — Your Statusline is the Cheapest Feedback Loop in Agentic Coding — walks through the script field by field.

Why this number matters

Four reasons, in order of how often they bite:

  1. Rate-limit budgeting. Max-plan users hit 5-hour and 7-day rolling rate limits, not absolute caps. Tokens-per-session is the input. If you don’t know it, you don’t know how much runway is left before the next reset.
  2. Quality drift. Long sessions degrade — context gets noisier, the model starts repeating itself, edits get sloppier. Token count is a leading indicator: somewhere around 60–80k for most workflows, a fresh session beats a continued one even if /context says you have headroom.
  3. Scope-creep detection. “Small refactor” sessions that quietly turn into 200k-token marathons are the single most common failure mode of agentic coding. Watching the number tick up while you work is what catches this — /usage after the fact does not.
  4. Cost attribution (API users only). Tokens map directly to dollars. If a teammate asks how much that experiment cost, you need an answer.

The three numbers people confuse

Most of the confusion in agentic-coding setups is mistaking one of these for another. They are not interchangeable:

NumberWhat it measuresWhen it lies to youWhen it’s the right one to watch
Cost ($)Estimated dollars spent this session, computed from token counts × model ratesOn Max/Pro you pay a flat rate, so the dollar figure doesn’t reflect what you actually payAPI users, billing reconciliation
Context-window %How much of the model’s window is currently occupied by system prompt, tools, memory, and conversationGoes down after /compact or rolling drops — but tokens are still spent. Can read 8% after burning 200k tokens.”Am I about to hit the wall on this turn?” — short-horizon
Tokens-per-sessionCumulative input + output (+ cache) summed over every turn so farDoesn’t tell you what the tokens were spent on (use /context for that)Everything else: rate-limit budgeting, quality drift, scope creep, cost

If you only watch one, watch tokens-per-session. If you watch two, add context-% for the short-horizon “will this turn fit?” question.

What /usage actually shows

Claude Code ships /usage (aliases: /cost, /stats). Here’s the output from a real session:

Session
  Total cost:            $0.0801
  Total duration (API):  2s
  Total duration (wall): 26m 17s
  Total code changes:    0 lines added, 0 lines removed
  Usage by model:
       claude-opus-4-7:  6 input, 16 output, 16.5k cache read, 11.4k cache write ($0.0801)
 
Current session
  ███                                                6% used
  Resets 9:25am
 
Current week (all models)
  █████████▌                                         19% used

What you get:

  • Cost in dollars. Mostly noise on Max.
  • Wall-clock and API duration. Adjacent.
  • A four-number per-model breakdown — input, output, cache read, cache write. These are tokens, but they’re not summed. To answer “how many tokens this session?” you have to mentally add four numbers per model, then sum across models. Nobody does that mid-session.
  • Two rate-limit % bars for the 5-hour and 7-day rolling windows. These are the outputs of session token spend; they tell you you’re 19% in, not how you got there.

What you don’t get: a single line that says Session tokens: 28,012. The data is there. The compute isn’t.

About those four numbers — the cache nuance

The per-model row separates input / output / cache-read / cache-write for a reason. They bill at very different rates:

  • Input tokens — full rate.
  • Output tokens — ~5× input.
  • Cache reads — ~10% of input. (This is why long sessions with stable system prompts get cheap fast.)
  • Cache writes — ~125% of input. (One-time cost to populate the cache.)

If you’re tracking for dollar cost, you need the weighted sum and /usage already shows the dollar figure. If you’re tracking for rate-limit pressure, sum all four — the rate limiter counts them. If you’re tracking for session quality / scope creep, sum input + output and ignore cache (cache traffic doesn’t make the conversation longer).

The statusline script I run sums input + output for that reason. Cache reads and writes are a separate concern.

What /context actually shows

The other obvious place to look is /context. It does something completely different from what the name might suggest:

Model: claude-opus-4-7[1m]
Tokens: 78k / 1m (8%)
 
Estimated usage by category
  System prompt:  8.1k (0.8%)
  System tools:   6.4k (0.6%)
  Memory files:   6.4k (0.6%)
  Skills:           9k (0.9%)
  Messages:      48.6k (4.9%)
  Free space:   921.3k (92.1%)

This is context-window occupancy — how much of the model’s 1M-token window is currently filled by system prompt, MCP tool definitions, memory files, skills, and the conversation so far. Excellent diagnostic if you want to know why your context is filling up (mine was being eaten by an unused PM-skills plugin pack — different post).

What it doesn’t show: cumulative tokens spent over the whole session. The two diverge as soon as Claude compacts old turns or rolls them off. You can read ctx: 8% and still have burned 200k tokens over the past hour.

Rough benchmarks

The first question every engineer asks once and never asks again: is 80k a lot? Anchors, with the caveat that variance is huge:

Session shapeTokens (input + output)
One-off: typo fix, single-file rename, doc lookup2–10k
Small task: one feature, a few files, one round of tests15–40k
Medium task: multi-file refactor, new endpoint with tests40–100k
Large task: cross-cutting change, exploratory architecture work100–250k
Something has gone wrong300k+ in under an hour with low signal

These are working numbers from my own sessions, not rules. If you’re consistently in the 150k+ band for things that feel small, your prompts are doing too much or your agent is being asked to discover instead of execute.

How to surface it ambiently

Twenty-five minutes of bash. The statusline I run now:

~/workspaces/2023/personal/sharmaprakash-astro (main↑1) Opus 4.7 70.4k tok +93/-45 ctx:93% 5h:2%

70.4k tok is computed from the JSON Claude Code already pipes to your statusline script every render. The full breakdown of fields, the bash gotchas, and the script itself are in the companion piece: Your Statusline is the Cheapest Feedback Loop in Agentic Coding.

The point isn’t this particular layout. The point is the number is ambient instead of behind a slash command. On-demand signals you have to summon every few minutes are signals you stop summoning.

What to do when the number is high

Concrete playbook — what each threshold should trigger:

  • ~40–60k and a task that should have been small — stop, scope down. The agent is probably exploring instead of executing. Tighten the prompt, restart the task.
  • ~80k+ on a long-running task — finish the current turn, /compact, continue. Quality drift hasn’t hit yet but is coming.
  • Token count climbing fast while diff stays small (e.g. +12/-8 after 60k tokens) — the agent is thrashing. Interrupt, give it a sharper instruction, or restart with a tighter brief.
  • Token count climbing fast and diff also large (+800/-400 in 10 minutes on a “small” task) — scope has run away. Stash, fresh session, re-plan.
  • Approaching rate-limit % (5h:80%+) — switch to /model Sonnet or Haiku for cheap turns, or save the heavy work for after reset. Don’t burn Opus tokens on small stuff when you’re near the wall.
  • ctx:% red AND tokens high — the session is over. Capture what you have, commit, /clear, start fresh.

The single highest-leverage habit: glance at the statusline before sending each turn. Three seconds of friction, catches 90% of runaway sessions.

What this post does not cover

  • Cross-session aggregation. If you want “tokens this week,” the statusline doesn’t give that — it’s per-session. The community tool ccusage reads Claude Code’s local JSONL logs and reports daily / weekly / per-project totals. Install it once, run it when you need the rollup.
  • Billing reconciliation. The dollar number in /usage is an estimate, not an invoice. For real billing, the Anthropic console is authoritative.
  • Team-level dashboards. Out of scope. If you need exec-level usage reporting, that’s a different tooling problem.

When not to track this

Two-minute “rename this variable” sessions don’t need a token watch. Neither does a quick /help lookup. Tracking matters when:

  • You’re driving the agent for more than ~15 minutes of wall time, or
  • The task could plausibly run away (refactor, debugging, exploration), or
  • You’re on a constrained plan tier and the day’s budget is real.

Don’t over-apply the lens. The cost of constantly glancing at a statusline is low but non-zero. For trivial tasks, ignore it.


How this came up

A friend asked me how many tokens my last session burned. I didn’t have an answer — I’d been watching ctx:% for months and treating that as my dashboard. It wasn’t. The reframe took twenty-five minutes of bash and changed my behaviour mid-session more than any prompt-engineering tip I’ve picked up this year.

The companion post — Your Statusline is the Cheapest Feedback Loop in Agentic Coding — has the actual script and the field-by-field rationale. Read it after this one.

About the author

Prakash Poudel Sharma

Engineering Manager · Product Owner · Varicon

Engineering Manager at Varicon, leading the Onboarding squad as Product Owner. Eleven years of building software — first as a programmer, then as a founder, now sharpening the product craft from the inside of a focused team.

Keep reading

More on this

Join the conversation 0 comments

What did you take away?

Thoughts, pushback, or a story of your own? Drop a reply below — I read every one.

Comments are powered by Disqus. By posting you agree to their terms.