Hermes: Self-Improving Agents on Cheap Infrastructure

A self-improving autonomous agent that runs on a $5 VPS, works with any LLM, and follows open standards. Nous Research's Hermes is what model-agnostic agentic infrastructure looks like.

Prakash Poudel Sharma

Engineering Manager · Product Owner · Varicon

Apr 24, 2026 · 7 min read

Hermes: Self-Improving Agents on Cheap Infrastructure

You’ve built a workflow that depends on Claude. The AGENTS.md rules, the MCP configurations, the session hooks — they’re tuned for how Claude thinks. Then Anthropic adjusts pricing, or a new model from a different lab is significantly better at the specific tasks you care about. How much of your workflow survives the switch?

If your agent logic is entangled with the model provider, the answer is: not much. You’d be rewriting rules, re-testing behaviours, re-learning quirks. The model switch becomes a migration project.

Hermes from Nous Research is built around a different assumption: the agent’s capabilities, memory, and skills should be portable. The model is a replaceable component, not the foundation.

What Hermes Is

Self-improving autonomous agent. Open-source, from Nous Research — the team that has done significant work on open-weight models and agent architectures. Hermes is not a coding assistant or a chat interface. It’s a general-purpose agent platform designed for autonomous, continuous operation.

The repository is at github.com/nousresearch/hermes-agent.

Key Differentiators

Model-agnostic

Hermes exposes a uniform agent interface. The model underneath is configuration. You can run it with Claude, GPT-4o, a local Llama model via Ollama, DeepSeek, or any model with an OpenAI-compatible API.

Your rules, your skills, your memory — they’re attached to the agent, not the model. Swap the model in the config file and everything else stays put. This is the property that makes Hermes valuable for long-term infrastructure: you’re not betting on one provider’s pricing or availability.

Runs on a $5 VPS

This is not incidental. Most agent frameworks assume you’re running on a developer workstation or a beefy cloud instance. Hermes is designed to run on cheap, always-on infrastructure — a $5/month VPS, a spare Raspberry Pi, a small cloud VM.

Always-on changes what you can automate. When the agent runs only when your laptop is open, you can automate tasks that happen during your working day. When the agent runs 24/7 on a server, you can automate tasks that happen at 3am, on weekends, and while you’re on holiday.

Self-improving

Same pattern as OpenClaw: the agent can create its own skills, which are saved and reused when similar tasks arise. The difference is the deployment context — OpenClaw’s skills run on your local machine, Hermes’s skills run on the server.

Memory is agent-curated and persistent. Periodic nudges — small scheduled prompts that ask the agent to review and update its memory — keep the memory accurate over time rather than accumulating stale context.

MCP-compatible

Hermes implements the Model Context Protocol. Your existing MCP server configurations — the Jira server from Post 2, the Notion server from Post 3 — work with Hermes without modification. The MCP layer is the same; you’re just connecting a different agent to it.

This matters because MCP configurations are the part of your setup that takes real effort to build and test. If Hermes can reuse them, the migration from a Claude Code workflow to a Hermes background workflow is substantially less work.

Multi-platform messaging

Hermes connects to Telegram, Discord, and Slack through the same gateway process as OpenClaw. You send instructions from your phone; Hermes executes them on the server and responds in the same chat thread.

The practical difference: OpenClaw connects to apps on your local machine. Hermes’s gateway runs on the server — it’s reachable from anywhere, doesn’t require your laptop to be running, and can handle incoming requests whether you’re awake or not.

Open standard

Hermes follows the agentskills.io open standard for agent skills. Skills built to this standard are portable between compatible agents. A skill you write for Hermes can run on any other agentskills.io-compatible platform. This is the same portability principle applied to skills rather than models.

Batch Trajectory Generation

Hermes has a capability that becomes interesting if you ever want to fine-tune a model on your specific workflows: it can generate training trajectories from its own runs.

When Hermes completes a task, it can log the full decision trace — the inputs, the reasoning steps, the tool calls, the outputs — in a format suitable for supervised fine-tuning. If you accumulate months of Hermes runs handling your specific workflows, you have a dataset of high-quality examples of those workflows done correctly.

Most developers won’t need this. But if you’re building on top of open-weight models and want to improve them for your domain, this is how you generate the training data without paying human annotators.

When to Use Hermes vs. Claude Code

	Claude Code	Hermes
Session type	Interactive, human in the loop	Autonomous, runs unattended
Trigger	You start it	Cron schedule or incoming message
Infrastructure	Your laptop	$5 VPS
Best for	Coding, architecture decisions, complex reasoning with a human	Scheduled tasks, background processing, continuous monitoring
Model dependency	Tied to Anthropic API	Model-agnostic

Claude Code is your power tool for focused work. Hermes is the background process that keeps things running while you focus on the work or step away entirely.

When to Use Hermes vs. OpenClaw

Both self-improve, both use persistent memory, both support multi-platform messaging. The split is device-centric vs. server-centric.

OpenClaw runs on your machine. It has direct access to your local files, your desktop browser, your local app data. When you close your laptop, OpenClaw stops. It’s the right tool when the task requires your machine’s resources or your personal accounts.

Hermes runs on a server. It doesn’t have access to your local machine (unless you give it SSH access, which is a separate decision). It runs whether your laptop is open or not. It’s the right tool for tasks that need to happen continuously, on a schedule, without requiring your presence.

In a full stack: OpenClaw handles the personal layer (your inbox, your chat, your calendar, your local knowledge). Hermes handles the background processing layer (scheduled data pulls, monitoring, server-side automations). Paperclip manages them both.

The Paperclip Integration

Hermes agents are valid Paperclip “employees.” You register a Hermes instance in Paperclip’s org chart, set its monthly token budget, and its runs are audited through Paperclip’s dashboard alongside your other agents.

This is the complete picture of the stack from Posts 5–7: Paperclip provides the governance layer; OpenClaw and Hermes are the agents being governed. Each does something the others don’t. Together they cover the communication layer, the background processing layer, and the budget visibility that makes running them sustainable.

Getting Started

Hermes runs as a Python process. The setup is:

git clone https://github.com/nousresearch/hermes-agent
cd hermes-agent
pip install -r requirements.txt
cp config.example.yaml config.yaml
# Edit config.yaml: set your LLM provider, API key, and messaging gateway
python hermes.py

The config file is where you specify the model provider. Switching from Claude to a local model is a two-line change. Your skills, rules, and memory files stay untouched.

For always-on deployment, run it under systemd or a process manager on your VPS. The resource footprint is minimal — the agent is mostly idle, waking to handle scheduled tasks or incoming messages.

Where to Go From Here

This is the last post in the AI Tooling for Developers series. The series covered the token cost decisions (Posts 1–3), a real end-to-end automation (Post 4), and the agent infrastructure layer (Posts 5–7).

Two posts worth reading next:

Choosing Your AI Stack — the model and pricing decisions that affect all of these tools. Which provider, which tier, when to use API credits vs. subscriptions, and how to estimate costs before you’re surprised by the bill.

The Parallel Developer — the coding workflow that these tools plug into. The day-to-day practice of working with agents in the coding context, not just the infrastructure around them.

The tools in this series are all real, all open-source, and all actively developed. Pick the ones that match where you are in the stack and what you’re trying to automate. Start with one, understand the costs, and add the next layer when the first one is working well.

About the author

Prakash Poudel Sharma

Engineering Manager · Product Owner · Varicon

Engineering Manager at Varicon, leading the Onboarding squad as Product Owner. Eleven years of building software — first as a programmer, then as a founder, now sharpening the product craft from the inside of a focused team.

Follow on LinkedIn More essays

AI Tooling for Developers

7 parts in this series.

A seven-part guide to the agent ecosystem beyond your coding IDE — MCP vs CLI tradeoffs, Jira/Notion integrations, release note automation, Paperclip, OpenClaw, and Hermes.

Previous Part 6: OpenClaw: A Personal AI with Eyes and Hands

01 MCP vs CLI: The Token Cost You're Not Tracking
02 Setting Up the Jira MCP Server (And When Not To)
03 Setting Up the Notion MCP Server
04 Automating Release Notes with AI Agents
05 Paperclip: Managing AI Agents Like a Team
06 OpenClaw: A Personal AI with Eyes and Hands previous
07 Hermes: Self-Improving Agents on Cheap Infrastructure ← you are here

Join the conversation 0 comments

What did you take away?

Thoughts, pushback, or a story of your own? Drop a reply below — I read every one.

Comments are powered by Disqus. By posting you agree to their terms.