5 min read · Research deep-dive for developers shipping AI agents in production
AI coding agents can write, commit, and deploy code in seconds. The gap between "agent decides to act" and "irreversible damage" is measured in milliseconds. Prompt instructions alone cannot close that gap because they live inside the same context the agent can override.
Researchers at Tsinghua University formalized this problem in their work on Natural-Language Agent Harnesses (NLAH). Their key insight: the safety layer must be external to the agent, treated as a first-class object with its own contracts, verification logic, and persistent state.
The NLAH framework defines four components that any production-grade harness needs. Here is how each maps to a concrete implementation in ThumbGate:
| NLAH Component | What It Does | ThumbGate Implementation |
|---|---|---|
| Contracts | Formal rules that define what the agent must not do | Prevention rules in prevention-rules.md — auto-generated from thumbs-down feedback |
| Verification Gates | Checkpoints that intercept actions before execution | PreToolUse hooks — intercept every tool call, match against gates, block or allow |
| Durable State | Persistent memory that survives across sessions | SQLite+FTS5 lesson database — feedback, memories, and rules persist and are searchable |
| Adapters | Platform-specific connectors for different agent runtimes | MCP server + adapters for Claude Code, Cursor, Codex, Gemini, Amp, OpenCode |
A prompt rule says: "Do not force-push to main." An agent can reason around that, reinterpret it, or simply lose it in a long context window.
A contract says: if the tool call is Bash and the command matches git push.*--force targeting main, return {"decision": "block"}. The agent never executes the command. There is nothing to reason around.
Every time your AI agent calls a tool — running a shell command, writing a file, making an API call — a PreToolUse hook fires. ThumbGate checks the call against your gates:
This is the verification gate pattern from the NLAH framework, running in production today.
One of the NLAH paper's strongest arguments is that agent harnesses need persistent state. An agent that forgets its mistakes between sessions will repeat them.
ThumbGate stores every feedback event in a SQLite database with full-text search (FTS5). When a new session starts, the agent's context is assembled from relevant past lessons — not the entire history, but the lessons most similar to the current task.
The NLAH framework emphasizes platform independence. A harness should work across different agent runtimes without rewriting the safety logic.
ThumbGate achieves this through the Model Context Protocol (MCP). Any agent that speaks MCP — Claude Code, Cursor, Codex, Gemini, Amp, OpenCode — connects to the same ThumbGate server and gets the same gates. Write your rules once, enforce everywhere.
The NLAH framework describes what an agent harness should be. ThumbGate is what it looks like when you ship one:
npx mcp-memory-gateway init
That single command sets up:
You are not writing safety rules from scratch. You are thumbs-downing mistakes and letting the harness learn.
One command. Works with Claude Code, Cursor, Codex, Gemini, Amp, and any MCP agent.