Table of Contents
TL;DR
The Stash MCP server is a local Model Context Protocol endpoint that lets Claude Code, Cursor, Claude Desktop, and any other MCP-speaking client query your screenshots and recordings as structured context. Five tools — list_recent, search, get_capture, get_bundle, render_plain — return app metadata, window titles, URLs, accessibility trees, per-frame video metadata, and absolute paths to on-disk assets. One command installs the stdio bridge and registers it with all three official clients. The server listens on a Unix domain socket; no capture data leaves your Mac.
What the Stash MCP Server Is
Stash is a macOS menu bar app that captures screenshots and screen recordings with embedded context — app, window title, URL, accessibility tree, cursor position, click coordinates, voice transcript. The Stash MCP server exposes that library to any AI client that speaks the Model Context Protocol (the open standard Anthropic published in November 2024 and now the default integration layer for Claude Code, Cursor, Claude Desktop, and a growing list of other agents).
Instead of dragging an image into a chat window and letting the model OCR pixels, the agent issues a JSON-RPC call over a local socket, receives structured data, and reasons over real metadata. That is the whole idea.
The architecture, in one paragraph: the agent's MCP client spawns a small stdio bridge (~/.local/bin/stash-mcp) that relays line-delimited JSON-RPC 2.0 between stdin/stdout and a Unix domain socket at ~/Library/Application Support/Stash/mcp.sock. The Stash app process hosts the server. Before any request is served, the server checks the peer's codesign team identifier against an allowlist. Replies use the MCP-standard {content: [{type: "text", text: "<json>"}]} envelope. Protocol version stash-1; additive changes stay on stash-1, breaking changes would bump to stash-2.
The Five Tools
Five tools cover every query pattern an agent needs. Each is token-budgeted so the agent can triage cheaply before committing context to a full dossier.
| Tool | Signature | Returns | Typical use |
|---|---|---|---|
stash.list_recent |
list_recent(n) — max 500 |
Newest-first summaries: app, window, shortID, timestamp, kind | "Show me my last few captures" — triage before a deeper fetch |
stash.search |
search(query) |
Matching IDs + snippets across appName, windowTitle, bookmarkName, textContent, browserURL | "Find the Xcode capture about the keychain bug" |
stash.get_capture |
get_capture(id) |
Full dossier: app, window, URL, appearance, OS, a11y tree, userFocus annotations, devContext | "Open capture #8FD26F28 and explain the error" |
stash.get_bundle |
get_bundle(id) |
Video bundle: report.md content, per-frame metadata with app/window transitions, absolute paths to every frame image |
"Walk the timeline of my last recording" |
stash.render_plain |
render_plain(id) |
Plain-text rendering suitable for inline paste | "Paste the text of that capture into the PR description" |
Every capture has a stable URI — stash://bundle/<uuid> for videos, stash://capture/<uuid> for screenshots — so an agent can re-fetch the same capture across sessions, across conversations, and across tool restarts.
Install in One Command
The installer script drops the stdio bridge binary and registers it with all three officially supported clients at once.
curl -sSL https://yourstash.ai/install-claude.sh | bash
What the script does:
- Places the bridge at
~/.local/bin/stash-mcp— a small executable that relays stdin/stdout to the local socket. - Registers with Claude Code via
claude mcp add stash /Users/<you>/.local/bin/stash-mcp -s user. The CLI writes the entry into~/.claude.jsonunder the user scope. - Registers with Claude Desktop by editing
~/Library/Application Support/Claude/claude_desktop_config.json. - Registers with Cursor by editing
~/.cursor/mcp.json.
The written config entries look like this:
// ~/Library/Application Support/Claude/claude_desktop_config.json
// and ~/.cursor/mcp.json
{
"mcpServers": {
"stash": {
"command": "/Users/<you>/.local/bin/stash-mcp"
}
}
}
After install: Claude Code picks the server up on the next /mcp reconnect. Claude Desktop requires a full quit-and-relaunch. Cursor reloads on window restart.
What Flows Through MCP
A single get_capture call on a screenshot returns a structured payload the agent can parse. Here is what the payload carries:
Capture identity
- UUID and 8-character shortID (the
#XXXXXXXXyou see in the banner) - Protocol version (
stash-1), bundle version, timestamp with local timezone stash://capture/<uuid>URI for stable re-fetch
System context
- macOS version — UI drifts between major releases
- Display resolution, backing scale factor, color space, locale
- Light vs. dark appearance
App and window context
- Bundle ID — distinguishes
dev.warp.Warp-Stablefromcom.apple.Terminal, which look identical and behave differently - App name, version, window title, window frame
Browser context (when applicable)
- Full URL of the active tab and the page title — impossible to tell from pixels alone on minimalist sites
Dev context (editors and terminals)
- Active file path, language, cursor line and column — the reason "fix the bug in this function" works from a screenshot
- Visible buffer contents (the text, not a picture of it)
- For terminals: shell, current working directory, last N commands and outputs
Accessibility tree
- Full tree of every UI element in the captured app window: role, label, value, enabled state, position, children
- Pristine structured text that bypasses OCR — menu items, button labels, text field contents, all correct
Annotation shapes
- Arrows, rectangles, ellipses, callouts exported as geometric shapes with color and stroke metadata, not baked pixels — the agent can distinguish your callouts from UI elements that happen to look similar
For a get_bundle on a video, on top of all the above, you also get a session timeline, an interaction log (clicks, drags, scroll bursts, keyboard), clipboard events with the actual text inlined, voice transcript, visual events (toast detections, stuck spinner flags), a state-change heatmap, and absolute paths to every frame image. See the full stash-1 protocol spec for the exact field shape.
Real Examples from Claude Code and Cursor
Triage a recent session
You ask Claude Code: "What was the last thing I captured?" Claude calls stash.list_recent(5) and gets back:
[
{ "id": "8FD26F28-…", "shortID": "8FD26F28",
"kind": "image", "appName": "Cursor",
"windowTitle": "ContextBannerRenderer.swift",
"timestamp": "2026-04-20T16:42:00Z" },
{ "id": "AA1B7C31-…", "shortID": "AA1B7C31",
"kind": "video", "appName": "Safari",
"windowTitle": "localhost:3000 — Checkout",
"durationSec": 42.3,
"timestamp": "2026-04-20T16:15:00Z" },
...
]
Claude picks the most recent capture, calls stash.get_capture("8FD26F28-…"), and answers in one turn. Triage cost: ~500 tokens for the list plus ~3K for the dossier.
Find an old capture by topic
You ask Cursor: "Pull up the Xcode capture where I was debugging the keychain error." Cursor calls stash.search("keychain"):
[
{ "id": "BC4AE0F1-…", "shortID": "BC4AE0F1",
"kind": "image", "appName": "Xcode",
"windowTitle": "KeychainService.swift",
"snippet": "keychain access -25300",
"matchedField": "textContent",
"timestamp": "2026-04-18T10:11:00Z" }
]
One match. Cursor calls stash.get_capture("BC4AE0F1-…"), reads the a11y tree of Xcode's debugger pane, sees the error code in structured text (not OCR'd pixels), and suggests a fix grounded in the actual stack frame.
Walk a video timeline without opening the MP4
You ask Claude Code: "Review my last recording and tell me where the modal stutters." Claude calls stash.get_bundle("AA1B7C31-…") and receives:
{
"protocolVersion": "stash-1",
"bundleVersion": 2,
"captureId": "AA1B7C31-…",
"report": "<contents of report.md — ~22 KB>",
"frames": [
{ "filename": "frame_01.jpg", "timestampSec": 0.0,
"tag": "start", "appName": "Safari",
"path": "/Users/.../Recordings/AA1B7C31-…/frame_01.jpg" },
{ "filename": "frame_05.jpg", "timestampSec": 12.4,
"tag": "interaction", "appName": "Safari",
"path": "/Users/.../Recordings/AA1B7C31-…/frame_05.jpg" },
...
],
"audioPath": "/Users/.../Recordings/AA1B7C31-…/audio.m4a",
"mcpURI": "stash://bundle/AA1B7C31-…"
}
Claude reads report.md, finds the interaction marked "2.3s — click (150, 200) — modal backdrop", then calls its own Read tool on frame_05.jpg directly from the returned absolute path. It never touches video.mp4. It walks the timeline in structured order, frame by frame, and pinpoints the CSS transition timing that produces the stutter.
Token Economics vs. Chat Uploads
Every MCP payload is shaped to fit inside an agent's context window without waste. Dragging the same assets into a chat client costs 10-100x more tokens and arrives with weaker grounding.
| Query | Size over MCP | Tokens over MCP | Equivalent chat upload |
|---|---|---|---|
list_recent(20) |
~2 KB | ~2K (≈100 per capture) | Not possible — chat clients don't index history |
search("keychain") |
~1 KB | ~1K | Not possible — same reason |
get_capture — single screenshot |
1-5 KB (a11y tree) | 2-6K total dossier | 1,500-4,000 tokens for raw pixel OCR, often with hallucinated UI labels |
get_bundle — 5-min recording |
~22 KB report + paths to 30 frames | 6-10K structured + 1-3K per frame the agent chooses to read | 50K+ tokens when the client supports the upload at all; metadata and sidecars stripped |
render_plain — inline text paste |
~1 KB | ~500 | Image upload, then OCR on the other side — ~4K tokens |
The savings compound on video. A 5-minute recording at 30fps is ~9,000 raw frames. Stash hard-caps each bundle at 30 interaction-anchored frames (tagged start, interaction, end, ambient, gap-fill) — a 99%+ reduction with no loss of decision-relevant frames. Scroll bursts collapse the same way: 2,222 interactions fold into ~20 burst lines ("29.7s-30.1s, 74 scroll events"). And because get_bundle returns paths to frames, not inlined images, the agent pulls only the 5-15 frames it decides matter, not all 30.
Privacy and Peer Auth
- Local socket only. The server listens at
~/Library/Application Support/Stash/mcp.sock. No network port is opened. No cloud relay exists. MCP requests never leave your Mac. - Peer-auth allowlist. Stash checks the peer's codesign team identifier before serving any request and silently rejects unknown signers. The built-in allowlist covers Anthropic (
58LP8PCM82) and Stash itself (VJMJQKCRMC). You can extend it viamcpExtraTrustedTeamIDsin Settings. - No keychain entanglement. There are no tokens to provision. The socket is scoped to your user account by macOS file permissions and gated by the codesign check — not by API keys you have to rotate.
- Sensitive data auto-purges. Accessibility trees, selected text, file paths, git branches, and terminal CWDs are purged 24 hours after capture by default (user-adjustable from 1h to never). Screenshots and basic metadata follow your normal history retention.
- Secret redaction at capture time. API keys, JWTs, AWS credentials, Bearer tokens, and PEM blocks are detected before anything is persisted and replaced with
[redacted]. The secret never touches disk, so it never reaches MCP.
The precedence rule for any one capture is MCP > XMP > pixel banner. The MCP payload is live; the XMP blob embedded in the PNG is a frozen snapshot from capture time. When both are reachable, the agent prefers MCP so any post-capture annotation you drew is included.
Beyond Claude and Cursor
Any MCP-speaking client can talk to Stash. The installer auto-configures the three officially supported ones; everything else is a manual stdio-bridge entry pointed at the same ~/.local/bin/stash-mcp binary.
| Client | Config path | Notes |
|---|---|---|
| Claude Code | Managed by claude mcp add, persisted in ~/.claude.json |
Auto-configured by the installer |
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json |
Auto-configured by the installer; requires full quit-and-relaunch |
| Cursor | ~/.cursor/mcp.json (global) or .cursor/mcp.json (project) |
Auto-configured by the installer |
| Windsurf | Its own mcp_config.json in the client's support directory |
Manual config; same stdio bridge shape |
| Zed | Zed settings context_servers block |
Manual config; same stdio bridge shape |
| ChatGPT Desktop | Developer settings MCP pane | Manual config; same stdio bridge shape |
| Codex CLI | Codex MCP config file | Manual config; same stdio bridge shape |
The bridge binary is the same in every case. Only the config file changes.
Frequently Asked Questions
How do I install the Stash MCP server?
Run one command: curl -sSL https://yourstash.ai/install-claude.sh | bash. The installer places the stdio bridge at ~/.local/bin/stash-mcp and registers it with Claude Code, Claude Desktop, and Cursor automatically. Stash must be running for the MCP server to accept connections.
Which MCP clients work with Stash?
Three clients are supported out of the box by the one-line installer: Claude Code, Claude Desktop, and Cursor. Any other MCP-speaking client — Windsurf, Zed, ChatGPT Desktop, Codex CLI — is compatible via manual stdio-bridge config pointed at the same ~/.local/bin/stash-mcp binary.
Does Stash upload my captures to the cloud to use MCP?
No. The Stash MCP server listens on a Unix domain socket at ~/Library/Application Support/Stash/mcp.sock. It does not open a network port. No capture data leaves your Mac to serve an MCP request. The agent connects over a local socket, reads local data, and replies — no cloud round-trip.
What are the five tools Stash's MCP exposes?
The five tools are stash.list_recent(n) for newest-first summaries, stash.search(query) for substring search across app name, window title, bookmark name, text content, and browser URL, stash.get_capture(id) for the full dossier of a screenshot, stash.get_bundle(id) for a video bundle with report markdown and per-frame metadata, and stash.render_plain(id) for a plain-text rendering suitable for inline paste.
How big is the token cost of a typical query?
The cost scales with the tool. stash.list_recent returns roughly 100 tokens per capture summary. stash.get_capture on a screenshot returns 2-6K tokens for a full dossier including the a11y tree. stash.get_bundle on a video bundle returns 6-10K tokens for the structured context, plus 1-3K tokens per frame the agent chooses to read. An equivalent chat upload of the same video session can cost 50K+ tokens before any reasoning.
Can I use Stash's MCP server with Windsurf or ChatGPT Desktop?
Yes. Windsurf, Zed, ChatGPT Desktop, and Codex CLI all speak MCP over stdio and work with Stash via manual config. Point the client at the same stdio bridge at ~/.local/bin/stash-mcp that the one-line installer drops. The bridge relays stdio to the local socket — no code changes on the Stash side.
Is the a11y tree the same thing as OCR?
No. The a11y tree is the macOS accessibility tree of the captured app window — role, label, value, enabled state, position, children — read directly from the OS. OCR infers text from pixels and hallucinates on low-contrast or small UI. The a11y tree is structured and correct. A typical app window's a11y tree is 1-5 KB of text (roughly 250-1,200 tokens), versus 1,500-4,000 tokens to OCR the same screenshot.
Does the MCP server keep running when Stash is quit?
No. The MCP server is hosted inside the Stash app process. Quitting Stash shuts the socket down and any connected agent will see the connection drop. Stash is a menu bar app — keep it running in the background and MCP stays available.
Key Takeaways
- The Stash MCP server exposes your capture library to any MCP-speaking agent — Claude Code, Cursor, Claude Desktop, and compatible clients — over a local Unix domain socket.
- Five tools cover every query pattern:
list_recent,search,get_capture,get_bundle,render_plain. - One command —
curl -sSL https://yourstash.ai/install-claude.sh | bash— installs the stdio bridge and registers it with Claude Code, Claude Desktop, and Cursor in one shot. - A
get_capturepayload carries app metadata, window title, URL, OS/display context, accessibility tree, dev context (file path, cursor position, terminal history), and annotation shapes — not raw pixels. - A
get_bundlepayload carries the fullreport.md, per-frame metadata with timestamps, and absolute paths so the agent pulls only the frames it needs. - Token economics favor MCP by 10-100x versus dragging the same assets into a chat client — and the data arrives with metadata and sidecars intact rather than stripped at upload time.
- The server never opens a network port; peer codesign team IDs are checked against an allowlist; sensitive capture data auto-purges after 24 hours by default.
References
- Anthropic. "Introducing the Model Context Protocol." November 2024.
- Model Context Protocol. Specification, 2025-11-25.
- Anthropic. "Claude Code settings." Official documentation for
claude mcp addand~/.claude.json. - Cursor. "Model Context Protocol (MCP)." Official documentation for
~/.cursor/mcp.json. - Model Context Protocol. "Connect to local MCP servers." Quickstart for Claude Desktop and
claude_desktop_config.json.