TL;DR

The Stash MCP server is a local Model Context Protocol endpoint that lets Claude Code, Cursor, Claude Desktop, and any other MCP-speaking client query your screenshots and recordings as structured context. Five tools — list_recent, search, get_capture, get_bundle, render_plain — return app metadata, window titles, URLs, accessibility trees, per-frame video metadata, and absolute paths to on-disk assets. One command installs the stdio bridge and registers it with all three official clients. The server listens on a Unix domain socket; no capture data leaves your Mac.


What the Stash MCP Server Is

Stash is a macOS menu bar app that captures screenshots and screen recordings with embedded context — app, window title, URL, accessibility tree, cursor position, click coordinates, voice transcript. The Stash MCP server exposes that library to any AI client that speaks the Model Context Protocol (the open standard Anthropic published in November 2024 and now the default integration layer for Claude Code, Cursor, Claude Desktop, and a growing list of other agents).

Instead of dragging an image into a chat window and letting the model OCR pixels, the agent issues a JSON-RPC call over a local socket, receives structured data, and reasons over real metadata. That is the whole idea.

The architecture, in one paragraph: the agent's MCP client spawns a small stdio bridge (~/.local/bin/stash-mcp) that relays line-delimited JSON-RPC 2.0 between stdin/stdout and a Unix domain socket at ~/Library/Application Support/Stash/mcp.sock. The Stash app process hosts the server. Before any request is served, the server checks the peer's codesign team identifier against an allowlist. Replies use the MCP-standard {content: [{type: "text", text: "<json>"}]} envelope. Protocol version stash-1; additive changes stay on stash-1, breaking changes would bump to stash-2.


The Five Tools

Five tools cover every query pattern an agent needs. Each is token-budgeted so the agent can triage cheaply before committing context to a full dossier.

Tool Signature Returns Typical use
stash.list_recent list_recent(n) — max 500 Newest-first summaries: app, window, shortID, timestamp, kind "Show me my last few captures" — triage before a deeper fetch
stash.search search(query) Matching IDs + snippets across appName, windowTitle, bookmarkName, textContent, browserURL "Find the Xcode capture about the keychain bug"
stash.get_capture get_capture(id) Full dossier: app, window, URL, appearance, OS, a11y tree, userFocus annotations, devContext "Open capture #8FD26F28 and explain the error"
stash.get_bundle get_bundle(id) Video bundle: report.md content, per-frame metadata with app/window transitions, absolute paths to every frame image "Walk the timeline of my last recording"
stash.render_plain render_plain(id) Plain-text rendering suitable for inline paste "Paste the text of that capture into the PR description"

Every capture has a stable URI — stash://bundle/<uuid> for videos, stash://capture/<uuid> for screenshots — so an agent can re-fetch the same capture across sessions, across conversations, and across tool restarts.


Install in One Command

The installer script drops the stdio bridge binary and registers it with all three officially supported clients at once.

curl -sSL https://yourstash.ai/install-claude.sh | bash

What the script does:

The written config entries look like this:

// ~/Library/Application Support/Claude/claude_desktop_config.json
// and ~/.cursor/mcp.json
{
  "mcpServers": {
    "stash": {
      "command": "/Users/<you>/.local/bin/stash-mcp"
    }
  }
}

After install: Claude Code picks the server up on the next /mcp reconnect. Claude Desktop requires a full quit-and-relaunch. Cursor reloads on window restart.


What Flows Through MCP

A single get_capture call on a screenshot returns a structured payload the agent can parse. Here is what the payload carries:

Capture identity

System context

App and window context

Browser context (when applicable)

Dev context (editors and terminals)

Accessibility tree

Annotation shapes

For a get_bundle on a video, on top of all the above, you also get a session timeline, an interaction log (clicks, drags, scroll bursts, keyboard), clipboard events with the actual text inlined, voice transcript, visual events (toast detections, stuck spinner flags), a state-change heatmap, and absolute paths to every frame image. See the full stash-1 protocol spec for the exact field shape.


Real Examples from Claude Code and Cursor

Triage a recent session

You ask Claude Code: "What was the last thing I captured?" Claude calls stash.list_recent(5) and gets back:

[
  { "id": "8FD26F28-…", "shortID": "8FD26F28",
    "kind": "image", "appName": "Cursor",
    "windowTitle": "ContextBannerRenderer.swift",
    "timestamp": "2026-04-20T16:42:00Z" },
  { "id": "AA1B7C31-…", "shortID": "AA1B7C31",
    "kind": "video", "appName": "Safari",
    "windowTitle": "localhost:3000 — Checkout",
    "durationSec": 42.3,
    "timestamp": "2026-04-20T16:15:00Z" },
  ...
]

Claude picks the most recent capture, calls stash.get_capture("8FD26F28-…"), and answers in one turn. Triage cost: ~500 tokens for the list plus ~3K for the dossier.

Find an old capture by topic

You ask Cursor: "Pull up the Xcode capture where I was debugging the keychain error." Cursor calls stash.search("keychain"):

[
  { "id": "BC4AE0F1-…", "shortID": "BC4AE0F1",
    "kind": "image", "appName": "Xcode",
    "windowTitle": "KeychainService.swift",
    "snippet": "keychain access -25300",
    "matchedField": "textContent",
    "timestamp": "2026-04-18T10:11:00Z" }
]

One match. Cursor calls stash.get_capture("BC4AE0F1-…"), reads the a11y tree of Xcode's debugger pane, sees the error code in structured text (not OCR'd pixels), and suggests a fix grounded in the actual stack frame.

Walk a video timeline without opening the MP4

You ask Claude Code: "Review my last recording and tell me where the modal stutters." Claude calls stash.get_bundle("AA1B7C31-…") and receives:

{
  "protocolVersion": "stash-1",
  "bundleVersion": 2,
  "captureId": "AA1B7C31-…",
  "report": "<contents of report.md — ~22 KB>",
  "frames": [
    { "filename": "frame_01.jpg", "timestampSec": 0.0,
      "tag": "start", "appName": "Safari",
      "path": "/Users/.../Recordings/AA1B7C31-…/frame_01.jpg" },
    { "filename": "frame_05.jpg", "timestampSec": 12.4,
      "tag": "interaction", "appName": "Safari",
      "path": "/Users/.../Recordings/AA1B7C31-…/frame_05.jpg" },
    ...
  ],
  "audioPath": "/Users/.../Recordings/AA1B7C31-…/audio.m4a",
  "mcpURI": "stash://bundle/AA1B7C31-…"
}

Claude reads report.md, finds the interaction marked "2.3s — click (150, 200) — modal backdrop", then calls its own Read tool on frame_05.jpg directly from the returned absolute path. It never touches video.mp4. It walks the timeline in structured order, frame by frame, and pinpoints the CSS transition timing that produces the stutter.


Token Economics vs. Chat Uploads

Every MCP payload is shaped to fit inside an agent's context window without waste. Dragging the same assets into a chat client costs 10-100x more tokens and arrives with weaker grounding.

Query Size over MCP Tokens over MCP Equivalent chat upload
list_recent(20) ~2 KB ~2K (≈100 per capture) Not possible — chat clients don't index history
search("keychain") ~1 KB ~1K Not possible — same reason
get_capture — single screenshot 1-5 KB (a11y tree) 2-6K total dossier 1,500-4,000 tokens for raw pixel OCR, often with hallucinated UI labels
get_bundle — 5-min recording ~22 KB report + paths to 30 frames 6-10K structured + 1-3K per frame the agent chooses to read 50K+ tokens when the client supports the upload at all; metadata and sidecars stripped
render_plain — inline text paste ~1 KB ~500 Image upload, then OCR on the other side — ~4K tokens

The savings compound on video. A 5-minute recording at 30fps is ~9,000 raw frames. Stash hard-caps each bundle at 30 interaction-anchored frames (tagged start, interaction, end, ambient, gap-fill) — a 99%+ reduction with no loss of decision-relevant frames. Scroll bursts collapse the same way: 2,222 interactions fold into ~20 burst lines ("29.7s-30.1s, 74 scroll events"). And because get_bundle returns paths to frames, not inlined images, the agent pulls only the 5-15 frames it decides matter, not all 30.


Privacy and Peer Auth

The precedence rule for any one capture is MCP > XMP > pixel banner. The MCP payload is live; the XMP blob embedded in the PNG is a frozen snapshot from capture time. When both are reachable, the agent prefers MCP so any post-capture annotation you drew is included.


Beyond Claude and Cursor

Any MCP-speaking client can talk to Stash. The installer auto-configures the three officially supported ones; everything else is a manual stdio-bridge entry pointed at the same ~/.local/bin/stash-mcp binary.

Client Config path Notes
Claude Code Managed by claude mcp add, persisted in ~/.claude.json Auto-configured by the installer
Claude Desktop ~/Library/Application Support/Claude/claude_desktop_config.json Auto-configured by the installer; requires full quit-and-relaunch
Cursor ~/.cursor/mcp.json (global) or .cursor/mcp.json (project) Auto-configured by the installer
Windsurf Its own mcp_config.json in the client's support directory Manual config; same stdio bridge shape
Zed Zed settings context_servers block Manual config; same stdio bridge shape
ChatGPT Desktop Developer settings MCP pane Manual config; same stdio bridge shape
Codex CLI Codex MCP config file Manual config; same stdio bridge shape

The bridge binary is the same in every case. Only the config file changes.


Frequently Asked Questions

How do I install the Stash MCP server?

Run one command: curl -sSL https://yourstash.ai/install-claude.sh | bash. The installer places the stdio bridge at ~/.local/bin/stash-mcp and registers it with Claude Code, Claude Desktop, and Cursor automatically. Stash must be running for the MCP server to accept connections.

Which MCP clients work with Stash?

Three clients are supported out of the box by the one-line installer: Claude Code, Claude Desktop, and Cursor. Any other MCP-speaking client — Windsurf, Zed, ChatGPT Desktop, Codex CLI — is compatible via manual stdio-bridge config pointed at the same ~/.local/bin/stash-mcp binary.

Does Stash upload my captures to the cloud to use MCP?

No. The Stash MCP server listens on a Unix domain socket at ~/Library/Application Support/Stash/mcp.sock. It does not open a network port. No capture data leaves your Mac to serve an MCP request. The agent connects over a local socket, reads local data, and replies — no cloud round-trip.

What are the five tools Stash's MCP exposes?

The five tools are stash.list_recent(n) for newest-first summaries, stash.search(query) for substring search across app name, window title, bookmark name, text content, and browser URL, stash.get_capture(id) for the full dossier of a screenshot, stash.get_bundle(id) for a video bundle with report markdown and per-frame metadata, and stash.render_plain(id) for a plain-text rendering suitable for inline paste.

How big is the token cost of a typical query?

The cost scales with the tool. stash.list_recent returns roughly 100 tokens per capture summary. stash.get_capture on a screenshot returns 2-6K tokens for a full dossier including the a11y tree. stash.get_bundle on a video bundle returns 6-10K tokens for the structured context, plus 1-3K tokens per frame the agent chooses to read. An equivalent chat upload of the same video session can cost 50K+ tokens before any reasoning.

Can I use Stash's MCP server with Windsurf or ChatGPT Desktop?

Yes. Windsurf, Zed, ChatGPT Desktop, and Codex CLI all speak MCP over stdio and work with Stash via manual config. Point the client at the same stdio bridge at ~/.local/bin/stash-mcp that the one-line installer drops. The bridge relays stdio to the local socket — no code changes on the Stash side.

Is the a11y tree the same thing as OCR?

No. The a11y tree is the macOS accessibility tree of the captured app window — role, label, value, enabled state, position, children — read directly from the OS. OCR infers text from pixels and hallucinates on low-contrast or small UI. The a11y tree is structured and correct. A typical app window's a11y tree is 1-5 KB of text (roughly 250-1,200 tokens), versus 1,500-4,000 tokens to OCR the same screenshot.

Does the MCP server keep running when Stash is quit?

No. The MCP server is hosted inside the Stash app process. Quitting Stash shuts the socket down and any connected agent will see the connection drop. Stash is a menu bar app — keep it running in the background and MCP stays available.


Key Takeaways

  • The Stash MCP server exposes your capture library to any MCP-speaking agent — Claude Code, Cursor, Claude Desktop, and compatible clients — over a local Unix domain socket.
  • Five tools cover every query pattern: list_recent, search, get_capture, get_bundle, render_plain.
  • One command — curl -sSL https://yourstash.ai/install-claude.sh | bash — installs the stdio bridge and registers it with Claude Code, Claude Desktop, and Cursor in one shot.
  • A get_capture payload carries app metadata, window title, URL, OS/display context, accessibility tree, dev context (file path, cursor position, terminal history), and annotation shapes — not raw pixels.
  • A get_bundle payload carries the full report.md, per-frame metadata with timestamps, and absolute paths so the agent pulls only the frames it needs.
  • Token economics favor MCP by 10-100x versus dragging the same assets into a chat client — and the data arrives with metadata and sidecars intact rather than stripped at upload time.
  • The server never opens a network port; peer codesign team IDs are checked against an allowlist; sensitive capture data auto-purges after 24 hours by default.

References