The Stash protocol.
How LLMs and agentic tools read Stash captures. This page is the human-readable companion to
/llms.txt — the same spec, same precedence rules, just nicer to skim.
stash-1.
Capture data availability
Supplemental matrix for agent-facing data. Rows are ordered from raw media data to protocol-level metadata and local MCP retrieval.
Screenshot data
A regular screenshot means a normal image with no Stash context.
| # | Data type | LLM use | Regular screenshot | AI mode | MCP | Chrome extension |
|---|---|---|---|---|---|---|
| 01 | Pixels Raster image data. |
Model-visible raster input. Supports visual inspection of layout, color, shape, and apparent UI state. Anything outside the visible pixels remains inferred. | Yes | Yes | Yes | Yes |
| 02 | Visible text Text visible in the image. |
Provides visible labels, error messages, filenames, and button text through vision/OCR. Reliability depends on size, contrast, truncation, and table density. | Yes | Yes | Yes | Yes |
| 03 | Visible layout Spatial relationship of UI elements. |
Provides selected rows, disabled controls, overlapping content, alignment defects, empty states, and visible hierarchy. Semantic roles still require inference unless structured data is available. | Yes | Yes | Yes | Yes |
| 04 | App name and bundle ID Native screenshot metadata, XMP, MCP dossier. |
Identifies the owning application. App-specific behavior differs across Chrome DevTools, Cursor, Xcode, Warp, and System Settings. Bundle ID disambiguates applications with similar names or UI. | No | Yes | Yes | No |
| 05 | Window title Native screenshot metadata, XMP, MCP dossier. |
Identifies the active document, project, tab, dashboard, or terminal session when that information appears in the window title. | No | Yes | Yes | No |
| 06 | Browser URL Native browser context when available. Chrome PNG metadata for extension captures. |
Identifies route, environment, object ID, query state, and sometimes tenant/account. Prevents conflating staging with production, list routes with detail routes, or one object record with another. | No | Yes | Yes | Yes |
| 07 | Timestamp Pixel banner, XMP, MCP dossier, Chrome PNG metadata. |
Orders captures in time. Distinguishes current state, previous attempts, and before/after comparisons in a multi-turn coding session. | No | Yes | Yes | Yes |
| 08 | Appearance, OS, display, scale Pixel banner for appearance and OS. XMP and MCP for display details. |
Specifies rendering environment. UI behavior can vary by OS version, color scheme, Retina scale, display dimensions, and viewport dimensions. | No | Yes | Yes | Yes |
| 09 | Stable capture ID Pixel banner short ID, XMP captureId, History row, MCP tools. |
Provides a durable lookup key such as #A1B2C3D4. The same capture can be referenced, fetched, or searched across turns. |
No | Yes | Yes | Yes |
| 10 | Annotation summary Pixel banner line such as user focus: arrow pointing. MCP get_capture returns a fuller annotation explainer; XMP carries structured annotation metadata. |
Encodes user focus separately from the application UI. The annotation describes mark behavior while target resolution still comes from vision or accessibility data. MCP carries the most detailed explainer. | No | Yes | Yes | No |
| 11 | Annotation geometry XMP userFocus array; also returned by MCP get_capture. |
Stores coordinates for arrows, boxes, and other marks. Coordinates can be matched against OCR output, accessibility nodes, or visual regions. | No | Yes | Yes | No |
| 12 | XMP payload Embedded file metadata under the Stash namespace. |
Provides file-embedded structured metadata for file-on-disk flows such as email, Drive, and local folders when the local MCP server is not reachable. | No | Yes | No | No |
| 13 | MCP capture dossierget_capture(id). |
Returns the local structured record for a capture: app, display, accessibility, developer context, and related metadata. | No | No | Yes | Yes |
| 14 | Accessibility tree MCP capture dossier, with summarized fallback in XMP where available. |
Provides roles, labels, values, enabled state, and hierarchy as structured text. Identifies controls, selected items, table rows, menu items, and form fields without relying on OCR. | No | No | Yes | No |
| 15 | Developer context MCP capture dossier for supported editors and terminals. XMP can carry a snapshot summary. |
Maps visible editor or terminal state to repo context: file path, language, selected text, cursor position, terminal cwd, git branch, and recent commands when available. | No | No | Yes | No |
| 16 | Recent and search indexeslist_recent(n) and search(query). |
Provides retrieval over prior captures by app, window, text, URL, and recency. Supports multi-turn workflows without requiring the image to be pasted again. | No | No | Yes | Yes |
| 17 | Plain rendered imagerender_plain(id). |
Returns raw image bytes without banner or XMP. Supports evaluation of pixel-only model behavior separately from structured context. | No | No | Yes | No |
| 18 | Chrome full-page capture Chrome extension scroll-and-stitch capture. |
Captures document content beyond the visible viewport. Required when a web page issue depends on content below the fold or on relationships between distant page sections. | No | No | No | Yes |
| 19 | Chrome URL, title, domain, browser Chrome extension PNG tEXt chunks. |
Identifies the browser-origin page and runtime. Relevant for route-specific defects, auth redirects, extension behavior, and browser rendering differences. | No | No | No | Yes |
| 20 | Viewport, DPR, page height, parts Chrome PNG tEXt chunks, including capturePart and capturePartsTotal. |
Describes viewport geometry, device pixel ratio, full page height, and ordered split-image parts. Prevents treating one part as the complete page. | No | No | No | Yes |
| 21 | Image hash and DOM hash Chrome extension PNG tEXt chunks. |
Provides deterministic identifiers for the image artifact and captured DOM structure. Supports before/after comparisons and changed-structure detection. | No | No | No | Yes |
| 22 | Signature and attestation fields Local ECDSA signature, public key, attestation ID, server timestamp when available. |
Provides verification metadata for evidence workflows: local signature, public key, attestation ID, and server timestamp when available. | No | No | No | Yes |
| 23 | Downloads-to-History import Stash watches ~/Downloads for stash-*.png files when permission is granted. |
Indexes Chrome extension PNGs into local Stash History when Downloads access is granted. Imported records become retrievable through MCP recent/search tools. | No | No | No | Yes |
Video data
A regular recording means a normal screen video file with no Stash bundle metadata. There is no Chrome extension video capture path.
| # | Data type | LLM use | Regular recording | Stash recording | MCP |
|---|---|---|---|---|---|
| 01 | Video pixels Rendered frames in the media file. |
Model-visible visual input when frames are sampled. Content outside sampled frames remains unavailable to the model. | Yes | Yes | Yes |
| 02 | Audio track Spoken audio or system audio when present. |
Provides spoken context, narration, or system sounds when the recorder includes audio. Stash stores extracted audio as audio.m4a when present. |
Yes | Yes | Yes |
| 03 | Duration Total recording length. |
Orders the recording as a time span and supports timeline references such as beginning, middle, and end. | Yes | Yes | Yes |
| 04 | Sampled still framesframe_NN.jpg, capped at 30 per recording. |
Provides discrete visual checkpoints that an LLM can inspect without reading the entire video file. Stash keeps start and end bookends and samples remaining frame budget from interaction and ambient frames. | No | Yes | Yes |
| 05 | Frame order Numeric frame filenames and frame_tags.json. |
Defines the intended reading order for sampled frames, preventing the model from treating still frames as unrelated screenshots. | No | Yes | Yes |
| 06 | Per-frame app/window tagsframe_tags.json. |
Maps sampled frames to visible app and window state over time. This lets an agent track app switches, active windows, and context changes inside one recording. | No | Yes | Yes |
| 07 | Frame tag classificationstart, interaction, end, ambient, and gap-fill. |
Labels why each saved frame exists in the bundle. The tag set is stored in frame_tags.json. |
No | Yes | Yes |
| 08 | report.mdYAML frontmatter plus markdown timeline. |
Provides a text-first summary of the recording with machine-readable fields such as protocol, bundle version, capture ID, duration, frame count, audio presence, primary app, and MCP URI. | No | Yes | Yes |
| 09 | llms.txtOffline self-description inside the bundle. |
Gives an agent local instructions for reading the bundle even when the website protocol page is unavailable. | No | Yes | Yes |
| 10 | Bundle capture ID and MCP URIcaptureId and stash://bundle/<UUID>. |
Provides a durable handle for referencing, fetching, and discussing the same recording across turns. | No | Yes | Yes |
| 11 | MCP bundle fetchget_bundle(id). |
Returns the video bundle as one unit: report.md, enriched frame_tags, and absolute paths to every asset in the folder. |
No | No | Yes |
| 12 | Absolute asset paths Returned by get_bundle(id). |
Allows a local agent to inspect specific frames, audio, report files, or the original video without asking the user to locate files manually. | No | No | Yes |
Three channels, one capture
Every Stash screenshot carries structure in three places so a capture can always be resolved — even after the user pastes it into a web chat and every byte of metadata is stripped.
| Channel | Survives | What it carries |
|---|---|---|
| Pixel banner | Anything an image survives | App, window title, appearance, OS version, timestamp, shortID |
| XMP metadata | File-on-disk flows (Drive, email) | Full structured payload: annotations, a11y tree summary, dev context |
| Chrome extension PNG tEXt | Downloaded browser captures and ordered long-page parts | URL, title, browser, viewport, page height, image/DOM hashes, part fields, attestation/signature fields, generic visible page context |
| MCP server | Local RPC (same machine) | Live dossier including full a11y tree, annotation explainers, and un-summarized fields |
Screenshot banner
Rendered at the bottom of every Stash screenshot in a monospace font:
📌 Claude — Settings · dark · macOS 26.4 · 2026-04-12 14:24 · #8FD26F28
- App name (always).
— windowTitlewhen available.dark/light— system appearance at capture time.- macOS version.
- Capture timestamp to the minute, local time.
#XXXXXXXX— first 8 hex chars of the capture UUID. Call it out in a follow-up prompt or filter withsearch.
When the user drew annotations, a second line appears above the pin:
user focus: blue arrow pointing · red box enclosing
The banner describes shape behavior — never the target. Resolve the target yourself using vision and/or the a11y tree.
Standard banner explainers are intentionally compact: arrow pointing, double-arrow connecting, box enclosing, oval enclosing, blur obscuring, mark/highlight marking, callout annotating, emoji marking, and label text.
XMP payload
On auto-save-to-desktop for developer apps, the JPEG carries an XMP payload under namespace
http://stash.app/ns/1.0/. Serialized as a single JSON string under stash:payload:
{
"protocolVersion": "stash-1",
"source": "xmp-snapshot",
"captureId": "8FD26F28-…",
"mcpURI": "stash://capture/8FD26F28-…",
"snapshotTimestamp": "2026-04-12T14:24:00Z",
"appName": "Cursor",
"bundleID": "com.todesktop.230313mzl4w4u92",
"windowTitle": "ContextBannerRenderer.swift",
"appearance": "dark",
"osVersion": "macOS 26.4",
"userFocus": [
{ "type": "arrow", "color": "BA0C2F", "behavior": "pointing",
"llmInstruction": "User drew a single arrow. Treat the arrow tip/end point as the specific object, control, text, state, or visual detail the user wants called out. Do not treat it as decoration.",
"from": [120, 340], "to": [420, 300] }
],
"a11yTreeSummary": { /* trimmed: top 3 levels + labelled controls */ },
"devContext": {
"activeFilePath": "/Users/x/proj/Foo.swift",
"selectedText": "let appearance = …",
"gitBranch": "main"
}
}
Also tagged with IPTC 2025.1 Iptc4xmpExt:AISystemUsed = "Stash" so conformant
tooling can detect AI-assisted captures. Filename convention on save-to-desktop:
Stash-YYYY-MM-DD-HHmmss-{shortID}.jpg.
Chrome extension PNG metadata
The Stash Chrome extension saves full-page browser captures as PNG files named
stash-{domain}-{date}-{time}.png or
stash-{domain}-{date}-{time}-part-XX-of-YY.png. Very tall pages are split
into ordered parts, and each part is a separate image artifact.
Each PNG includes tEXt chunks with stash: keywords, including
URL, title, timestamp, browser, viewport, DPR, page height, domain, extension version,
image hash, DOM hash, security protocol, extension ID, attestation ID, capture part,
capture parts total, generic visible page context, and optional local/server signature
fields.
When Stash for Mac is running and has Downloads Folder access, it event-watches
~/Downloads, imports new stash-*.png files into History as
source app Chrome Extension, preserves the PNG metadata in
metadata_json, and dedupes by the original PNG SHA-256 hash. Prefer the
imported History/MCP item over raw PNG parsing when both are available.
Video bundles
Produced by the Stash screen recorder. A self-describing folder, indexable as one unit:
Recordings/<uuid>/
├── report.md ← YAML frontmatter + markdown timeline
├── frame_tags.json ← { "frames": [ … ] } — per-frame app/window/tag
├── llms.txt ← offline self-description
├── frame_NN.jpg ← 1-indexed, zero-padded; hard-capped at 30 per
│ recording. Start + end bookends always kept;
│ remaining budget sampled uniformly from
│ interaction frames, then ambient. Read in
│ numeric order via frame_tags.json.
├── audio.m4a ← extracted audio when present
└── video.mp4 ← original; generally skip
The report.md opens with machine-readable YAML frontmatter:
---
protocol: stash-1
bundleVersion: 2
captureId: <UUID>
duration: 42.30
frameCount: 12
hasAudio: true
primaryApp: Cursor
mcpURI: stash://bundle/<UUID>
---
MCP server
Stash ships a local Model Context Protocol server on a UNIX domain socket at
~/Library/Application Support/Stash/mcp.sock — line-delimited JSON-RPC 2.0.
Local-only by design; the socket is not exposed to the network.
Transport
Stdio MCP clients (Claude Code, Claude Desktop, Cursor, Codex CLI, Continue, Windsurf,
Zed, Warp, Cline, …) connect via a small bridge binary that relays stdin/stdout to the
socket. The bridge ships bundled inside the app at
/Applications/Stash.app/Contents/Helpers/stash-mcp — pre-signed as part of
Stash.app under Stash's Apple team with Hardened Runtime. There is no compile step and
no Apple Developer certificate required. The one-line installer at
yourstash.ai/claude verifies the bundled helper exists and is
validly signed, then points Claude Code, Claude Desktop, and Cursor at that absolute
path. Other clients are manual; see
/claude#manual-setup for full per-client snippets.
Note: GUI clients launched from Finder/Dock do not inherit your shell PATH
and ~ does not expand reliably — always use the absolute path to
stash-mcp in the command field. The bundled path
/Applications/Stash.app/Contents/Helpers/stash-mcp is already absolute.
Peer auth
Stash reads the peer's codesign team identifier on connect and silently rejects unknown
signers. Built-in allowlist: Anthropic (58LP8PCM82) and Stash itself
(VJMJQKCRMC). The bundled bridge is signed under VJMJQKCRMC,
so it connects with no extra setup. Extend the allowlist via
Stash → Settings → Privacy → Additional trusted team IDs, or toggle
Allow unsigned MCP clients — the mcpAllowUnsignedClients UserDefault.
That toggle is for advanced/local use only: enabling it lets any unsigned
local process connect to the MCP socket, so it should not be a general recommendation.
One honest caveat: team-ID allowlisting is a speedbump, not a hard boundary. Because the bridge is a pure relay, any local process can launch the trusted bridge and drive it — a confused-deputy weakness. It stops other signed apps from connecting directly; it does not stop arbitrary same-user code. A capability-token handshake is the planned future replacement.
Tools
| Tool | Purpose |
|---|---|
get_capture(id) | Full dossier for a screenshot or video capture |
get_bundle(id) | Video bundle: report.md, enriched frame_tags, absolute file paths |
list_recent(n) | Paste-flow fallback; compact summaries newest-first |
search(query) | Substring match over app / window / text / URL |
render_plain(id) | Raw JPEG bytes, no banner, no XMP (for evals) |
All tools return the MCP-standard {content: [{type: "text", text: "<json>"}]} envelope.
render_plain returns {type: "image", data: "<base64>", mimeType: "image/jpeg"}.
Annotation explainers
The screenshot banner has limited space, so it uses compact behavior labels. MCP returns the
same shapes with coordinates plus an expanded llmInstruction, a
userFocusSummary, and a userFocusGuidance field that tells agents to treat
annotations as intentional user instructions, not decoration.
| Shape | MCP explainer |
|---|---|
| Arrow | The tip/end point is the exact thing the user wants called out. |
| Double arrow | The endpoints define a relationship, comparison, distance, before/after connection, or dependency between two visible objects. |
| Rectangle | The enclosed area is deliberate focus. It may contain existing content to consider together, or it may reference a proposed/new region or shape the user wants the agent to reason about. |
| Ellipse | The enclosed area is deliberate focus. It may call out one object, a cluster, a status, or an ambiguous region the user wants isolated from the rest of the screen. |
| Blur | The region is intentionally hidden or sensitive. The agent should never attempt to retrieve, reconstruct, infer, OCR, or guess what is under the blur; it should reason from surrounding visible context only. |
| Line | The mark indicates alignment, boundary, path, separation, or a visual span. |
| Highlight | The marked content is high-priority evidence. Preserve exact visible wording when possible. |
| Callout | The leader endpoint is the referenced target and the callout marker is an ordered user note or sequence marker. |
| Emoji | The marker position carries user emphasis or sentiment attached to the nearest visible UI/content. |
| Text label | The text is user-authored instruction or context for the nearest visible object or region. |
| Multiple colors | If annotations use multiple colors, assume the user is separating distinct situations, categories, priorities, or comparisons on the screenshot unless the visible context proves otherwise. |
Privacy model
- Everything Stash captures stays on the user's Mac.
- The MCP socket is local-only.
- Sensitive capture data (a11y tree, selected text, file paths, git branches, terminal CWD) is purged 24 hours after capture by default. User-adjustable 1 hour → never.
- Screenshots and basic metadata follow the user's normal history retention.
- Detected secrets (API keys, tokens, private keys) are redacted at capture time and never stored.
- Accessibility tree capture is skipped for password managers and their web UIs.
- Stash servers never see any capture data.
Versioning
stash-1 is stable. Additive changes (new fields, new tools) land as v1.1, v1.2
and do not break v1 readers. A breaking change bumps to stash-2. Prefer live
MCP data over a frozen XMP snapshot when both are available.
yourstash.ai/llms.txt. Point your agent's config there
for a one-shot sync; it's the same spec as above, optimized for plain-text consumption.
Frequently asked questions
What is the stash-1 protocol?
The stash-1 protocol is Stash's specification for how LLMs and agentic tools read Stash captures across native screenshot channels, Chrome extension PNG metadata, and a local Model Context Protocol (MCP) server. Each channel is a fallback for the layer above.
Which channel is most authoritative — banner, XMP, or MCP?
For native Stash captures, the MCP server is most authoritative when the original Mac is reachable, because it carries the full live dossier including the un-summarized accessibility tree. XMP is next — it survives file-on-disk flows like email and Drive. The pixel banner is the last-resort channel that survives anything an image survives, including pasted-into-chat captures where every byte of metadata has been stripped. For Chrome extension captures, prefer the imported Stash History/MCP item, then the PNG tEXt chunks, then visible pixels.
Can I read Stash captures from a remote machine?
No. The MCP server binds to a UNIX domain socket under the user account and is not exposed to the network by design. To read captures remotely, agents must rely on the XMP payload embedded in the file or the pixel banner.
How does Stash authenticate MCP clients?
Stash reads the connecting peer's codesign team identifier on connect and silently rejects unknown signers. The built-in allowlist trusts Anthropic (team 58LP8PCM82) and Stash itself (team VJMJQKCRMC). The bridge binary ships bundled inside Stash.app, pre-signed under team VJMJQKCRMC, so it connects with no extra setup. Additional team IDs can be added via Settings → Privacy, or the mcpAllowUnsignedClients toggle can bypass the check entirely — but that lets any unsigned local process connect to the socket, so it's for advanced local use only, not a general recommendation. Note that team-ID allowlisting is a speedbump, not a hard boundary: because the bridge is a pure relay, any local process can launch the trusted bridge and drive it (a confused-deputy weakness). It stops other signed apps from connecting directly; it does not stop arbitrary same-user code. A capability-token handshake is the planned future replacement.
Are Stash video bundles indexable by AI tools?
Yes. Each bundle is a self-describing folder containing report.md with YAML frontmatter, frame_tags.json with per-frame app and window metadata, an offline llms.txt, sampled JPEG frames (capped at 30 per recording), and extracted audio. Agents read it as a single unit via get_bundle().
Will the stash-1 protocol ever break?
No. The stash-1 protocol is stable. Additive changes — new fields, new tools — land as v1.1, v1.2 and do not break existing v1 readers. A breaking change would bump to stash-2.
Key takeaways
- Native captures carry context through the pixel banner, XMP metadata, and MCP server.
- Chrome extension browser captures carry context in PNG tEXt chunks and are imported into Stash History from Downloads when available.
- Clients should prefer the highest-fidelity available source: MCP/History first, embedded file metadata next, visible pixels last.
- The pixel banner describes shape behavior (arrow pointing, box enclosing) — never the target. Agents resolve targets via vision plus the a11y tree.
- The MCP server runs on a local UNIX domain socket and is never exposed to the network.
- Peer auth is by codesign team-ID allowlist; unknown signers are silently rejected.
- Sensitive fields (a11y tree, selected text, file paths, git branches, terminal CWD) auto-purge 24 hours after capture.
stash-1is stable; only additive changes within v1. Breaking changes bump tostash-2.
Related reading
- Stash for Claude — install & setup — the operator-facing companion to this spec.
- What is MCP (Model Context Protocol)? — background on the protocol Stash speaks.
- Stash MCP server for Claude Code and Cursor — practical integration walkthrough.
- Screenshot context for AI coding — why captures need metadata, not just pixels.
- /llms.txt — the machine-readable version of this page.