← Back to getcrawdad.dev

How Crawdad Handles Your Data

Full technical transparency. Zero-knowledge by architecture, not by promise.

1. The Architecture

Your Machine

Agent Crawdad Proxy (:7744–:7748) 7-layer detection Real LLM API
                                                   ↑↓
           Signed metering counts only → Crawdad Cloud (billing, signatures)

Proxy ports: 7748 (Anthropic) · 7747 (OpenAI) · 7746 (Google) · 7745 (xAI) · 7744 (NVIDIA NIM)
Local services: 7749 (sidecar control + security API) · 7750 (dashboard)

Crawdad runs as a transparent HTTP proxy on your machine. Your agent points its base URL at the local Crawdad port (e.g., ANTHROPIC_BASE_URL=http://localhost:7748) and every request flows through the 7-layer detection pipeline before reaching the upstream LLM. Only signed metering packets (event counts, Ed25519-signed, sequence-numbered) transmit upstream. Raw prompts, responses, action parameters, and PII never leave your machine. This is enforced by architecture, not policy — even Crawdad as a company cannot see your content.

7-layer detection pipeline

Every decision happens in-proxy before the upstream LLM sees the traffic. Pattern-only latency is sub-millisecond; the ML layer adds inference time that varies by platform (see Getting Started for current platform notes).

Per-agent attribution and trust levels

When a connection lands on a proxy port, Crawdad resolves the caller's TCP socket to its owning PID, then walks up the parent process tree calling the agent classifier at each step until a known signature matches (claude, cursor, aider, etc.) or the walk hits PID 1. Socket ownership on macOS is resolved via lsof (excluding the sidecar's own PID so it isn't mistaken for the caller); Linux parses /proc/net/tcp and scans /proc/<pid>/fd/. The result is cached per-connection so HTTP keep-alive and HTTP/2 multiplexed requests don't pay the lookup cost repeatedly. Attribution failure is always fail-open — unknown callers get full scanning with no restrictions, never blocked.

Every agent runs at one of four trust levelsAutonomous, Monitored, Restricted, Quarantined. The level controls which detection layers run (Autonomous skips L7 for speed; the others run the full L1–L7 pipeline) and what per-agent enforcement applies (Restricted evaluates per-tool-call restrictions; Quarantined returns HTTP 403 before detection runs). Because enforcement keys on attributed agent rather than provider port, blocking one agent never affects another on the same provider.

Automatic escalation. When a detection fires on an attributed agent, the system ratchets its trust level down one step according to a fixed rule set — score-based, rate-based, and category-based (exfiltration patterns quarantine immediately from any level). The full trigger table is in the FAQ and in the crawdad-sidecar/src/trust_escalation.rs source. Every transition is audited with its trigger string, previous level, and whether it was manual or automatic. A background loop walks auto-escalated agents back toward Autonomous after a configurable quiet period; manual changes are never auto-reverted.

2. Remote control plane

Crawdad v0.10.0 adds an optional remote control plane so operators can monitor agents and change trust levels from a paired phone without relaxing the zero-knowledge boundary. The design keeps the relay on a strict need-to-know diet — it forwards encrypted blobs and opaque device IDs and nothing else.

Your Machine — keeps all content
Sidecar encrypted state snapshot (AES-256-GCM, key held by paired device)
      60-second cadence + on-change push
   WebSocket
Gateway relay (stateless, opaque blobs, opaque device IDs)
   WebSocket
Paired phone (decrypts locally; commands Ed25519-signed by the phone)

Pairing

On the desktop, Settings → Connect Device renders a QR that encodes a pairing-session token plus a Curve25519 public key. The phone scans it in the browser, runs the handshake over the LAN, and both sides derive a shared symmetric key plus a per-device Ed25519 signing key. The device record (public key, permission scope, created_at, last_seen) is stored in a dedicated pairing_db SQLite file with journal_mode=WAL and synchronous=NORMAL so a paired device survives restart. First pairing needs both devices on the same WiFi; every subsequent connect uses the relay and works from anywhere.

What the relay sees

It does not see prompt text, response text, tool-call arguments, PII values, detection content, trust-level strings, or agent names. It also cannot correlate devices to tenants or identify fleet membership, because the only identifier crossing the boundary is the opaque per-device token.

Remote commands

Commands from the phone (change trust level, release quarantine, acknowledge alert) are Ed25519-signed by the phone's device key and carry a monotonic nonce. The sidecar verifies the signature against the paired-device record, rejects replays, rate-limits per device (5 trust changes per 10 minutes, 1 quarantine release per hour), and records every accepted command in the audit log. A PIN gate can be required for sensitive actions. The desktop-side kill switch (Settings → Paired Devices → Disconnect) immediately invalidates a device's key so any in-flight or future command is rejected.

Out of scope for the relay

Local telemetry tables that feed the snapshot

The encrypted state snapshot pushed to the phone is assembled on the sidecar from three local SQLite tables. Only the aggregated metadata below crosses the trust boundary — never the rows themselves.

State snapshot shape

A snapshot is a JSON object containing:

The snapshot is pushed over the WebSocket every 60s and on change, and served over plain HTTP at /api/v1/overview/state-snapshot for phones on the same LAN where crypto.subtle (AES-GCM) isn't available.

Alert push path

On every blocked detection, the sidecar fires a fire-and-forget metadata-only alert payload — shape {event_type, agent_name, machine_id, detection_category, pattern_name, verdict, severity, timestamp} — AES-256-GCM encrypts it with each paired device's symmetric key, and POSTs the ciphertext + opaque device ID to the gateway relay's /api/relay/push. The relay forwards to the phone's open WebSocket. Zero prompt text, zero response text, zero tool arguments. Auto-escalation events (trust-level changes) and permissioned quarantines follow the same pipeline.

Relay fallback when the LAN IP changes

After a desktop reboot or IP renewal, the phone's cached direct-API URL is often stale. The phone races a 3-second direct probe against that URL at load time and falls back to the encrypted relay automatically when it fails. The user never sees a blank screen; the snapshot pulls in either direct over HTTP or encrypted-relay-over-WebSocket, whichever answers first. A 60s re-probe brings direct back when the IP stabilizes or the phone rejoins the LAN.

3. What Never Leaves Your Machine

Even if Crawdad's servers were fully compromised, there is no customer content to retrieve. This is enforced by architecture, not policy.

4. What Is Transmitted Upstream

The sidecar sends a signed metering packet on a fixed cadence containing only operation counts:

{
  "tenant_id": "t_abc123",
  "device_id": "d_xyz789",
  "sequence": 42,
  "counts": {
    "firewall_scans": 142,
    "action_authorizations": 89,
    "outbound_scans": 142,
    "memory_writes": 23,
    "privacy_classifications": 67
  }
}

The packet is signed with the device's Ed25519 key. Any tampering invalidates the signature. No prompt text, no response text, no tool-call arguments, no PII values — only how many operations ran. Sequence numbers prevent replay.

5. What Is Stored Locally

The data directory is created with 0700 permissions on first run; the sidecar refuses to start on a group- or world-accessible path. Paths: ~/Library/Application Support/crawdad/ on macOS, ~/.local/share/crawdad/ on Linux, %APPDATA%\crawdad\ on Windows.

6. Formal Verification

Five architectural invariants are enforced at every checkpoint:

  1. No content bytes ever cross the metering-packet boundary
  2. No prompt text ever appears in the audit chain payload
  3. PII excerpts are only in redacted form outside the proxy process
  4. Device config never re-enters memory after being hashed
  5. Policy decisions log the content hash, not the content

The crawdad-zk-verify crate property-tests all five invariants over 1,000,000 iterations on every release. A standalone getcrawdad/zk-verify MIT reproducer is on the roadmap so any operator can confirm the invariants against a running sidecar independently.

Runtime attestation

The sidecar exposes a signed attestation on its control port:

$ curl http://127.0.0.1:7749/v1/verify

{
  "architecture": "Zero-knowledge sidecar v0.10.0",
  "data_never_leaves": [
    "Prompt content",
    "Action parameters",
    "Agent responses",
    "PII values",
    "Memory content",
    "Device keys"
  ],
  "data_sent_upstream": [
    "Signed metering counts (operation totals only)",
    "Device certificate renewal requests"
  ],
  "audit_chain_valid": true,
  "sidecar_bound_to": "127.0.0.1:7749"
}

The audit database is inspectable directly. Its schema contains no content, message, or text columns:

$ sqlite3 "$HOME/Library/Application Support/crawdad/audit.db" ".schema"
-- Columns: entry_id, timestamp, endpoint, decision,
--          risk_score, content_hash, pii_categories, chain_hash

7. Threat Model

What we protect against

What we do not protect against

8. Deployment Modes

Cloud-hosted inspection (server-side content processing) is not offered. It would violate the zero-knowledge claim — the entire product architecture depends on the trust boundary being on the customer's machine, not on a Crawdad server.