How Crawdad Handles Your Data

Full technical transparency. Zero-knowledge by architecture, not by promise.

1. The Architecture

Your Machine

Agent → Crawdad Proxy (:7744–:7748) → 7-layer detection → Real LLM API
↑↓
Signed metering counts only → Crawdad Cloud (billing, signatures)

Proxy ports: 7748 (Anthropic) · 7747 (OpenAI) · 7746 (Google) · 7745 (xAI) · 7744 (NVIDIA NIM)
Local services: 7749 (sidecar control + security API) · 7750 (dashboard)

Crawdad runs as a transparent HTTP proxy on your machine. Your agent points its base URL at the local Crawdad port (e.g., ANTHROPIC_BASE_URL=http://localhost:7748) and every request flows through the 7-layer detection pipeline before reaching the upstream LLM. Only signed metering packets (event counts, Ed25519-signed, sequence-numbered) transmit upstream. Raw prompts, responses, action parameters, and PII never leave your machine. This is enforced by architecture, not policy — even Crawdad as a company cannot see your content.

7-layer detection pipeline

L1 — Pattern matching — compiled regexes for known attack shapes, scoped to the latest content block of each request after the session is established
L2 — ML semantic classifier — custom-trained DeBERTa-small ONNX model (FP16, ~272 MB). The sidecar downloads the model and platform libonnxruntime in the background after install; ML activates on the next restart. Auto on macOS ARM64, Linux x86_64, and Linux ARM64. Intel Mac stays pattern-only until upstream ONNX Runtime ships a 1.24+ wheel for that target.
L3 — Indirect injection — scans retrieved documents, tool outputs, and file uploads for injected instructions
L4 — Session context & provenance — escalation tracking, post-block retry detection, cross-session correlation
L5 — Output guard — scans the response path, redacts or blocks where required
L6 — PII & credential detector — 15 PII categories plus API keys, secrets, internal URLs, bulk data patterns
L7 — LLM-critic — optional local-model judge for ambiguous cases; uses a local model you provide

Every decision happens in-proxy before the upstream LLM sees the traffic. Pattern-only latency is sub-millisecond; the ML layer adds inference time that varies by platform (see Getting Started for current platform notes).

Per-agent attribution and trust levels

When a connection lands on a proxy port, Crawdad resolves the caller's TCP socket to its owning PID, then walks up the parent process tree calling the agent classifier at each step until a known signature matches (claude, cursor, aider, etc.) or the walk hits PID 1. Socket ownership on macOS is resolved via lsof (excluding the sidecar's own PID so it isn't mistaken for the caller); Linux parses /proc/net/tcp and scans /proc/<pid>/fd/. The result is cached per-connection so HTTP keep-alive and HTTP/2 multiplexed requests don't pay the lookup cost repeatedly. Attribution failure is always fail-open — unknown callers get full scanning with no restrictions, never blocked.

Every agent runs at one of four trust levels — Autonomous, Monitored, Restricted, Quarantined. The level controls which detection layers run (Autonomous skips L7 for speed; the others run the full L1–L7 pipeline) and what per-agent enforcement applies (Restricted evaluates per-tool-call restrictions; Quarantined returns HTTP 403 before detection runs). Because enforcement keys on attributed agent rather than provider port, blocking one agent never affects another on the same provider.

Automatic escalation. When a detection fires on an attributed agent, the system ratchets its trust level down one step according to a fixed rule set — score-based, rate-based, and category-based (exfiltration patterns quarantine immediately from any level). The full trigger table is in the FAQ and in the crawdad-sidecar/src/trust_escalation.rs source. Every transition is audited with its trigger string, previous level, and whether it was manual or automatic. A background loop walks auto-escalated agents back toward Autonomous after a configurable quiet period; manual changes are never auto-reverted.

2. Remote control plane

Crawdad v0.10.0 adds an optional remote control plane so operators can monitor agents and change trust levels from a paired phone without relaxing the zero-knowledge boundary. The design keeps the relay on a strict need-to-know diet — it forwards encrypted blobs and opaque device IDs and nothing else.

Your Machine — keeps all content
Sidecar → encrypted state snapshot (AES-256-GCM, key held by paired device)
      60-second cadence + on-change push
  ↓ WebSocket
Gateway relay (stateless, opaque blobs, opaque device IDs)
  ↓ WebSocket
Paired phone (decrypts locally; commands Ed25519-signed by the phone)

Pairing

On the desktop, Settings → Connect Device renders a QR that encodes a pairing-session token plus a Curve25519 public key. The phone scans it in the browser, runs the handshake over the LAN, and both sides derive a shared symmetric key plus a per-device Ed25519 signing key. The device record (public key, permission scope, created_at, last_seen) is stored in a dedicated pairing_db SQLite file with journal_mode=WAL and synchronous=NORMAL so a paired device survives restart. First pairing needs both devices on the same WiFi; every subsequent connect uses the relay and works from anywhere.

What the relay sees

Opaque ciphertext blobs it cannot decrypt
Opaque device IDs (random 128-bit tokens, per-device, not tied to tenant or user identity)
Connection liveness and message size

It does not see prompt text, response text, tool-call arguments, PII values, detection content, trust-level strings, or agent names. It also cannot correlate devices to tenants or identify fleet membership, because the only identifier crossing the boundary is the opaque per-device token.

Remote commands

Commands from the phone (change trust level, release quarantine, acknowledge alert) are Ed25519-signed by the phone's device key and carry a monotonic nonce. The sidecar verifies the signature against the paired-device record, rejects replays, rate-limits per device (5 trust changes per 10 minutes, 1 quarantine release per hour), and records every accepted command in the audit log. A PIN gate can be required for sensitive actions. The desktop-side kill switch (Settings → Paired Devices → Disconnect) immediately invalidates a device's key so any in-flight or future command is rejected.

Out of scope for the relay

The relay never stores content, plaintext state, or correlation metadata
The relay cannot issue commands — it can only forward signed messages from a known device to a known sidecar
Gateway operators cannot read pairing keys (they're never sent upstream), so even a fully compromised gateway cannot spoof a device or decrypt a snapshot

Local telemetry tables that feed the snapshot

The encrypted state snapshot pushed to the phone is assembled on the sidecar from three local SQLite tables. Only the aggregated metadata below crosses the trust boundary — never the rows themselves.

agent_activity — one row per proxied LLM request (blocked or clean). Columns: agent_identity_id, timestamp, action_type, tool_name, verdict. The Agent Behavior Map widget, per-agent sparklines on mobile, and "is this agent active?" all derive from this. Table is bounded: every 100th insert prunes rows older than 1 hour. Reserved for the dashboard and phone snapshot; never leaves the machine row-by-row.
decision_log — one row per detection event (pattern match, ML classification, indirect injection hit, output-guard flag, restriction violation). Columns: decision_id, timestamp, event_type, severity, category, pattern_name, action_taken, agent_identity_id, confidence, reviewer_decision. Drives the Attack Pattern Intelligence widget (category bars with trend arrows vs. the previous equal-length period, top patterns, new patterns) and the last 50 blocked-detection alert cards in the mobile Alerts feed.
agent_identities — trust level + routable flag + total counters + last_seen per agent. Drives the security-score sub-scores (trust posture = share of routable agents at Autonomous/Monitored) and the per-agent trust dots throughout mobile + dashboard.

State snapshot shape

A snapshot is a JSON object containing:

agents[] and agents_detail[] — compact trust-level list + per-agent 24h hourly sparkline + top tools + 24h request/blocked counts.
health{} — uptime, requests and blocks for today / this week / this month.
security_score{} — 0-100 overall plus three sub-scores (trust posture, detection health, alert hygiene) and a trend arrow.
recent_alerts[] — last 50 blocked detections joined to agent_identities for display name. Every row is metadata-only.
privacy.anonymize_tool_names — current state of the tool-name anonymization toggle (see §5).

The snapshot is pushed over the WebSocket every 60s and on change, and served over plain HTTP at /api/v1/overview/state-snapshot for phones on the same LAN where crypto.subtle (AES-GCM) isn't available.

Alert push path

On every blocked detection, the sidecar fires a fire-and-forget metadata-only alert payload — shape {event_type, agent_name, machine_id, detection_category, pattern_name, verdict, severity, timestamp} — AES-256-GCM encrypts it with each paired device's symmetric key, and POSTs the ciphertext + opaque device ID to the gateway relay's /api/relay/push. The relay forwards to the phone's open WebSocket. Zero prompt text, zero response text, zero tool arguments. Auto-escalation events (trust-level changes) and permissioned quarantines follow the same pipeline.

Relay fallback when the LAN IP changes

After a desktop reboot or IP renewal, the phone's cached direct-API URL is often stale. The phone races a 3-second direct probe against that URL at load time and falls back to the encrypted relay automatically when it fails. The user never sees a blank screen; the snapshot pulls in either direct over HTTP or encrypted-relay-over-WebSocket, whichever answers first. A 60s re-probe brings direct back when the IP stabilizes or the phone rejoins the LAN.

3. What Never Leaves Your Machine

✓ Prompt content — the inbound text scanned by the firewall
✓ Action parameters — tool-call arguments evaluated by the policy engine
✓ Agent responses — outbound text scanned for PII and credential leakage
✓ PII values — detected personally identifiable information
✓ Memory content — agent memory entries and context
✓ Device keys — stored in your system keychain, never transmitted

Even if Crawdad's servers were fully compromised, there is no customer content to retrieve. This is enforced by architecture, not policy.

4. What Is Transmitted Upstream

The sidecar sends a signed metering packet on a fixed cadence containing only operation counts:

{
  "tenant_id": "t_abc123",
  "device_id": "d_xyz789",
  "sequence": 42,
  "counts": {
    "firewall_scans": 142,
    "action_authorizations": 89,
    "outbound_scans": 142,
    "memory_writes": 23,
    "privacy_classifications": 67
  }
}

The packet is signed with the device's Ed25519 key. Any tampering invalidates the signature. No prompt text, no response text, no tool-call arguments, no PII values — only how many operations ran. Sequence numbers prevent replay.

5. What Is Stored Locally

audit.db — SHA-256 hashes of content (not reversible), decisions, risk scores, PII category names (not values). Merkle-chained for tamper detection.
metering.db — atomic counters by operation type. No content.
signatures/ — Ed25519-verified detection pattern bundles, polled every 4 hours from the Crawdad gateway.
device.cert — short-lived device certificate (24-hour expiry, auto-renewed).

The data directory is created with 0700 permissions on first run; the sidecar refuses to start on a group- or world-accessible path. Paths: ~/Library/Application Support/crawdad/ on macOS, ~/.local/share/crawdad/ on Linux, %APPDATA%\crawdad\ on Windows.

6. Formal Verification

Five architectural invariants are enforced at every checkpoint:

No content bytes ever cross the metering-packet boundary
No prompt text ever appears in the audit chain payload
PII excerpts are only in redacted form outside the proxy process
Device config never re-enters memory after being hashed
Policy decisions log the content hash, not the content

The crawdad-zk-verify crate property-tests all five invariants over 1,000,000 iterations on every release. A standalone getcrawdad/zk-verify MIT reproducer is on the roadmap so any operator can confirm the invariants against a running sidecar independently.

Runtime attestation

The sidecar exposes a signed attestation on its control port:

$ curl http://127.0.0.1:7749/v1/verify

{
  "architecture": "Zero-knowledge sidecar v0.10.0",
  "data_never_leaves": [
    "Prompt content",
    "Action parameters",
    "Agent responses",
    "PII values",
    "Memory content",
    "Device keys"
  ],
  "data_sent_upstream": [
    "Signed metering counts (operation totals only)",
    "Device certificate renewal requests"
  ],
  "audit_chain_valid": true,
  "sidecar_bound_to": "127.0.0.1:7749"
}

The audit database is inspectable directly. Its schema contains no content, message, or text columns:

$ sqlite3 "$HOME/Library/Application Support/crawdad/audit.db" ".schema"
-- Columns: entry_id, timestamp, endpoint, decision,
--          risk_score, content_hash, pii_categories, chain_hash

7. Threat Model

What we protect against

Direct prompt injection and jailbreak attacks — 99.8% detection and 0% false-positive rate on the open 497-attack / 1,172-negative benchmark (contemporary-agent-attacks, CC-BY 4.0, reproducible by any third party). Stack: pattern layers + indirect-injection + code-scanner + PII/credential detector + a fine-tuned DeBERTa-small ML classifier.
Indirect injection from retrieved documents, tool outputs, and file uploads
Role hijacking, authority impersonation, boundary dissolution
PII and credential leakage in agent responses (15 categories)
Encoding obfuscation: zero-width characters, homoglyphs, leetspeak, base64, ROT13, hex, URL-encoding
Session-level escalation attacks across multiple messages
Capability abuse at the tool-call layer (KDL policy, per agent, per workspace)
Memory tampering (Merkle-chained audit log)

What we do not protect against

Novel attack shapes not yet in corpus or patterns (signature updates every 4 hours)
Attacks that bypass the proxy entirely (the agent must be pointed at the proxy via its base URL env var)
Zero-day vulnerabilities in the LLM models themselves
Infrastructure compromises below where Crawdad runs (OS, hardware, container host)
Social engineering of human operators
Attacks longer than 192 tokens in the ML layer (pattern-only layers still hit; chunked-scan fallback is post-v0.10.0 work)

8. Deployment Modes

Sidecar (default) — all scanning local on the host, signed counts to the cloud for billing. Every tier from Free to Enterprise runs this way.
Air-gap — fully disconnected. Sidecar runs without any network access; signature updates and metering reconciled offline. Available on Business and Enterprise tiers.

Cloud-hosted inspection (server-side content processing) is not offered. It would violate the zero-knowledge claim — the entire product architecture depends on the trust boundary being on the customer's machine, not on a Crawdad server.