Operator guide

Operations · Source: docs/OPERATOR_GUIDE.md

Cubby operator guide

QUICKSTART.md gets you to a signed change against a lab device. This document is for the operator driving the system on something more than a laptop lab — a dev-net a vendor lent you, a pre-production cell, or a small pilot of your own network.

Read this before you invite other humans to drive the system.

Environment-variable matrix

Every var is read in exactly one place (packages/common/runtime_config.py), cached as a singleton, and is discoverable via cubby config show. The ones that materially change behaviour:

VariableDefaultWhat changes when you set it
NETOPS_ENVdevelopmentproduction flips the plugin registry to strict mode — simulated adapters are rejected at register time, so the default build_demo_harness will refuse to boot. You must wire real adapters yourself before setting this.
NETOPS_API_AUTH_MODEdevdev prints a random token on start-up. hmac validates bearer tokens signed with NETOPS_API_HMAC_SECRET. oidc validates JWTs against the IdP. production + dev is refused at boot.
NETOPS_API_HMAC_SECRET(empty)HMAC signing secret for API bearer tokens. 32+ bytes. Required when NETOPS_API_AUTH_MODE=hmac.
NETOPS_OIDC_ISSUER / NETOPS_OIDC_AUDIENCE / NETOPS_OIDC_JWKS_URL(empty)OIDC validator config. JWKS URL must be HTTPS — non-HTTPS URLs are refused at refresh time. JWKS is cached 15 minutes by default.
ANTHROPIC_API_KEY(empty)Selects ClaudeAgentRuntime over any OpenAI option. Preferred when multiple are set.
NETOPS_ANTHROPIC_MODELclaude-opus-4-7Override the Claude model id.
OPENAI_API_KEY(empty)Selects OpenAIAgentRuntime with a static API key. Honours OPENAI_BASE_URL for Azure / vLLM / Ollama / etc.
NETOPS_CODEX_CREDENTIAL_PATH(empty)Path to a Codex CLI auth.json. Bills against a ChatGPT subscription via OAuth refresh; requires NETOPS_CODEX_TOKEN_URL for the refresh endpoint.
NETOPS_EVIDENCE_HMAC_SECRET(empty)Production evidence-signing key. When unset, a deterministic dev key is written under var/keys/. Set NETOPS_EVIDENCE_REQUIRE_CONFIGURED_KEY=1 to refuse the dev fallback.
NETOPS_APPROVAL_HMAC_SECRET(empty)Approval-signing key (distinct from evidence). Same dev-key fallback policy as above.
NETOPS_EVIDENCE_LEGACY_KEY_IDS(empty)Comma-separated list of key_ids the verifier tolerates without cryptographic check. Use only for unrecoverable key-loss scenarios.
NETOPS_API_MAX_BODY_BYTES65536HTTP request body cap in bytes. Rejects both Content-Length and chunked requests that exceed the cap.
NETOPS_WIKI_ROOT<repo>/docsRoot of the hand-curated knowledge base the agents read.
NETOPS_CAB_ACKNOWLEDGE_SHARED_SECRET(empty)Set to 1 to acknowledge a shared-secret CAB quorum and either (a) stop the stderr boot banner in non-production envs or (b) allow NETOPS_ENV=production boot despite the shared-secret limitation. Without this, production boot with a multi-member CAB + single HMAC signer fails fast.

cubby config show renders this matrix against the current process environment so you can see what's resolved vs what's falling back to defaults.

Demo vs production posture

Two failure modes the platform enforces at boot when NETOPS_ENV=production:

  1. No simulated adapters. The plugin registry refuses to register any plugin with simulated=True, so build_demo_harness() fails fast with SimulationLeakError on the first simulated device adapter. You must wire real vendor adapters (plugins/device/*/real_adapter.py) and/or custom adapters before the harness will construct.
  2. No dev auth. NETOPS_API_AUTH_MODE=dev is refused — you must set hmac or oidc and supply the matching secret/issuer config.

Both are intentional: it's much safer for the system to refuse to start than to silently boot a prod-tagged deployment on demo adapters or a printed dev token.

Wiring real device adapters

Real adapters exist today for:

To use them in production, construct a harness that registers the real classes in place of the simulated defaults. The quickest path is a thin wrapper on build_demo_harness(..., allow_simulated=False) that replaces the registry's device_adapters dict. A reference implementation lives at tests/devicelab/harness.py:build_lab_harness — it routes every call through real adapters via LabDeviceRouter.

If your vendor isn't in the list above, you can either:

CAB signing — from shared-secret to per-approver

The default bootstrap pairs a multi-member CAB (alice, bob, carol, …) with a single HMAC approval-signing key. That configuration works, but at boot the system logs a loud warning because anyone holding the HMAC secret can mint approvals under any approver name — quorum separation is nominal, not cryptographic.

To upgrade to real multi-party authorization:

  1. Generate per-approver Ed25519 keypairs. Each approver holds their private key on a YubiKey or equivalent.
  2. Build a SignerKeyring at bootstrap that loads every approver's public key under their key_id (the key_id is what ends up in SignedApproval.signer_key_id). Ed25519 signers implement the same EvidenceSigner interface as the HMAC signer, so the rest of the CAB code is unchanged.
  3. Configure ApproverGroup.members with the approver identities ("alice", "bob", …). The verifier checks both that the signer_key_id resolves to a known signer AND that the approver name is a member of the group.
  4. Remove or demote the shared HMAC signer. Keep it only as a legacy-verifier entry via NETOPS_EVIDENCE_LEGACY_KEY_IDS if you have historical bundles signed with it.

Until you do this, assume the CAB is "one person with the secret can do anything" and size your deployment's operator trust accordingly.

API auth — dev → HMAC → OIDC

Three modes, increasing production-readiness:

Role names the routes check today:

Secrets custody — what's dev-generated and what must be rotated

Everything under var/keys/ is dev-generated and committed to state between runs. On a first prod deployment, rotate all of them:

FileRoleRotation path
var/keys/dev_evidence_hmac.keySigns evidence bundlesSet NETOPS_EVIDENCE_HMAC_SECRET (inline) or NETOPS_EVIDENCE_HMAC_KEY_PATH (file). Set NETOPS_EVIDENCE_REQUIRE_CONFIGURED_KEY=1 to refuse fallback.
var/keys/dev_approval_hmac.keySigns CAB approvalsSame mechanism as evidence, with NETOPS_APPROVAL_* env vars. Ideally replaced with per-approver Ed25519 keys (see above).
var/evidence/chain.tipPrev-hash pointer for the evidence chainNot a secret — safe to check in, but do not delete after a prod deployment starts. Deleting breaks the chain; use NETOPS_EVIDENCE_CHAIN_RESET_BUNDLE_IDS only for known planned resets.

The operator should also rotate:

Test-user readiness checklist

Before letting another human operator drive the system against anything other than a lab they own:

Where to go if something's wrong