Skip to content

Checking access...

Canister Production Activation

Last Updated: 2026-05-12 Audience: Engineers bringing a backend canister from staging-only to production-ready Estimated time: ~30 min per canister (assuming WASM builds clean and oracle-bridge prod is already deployed)

Overview

This runbook walks through the standard procedure for activating a backend Rust canister on IC mainnet production. It handles three starting states:

  1. No prod canister exists — create from scratch, deploy current WASM, configure
  2. Prod canister exists with outdated WASM — upgrade in place (BL-111 Tier 1 canisters)
  3. Prod canister exists with current WASM — only cycles + controller hardening needed

The runbook stops short of activating user-facing flows that depend on prod oracle-bridge (which is deployed during the OVH cutover BL-519.5, not this runbook). Init hooks like set_oracle_bridge_url and reset_signing_key are documented but deferred unless oracle-bridge prod is live.

Prerequisites

  • Working directory: target canister repo (e.g. ~/git/user-service)
  • Identity with controller access to the target canister (typically github-ci)
  • icp CLI installed (/home/coby/.local/bin/icp)
  • dfx 0.24.3+ installed
  • Cycles available — verify with: icp cycles balance --network ic --identity github-ci (need ≥0.5 ICP equivalent to fund 1T cycles topup + reserve)
  • Branch at main HEAD, working tree clean

Decision tree

Does the canister have a prod entry in canister_ids.json?
├── NO  → Step 0 (create prod canister) → Step 1 (build) → Step 2 (install fresh)
└── YES → Run `icp canister status <PROD_ID>` to check module hash + cycles
         ├── Module hash empty (controller-only canister) → Step 1 (build) → Step 2 (install fresh)
         ├── Module hash matches current main build → Skip to Step 3 (cycles + controllers)
         └── Module hash differs from main → Step 1 (build) → Step 2 (upgrade)

Steps

Step 0 — Create prod canister (only if it doesn't exist)

bash
icp canister create <CANISTER_NAME> \
  --network ic \
  --identity github-ci \
  --with-cycles 1_000_000_000_000   # 1T initial fund

Capture the returned canister ID. Add it to the repo's canister_ids.json under the production key:

json
{
  "<canister_name>": {
    "ic": "<staging-id>",
    "production": "<new-prod-id>"
  }
}

Commit the canister_ids.json update via a PR (branch protection applies).

Step 1 — Build current WASM

bash
cd ~/git/<canister-repo>
git checkout main && git pull
cargo build --release --target wasm32-unknown-unknown

# Attach Candid metadata (required for the IC dashboard + dfx introspection)
ic-wasm target/wasm32-unknown-unknown/release/<canister_name>.wasm \
  -o target/wasm32-unknown-unknown/release/<canister_name>_with_metadata.wasm \
  metadata candid:service -f src/<canister_name>.did -v public

# Capture build hash for verification later
sha256sum target/wasm32-unknown-unknown/release/<canister_name>_with_metadata.wasm

Most canister repos have this exact sequence in their deploy-production.yml; running it manually here matches CI behaviour.

Step 2 — Install or upgrade WASM

For fresh install (Step 0 just ran):

bash
dfx canister --network ic install <PROD_CANISTER_ID> \
  --wasm target/wasm32-unknown-unknown/release/<canister_name>_with_metadata.wasm \
  --mode install \
  --yes \
  --identity github-ci

For upgrade (canister already has older WASM):

bash
dfx canister --network ic install <PROD_CANISTER_ID> \
  --wasm target/wasm32-unknown-unknown/release/<canister_name>_with_metadata.wasm \
  --mode upgrade \
  --yes \
  --identity github-ci

Critical: pass the prod canister ID explicitly. dfx.networks.json defines only local, testnet, mainnet (plus dfx's builtin ic). There is no production network--network production errors with Network not found: production. The "production" key inside canister_ids.json is dead config that dfx will not resolve without a matching network definition. The reliable pattern is --network ic + the explicit prod canister ID (the identity-gateway BL-174 pattern: pass the ID via secret in CI, hard-code or shell-var it in manual runbooks). Several repo deploy-production.yml files relied on the canister-name resolution path before BL-524 fixed user-service — verify other repos when you next touch them.

Verify the install succeeded:

bash
icp canister status <PROD_ID> --network ic --identity github-ci | grep "Module hash"

The module hash should match the local build's sha256sum (prefix 0x).

Step 3 — Top up cycles to 1T

Operational threshold is 500B (warn) / 100B (critical) per cycles-topup.md. Target 1T for upgrade headroom + several months idle reserve.

Path A — drain wdhec wallet (preferred while it still has cycles; see CLAUDE.md cycles section):

bash
export DFX_WARNING=-mainnet_plaintext_identity
dfx canister --network ic --identity github-ci \
  call wdhec-2yaaa-aaaao-a6dgq-cai wallet_send \
  '(record { canister = principal "<PROD_ID>"; amount = 1_000_000_000_000 : nat64 })'
# Expect: (variant { 17_724 }) — Ok variant

Path B — fresh mint (when wdhec is empty or you want to bypass it):

bash
icp canister top-up --amount 1t <PROD_ID> \
  --network ic --identity github-ci

Verify:

bash
icp canister status <PROD_ID> --network ic --identity github-ci | grep Cycles:
# Expect: Cycles: 1_0xx_xxx_xxx_xxx (≥1T)

Step 4 — Add controller redundancy

Many BL-111 Tier 1 canisters have github-ci as their sole controller. This is a single-point-of-failure: lost CI credentials = unrecoverable canister. The canonical pattern (locked 2026-05-19) is to wire all three co-founder principals (Graydon, Menley, Coby's default identity) as controllers alongside github-ci — same shape as staging, applied to every prod backend canister.

Check current controllers:

bash
dfx canister info <PROD_CANISTER_ID> --network ic | grep Controllers

Add the three co-founder principals (canonical 4-controller pattern):

bash
dfx canister --network ic update-settings <PROD_CANISTER_ID> \
  --add-controller hzlpj-e6tp4-utvf2-cllfw-mcuxn-gwpde-hwmyc-zeh33-rv7md-w7rxh-2qe \
  --add-controller ixqcj-7o7fo-ggcky-e3676-ke7nf-n4zkl-syi35-fshpo-fh4d5-5zfq4-qae \
  --add-controller vfnqe-wn3wk-lowg4-xzrfv-7j62s-mo4ne-vewaa-f7zce-3igsf-vobbr-3qe \
  --identity github-ci

Principal-to-human mapping:

  • hzlpj-e6tp4-...-2qe = Graydon (degenotterdev)
  • ixqcj-7o7fo-...-qae = Menley (romcomrade)
  • vfnqe-wn3wk-...-3qe = Coby (RationalSolutions) default identity

End state should be exactly 4 controllers per canister: [github-ci] [Graydon] [Menley] [Coby]. Verify via the canonical readback:

bash
dfx canister info <PROD_CANISTER_ID> --network ic | grep Controllers
# Expected: 4 principals — exactly one each of [ci] [Graydon] [Menley] [Coby]

Why the 4-controller pattern is canonical:

  • Recoverability: any single principal compromise / loss leaves three other recovery paths. github-ci sole-controller = unrecoverable canister on lost CI credentials.
  • Fleet-wide consistency: matches the staging-side end state (workspace commit 4d669b34 2026-05-19 reconciled all 14 staging canisters to this shape; the prod fleet pass on 2026-05-19 extended the pattern to all 7 prod backend canisters — documented in workspace commit a3331848).
  • No additional rotation overhead: controllers don't sign deploys (github-ci does); they're emergency-only. Three additional principals add zero ongoing operational cost.
  • Pre-IAM-001 transitional posture: once IAM-001-1 (principals canister + ECDSA P-256 signers) is fleet-wide, this pattern migrates to single primary controller (IAM service principal) + the same 3 co-founder controllers as break-glass recovery. The 4-controller pattern survives the IAM-001 cutover.

Existing prod canisters already at canonical state (as of 2026-05-19 prod fleet pass): membership, treasury, user-service, identity-gateway, governance, dom-exchange-service, blog. Future prod activations under this runbook should land at the canonical 4-controller end state on first activation — no follow-up "Step 4 deferred" pattern needed.

See ops-infra/runbooks/controller-management.md for the canonical controller-management playbook.

Step 5 — Init hooks (DEFER unless prod oracle-bridge is live)

Some canisters require post-deploy initialization (e.g. user-service needs set_oracle_bridge_url + reset_signing_key; auth-gated canisters need set_oracle_bridge). Do not run these against the staging oracle-bridge URL — mixing prod canisters with staging off-chain services creates inconsistent state and signed-request verification failures.

Defer init hooks until:

  • The prod oracle-bridge instance is deployed (BL-519.5 scope)
  • Its DNS record (oracle.helloworlddao.com) cuts over from staging IP to prod IP

When prod oracle-bridge is live:

bash
# user-service example
dfx canister --network ic call <PROD_CANISTER_ID> set_oracle_bridge_url \
  '("https://oracle.helloworlddao.com")' \
  --identity github-ci

dfx canister --network ic call <PROD_CANISTER_ID> reset_signing_key \
  --identity github-ci
# Capture returned public key + register it with oracle-bridge prod

Reference: <canister-repo>/.github/workflows/deploy-production.yml Configure step in the post-BL-524 pattern (canister ID passed via secrets.PRODUCTION_<CANISTER>_ID; --network ic is the correct flag).

Step 6 — Audit + sprint-status update

  • If this is the first prod activation of a previously-staging-only canister, add a production: entry to canister_ids.json (PR required)
  • Flip the canister's relevant sprint-status entry from backlog → appropriate state
  • Do NOT log cycles topups or WASM upgrades in bmad-artifacts/runbooks/key-rotation-log.md — that log is specifically for credential rotations (passwords, API keys, OAuth secrets, signing keys). Cycle topups belong in monitor-metrics.yml workflow run history (auto-topup) or sprint-status notes (manual). WASM upgrades belong in the relevant story's sprint-status done entry with commit references.

Verification checklist

  • [ ] icp canister status <PROD_ID> shows Status: Running
  • [ ] Module hash matches local build sha256
  • [ ] Cycles ≥1T
  • [ ] At least 2 controllers (recovery redundancy)
  • [ ] canister_ids.json has production: entry
  • [ ] Init hooks status documented (run / deferred / N/A)
  • [ ] sprint-status entry updated

Worked example: user-service prod hardening (2026-05-12)

Starting state:

  • Prod ID 6iu47-dyaaa-aaaaf-qgeza-cai exists from BL-111 Tier 1 (2026-03-29)
  • Module hash 0x9ef2...e98e9f7 (older than staging 0x38ca...4e74f4 — needs upgrade)
  • Cycles 38.8B (way below 500B operational threshold)
  • Controllers: github-ci only (single point of failure)
  • canister_ids.json already has the prod entry

Plan: Skip Step 0 (canister exists). Run Steps 2 (upgrade mode), 3, 4. Defer Step 5 — prod oracle-bridge doesn't exist yet (BL-519.5). Step 1 (controller redundancy) deferred — Graydon's and Menley's IC principals are not currently documented in MEMORY.md or anywhere reliable; filed as a collab question for them to share via dfx identity get-principal.

Result after runbook execution (2026-05-12):

  • Cycles: 38.8B → 1.024T (Path A wallet_send from wdhec; wdhec remaining ≈10.18 TC)
  • Module hash: 0x9ef2676... (BL-111 Tier 1 deploy 2026-03-29) → 0x9e3275f06afdd37b7e1b0e3590e86b3c013fe275922bb5c10b8bdf0eba084533 (local sha256 of user_service_with_metadata.wasm built from main@1f046ee)
  • Memory size: 3.58MB → 4.24MB (newer WASM)
  • Controllers: still github-ci sole; add Coby/Graydon/Menley pending principal confirmation
  • Init hooks: NOT run (oracle-bridge prod doesn't exist; running against staging URL would mix envs)

Canister now sits in "deployed but unwired" state until OVH cutover (BL-519.5) provisions prod oracle-bridge.

Known issues / gotchas

  • --network production does not exist as a network: dfx.networks.json defines only local, testnet, mainnet (+ dfx's builtin ic). The "production" key inside canister_ids.json is dead config unless your dfx setup defines a matching network. Always use --network ic + explicit prod canister ID. Verify the install hit the prod canister by checking the module hash post-install equals the local build sha256. (BL-524 — user-service deploy-production.yml fix landed 2026-05-12.)
  • Legacy deploy triggers: Some repos use /deploy production PR-comment trigger instead of the org-standard /deliver slash command. Don't run those workflows for prod — use this manual runbook until they're migrated. (BL-526)
  • canister_ids.json drift: identity-gateway prod is mwutv-4iaaa-aaaac-behda-cai per memory and is verified deployed with WASM, but the file is missing the production: entry. (BL-525)
  • wdhec wallet has IC0504 forward bug: dfx ledger top-up silently fails on certain target conditions and accumulates cycles in wdhec without delivering. Use dfx canister call wdhec ... wallet_send directly (Path A) or icp canister top-up (Path B). See CLAUDE.md cycles section + memory feedback_icp_cycles_transfer_wrong_verb.

References

Hello World Co-Op DAO