Infrastructure & Deployment

This document covers the production and staging infrastructure for Hello World DAO: the Hetzner VPS, Cloudflare DNS, Docker-based oracle-bridge deployment, and the FounderyOS Kubernetes cluster on AX42-U.

Hetzner VPS (oracle-bridge Staging)

The oracle-bridge off-chain service is deployed to a Hetzner server in Helsinki.

Property	Value
Provider	Hetzner Cloud
Location	Helsinki, Finland
IP	`65.21.149.226`
SSH	`ssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226`

Docker Containers

Oracle-bridge runs as Docker containers managed by Docker Compose:

Container	Port	Purpose
`oracle-bridge-staging`	`8787`	Staging environment
`oracle-bridge-production`	`8788`	Production environment

bash

# SSH to VPS
ssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226

# View running containers
docker ps

# View staging logs (last 100 lines)
docker logs oracle-bridge-staging --tail 100 -f

# Restart staging container
docker compose -f ~/oracle-bridge/docker-compose.staging.yml restart

Docker Compose Directory

Compose files and .env files live in ~/oracle-bridge/ on the VPS. Never edit these manually in production — changes are managed by CI/CD.

Container Registry

Images are published to GitHub Container Registry:

ghcr.io/hello-world-co-op/oracle-bridge:staging   (latest staging build)
ghcr.io/hello-world-co-op/oracle-bridge:latest     (latest production build)
ghcr.io/hello-world-co-op/oracle-bridge:v0.x.x     (versioned release tags)

PEM and CA Certificates

Secrets stored on the VPS:

Path	Contents	Permissions
`/etc/oracle-bridge/github-ci-identity.pem`	IC identity for canister calls	`640 root:docker`
`/etc/oracle-bridge/proton-bridge-ca.crt`	Internal CA cert	`640 root:docker`

These are mounted read-only into containers via the compose file. To rotate: update the file on the VPS and restart the container.

CI/CD — Docker Build & Deploy

Oracle-bridge uses GitHub Actions for automated deployment. Manual dfx deploy or SSH deploys are not used.

Workflow Files

Workflow	File	Trigger
Staging deploy	`docker-build.yml`	Push to `main`
Production deploy	`deploy-production.yml`	GitHub Release event

Staging Deploy Flow

git push origin main
    │
    ▼
GHA: docker-build.yml
    ├── Build Docker image
    ├── Push to ghcr.io:staging
    └── SSH to VPS → docker compose pull + up -d

To check deploy status:

bash

# View latest staging deploy run
gh run list --workflow=docker-build.yml --repo Hello-World-Co-Op/oracle-bridge --limit 5

# View logs for a specific run
gh run view <run-id> --log

Production Release Flow

gh release create v0.x.x --title "..." --notes "..."
    │
    ▼
GHA: deploy-production.yml
    ├── Build Docker image
    ├── Push to ghcr.io:latest + ghcr.io:v0.x.x
    └── SSH to VPS → docker compose pull + up -d (production container)

Cloudflare DNS

DNS for helloworlddao.com migrated from GoDaddy to Cloudflare in February 2026.

Property	Value
Nameservers	`elmo.ns.cloudflare.com`, `tessa.ns.cloudflare.com`
Zone ID	`c54ceb83773e6dc926a644a2d4e8d4af` (org account, migrated 2026-04-17 PLATFORM-001.2)
Credentials	`~/.config/cloudflare/org-credentials.env` (canonical)
Env overrides	`CLOUDFLARE_ZONE_ID`, `CLOUDFLARE_CREDENTIALS` — accepted by both sync + DDNS scripts

DNS Sync Script

DNS records are managed declaratively via ops-infra/scripts/cloudflare-dns-sync.sh. This script defines all 61+ records and syncs them to Cloudflare.

bash

# Dry run — shows what would change
./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run

# Apply changes
./ops-infra/scripts/cloudflare-dns-sync.sh

Important: IC boundary node records MUST be proxied: false (DNS-only, grey cloud). Cloudflare's auto-import sets proxied: true for all records — the sync script corrects this. Proxied IC records break canister routing.

Zone Migration — Pre-Cutover Checklist (AI-P1-01)

Background: Cloudflare's "add a zone to a new account" flow does NOT import all records, and silently sets proxied=true on many record types that need to be DNS-only. Treating auto-import as a migration will break IC boundary TLS, break subdomains that weren't imported, and leave latent bugs invisible. Follow this checklist every time a zone moves between Cloudflare accounts.

Before changing NS at the registrar:

Add the zone to the destination Cloudflare account.

Get the new zone ID:

bash

source ~/.config/cloudflare/<dest>-credentials.env
curl -s -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  "https://api.cloudflare.com/client/v4/zones?name=<domain>" \
  | jq '.result[] | {id, name, account: .account.name}'

Run the sync script in dry-run mode against the new zone:

bash

CLOUDFLARE_ZONE_ID=<new-zone-id> \
CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/<dest>-credentials.env \
./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run

Inspect the diff:
- CREATE count tells you how many records auto-import missed.
- UPDATE count with proxied=true→false tells you how many records auto-import flipped to proxied (IC records MUST be DNS-only).
- Unmanaged records tells you what extras (AAAA apex, duplicate DMARC, legacy CAA) need manual review.

Run the live sync:

bash

CLOUDFLARE_ZONE_ID=<new-zone-id> \
CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/<dest>-credentials.env \
./ops-infra/scripts/cloudflare-dns-sync.sh

Verify with dig @<new-ns>.ns.cloudflare.com for the critical record types:

bash

dig +short @<new-ns> CNAME staging-portal.<domain>
dig +short @<new-ns> TXT _canister-id.www.<domain>
dig +short @<new-ns> CNAME _acme-challenge.www.<domain>

Only after the new zone dig matches expectations, update NS at the registrar.

After NS cutover:

Re-run the full dig + HTTPS curl checklist at T+15m and T+1h. Schedule these as concrete tasks in the migration story — an aspirational "we'll check later" leaves outages invisible.
If IC boundary node serves a wrong cert (CN=<other-canister>.icpex.org etc.), verify the domain is registered with IC's custom-domain registry (POST https://icp0.io/custom-domains/v1). DNS being correct is necessary but not sufficient.

Why this checklist exists: PLATFORM-001.2 (2026-04-17) migrated helloworlddao.com to the org account. Cloudflare auto-import brought in 28 of 61 records and set IC CNAMEs to proxied=true. Staging suites were unreachable for hours before the sync script restored them. Source: epic-platform-001-retro-2026-04-17.md.

DDNS Script (legacy)

A DDNS cron job historically kept DNS current with a dynamic WAN IP for the Sector7 lab network (now decommissioned). The script remains in ops-infra for any contributor whose home/office network needs the same pattern:

bash

# Script location
ops-infra/scripts/cloudflare-ddns-helloworlddao.sh

The active staging + production stack now lives entirely on Hetzner (oracle-bridge VPS + AX42-U dedicated server, both with static IPs), so DDNS is no longer required for the platform itself.

Key DNS Records

Subdomain	Type	Target	Notes
`www`	CNAME	IC boundary node	Marketing suite canister
`portal`	CNAME	IC boundary node	DAO suite canister
`admin`	CNAME	IC boundary node	DAO admin suite canister
`oracle`	A	`65.21.149.226`	Oracle bridge (staging on 8787)
`staging-oracle`	A	`65.21.149.226`	Explicit staging endpoint
`think-tank`	CNAME	IC boundary node	Think Tank suite
`ottercamp`	CNAME	IC boundary node	Otter Camp suite
`governance`	CNAME	IC boundary node	Governance suite

Unmanaged Records

These records exist in Cloudflare but are NOT managed by the sync script:

Record	Reason
`_domainconnect`	GoDaddy artifact — safe to leave
`_gh-hello-world-coop-dao-e`	GitHub org verification — do not delete

IC Custom Domain API

To transfer a custom domain between IC canisters (e.g., promote staging canister to production):

bash

# Trigger domain transfer
curl -X PATCH https://icp0.io/custom-domains/v1/helloworlddao.com \
  -H "Content-Type: application/json" \
  -d '{"canister_id": "<new-canister-id>"}'

You must also remove the domain from the old canister's .well-known/ic-domains file before or after the API call. See IC Custom Domain Runbook for the full procedure.

Platform API Gateway (AX42-U k3s — PLATFORM-006)

See also: System Topology — the cross-machine architecture overview (VPS + AX42-U + IC mainnet, vSwitch bridging, per-path gateway routing, cross-domain auth bridge, payment + notification data flow). This section is the developer-operational view; the system-topology doc is the architecture-overview view.

Traefik-based TLS ingress on AX42-U that unifies all backend services under one pair of hostnames, with path-based routing, TLS termination, CORS, rate limiting, and service-token auth for internal routes.

mermaid

graph LR
  subgraph public["Public internet"]
    browser["Browser / API client"]
  end

  subgraph cloudflare["Cloudflare DNS"]
    dns["apis.helloworlddao.com<br/>staging-apis.helloworlddao.com<br/>A → 157.180.13.84"]
  end

  subgraph ax42u["AX42-U (k3s)"]
    traefik["Traefik Ingress<br/>:80 → 308 redirect<br/>:443 TLS (Let's Encrypt)"]
    mw["Middleware chain<br/>CORS · rate-limit · security-headers<br/>strip-* · service-token-auth"]
    subgraph ns_platform["ns: platform"]
      health["health (nginx)"]
      authz["token-authz (ForwardAuth)"]
      shim["ExternalName / Endpoints"]
    end
    subgraph ns_fos["ns: founderyos"]
      fosapi["founderyos-api:8000"]
    end
  end

  subgraph vps["VPS (Hetzner Cloud)"]
    vsw["vSwitch 10.0.0.2<br/>(oracle-bridge via private net — pending rebind)"]
    ob["oracle-bridge :8787 staging<br/>oracle-bridge :8788 prod"]
  end

  browser -->|HTTPS| dns
  dns --> traefik
  traefik --> mw
  mw -->|/health| health
  mw -->|/fos/*| shim -->|strip /fos| fosapi
  mw -->|/oracle/*| shim -->|strip /oracle, vSwitch| vsw
  vsw -.->|pending oracle-bridge listener| ob
  mw -->|/notify/* with token| authz
  authz -.->|401 no token| mw

Hosts and routing

Host	Environment
`https://apis.helloworlddao.com`	Production
`https://staging-apis.helloworlddao.com`	Staging

Path	Backend	Notes
`/health`	in-cluster nginx	200 `ok` — gateway liveness
`/fos/*`	`founderyos-api.founderyos:8000` via ExternalName shim	prefix stripped
`/oracle/*`	oracle-bridge VPS `10.0.0.2:8787` (staging) / `:8788` (prod) via vSwitch	prefix stripped; pending oracle-bridge rebind
`/auth/*`	placeholder (503)	filled in by PLATFORM-003
`/notify/*`	placeholder (503) + service-token ForwardAuth	filled in by PLATFORM-002

Bootstrap

Manifests: ops-infra/k8s/platform-gateway/ — namespace, health, middlewares, service-token-auth, per-path Ingresses, Traefik HelmChartConfig.

First-time apply and one-time Traefik arg patch are documented in the platform-gateway README.

Service tokens

Stored in k8s Secret service-tokens (namespace platform). One token per calling service (TOKEN_NOTIFICATION_SERVICE, TOKEN_FOUNDERYOS_API, TOKEN_ORACLE_BRIDGE). Validated by the token-authz Deployment — a small Node.js ForwardAuth verifier. Rotate with the script in the README. Never committed to git.

Access logs + dashboard

Traefik emits JSON access logs to stdout. Authorization and Cookie headers are stripped. Tail via kubectl -n kube-system logs -f deploy/traefik.
Traefik dashboard is enabled but not Ingress-exposed. Access: kubectl -n kube-system port-forward deploy/traefik 9000:8080 → http://127.0.0.1:9000/dashboard/.

Adding a new service

See ops-infra/runbooks/api-gateway-add-service.md — template-based walkthrough covering ExternalName shim, strip middleware, Ingress, token provisioning, verification, and troubleshooting.

AX42-U Kubernetes Cluster (FounderyOS + platform services)

Cluster note (2026-04-27): The Sector7 cluster (Aurora, Theo, Library, Knower nodes on 192.168.2.0/24) was fully decommissioned. AX42-U is the only k3s cluster. The platform API gateway, FounderyOS API, and Ollama all run here.

The off-chain platform (FounderyOS API, notification-service, payment-gateway, Ollama, GlitchTip) runs on a single-node k3s cluster on the Hetzner AX42-U dedicated server.

Server

Property	Value
Hostname	`ax42u-hel1`
Public IP	`157.180.13.84`
Private IP	`10.0.1.3/24` (vSwitch VLAN, see "Private Network" below)
Provider	Hetzner Robot
Location	Helsinki, Finland
OS	Ubuntu 6.8.0-90
SSH	`ssh -i ~/.ssh/hetzner_vps root@157.180.13.84`
k3s	Single-node — `cni0` MTU 1450, pod CIDR `10.42.0.0/24`, svc CIDR `10.43.0.0/16`

Private Network (vSwitch — PLATFORM-006.1)

Hetzner Cloud Network hwdao-private (10.0.0.0/16) bridged to Robot vSwitch 80388 (VLAN 4010). Connects oracle-bridge VPS and AX42-U for cross-machine backend traffic.

Machine	Public	Private	Iface
oracle-bridge VPS	`65.21.149.226`	`10.0.0.2/32`	`enp7s0` (cloud-init)
AX42-U (this server)	`157.180.13.84`	`10.0.1.3/24`	`enp7s0.4010` (VLAN)

Gateway 10.0.1.1 forwards between subnets but drops ICMP. ufw on each box allows 10.0.0.0/16 inbound on the private iface only.

kubectl Access

bash

# kubectl binary
~/.local/bin/kubectl

# Kubeconfig
~/.kube/config

# Context and cluster (post-cutover from sector7)
kubectl config get-contexts

bash

# Check cluster status
kubectl cluster-info

# List pods in hello-world namespace
kubectl get pods -n hello-world

# List pods in platform namespace (api gateway + service tokens)
kubectl get pods -n platform

# View pod logs
kubectl logs -n hello-world <pod-name> --tail 100 -f

Namespaces

Namespace	Purpose
`hello-world`	Canonical app namespace — FounderyOS API, workers, Ollama, future microservices (per project memory: consolidate here)
`platform`	API gateway middleware, service-token-auth ForwardAuth, notification-service, payment-gateway
`founderyos`	FounderyOS API (legacy ns — being consolidated into `hello-world`)
`kube-system`	Traefik ingress, k3s control-plane
`the-flourish`	Affiliate project — view-only for Coby

Coby's RBAC: edit on hello-world, view on the-flourish. No longer cluster-admin (post-S206).

Platform Services

Service	Namespace	Internal Address	External	Notes
Traefik (ingress)	`kube-system`	—	`apis.helloworlddao.com` / `staging-apis.helloworlddao.com`	TLS via Let's Encrypt
FounderyOS API	`founderyos`	`founderyos-api:8000`	`founderyos.dev` (via Traefik `/fos/*`)	Node.js + Fastify
notification-service	`platform`	`notification-service:3100`	gated by service-token	Resend-backed email (PLATFORM-002)
payment-gateway	`platform`	`payment-gateway:3200`	gated by service-token	Stripe + Stripe Connect + ICP/DOM (PLATFORM-007)
Ollama	`platform`	`ollama:11434`	—	Inference for AI features
GlitchTip	`hello-world`	`glitchtip.founderyos.dev`	direct DNS	Error tracking — single project ID 4

Network Access

Public-facing services route through Traefik on AX42-U. Tailscale is NOT in use (decommissioned 2026-04-17). Direct SSH to AX42-U for cluster admin; service traffic is public via Traefik on :443.

Database Migrations (FounderyOS)

FounderyOS uses Prisma for database schema management:

bash

# Apply pending migrations (from founderyos-api repo)
npx prisma migrate deploy

# Generate Prisma client after schema changes
npx prisma generate

Migrations run automatically on deploy via the founderyos-api pod's startup script. For manual migration during incidents:

bash

# Exec into API pod
kubectl exec -it -n founderyos <founderyos-api-pod> -- sh

# Run migration inside pod
npx prisma migrate deploy

Cycle Monitoring

The IC canister fleet (12 backend canisters + 6 frontend asset canisters, ~30 TC total) is monitored via:

Local script: ops-infra/scripts/check-cycles.sh
GHA cron: ops-infra/.github/workflows/monitor-metrics.yml (runs every 6 hours)
Minimum balance: 100B cycles per canister

Canisters with high burn rates (user-service, membership) require monthly top-ups. Top-up command:

bash

# Convert ICP to cycles and top up a canister
dfx ledger top-up <canister-id> --amount <icp> --network ic

Resource Stability Rules

These rules apply to all cluster workloads (learned from EPIC-033 OOM crash resolution):

k3s reserves 2 GiB memory on all nodes (systemReserved)
kubelet eviction triggers at < 500 Mi available
All pods must have resources.limits and resources.requests set
LimitRange and ResourceQuota are enforced per namespace
Vitest test workers: maxThreads: 4 to prevent OOM during CI

Violating these rules causes node OOM and pod eviction cascades. Review ops-infra/k8s/ for current LimitRange and ResourceQuota manifests.

Infrastructure & Deployment ​

Hetzner VPS (oracle-bridge Staging) ​

Docker Containers ​

Docker Compose Directory ​

Container Registry ​

PEM and CA Certificates ​

CI/CD — Docker Build & Deploy ​

Workflow Files ​

Staging Deploy Flow ​

Production Release Flow ​

Cloudflare DNS ​

DNS Sync Script ​

Zone Migration — Pre-Cutover Checklist (AI-P1-01) ​

DDNS Script (legacy) ​

Key DNS Records ​

Unmanaged Records ​

IC Custom Domain API ​

Platform API Gateway (AX42-U k3s — PLATFORM-006) ​

Hosts and routing ​

Bootstrap ​

Service tokens ​

Access logs + dashboard ​

Adding a new service ​

AX42-U Kubernetes Cluster (FounderyOS + platform services) ​

Server ​

Private Network (vSwitch — PLATFORM-006.1) ​

kubectl Access ​

Namespaces ​

Platform Services ​

Network Access ​

Database Migrations (FounderyOS) ​

Cycle Monitoring ​

Resource Stability Rules ​

Infrastructure & Deployment

Hetzner VPS (oracle-bridge Staging)

Docker Containers

Docker Compose Directory

Container Registry

PEM and CA Certificates

CI/CD — Docker Build & Deploy

Workflow Files

Staging Deploy Flow

Production Release Flow

Cloudflare DNS

DNS Sync Script

Zone Migration — Pre-Cutover Checklist (AI-P1-01)

DDNS Script (legacy)

Key DNS Records

Unmanaged Records

IC Custom Domain API

Platform API Gateway (AX42-U k3s — PLATFORM-006)

Hosts and routing

Bootstrap

Service tokens

Access logs + dashboard

Adding a new service

AX42-U Kubernetes Cluster (FounderyOS + platform services)

Server

Private Network (vSwitch — PLATFORM-006.1)

kubectl Access

Namespaces

Platform Services

Network Access

Database Migrations (FounderyOS)

Cycle Monitoring

Resource Stability Rules