Skip to content

Checking access...

Infrastructure & Deployment

This document covers the production and staging infrastructure for Hello World DAO: the Hetzner VPS, Cloudflare DNS, Docker-based oracle-bridge deployment, and the FounderyOS Kubernetes cluster on AX42-U.


Hetzner VPS (oracle-bridge Staging)

The oracle-bridge off-chain service is deployed to a Hetzner server in Helsinki.

PropertyValue
ProviderHetzner Cloud
LocationHelsinki, Finland
IP65.21.149.226
SSHssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226

Docker Containers

Oracle-bridge runs as Docker containers managed by Docker Compose:

ContainerPortPurpose
oracle-bridge-staging8787Staging environment
oracle-bridge-production8788Production environment
bash
# SSH to VPS
ssh -i ~/.ssh/oracle-bridge-deploy deploy@65.21.149.226

# View running containers
docker ps

# View staging logs (last 100 lines)
docker logs oracle-bridge-staging --tail 100 -f

# Restart staging container
docker compose -f ~/oracle-bridge/docker-compose.staging.yml restart

Docker Compose Directory

Compose files and .env files live in ~/oracle-bridge/ on the VPS. Never edit these manually in production — changes are managed by CI/CD.

Container Registry

Images are published to GitHub Container Registry:

ghcr.io/hello-world-co-op/oracle-bridge:staging   (latest staging build)
ghcr.io/hello-world-co-op/oracle-bridge:latest     (latest production build)
ghcr.io/hello-world-co-op/oracle-bridge:v0.x.x     (versioned release tags)

PEM and CA Certificates

Secrets stored on the VPS:

PathContentsPermissions
/etc/oracle-bridge/github-ci-identity.pemIC identity for canister calls640 root:docker
/etc/oracle-bridge/proton-bridge-ca.crtInternal CA cert640 root:docker

These are mounted read-only into containers via the compose file. To rotate: update the file on the VPS and restart the container.


CI/CD — Docker Build & Deploy

Oracle-bridge uses GitHub Actions for automated deployment. Manual dfx deploy or SSH deploys are not used.

Workflow Files

WorkflowFileTrigger
Staging deploydocker-build.ymlPush to main
Production deploydeploy-production.ymlGitHub Release event

Staging Deploy Flow

git push origin main


GHA: docker-build.yml
    ├── Build Docker image
    ├── Push to ghcr.io:staging
    └── SSH to VPS → docker compose pull + up -d

To check deploy status:

bash
# View latest staging deploy run
gh run list --workflow=docker-build.yml --repo Hello-World-Co-Op/oracle-bridge --limit 5

# View logs for a specific run
gh run view <run-id> --log

Production Release Flow

gh release create v0.x.x --title "..." --notes "..."


GHA: deploy-production.yml
    ├── Build Docker image
    ├── Push to ghcr.io:latest + ghcr.io:v0.x.x
    └── SSH to VPS → docker compose pull + up -d (production container)

Cloudflare DNS

DNS for helloworlddao.com migrated from GoDaddy to Cloudflare in February 2026.

PropertyValue
Nameserverselmo.ns.cloudflare.com, tessa.ns.cloudflare.com
Zone IDc54ceb83773e6dc926a644a2d4e8d4af (org account, migrated 2026-04-17 PLATFORM-001.2)
Credentials~/.config/cloudflare/org-credentials.env (canonical)
Env overridesCLOUDFLARE_ZONE_ID, CLOUDFLARE_CREDENTIALS — accepted by both sync + DDNS scripts

DNS Sync Script

DNS records are managed declaratively via ops-infra/scripts/cloudflare-dns-sync.sh. This script defines all 61+ records and syncs them to Cloudflare.

bash
# Dry run — shows what would change
./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run

# Apply changes
./ops-infra/scripts/cloudflare-dns-sync.sh

Important: IC boundary node records MUST be proxied: false (DNS-only, grey cloud). Cloudflare's auto-import sets proxied: true for all records — the sync script corrects this. Proxied IC records break canister routing.

Zone Migration — Pre-Cutover Checklist (AI-P1-01)

Background: Cloudflare's "add a zone to a new account" flow does NOT import all records, and silently sets proxied=true on many record types that need to be DNS-only. Treating auto-import as a migration will break IC boundary TLS, break subdomains that weren't imported, and leave latent bugs invisible. Follow this checklist every time a zone moves between Cloudflare accounts.

Before changing NS at the registrar:

  1. Add the zone to the destination Cloudflare account.
  2. Get the new zone ID:
    bash
    source ~/.config/cloudflare/<dest>-credentials.env
    curl -s -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
      "https://api.cloudflare.com/client/v4/zones?name=<domain>" \
      | jq '.result[] | {id, name, account: .account.name}'
  3. Run the sync script in dry-run mode against the new zone:
    bash
    CLOUDFLARE_ZONE_ID=<new-zone-id> \
    CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/<dest>-credentials.env \
    ./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run
  4. Inspect the diff:
    • CREATE count tells you how many records auto-import missed.
    • UPDATE count with proxied=true→false tells you how many records auto-import flipped to proxied (IC records MUST be DNS-only).
    • Unmanaged records tells you what extras (AAAA apex, duplicate DMARC, legacy CAA) need manual review.
  5. Run the live sync:
    bash
    CLOUDFLARE_ZONE_ID=<new-zone-id> \
    CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/<dest>-credentials.env \
    ./ops-infra/scripts/cloudflare-dns-sync.sh
  6. Verify with dig @<new-ns>.ns.cloudflare.com for the critical record types:
    bash
    dig +short @<new-ns> CNAME staging-portal.<domain>
    dig +short @<new-ns> TXT _canister-id.www.<domain>
    dig +short @<new-ns> CNAME _acme-challenge.www.<domain>
  7. Only after the new zone dig matches expectations, update NS at the registrar.

After NS cutover:

  • Re-run the full dig + HTTPS curl checklist at T+15m and T+1h. Schedule these as concrete tasks in the migration story — an aspirational "we'll check later" leaves outages invisible.
  • If IC boundary node serves a wrong cert (CN=<other-canister>.icpex.org etc.), verify the domain is registered with IC's custom-domain registry (POST https://icp0.io/custom-domains/v1). DNS being correct is necessary but not sufficient.

Why this checklist exists: PLATFORM-001.2 (2026-04-17) migrated helloworlddao.com to the org account. Cloudflare auto-import brought in 28 of 61 records and set IC CNAMEs to proxied=true. Staging suites were unreachable for hours before the sync script restored them. Source: epic-platform-001-retro-2026-04-17.md.

DDNS Script (legacy)

A DDNS cron job historically kept DNS current with a dynamic WAN IP for the Sector7 lab network (now decommissioned). The script remains in ops-infra for any contributor whose home/office network needs the same pattern:

bash
# Script location
ops-infra/scripts/cloudflare-ddns-helloworlddao.sh

The active staging + production stack now lives entirely on Hetzner (oracle-bridge VPS + AX42-U dedicated server, both with static IPs), so DDNS is no longer required for the platform itself.

Key DNS Records

SubdomainTypeTargetNotes
wwwCNAMEIC boundary nodeMarketing suite canister
portalCNAMEIC boundary nodeDAO suite canister
adminCNAMEIC boundary nodeDAO admin suite canister
oracleA65.21.149.226Oracle bridge (staging on 8787)
staging-oracleA65.21.149.226Explicit staging endpoint
think-tankCNAMEIC boundary nodeThink Tank suite
ottercampCNAMEIC boundary nodeOtter Camp suite
governanceCNAMEIC boundary nodeGovernance suite

Unmanaged Records

These records exist in Cloudflare but are NOT managed by the sync script:

RecordReason
_domainconnectGoDaddy artifact — safe to leave
_gh-hello-world-coop-dao-eGitHub org verification — do not delete

IC Custom Domain API

To transfer a custom domain between IC canisters (e.g., promote staging canister to production):

bash
# Trigger domain transfer
curl -X PATCH https://icp0.io/custom-domains/v1/helloworlddao.com \
  -H "Content-Type: application/json" \
  -d '{"canister_id": "<new-canister-id>"}'

You must also remove the domain from the old canister's .well-known/ic-domains file before or after the API call. See IC Custom Domain Runbook for the full procedure.


Platform API Gateway (AX42-U k3s — PLATFORM-006)

See also: System Topology — the cross-machine architecture overview (VPS + AX42-U + IC mainnet, vSwitch bridging, per-path gateway routing, cross-domain auth bridge, payment + notification data flow). This section is the developer-operational view; the system-topology doc is the architecture-overview view.

Traefik-based TLS ingress on AX42-U that unifies all backend services under one pair of hostnames, with path-based routing, TLS termination, CORS, rate limiting, and service-token auth for internal routes.

mermaid
graph LR
  subgraph public["Public internet"]
    browser["Browser / API client"]
  end

  subgraph cloudflare["Cloudflare DNS"]
    dns["apis.helloworlddao.com<br/>staging-apis.helloworlddao.com<br/>A → 157.180.13.84"]
  end

  subgraph ax42u["AX42-U (k3s)"]
    traefik["Traefik Ingress<br/>:80 → 308 redirect<br/>:443 TLS (Let's Encrypt)"]
    mw["Middleware chain<br/>CORS · rate-limit · security-headers<br/>strip-* · service-token-auth"]
    subgraph ns_platform["ns: platform"]
      health["health (nginx)"]
      authz["token-authz (ForwardAuth)"]
      shim["ExternalName / Endpoints"]
    end
    subgraph ns_fos["ns: founderyos"]
      fosapi["founderyos-api:8000"]
    end
  end

  subgraph vps["VPS (Hetzner Cloud)"]
    vsw["vSwitch 10.0.0.2<br/>(oracle-bridge via private net — pending rebind)"]
    ob["oracle-bridge :8787 staging<br/>oracle-bridge :8788 prod"]
  end

  browser -->|HTTPS| dns
  dns --> traefik
  traefik --> mw
  mw -->|/health| health
  mw -->|/fos/*| shim -->|strip /fos| fosapi
  mw -->|/oracle/*| shim -->|strip /oracle, vSwitch| vsw
  vsw -.->|pending oracle-bridge listener| ob
  mw -->|/notify/* with token| authz
  authz -.->|401 no token| mw

Hosts and routing

HostEnvironment
https://apis.helloworlddao.comProduction
https://staging-apis.helloworlddao.comStaging
PathBackendNotes
/healthin-cluster nginx200 ok — gateway liveness
/fos/*founderyos-api.founderyos:8000 via ExternalName shimprefix stripped
/oracle/*oracle-bridge VPS 10.0.0.2:8787 (staging) / :8788 (prod) via vSwitchprefix stripped; pending oracle-bridge rebind
/auth/*placeholder (503)filled in by PLATFORM-003
/notify/*placeholder (503) + service-token ForwardAuthfilled in by PLATFORM-002

Bootstrap

Manifests: ops-infra/k8s/platform-gateway/ — namespace, health, middlewares, service-token-auth, per-path Ingresses, Traefik HelmChartConfig.

First-time apply and one-time Traefik arg patch are documented in the platform-gateway README.

Service tokens

Stored in k8s Secret service-tokens (namespace platform). One token per calling service (TOKEN_NOTIFICATION_SERVICE, TOKEN_FOUNDERYOS_API, TOKEN_ORACLE_BRIDGE). Validated by the token-authz Deployment — a small Node.js ForwardAuth verifier. Rotate with the script in the README. Never committed to git.

Access logs + dashboard

  • Traefik emits JSON access logs to stdout. Authorization and Cookie headers are stripped. Tail via kubectl -n kube-system logs -f deploy/traefik.
  • Traefik dashboard is enabled but not Ingress-exposed. Access: kubectl -n kube-system port-forward deploy/traefik 9000:8080http://127.0.0.1:9000/dashboard/.

Adding a new service

See ops-infra/runbooks/api-gateway-add-service.md — template-based walkthrough covering ExternalName shim, strip middleware, Ingress, token provisioning, verification, and troubleshooting.


AX42-U Kubernetes Cluster (FounderyOS + platform services)

Cluster note (2026-04-27): The Sector7 cluster (Aurora, Theo, Library, Knower nodes on 192.168.2.0/24) was fully decommissioned. AX42-U is the only k3s cluster. The platform API gateway, FounderyOS API, and Ollama all run here.

The off-chain platform (FounderyOS API, notification-service, payment-gateway, Ollama, GlitchTip) runs on a single-node k3s cluster on the Hetzner AX42-U dedicated server.

Server

PropertyValue
Hostnameax42u-hel1
Public IP157.180.13.84
Private IP10.0.1.3/24 (vSwitch VLAN, see "Private Network" below)
ProviderHetzner Robot
LocationHelsinki, Finland
OSUbuntu 6.8.0-90
SSHssh -i ~/.ssh/hetzner_vps root@157.180.13.84
k3sSingle-node — cni0 MTU 1450, pod CIDR 10.42.0.0/24, svc CIDR 10.43.0.0/16

Private Network (vSwitch — PLATFORM-006.1)

Hetzner Cloud Network hwdao-private (10.0.0.0/16) bridged to Robot vSwitch 80388 (VLAN 4010). Connects oracle-bridge VPS and AX42-U for cross-machine backend traffic.

MachinePublicPrivateIface
oracle-bridge VPS65.21.149.22610.0.0.2/32enp7s0 (cloud-init)
AX42-U (this server)157.180.13.8410.0.1.3/24enp7s0.4010 (VLAN)

Gateway 10.0.1.1 forwards between subnets but drops ICMP. ufw on each box allows 10.0.0.0/16 inbound on the private iface only.

kubectl Access

bash
# kubectl binary
~/.local/bin/kubectl

# Kubeconfig
~/.kube/config

# Context and cluster (post-cutover from sector7)
kubectl config get-contexts
bash
# Check cluster status
kubectl cluster-info

# List pods in hello-world namespace
kubectl get pods -n hello-world

# List pods in platform namespace (api gateway + service tokens)
kubectl get pods -n platform

# View pod logs
kubectl logs -n hello-world <pod-name> --tail 100 -f

Namespaces

NamespacePurpose
hello-worldCanonical app namespace — FounderyOS API, workers, Ollama, future microservices (per project memory: consolidate here)
platformAPI gateway middleware, service-token-auth ForwardAuth, notification-service, payment-gateway
founderyosFounderyOS API (legacy ns — being consolidated into hello-world)
kube-systemTraefik ingress, k3s control-plane
the-flourishAffiliate project — view-only for Coby

Coby's RBAC: edit on hello-world, view on the-flourish. No longer cluster-admin (post-S206).

Platform Services

ServiceNamespaceInternal AddressExternalNotes
Traefik (ingress)kube-systemapis.helloworlddao.com / staging-apis.helloworlddao.comTLS via Let's Encrypt
FounderyOS APIfounderyosfounderyos-api:8000founderyos.dev (via Traefik /fos/*)Node.js + Fastify
notification-serviceplatformnotification-service:3100gated by service-tokenResend-backed email (PLATFORM-002)
payment-gatewayplatformpayment-gateway:3200gated by service-tokenStripe + Stripe Connect + ICP/DOM (PLATFORM-007)
Ollamaplatformollama:11434Inference for AI features
GlitchTiphello-worldglitchtip.founderyos.devdirect DNSError tracking — single project ID 4

Network Access

Public-facing services route through Traefik on AX42-U. Tailscale is NOT in use (decommissioned 2026-04-17). Direct SSH to AX42-U for cluster admin; service traffic is public via Traefik on :443.

Database Migrations (FounderyOS)

FounderyOS uses Prisma for database schema management:

bash
# Apply pending migrations (from founderyos-api repo)
npx prisma migrate deploy

# Generate Prisma client after schema changes
npx prisma generate

Migrations run automatically on deploy via the founderyos-api pod's startup script. For manual migration during incidents:

bash
# Exec into API pod
kubectl exec -it -n founderyos <founderyos-api-pod> -- sh

# Run migration inside pod
npx prisma migrate deploy

Cycle Monitoring

The IC canister fleet (12 backend canisters + 6 frontend asset canisters, ~30 TC total) is monitored via:

  • Local script: ops-infra/scripts/check-cycles.sh
  • GHA cron: ops-infra/.github/workflows/monitor-metrics.yml (runs every 6 hours)
  • Minimum balance: 100B cycles per canister

Canisters with high burn rates (user-service, membership) require monthly top-ups. Top-up command:

bash
# Convert ICP to cycles and top up a canister
dfx ledger top-up <canister-id> --amount <icp> --network ic

Resource Stability Rules

These rules apply to all cluster workloads (learned from EPIC-033 OOM crash resolution):

  1. k3s reserves 2 GiB memory on all nodes (systemReserved)
  2. kubelet eviction triggers at < 500 Mi available
  3. All pods must have resources.limits and resources.requests set
  4. LimitRange and ResourceQuota are enforced per namespace
  5. Vitest test workers: maxThreads: 4 to prevent OOM during CI

Violating these rules causes node OOM and pod eviction cascades. Review ops-infra/k8s/ for current LimitRange and ResourceQuota manifests.

Hello World Co-Op DAO