Email Deliverability — Architecture Overview
Full DMARC rollout runbook:
platform-002-1a-founderyos-dmarc-hardening. Resend account reference: internal MEMORYreference_resend_account.md.
Why this exists
Every transactional email — account verification, password reset, parental consent, membership renewal receipt, event reminders, membership digest, FounderyOS welcome — flows through Resend. Without proper domain authentication (SPF, DKIM, DMARC), receiving mail servers downgrade the mail to spam or drop it silently. BL-259 showed how painful the failure mode gets: a platform-gateway ingress misconfiguration silently 503'd every DAO transactional email until a user noticed his verification email never arrived. The email subsystem is narrow, cross-cutting, and easy to misdiagnose; this page is the one-pager that orients you before you start pulling at threads.
Architecture
Both brands (helloworlddao.com and founderyos.dev) share a single Resend account under the org mailbox devops@helloworlddao.com. All three co-founders have access. The only service that sends mail today is notification-service (k8s namespace founderyos on AX42-U); oracle-bridge and founderyos-api will route through it in PLATFORM-002.3 and .4.
Env-var convention is fixed across all environments: single name RESEND_API_KEY, test-mode value in CI and staging, live-mode value in production. Storage: GH Secret in the notification-service repo, rendered into k8s Secret notification-service-secrets by deploy-staging.yml, referenced by the pod via valueFrom.secretKeyRef with optional: true (pod boots in stub mode if the key is missing). Do NOT introduce RESEND_TEST_API_KEY / RESEND_LIVE_API_KEY splits — one name, different values per deployment.
Per-subdomain verification
Resend verifies domains at the notifications.<brand> subdomain rather than the root. This is Resend's architecture, not ours — but it cascades into a few rules that are easy to get wrong:
| Record | Location | Purpose |
|---|---|---|
| DKIM TXT | resend._domainkey.notifications.<brand> | Signing-key publication — receivers fetch this to verify the DKIM-Signature header on inbound mail |
| SPF TXT | send.notifications.<brand> | v=spf1 include:amazonses.com ~all — Resend relays through Amazon SES, so senders must be in SES's SPF tree |
| Bounce MX | send.notifications.<brand> | 10 feedback-smtp.us-east-1.amazonses.com — bounce notifications from SES flow back here |
| DMARC TXT | _dmarc.<brand> | Organizational-domain policy — covers all subdomain sends under relaxed alignment |
Root SPF for both brands remains untouched. helloworlddao.com root SPF continues to serve Proton and founderyos.dev root SPF continues to serve SendGrid, exactly as they were before PLATFORM-002.1. Do NOT add include:send.resend.dev to root SPF. The subdomain pattern is deliberate.
All DNS records must be DNS-only (grey cloud / proxied=false) — Cloudflare proxying breaks DKIM verification and bounce MX delivery.
DKIM record maintenance
DKIM records are managed declaratively by ops-infra/scripts/cloudflare-dns-sync.sh. The record value is the base64 RSA public key Resend generates during domain verification. Dry-run the sync script on the target zone before any live apply:
CLOUDFLARE_ZONE_ID=<zone-id> CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/org-credentials.env ./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run --domain=<brand>The script should report 0 create / 0 update / N unchanged / 0 unmanaged. Any non-zero unmanaged value means a record exists in Cloudflare that the script does not know about — investigate before proceeding. If you need to rotate DKIM (key compromise or provider-initiated rotation), re-run the "verify domain" flow in the Resend dashboard first; the dashboard surfaces the new public key, which you then paste into the sync script and re-apply.
DMARC phase rollout
DMARC policy is rolled out in phases, not flipped directly from p=none to p=quarantine or p=reject. Starting hard fails legitimate mail that happens to be misaligned, producing exactly the silent-drop pathology we are trying to prevent.
| Phase | Policy | When to advance |
|---|---|---|
| PLATFORM-002.1 baseline | helloworlddao.com: p=quarantine. founderyos.dev: p=none; rua=mailto:postmaster@founderyos.dev | founderyos.dev stayed at p=none pending rua-report review |
| PLATFORM-002.1a phase 1 | founderyos.dev: p=quarantine; pct=25 | Advance after ≥14 days at phase 1 with no legitimate-sender failures in rua reports |
| PLATFORM-002.1a phase 2 | founderyos.dev: p=quarantine; pct=100 | Advance after another monitoring window with no regressions |
| Future | p=reject | Explicitly out of scope for 002.1a — filed as a follow-up decision |
The 14-day window between pct=25 and pct=100 exists to catch partial alignment failures — a source that DKIM-passes 97% of the time is invisible at p=none, visible at pct=25, and catastrophic at pct=100 if not fixed first. Pull rua reports at postmaster@founderyos.dev, confirm all legitimate senders show DMARC pass, then advance.
Do NOT add adkim=s or aspf=s (strict alignment) to the DMARC record. Resend sends from notifications.<brand> and relaxed alignment is what makes that work under the organizational-domain DMARC; strict would break every Resend send.
Troubleshooting
Email not arriving at all? The failure is almost never Resend itself. Work the path end to end:
- Check the gateway ingress. BL-259 fixed a silent 503 at
/notify/*because the ExternalName shim existed on the cluster but not in git. VerifyPOST https://staging-apis.helloworlddao.com/notify/api/v1/sendreturns200, not503. - Check the Resend dashboard message log. A message that never appears means the request never reached Resend. Pod env missing
RESEND_API_KEYis the usual cause — the pod boots in stub mode and logs but does not send. - Check DMARC alignment on a real delivery. Send a test to Gmail, open raw headers, confirm
dmarc=passanddkim=passwith the expectedd=domain. Mismatchedd=alignment is the usual cause of silent spam-foldering.
rua aggregate reports. DMARC phase advances on founderyos.dev require reviewing the XML rua reports arriving at postmaster@founderyos.dev. Each report lists sending IPs, DKIM/SPF pass/fail, and DMARC disposition. Legitimate senders that fail DMARC must be fixed (alignment, record drift) before raising pct.
Pod stub mode. notification-service intentionally boots when RESEND_API_KEY is unset so the deployment doesn't hard-fail on fresh environments. Symptom: service is "healthy" in k8s, no mail ever sends. Fix: populate the GH Secret and re-run the deploy workflow — no manual kubectl required.
References
| Reference | Purpose |
|---|---|
platform-002-1-resend-domain-verification | Resend account, DKIM/SPF/DMARC baseline, k8s Secret wiring |
platform-002-1a-founderyos-dmarc-hardening | Two-phase DMARC rollout runbook (pct=25 → pct=100) |
cloudflare-dns-sync.sh | Declarative DNS management — source of truth for all email records |
reference_resend_account.md | Internal MEMORY — account owner, env-var convention, API-key storage |
| BL-259 | Gateway-misconfig postmortem — silent 503 on /notify/* |
| PLATFORM-002.3 / .4 | Wire oracle-bridge + founderyos-api to Resend |
system-topology.md | Cross-machine architecture overview — where notification-service sits, how payment-gateway calls it, brand-based routing data flow |