Skip to content

Checking access...

Email Deliverability — Architecture Overview

Full DMARC rollout runbook: platform-002-1a-founderyos-dmarc-hardening. Resend account reference: internal MEMORY reference_resend_account.md.

Why this exists

Every transactional email — account verification, password reset, parental consent, membership renewal receipt, event reminders, membership digest, FounderyOS welcome — flows through Resend. Without proper domain authentication (SPF, DKIM, DMARC), receiving mail servers downgrade the mail to spam or drop it silently. BL-259 showed how painful the failure mode gets: a platform-gateway ingress misconfiguration silently 503'd every DAO transactional email until a user noticed his verification email never arrived. The email subsystem is narrow, cross-cutting, and easy to misdiagnose; this page is the one-pager that orients you before you start pulling at threads.

Architecture

Both brands (helloworlddao.com and founderyos.dev) share a single Resend account under the org mailbox devops@helloworlddao.com. All three co-founders have access. The only service that sends mail today is notification-service (k8s namespace founderyos on AX42-U); oracle-bridge and founderyos-api will route through it in PLATFORM-002.3 and .4.

Env-var convention is fixed across all environments: single name RESEND_API_KEY, test-mode value in CI and staging, live-mode value in production. Storage: GH Secret in the notification-service repo, rendered into k8s Secret notification-service-secrets by deploy-staging.yml, referenced by the pod via valueFrom.secretKeyRef with optional: true (pod boots in stub mode if the key is missing). Do NOT introduce RESEND_TEST_API_KEY / RESEND_LIVE_API_KEY splits — one name, different values per deployment.

Per-subdomain verification

Resend verifies domains at the notifications.<brand> subdomain rather than the root. This is Resend's architecture, not ours — but it cascades into a few rules that are easy to get wrong:

RecordLocationPurpose
DKIM TXTresend._domainkey.notifications.<brand>Signing-key publication — receivers fetch this to verify the DKIM-Signature header on inbound mail
SPF TXTsend.notifications.<brand>v=spf1 include:amazonses.com ~all — Resend relays through Amazon SES, so senders must be in SES's SPF tree
Bounce MXsend.notifications.<brand>10 feedback-smtp.us-east-1.amazonses.com — bounce notifications from SES flow back here
DMARC TXT_dmarc.<brand>Organizational-domain policy — covers all subdomain sends under relaxed alignment

Root SPF for both brands remains untouched. helloworlddao.com root SPF continues to serve Proton and founderyos.dev root SPF continues to serve SendGrid, exactly as they were before PLATFORM-002.1. Do NOT add include:send.resend.dev to root SPF. The subdomain pattern is deliberate.

All DNS records must be DNS-only (grey cloud / proxied=false) — Cloudflare proxying breaks DKIM verification and bounce MX delivery.

DKIM record maintenance

DKIM records are managed declaratively by ops-infra/scripts/cloudflare-dns-sync.sh. The record value is the base64 RSA public key Resend generates during domain verification. Dry-run the sync script on the target zone before any live apply:

CLOUDFLARE_ZONE_ID=<zone-id> CLOUDFLARE_CREDENTIALS=~/.config/cloudflare/org-credentials.env ./ops-infra/scripts/cloudflare-dns-sync.sh --dry-run --domain=<brand>

The script should report 0 create / 0 update / N unchanged / 0 unmanaged. Any non-zero unmanaged value means a record exists in Cloudflare that the script does not know about — investigate before proceeding. If you need to rotate DKIM (key compromise or provider-initiated rotation), re-run the "verify domain" flow in the Resend dashboard first; the dashboard surfaces the new public key, which you then paste into the sync script and re-apply.

DMARC phase rollout

DMARC policy is rolled out in phases, not flipped directly from p=none to p=quarantine or p=reject. Starting hard fails legitimate mail that happens to be misaligned, producing exactly the silent-drop pathology we are trying to prevent.

PhasePolicyWhen to advance
PLATFORM-002.1 baselinehelloworlddao.com: p=quarantine. founderyos.dev: p=none; rua=mailto:postmaster@founderyos.devfounderyos.dev stayed at p=none pending rua-report review
PLATFORM-002.1a phase 1founderyos.dev: p=quarantine; pct=25Advance after ≥14 days at phase 1 with no legitimate-sender failures in rua reports
PLATFORM-002.1a phase 2founderyos.dev: p=quarantine; pct=100Advance after another monitoring window with no regressions
Futurep=rejectExplicitly out of scope for 002.1a — filed as a follow-up decision

The 14-day window between pct=25 and pct=100 exists to catch partial alignment failures — a source that DKIM-passes 97% of the time is invisible at p=none, visible at pct=25, and catastrophic at pct=100 if not fixed first. Pull rua reports at postmaster@founderyos.dev, confirm all legitimate senders show DMARC pass, then advance.

Do NOT add adkim=s or aspf=s (strict alignment) to the DMARC record. Resend sends from notifications.<brand> and relaxed alignment is what makes that work under the organizational-domain DMARC; strict would break every Resend send.

Troubleshooting

Email not arriving at all? The failure is almost never Resend itself. Work the path end to end:

  1. Check the gateway ingress. BL-259 fixed a silent 503 at /notify/* because the ExternalName shim existed on the cluster but not in git. Verify POST https://staging-apis.helloworlddao.com/notify/api/v1/send returns 200, not 503.
  2. Check the Resend dashboard message log. A message that never appears means the request never reached Resend. Pod env missing RESEND_API_KEY is the usual cause — the pod boots in stub mode and logs but does not send.
  3. Check DMARC alignment on a real delivery. Send a test to Gmail, open raw headers, confirm dmarc=pass and dkim=pass with the expected d= domain. Mismatched d= alignment is the usual cause of silent spam-foldering.

rua aggregate reports. DMARC phase advances on founderyos.dev require reviewing the XML rua reports arriving at postmaster@founderyos.dev. Each report lists sending IPs, DKIM/SPF pass/fail, and DMARC disposition. Legitimate senders that fail DMARC must be fixed (alignment, record drift) before raising pct.

Pod stub mode. notification-service intentionally boots when RESEND_API_KEY is unset so the deployment doesn't hard-fail on fresh environments. Symptom: service is "healthy" in k8s, no mail ever sends. Fix: populate the GH Secret and re-run the deploy workflow — no manual kubectl required.

References

ReferencePurpose
platform-002-1-resend-domain-verificationResend account, DKIM/SPF/DMARC baseline, k8s Secret wiring
platform-002-1a-founderyos-dmarc-hardeningTwo-phase DMARC rollout runbook (pct=25pct=100)
cloudflare-dns-sync.shDeclarative DNS management — source of truth for all email records
reference_resend_account.mdInternal MEMORY — account owner, env-var convention, API-key storage
BL-259Gateway-misconfig postmortem — silent 503 on /notify/*
PLATFORM-002.3 / .4Wire oracle-bridge + founderyos-api to Resend
system-topology.mdCross-machine architecture overview — where notification-service sits, how payment-gateway calls it, brand-based routing data flow

Hello World Co-Op DAO