SendGrid outbound email — Orcha spec
Approved

SendGrid outbound email

2026-05-17Danielspec

Problem

All outbound email is sent through AWS SES (com.getorcha.aws/send-email!, SESv2). SES has only ever granted us sandbox access — production access has been requested and denied by AWS several times. Sandbox restricts sending to verified addresses, so customer-facing and notification email cannot be delivered in production.

We are moving outbound email to SendGrid, authenticated for the subdomain mail.getorcha.com. Inbound email (document acquisition: SES receipt rule → S3 → SQS → triage) is unaffected and stays on SES.

Goals & non-goals

Goals

Non-goals

Approach

Send via the SendGrid Web API v3 (POST https://api.sendgrid.com/v3/mail/send) through the existing traced HTTP wrapper com.getorcha.http.client/request (hato's shared default client + X-Ray subsegment tracing). A single side-effecting function com.getorcha.email/send! dispatches on a :provider key. No new Integrant component; no custom HTTP client.

OptionEffortFit with codebaseChosen
Web API v3 via raw hato/postLowLoses X-Ray parity with the old SES pathNo
SendGrid SMTP relay (JavaMail)MediumNew TLS/credential surface, no upsideNo
Keep SES impl as config fallbackLowDead code for a path we won't useNo
Wrap sender in an Integrant componentMediumNo lifecycle resource to manageNo
Decision matrix. The SES client is a component because the SDK client is a heavyweight closeable resource; a SendGrid HTTP call has no such resource, so the component shape does not transfer.
Decision

The email sender is a plain function, not an Integrant component. hato keeps its own shared default HttpClient behind an internal delay and is used client-less everywhere in this codebase; there is no resource to init/halt. Provider selection is config, not lifecycle.

Design

notify! / admin / verification email email/send! dispatch on :provider :sendgrid http.client/request :log (local/test) SendGrid API config: :provider :sender :api-key (from :com.getorcha/notifications, #profile-gated)
Outbound path. Inbound SES (document acquisition) is a separate pipeline and is not shown.

1 · DNS — manual, management account (not CDK)

All six records sit on mail.getorcha.com labels, which resolve only from the root getorcha.com hosted zone. That zone lives in the management account (333886071599, zone Z02414383CQNYTPGX2EIK) and is in no CDK stack — CDK only manages the per-environment {env}.getorcha.com zones in workload accounts. These records cannot go in the CDK-managed zone (wrong zone) and pulling the root zone into CDK is out of scope.

TypeHostValue
CNAMEurl5843.mail.getorcha.comsendgrid.net
CNAME107596766.mail.getorcha.comsendgrid.net
CNAMEem8281.mail.getorcha.comu107596766.wl017.sendgrid.net
CNAMEs1._domainkey.mail.getorcha.coms1.domainkey.u107596766.wl017.sendgrid.net
CNAMEs2._domainkey.mail.getorcha.coms2.domainkey.u107596766.wl017.sendgrid.net
TXT_dmarc.mail.getorcha.comv=DMARC1; p=quarantine; rua=mailto:max@getorcha.com; adkim=s; aspf=s; pct=100
Source: SendGrid domain authentication configured for mail.getorcha.com. The earlier apex-scoped getorcha.com record set is discarded.

Procedure:

  1. Pre-flight (read-only): with --profile orcha, list Z02414383CQNYTPGX2EIK and check for an existing _dmarc.mail.getorcha.com TXT, existing s1/s2._domainkey.mail.getorcha.com, and any name collision with em8281 / url5843 / 107596766. SES Easy-DKIM uses token selectors, not s1/s2, so no DKIM collision is expected.
  2. Capture the change as an idempotent UPSERT change-batch script committed under infra/scripts/ (mirrors the documented "manual records in management account" pattern).
  3. Apply with explicit confirmation before the mutating route53 change-resource-record-sets call. Existing mail.getorcha.com SPF/inbound records are left untouched.
Warning

If a _dmarc.mail.getorcha.com TXT already exists, reconcile — do not blind-overwrite. Only one DMARC record per name is valid; clobbering an existing one can change failure handling for other mail on that name.

2 · Secret & IAM (CDK)

3 · Code

clojure;; com.getorcha.email — provider-dispatched, plain fn
(defmulti send! (fn [{:keys [provider]} _msg] provider))

(defmethod send! :sendgrid [{:keys [api-key sender]} {:keys [to subject body]}]
  (let [{:keys [status] :as resp}
        (http/request {:method  :post
                       :url     "https://api.sendgrid.com/v3/mail/send"
                       :headers {"Authorization" (str "Bearer " api-key)}
                       :content-type :json
                       :form-params  (sendgrid-payload sender to subject body)})]
    (when-not (<= 200 status 299)
      (throw (ex-info "SendGrid send failed" {:status status, :body (:body resp)})))))

(defmethod send! :log [_cfg msg]
  (log/info "[LOCAL DEV] Would send email" (select-keys msg [:to :subject])))

4 · Infra removal (CDK)

Remove the VerifySendingDomain custom resource and its CfnOutput from infra/stacks/foundation_stack.py.

Warning

The custom resource's on_delete calls SESv2.deleteEmailIdentity for mail.getorcha.com — a real, stateful infra mutation. Inbound document acquisition uses the separate mail.{env}.getorcha.com receiving identity, so ingestion is unaffected. Change is applied via a CDK deploy (no CLI → no drift). Orphaned old SES DKIM CNAMEs in the management zone are harmless; cleaning them is optional and out of scope.

5 · Sequencing & verification

  1. Apply DNS (step 1). Wait for propagation.
  2. Hard gate: SendGrid dashboard shows the domain authenticated/verified.
  3. Generate the API key; set the SSM SecureString value.
  4. Deploy CDK (secret + IAM + VerifySendingDomain removal) and the code change.
  5. Send a test to a Gmail inbox; "Show original" must report DKIM: PASS and DMARC: PASS with d=mail.getorcha.com.
Warning

p=quarantine; pct=100 with strict alignment means any DKIM misconfiguration quarantines 100% of outbound mail. The "SendGrid verified" gate before code/infra cutover is mandatory, not advisory.

Rollback: outbound SES removal and the code change ship together; reverting the commit and redeploying restores the SES path, but SES is still sandbox-limited, so the real recovery path is fixing DNS/SendGrid, not rollback.

Open questions

All resolved 2026-05-17:

References