SendGrid outbound email
Problem
All outbound email is sent through AWS SES (com.getorcha.aws/send-email!, SESv2). SES has only ever granted us sandbox access — production access has been requested and denied by AWS several times. Sandbox restricts sending to verified addresses, so customer-facing and notification email cannot be delivered in production.
We are moving outbound email to SendGrid, authenticated for the subdomain mail.getorcha.com. Inbound email (document acquisition: SES receipt rule → S3 → SQS → triage) is unaffected and stays on SES.
Goals & non-goals
Goals
- SendGrid is the sole outbound email provider in production, sending from
noreply@mail.getorcha.com. - Mail passes DKIM and the published
_dmarc.mail.getorcha.compolicy (p=quarantine; adkim=s; aspf=s; pct=100). - SendGrid API key stored as an SSM parameter (CDK-created placeholder, value set out-of-band); everything CDK-able is in CDK.
- Outbound SES is removed: send code path, IAM grant, and the
VerifySendingDomainCDK resource. - The hard-coded sender in
app/http/settings/notifications.cljis eliminated; sender comes from config in one place. - Local/test behaviour preserved: emails are logged, not sent.
Non-goals
- Inbound SES / document acquisition — untouched.
- HTML email, SendGrid dynamic templates, marketing/campaign features — body stays plain text.
- Bounce / spam-complaint webhook handling — explicitly deferred (see Open questions).
- Importing the root
getorcha.comRoute53 zone into CDK — out of scope and high-risk.
Approach
Send via the SendGrid Web API v3 (POST https://api.sendgrid.com/v3/mail/send) through the existing traced HTTP wrapper com.getorcha.http.client/request (hato's shared default client + X-Ray subsegment tracing). A single side-effecting function com.getorcha.email/send! dispatches on a :provider key. No new Integrant component; no custom HTTP client.
| Option | Effort | Fit with codebase | Chosen |
|---|---|---|---|
Web API v3 via http.client/request, plain fn | Low | Matches hato-default + X-Ray convention | Yes |
Web API v3 via raw hato/post | Low | Loses X-Ray parity with the old SES path | No |
| SendGrid SMTP relay (JavaMail) | Medium | New TLS/credential surface, no upside | No |
| Keep SES impl as config fallback | Low | Dead code for a path we won't use | No |
| Wrap sender in an Integrant component | Medium | No lifecycle resource to manage | No |
The email sender is a plain function, not an Integrant component. hato keeps its own shared default HttpClient behind an internal delay and is used client-less everywhere in this codebase; there is no resource to init/halt. Provider selection is config, not lifecycle.
Design
1 · DNS — manual, management account (not CDK)
All six records sit on mail.getorcha.com labels, which resolve only from the root getorcha.com hosted zone. That zone lives in the management account (333886071599, zone Z02414383CQNYTPGX2EIK) and is in no CDK stack — CDK only manages the per-environment {env}.getorcha.com zones in workload accounts. These records cannot go in the CDK-managed zone (wrong zone) and pulling the root zone into CDK is out of scope.
| Type | Host | Value |
|---|---|---|
| CNAME | url5843.mail.getorcha.com | sendgrid.net |
| CNAME | 107596766.mail.getorcha.com | sendgrid.net |
| CNAME | em8281.mail.getorcha.com | u107596766.wl017.sendgrid.net |
| CNAME | s1._domainkey.mail.getorcha.com | s1.domainkey.u107596766.wl017.sendgrid.net |
| CNAME | s2._domainkey.mail.getorcha.com | s2.domainkey.u107596766.wl017.sendgrid.net |
| TXT | _dmarc.mail.getorcha.com | v=DMARC1; p=quarantine; rua=mailto:max@getorcha.com; adkim=s; aspf=s; pct=100 |
Procedure:
- Pre-flight (read-only): with
--profile orcha, listZ02414383CQNYTPGX2EIKand check for an existing_dmarc.mail.getorcha.comTXT, existings1/s2._domainkey.mail.getorcha.com, and any name collision withem8281/url5843/107596766. SES Easy-DKIM uses token selectors, nots1/s2, so no DKIM collision is expected. - Capture the change as an idempotent
UPSERTchange-batch script committed underinfra/scripts/(mirrors the documented "manual records in management account" pattern). - Apply with explicit confirmation before the mutating
route53 change-resource-record-setscall. Existingmail.getorcha.comSPF/inbound records are left untouched.
If a _dmarc.mail.getorcha.com TXT already exists, reconcile — do not blind-overwrite. Only one DMARC record per name is valid; clobbering an existing one can change failure handling for other mail on that name.
2 · Secret & IAM (CDK)
FoundationStack: add("sendgrid-api-key", "SendgridApiKey", …)to the existingparamslist (foundation_stack.py~:793). It is created as a plainssm.StringParameterplaceholder (PLACEHOLDER_UPDATE_ME), identical to every sibling secret. Value generated in SendGrid (scope: Mail Send) and set out-of-band; it may be overwritten as a SecureString sinceaws/get-parameteralready passesWithDecryption=true.ComputeStack: the instance role already grantsssm:GetParameteronparameter/v1-orcha/*(~:314), so the new key needs no new read grant and no KMS block. The only IAM change is to remove theses:SendEmail/ses:SendRawEmailstatement (~:267–275).- Config consumes the key via the existing
#orcha/paramreader.
3 · Code
- New ns
com.getorcha.email:send!takes the email config and a message map{:from :to :subject :body}; dispatches on:provider. :sendgrid: builds the v3 payload, callscom.getorcha.http.client/requestwithAuthorization: Bearer <api-key>. Non-2xx → throwex-infowith status and the SendGrid error body logged.toaccepts a string or vector (parity with currentsend-email!).:log: logs to/subject/body and returns, replacing today's "nil ses-client → log" idiom. Selected via#profilefor local/test.- Config lives in the existing
:com.getorcha/notificationsmap: add:providerand:api-keyalongside the existing:sender. Prod →:sendgrid; local/test →:log. - Rewire three call sites to
email/send!:notifications.cljuser-email (~:43–55) and admin-email (~:224–227), andapp/http/settings/notifications.cljsend-verification-email!(~:352–366). Delete the hard-coded(def ^:private sender-email …)atsettings/notifications.clj:220and thread the configured:senderin instead. - Delete outbound SES:
aws/build-ses-client,aws/send-email!, the:clients :seswiring, and now-unused SESv2 imports.
clojure;; com.getorcha.email — provider-dispatched, plain fn (defmulti send! (fn [{:keys [provider]} _msg] provider)) (defmethod send! :sendgrid [{:keys [api-key sender]} {:keys [to subject body]}] (let [{:keys [status] :as resp} (http/request {:method :post :url "https://api.sendgrid.com/v3/mail/send" :headers {"Authorization" (str "Bearer " api-key)} :content-type :json :form-params (sendgrid-payload sender to subject body)})] (when-not (<= 200 status 299) (throw (ex-info "SendGrid send failed" {:status status, :body (:body resp)}))))) (defmethod send! :log [_cfg msg] (log/info "[LOCAL DEV] Would send email" (select-keys msg [:to :subject])))
4 · Infra removal (CDK)
Remove the VerifySendingDomain custom resource and its CfnOutput from infra/stacks/foundation_stack.py.
The custom resource's on_delete calls SESv2.deleteEmailIdentity for mail.getorcha.com — a real, stateful infra mutation. Inbound document acquisition uses the separate mail.{env}.getorcha.com receiving identity, so ingestion is unaffected. Change is applied via a CDK deploy (no CLI → no drift). Orphaned old SES DKIM CNAMEs in the management zone are harmless; cleaning them is optional and out of scope.
5 · Sequencing & verification
- Apply DNS (step 1). Wait for propagation.
- Hard gate: SendGrid dashboard shows the domain authenticated/verified.
- Generate the API key; set the SSM SecureString value.
- Deploy CDK (secret + IAM +
VerifySendingDomainremoval) and the code change. - Send a test to a Gmail inbox; "Show original" must report
DKIM: PASSandDMARC: PASSwithd=mail.getorcha.com.
p=quarantine; pct=100 with strict alignment means any DKIM misconfiguration quarantines 100% of outbound mail. The "SendGrid verified" gate before code/infra cutover is mandatory, not advisory.
Rollback: outbound SES removal and the code change ship together; reverting the commit and redeploying restores the SES path, but SES is still sandbox-limited, so the real recovery path is fixing DNS/SendGrid, not rollback.
Open questions
All resolved 2026-05-17:
- API key timing — resolved: the user generates the SendGrid key (Mail Send scope) at step 3.
- DNS execution — resolved: I drive it via
--profile orchawith read-only pre-flight and explicit confirmation before the mutating call. - Bounce / spam-complaint handling — resolved: deferred, accepted for first cutover. No in-app suppression/feedback loop initially.
- Body format — resolved: stays plain text. No HTML/templates.
max@getorcha.com(DMARCrua) — confirmed monitored.
References
- Canonical DNS record list: Design §1 table above — the six
mail.getorcha.comrecords to UPSERT into zoneZ02414383CQNYTPGX2EIK(management account). - Send sites:
src/com/getorcha/notifications.clj(~:43, ~:224),src/com/getorcha/app/http/settings/notifications.clj(:220, ~:352). - SES code to remove:
src/com/getorcha/aws.clj(build-ses-client~:420,send-email!~:436). - Traced HTTP wrapper:
src/com/getorcha/http/client.clj. - CDK:
infra/stacks/foundation_stack.py(VerifySendingDomain~:608–642),infra/stacks/compute_stack.py(SES IAM ~:267–275). - Config:
resources/com/getorcha/config.edn(:com.getorcha/notifications:81). - SendGrid v3 Mail Send API · SendGrid domain authentication