Forward-based email acquisition using AWS SES for automatic invoice extraction.
SES email acquisition provides a simpler alternative to OAuth-based email integration. Instead of managing OAuth tokens, webhook subscriptions, and provider-specific APIs, customers simply forward emails to a dedicated SES receiving address.
Benefits over OAuth approach:
Trade-offs:
┌─────────────────────┐
│ Customer Email │
│ (M365, Gmail, etc) │
└──────────┬──────────┘
│
│ Mail rule (auto-forward)
▼
┌─────────────────────────────────────────────────────────────────┐
│ AWS SES │
│ documents@mail.{env}.getorcha.com │
│ │
│ Receipt Rule: │
│ ├── Verify domain identity (mail.{env}.getorcha.com) │
│ └── Store to S3 bucket (v1-orcha-ses-emails-{account}) │
└──────────────────────┬──────────────────────────────────────────┘
│
│ S3 Event Notification
▼
┌─────────────────────────────────────────────────────────────────┐
│ SQS: email-acquire │
│ │
│ Message: {bucket, key, event: "ObjectCreated"} │
└──────────────────────┬──────────────────────────────────────────┘
│
│ Poll
▼
┌─────────────────────────────────────────────────────────────────┐
│ Workers Service │
│ │
│ Acquisition Orchestrator │
│ └── SES Handler (multi/handle-queue-message :ses) │
│ 1. Fetch .eml from S3 │
│ 2. Parse MIME (headers, body, attachments) │
│ 3. Lookup ap_doc_source_ses by sender email │
│ 4. If known sender: triage → upload → queue ingestion │
│ 5. Record status in ap_doc_source_ses_processed │
│ 6. Delete .eml from S3 │
└──────────────────────┬──────────────────────────────────────────┘
│
│ SQS: ingest
▼
┌─────────────────────────────────────────────────────────────────┐
│ Ingestion Pipeline (transcription → extraction → validation) │
└─────────────────────────────────────────────────────────────────┘
Customer forwards an invoice email. SES receives at documents@mail.{env}.getorcha.com.
Receipt rule saves the raw .eml file to S3:
v1-orcha-ses-emails-{account-id}abc123def456)S3 event notification sends to email-acquire queue:
{
"Records": [{
"s3": {
"bucket": {"name": "v1-orcha-ses-emails-123456789"},
"object": {"key": "abc123def456"}
}
}]
}
;; SES handler entry point
(defmethod multi/handle-queue-message :ses
[{:keys [aws] :as context}
{:keys [bucket key] :as _message}]
;; 1. Check deduplication
(when-not (already-processed? context key)
;; 2. Fetch .eml from S3
(let [eml-bytes (aws/get-object (:s3-client aws) bucket key)
;; 3. Parse MIME
{:keys [from subject attachments] :as email} (parse-eml eml-bytes)]
;; 4. Lookup tenant by sender
(if-let [{:ap-doc-source-ses/keys [doc-source-id]}
(lookup-doc-source-by-sender context from)]
;; Known sender - process
(let [result (triage/queue-extractable-items! context doc-source-ses email)]
(record-processed! context {:status "processed" ...})
(aws/delete-object! s3-client bucket key))
;; Unknown sender - reject
(do
(record-processed! context {:status "rejected" :error-reason "unknown-sender"})
(aws/delete-object! s3-client bucket key))))))
The shared triage module (triage/queue-extractable-items!) processes the email identically to OAuth-based acquisition:
Maps sender email addresses to doc sources (tenants).
CREATE TABLE ap_doc_source_ses (
doc_source_id UUID PRIMARY KEY REFERENCES ap_doc_source(id) ON DELETE CASCADE,
sender_email TEXT NOT NULL UNIQUE,
ses_receiving_address TEXT NOT NULL, -- e.g., documents@mail.prod.getorcha.com
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_ap_doc_source_ses_sender_email ON ap_doc_source_ses(sender_email);
Usage:
invoices@supplier.com, lookup finds the tenantAudit/deduplication table for processed emails.
CREATE TABLE ap_doc_source_ses_processed (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
doc_source_id UUID REFERENCES ap_doc_source(id) ON DELETE CASCADE,
s3_object_key TEXT NOT NULL UNIQUE, -- SES message ID (S3 key)
ses_message_id TEXT, -- Message-ID header
sender_email TEXT NOT NULL,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
status TEXT NOT NULL, -- 'processed' | 'rejected' | 'error'
error_reason TEXT
);
Status values:
processed: Successfully triaged and queued for ingestionrejected: Unknown sender or spam - email deleted from S3error: Processing failed - email retained in S3 for manual reviewWhen an email arrives from an unregistered sender:
ap_doc_source_ses_processed with status rejectedIf MIME parsing fails:
ap_doc_source_ses_processed with status errorSES has a 30MB message size limit. Emails exceeding this are rejected by SES before reaching our infrastructure.
The already-processed? check prevents reprocessing:
ap_doc_source_ses_processederror status to allow manual retries:com.getorcha/aws
{:s3-buckets {:ses-emails #join ["v1-orcha-ses-emails-"
#profile {:local-dev "local-stack"
:test "test"
:default #orcha/param "/v1-orcha/account-id"}]}}
See infra/stacks/foundation_stack.py:
# SES emails bucket
self.ses_emails_bucket = s3.Bucket(
self, "SesEmailsBucket",
bucket_name=f"v1-orcha-ses-emails-{self.account}",
...
)
# S3 → SQS event notification
self.ses_emails_bucket.add_event_notification(
s3.EventType.OBJECT_CREATED,
s3n.SqsDestination(self.email_acquire_queue),
)
# SES Receipt Rule
ses.ReceiptRule(
self, "StoreToS3Rule",
rule_set=receipt_rule_set,
recipients=[f"documents@{mail_domain}"],
actions=[ses_actions.S3(bucket=self.ses_emails_bucket)],
)
| Alarm | Threshold | Description |
|---|---|---|
v1-orcha-email-acquire-dlq-not-empty |
> 0 messages | Processing failures |
v1-orcha-email-acquire-latency |
> 60 seconds | Queue backup |
| Metric | Source | Description |
|---|---|---|
ApproximateNumberOfMessagesVisible |
SQS | Queue depth |
ApproximateAgeOfOldestMessage |
SQS | Processing latency |
NumberOfMessagesReceived |
SQS | Throughput |
Application logs in CloudWatch /v1-orcha/application:
;; Success path
(log/info "Processing SES email" {:bucket bucket :key key})
(log/info "Parsed SES email" {:from from :subject subject :attachment-count n})
(log/info "SES email processed" {:doc-source-id id :queued-items n})
;; Rejection
(log/warn "Unknown sender for SES email, rejecting" {:from from :key key})
;; Error
(log/error e "Failed to parse/process SES email" {:key key})
scan_enabled=True)Only registered sender emails are processed:
ap_doc_source_ses row with sender email| Aspect | SES (Forward) | OAuth (Direct) |
|---|---|---|
| Setup | Customer configures mail rule | Customer authorizes OAuth |
| Maintenance | None | Token refresh, subscription renewal |
| Latency | Forwarding delay (~1 min) | Real-time webhooks |
| Reliability | Email forwarding | OAuth tokens, webhooks |
| Metadata | From, Subject, Attachments | Full message, folders, read status |
| Provider support | Any email system | Outlook, Gmail only |