AWS CDK infrastructure for the Orcha document processing system.
+---------------------------+
| ALB (HTTPS:443) |
Internet -------------------------->| app.prod.getorcha.com |
+-------------+-------------+
|
| Port 8888
v
+---------------------------------------------------------------------------------+
| Public Subnet (AZ-A, AZ-B) |
| |
| +---------------------------------------------------------------------+ |
| | Auto Scaling Group (min=max=1) | |
| | EC2 t4g.medium | |
| | | |
| | docker-compose | |
| | +---------------------+ +---------------------+ | |
| | | App Container | | Worker Container | | |
| | | (port 8888) | | (polls SQS) | | |
| | +---------------------+ +---------------------+ | |
| +---------------------------------------------------------------------+ |
| | |
+-------------------------------------------+-------------------------------------+
|
+-------------------------------------------+-------------------------------------+
| Private Subnet (AZ-A, AZ-B) |
| | |
| v |
| +----------------+ |
| | RDS | |
| | PostgreSQL | |
| +----------------+ |
+---------------------------------------------------------------------------------+
CI/CD Pipeline:
+----------+ +-------------+ +-----------+ +---------+ +------------+
| GitHub |--->| CodePipeline|--->| CodeBuild |--->| Approve |--->| CodeDeploy |
| (push) | | | | (test + | | (manual)| | (to ASG) |
+----------+ +-------------+ | build) | +---------+ +------------+
+-----------+
|
v
+---------+
| ECR |
| (images)|
+---------+
Infrastructure is organized into 4 CDK stacks deployed in order:
| Stack | Resources | Dependencies |
|---|---|---|
| FoundationStack | VPC, subnets, security groups, S3, SQS, ECR, Route53 hosted zone | None |
| DataStack | RDS PostgreSQL, Secrets Manager | Foundation |
| ComputeStack | ALB, ASG, ACM certificate, Route53 record, IAM roles | Foundation, Data |
| OpsStack | CloudWatch, SNS, CodePipeline, CodeBuild, CodeDeploy, Budget | Foundation, Data, Compute |
Deploy all stacks:
source .venv/bin/activate
AWS_PROFILE=orcha-prod cdk deploy --all --context env_name=prod
| Resource | Name/Value |
|---|---|
| VPC CIDR | 10.0.0.0/16 |
| Region | eu-central-1 |
| Availability Zones | eu-central-1a, eu-central-1b |
| NAT Gateway | None (EC2 in public subnet) |
Security Groups:
v1-orcha-alb-sg - ALB (inbound 80, 443 from internet)v1-orcha-ec2-sg - EC2 (inbound 8888 from ALB only)v1-orcha-rds-sg - RDS (inbound 5432 from EC2, CodeBuild)v1-orcha-codebuild-sg - CodeBuild (outbound only)| Resource | Name |
|---|---|
| S3 Bucket (documents) | v1-orcha-global-storage-{account_id} |
| S3 Bucket (pipeline) | v1-orcha-pipeline-artifacts-{account_id} |
| ECR Repository | v1-orcha (keeps last 5 images) |
| Queue | Purpose | Visibility Timeout |
|---|---|---|
v1-orcha-global-ingest |
Document processing | 600s (10 min) |
v1-orcha-global-ingest-dlq |
Failed documents | 14 day retention |
v1-orcha-global-email-acquire |
Email acquisition | 300s (5 min) |
v1-orcha-global-email-acquire-dlq |
Failed emails | 14 day retention |
| Attribute | Value |
|---|---|
| Instance | v1-orcha-db |
| Engine | PostgreSQL 18.1 |
| Instance type | db.t4g.medium |
| Storage | 30 GB gp3 (autoscales to 100 GB) |
| Multi-AZ | No |
| Backup retention | 14 days |
| Deletion protection | Yes |
Credentials stored in Secrets Manager: /v1-orcha/db-credentials
| Attribute | Value |
|---|---|
| Instance type | t4g.medium (ARM64) |
| AMI | Amazon Linux 2023 |
| ASG capacity | min=1, max=1 |
| Health check | ELB, 300s grace period |
IAM Role (v1-orcha-service-role):
/v1-orcha/*/v1-orcha/*| Resource | Value |
|---|---|
| Hosted zone | prod.getorcha.com |
| A record | app.prod.getorcha.com → ALB |
| Certificate | ACM, DNS-validated, auto-renews |
Pipeline: v1-orcha-deploy
| Stage | Action |
|---|---|
| Source | GitHub (CodeConnections) on master branch push |
| Build | CodeBuild: run tests, build uberjar, build/push Docker image |
| Approve | Manual approval (SNS notification) |
| Deploy | CodeDeploy to ASG (AllAtOnce, auto-rollback on failure) |
CodeBuild (v1-orcha-build):
aws/codebuild/amazonlinux2-aarch64-standard:3.0 (ARM64)/v1-orcha/codebuildCodeDeploy (v1-orcha / v1-orcha-production):
CloudWatch Alarms (10 total):
| Alarm | Condition | Severity |
|---|---|---|
v1-orcha-alb-unhealthy |
HealthyHostCount < 1 | Critical |
v1-orcha-ec2-status-check |
GroupInServiceInstances < 1 | Critical |
v1-orcha-rds-no-connections |
DatabaseConnections = 0 | Critical |
v1-orcha-ingest-dlq-not-empty |
DLQ messages > 0 | Critical |
v1-orcha-email-acquire-dlq-not-empty |
DLQ messages > 0 | Critical |
v1-orcha-ec2-high-cpu |
CPU > 80% | Operational |
v1-orcha-rds-high-cpu |
CPU > 80% | Operational |
v1-orcha-rds-low-storage |
Free storage < 5 GB | Operational |
v1-orcha-rds-high-connections |
Connections > 100 | Operational |
v1-orcha-cert-expiring |
Days to expiry < 14 | Operational |
SNS Topic: v1-orcha-alerts (email subscription)
Cost Controls:
Log Groups (30-day retention):
/v1-orcha/application/v1-orcha/user-data/v1-orcha/codebuild/v1-orcha/ssm-sessionsBootstrap CDK:
AWS_PROFILE=orcha-prod cdk bootstrap aws://700558745280/eu-central-1 --context env_name=prod
Deploy all stacks:
AWS_PROFILE=orcha-prod cdk deploy --all --context env_name=prod
Delegate subdomain (in management account):
./scripts/delegate-subdomain.sh prod <HOSTED_ZONE_ID>
Complete GitHub connection:
Confirm SNS subscription (check email)
Update SSM parameters:
./scripts/update-secrets.sh --from-file secrets
Push to master branch triggers the pipeline automatically.
Manual deployment:
AWS_PROFILE=orcha-prod cdk deploy --all --context env_name=prod
No SSH. Use SSM Session Manager:
# Get instance ID
INSTANCE_ID=$(AWS_PROFILE=orcha-prod aws ec2 describe-instances \
--filters "Name=tag:Name,Values=v1-orcha-app" \
--query "Reservations[0].Instances[0].InstanceId" --output text)
# Start session
AWS_PROFILE=orcha-prod aws ssm start-session --target $INSTANCE_ID
# Port forward REPL (9878)
AWS_PROFILE=orcha-prod aws ssm start-session --target $INSTANCE_ID \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["9878"],"localPortNumber":["9878"]}'
Applied to all resources:
| Key | Value |
|---|---|
Project |
orcha |
Environment |
prod |
ManagedBy |
cdk |
Items identified but not yet implemented:
| Item | Description | Priority |
|---|---|---|
| Migration Test CodeBuild | v1-orcha-migration-test project to test schema changes against RDS snapshot before deploying |
Medium |
| SQS Backlog Alarm | v1-orcha-ingest-backlog alarm when queue > 500 messages |
Low |
| Pipeline Notifications | SNS notifications for pipeline success/failure events (beyond manual approval) | Low |
| Developer SSM Policy | v1-orcha-developer-ssm-access IAM managed policy for team access |
Low |
| Item | Cost/Month | Priority | When |
|---|---|---|---|
| S3 Gateway Endpoint | FREE | Medium | Security improvement, keeps S3 traffic in AWS |
| ECR retention → 10 images | +$1 | Medium | Better rollback capability |
| WAF | +$15 | Medium | When handling sensitive customer data |
| VPC Interface Endpoints | +$122 | Low | Security, compliance requirements |
| HA (min=2 instances) | +$30 | Low | When SLA commitments needed |
| NAT Gateway | +$38 | Low | Compliance (move EC2 to private subnet) |
| RDS Multi-AZ | +$60 | Low | When SLA commitments needed |
| Item | Description |
|---|---|
| Auto DB Credential Rotation | EventBridge + Lambda to rotate credentials and restart app |
| Per-Tenant Resource Scaling | Dedicated SQS queues and workers for large tenants |
| Multi-Environment Support | Config-driven stack parameters for dev/staging/prod |
./
├── app.py # CDK app entry point
├── cdk.json # CDK configuration
├── requirements.txt # Python dependencies
├── stacks/
│ ├── foundation_stack.py # VPC, S3, SQS, ECR, Route53
│ ├── data_stack.py # RDS, Secrets Manager
│ ├── compute_stack.py # ALB, ASG, ACM, Route53 record
│ └── ops_stack.py # CI/CD, monitoring, alerts
├── runbooks/
│ ├── deploy.md # Deployment instructions
│ ├── bootstrap-cdk.md # CDK bootstrap details
│ └── update-secrets.md # SSM parameter updates
└── scripts/
├── delegate-subdomain.sh # NS delegation helper
└── update-secrets.sh # SSM parameter updates