Date: 12.04.2026
Reviewer: Automated codebase analysis
Scope: Verified every claim in O14 v2.0 against the Orcha codebase (orcha/src/, orcha/infra/) and CDK infrastructure
| Source | Key findings area |
|---|---|
infra/stacks/foundation_stack.py |
Cognito, S3, SQS, KMS, SSM parameters |
infra/stacks/data_stack.py |
RDS PostgreSQL, storage encryption |
infra/stacks/compute_stack.py |
ALB, EC2, TLS certificates |
src/com/getorcha/ai/llm.clj |
AI provider HTTP clients, endpoints |
src/com/getorcha/workers/ap/ingestion/extraction.clj |
Data actually sent to Anthropic |
src/com/getorcha/workers/ap/ingestion/transcription.clj |
Data sent to Google Document AI and Gemini Vision |
src/com/getorcha/workers/ap/acquisition/email/outlook.clj |
Microsoft Graph API usage |
src/com/getorcha/workers/ap/acquisition/email/gmail.clj |
Gmail API usage |
src/com/getorcha/oauth/providers/microsoft.clj |
Microsoft Entra ID authentication |
src/com/getorcha/notifications.clj |
Slack API, Microsoft Bot Framework |
src/com/getorcha/search.clj |
Google Vertex AI embeddings |
resources/com/getorcha/config.edn |
Provider endpoints and regions |
Full-text grep across src/, infra/ |
Claim verification |
| Prior O3 and O1 review findings | Cross-referenced |
Original: Section 6 was a full TIA for "OpenAI OpCo, LLC (Pre-Approved)" including company profile, data flow, transfer mechanism, US law exposure analysis, and risk determination. The document stated OpenAI was "pre-approved as a sub-processor but not yet active".
Finding: OpenAI is not integrated in the codebase. Exhaustive grep for openai, open-ai, gpt-4, gpt-3 across the entire repository returned zero matches. No SSM parameters, no config entries, no dependencies, no code.
Confidence: High. This matches the O3 review finding.
Action: Removed the entire OpenAI section. A TIA should assess actual data transfers. If OpenAI is desired as a future sub-processor, a TIA should be conducted as part of the activation procedure, not in advance on speculation. Removed OpenAI from scope, overview table, conclusion, and pre-approved list references.
Original: Microsoft was not covered. The original document treated Microsoft as though it was not a sub-processor.
Finding: Microsoft is a significant US-based sub-processor processing personal data in three distinct ways:
Confidence: High. Direct API URLs found in oauth/providers/microsoft.clj, workers/ap/acquisition/email/outlook.clj, and notifications.clj.
Action: Added a full Section 6 individual TIA for Microsoft Corporation with company profile, services used, data flow, transfer mechanism, US law exposure analysis, supplementary measures, and risk determination.
Original: Slack was not covered.
Finding: Slack Technologies (Salesforce subsidiary) receives notification messages containing document processing status, which may include supplier names and document references. Direct API calls to slack.com/api/chat.postMessage found in notifications.clj.
Confidence: High. Direct API usage confirmed in code.
Action: Added a full Section 7 individual TIA for Slack Technologies.
Original claim: "Cloud hosting, compute (EC2), database (RDS/DynamoDB), object storage (S3), content delivery (CloudFront), authentication (Cognito), email delivery (SES), and backup services."
Findings:
dynamodb or DynamoDB across the codebase. Only RDS PostgreSQL is used.Action: Removed DynamoDB and CloudFront. Added RDS, SQS, KMS, secrets management as accurate representations of used services.
Original claim: "Encryption at rest: AES-256 encryption for all stored data, with customer-managed keys (AWS KMS)"
Finding: This overstates the actual encryption configuration.
Evidence:
data_stack.pyusesstorage_encrypted=Truewithoutstorage_encryption_key.foundation_stack.pyS3 buckets useencryption=s3.BucketEncryption.S3_MANAGED. Single customer-managed KMS key (db-secrets-key) is used for specific field-level encryption only.
Confidence: High.
Action: Corrected to distinguish between AWS-managed encryption (storage-level) and customer-managed KMS (sensitive application fields).
Original claim: "TLS 1.3 encryption for all API calls" (appears multiple times)
Finding: TLS 1.3 is not explicitly configured. The ALB uses the AWS default TLS policy. Outbound HTTP calls use the JVM default TLS settings via the hato HTTP client. No TLS version is pinned in code or infrastructure. Actual TLS version negotiated depends on server support (likely TLS 1.3 for modern endpoints, TLS 1.2 fallback).
Confidence: High. Zero matches for TLS version configuration in code.
Action: Replaced "TLS 1.3" with "TLS" throughout to reflect the actual measure (transport encryption) without claiming a specific version that isn't enforced.
Original claims:
Finding: FALSE. No PII masking exists. The extraction prompt explicitly instructs the LLM: "IBAN & BIC: CRITICAL payment fields -- always extract the supplier's bank details". Full document text with all financial PII is transmitted unmasked. This matches the O3 and O1 review findings.
Confidence: High.
Action: Removed all PII masking / IBAN masking / tax ID redaction claims. The actual data minimization measures in place are: Vertex AI embeddings exclude IBANs (for search indexing only), email triage truncates body text, previews are low-resolution. Described these accurately.
Original claims:
Finding: There is no API-level zero-retention enforcement in the code. Anthropic's anthropic-beta header for zero-retention is not set. No Google-specific data retention configuration is applied. Zero-retention, if it applies, is enforced only contractually (via DPA), not technically at the API level.
Confidence: Medium. The contractual provisions may exist but cannot be verified from code; the claim that this is a technical measure is not supported.
Action: Reclassified "zero retention" from a technical measure to a contractual commitment throughout the document. Rephrased risk mitigation language to attribute these protections to DPA provisions rather than API configuration.
Original claim: "Customer document is retrieved from AWS Frankfurt, transmitted over TLS 1.3 to Google's Document AI and Gemini endpoints, processed with no persistent storage"
Finding: This is partially accurate but conflates services that have very different data residency:
eu-documentai.googleapis.com)europe-west1generativelanguage.googleapis.com, which is a global endpoint with no EU guaranteeConfidence: High. Matches O3 review finding.
Action: Corrected the Google section to distinguish between EU-regional Google services (Document AI, Vertex AI) and global Gemini endpoint. Updated the overview table to reflect this split.
Original claim: "Orcha makes an API call to Anthropic or Google, transmitting only the document content and extracted fields"
Finding: The actual scope is broader:
Confidence: High. Matches O3 review findings.
Action: Expanded the data flow and data categories descriptions to reflect the actual scope of data transferred.
Original: DPF certifications were mentioned for Google, OpenAI, and AWS only. Microsoft's DPF certification was not referenced.
Finding: Microsoft is DPF-certified (public information). Since Microsoft is now in scope, its DPF certification should be referenced. OpenAI's DPF certification is no longer relevant as OpenAI is not used.
Confidence: Medium. DPF certifications are external facts (on the DPF public list); I did not independently verify current certification status, but these statuses can be verified against dataprivacyframework.gov at document review time.
Action: Added Microsoft DPF reference; removed OpenAI DPF reference.
Throughout the document, the original risk analysis relied heavily on the "zero retention" technical claim to justify MEDIUM risk ratings. For example:
"Orcha's zero data retention policy ensures that data sent to sub-processors is transient; there is no persistent data store at the AI provider to produce in response to a 702 order."
This was the primary justification for mitigating FISA 702, EO 12333, and CLOUD Act risks. Since the zero-retention protection is contractual only (not technically enforced), the mitigation language has been softened accordingly.
Action: Rephrased FISA 702, EO 12333, and CLOUD Act mitigation language to attribute protection to contractual commitments rather than technical enforcement. The overall MEDIUM risk rating is retained but the justification is more accurate.
Original: Version 2.0, April 2026. No change history table. Action: Set to Version 3.0, 12.04.2026. Added change history table documenting v1.0, v2.0, and v3.0 changes.
| Finding | Confidence | Basis |
|---|---|---|
| OpenAI not integrated | High | Exhaustive grep; matches O3 review |
| Microsoft is a US sub-processor | High | Direct API URLs to graph.microsoft.com, login.microsoftonline.com |
| Slack is a US sub-processor | High | Direct calls to slack.com/api |
| DynamoDB not used | High | Zero matches |
| CloudFront not used | High | Zero matches in infra/ |
| TLS 1.3 not explicitly configured | High | No TLS version config in code |
| No PII masking before AI calls | High | Matches O3 and O1 findings; extraction prompt requests IBANs |
| No API-level zero-retention | Medium | No retention headers found; contractual terms not verifiable from code |
| Gemini uses global endpoint | High | Direct URL in llm.clj |
| Document AI uses EU region | High | Config value location=eu |
| Vertex AI uses europe-west1 | High | Config value |
| AWS encryption uses mix of managed/customer keys | High | CDK configuration directly inspected |
| DPF certifications of Google/Microsoft/Amazon | Medium | Public facts; verifiable at DPF list but not from code |
| ยง203 StGB contractual coverage | Unknown | Legal content not verifiable from code |
| Specific SCCs and UK Addendum versions | Unknown | Contractual documents not in codebase |
None. All claims in this review are supported by code evidence or cross-referenced with prior reviews.
anthropic-beta zero-retention header. Verify Google's default API retention and configure explicit no-retention where available. This converts the zero-retention guarantee from contract-only to technically enforced.europe-west1 instead of the global generativelanguage.googleapis.com would significantly reduce data residency risk and simplify the TIA.dataprivacyframework.gov at each TIA review.