O14 Transfer Impact Assessments -- Codebase Verification Review

Date: 12.04.2026 Reviewer: Automated codebase analysis Scope: Verified every claim in O14 v2.0 against the Orcha codebase (orcha/src/, orcha/infra/) and CDK infrastructure

Sources Consulted

Source Key findings area
infra/stacks/foundation_stack.py Cognito, S3, SQS, KMS, SSM parameters
infra/stacks/data_stack.py RDS PostgreSQL, storage encryption
infra/stacks/compute_stack.py ALB, EC2, TLS certificates
src/com/getorcha/ai/llm.clj AI provider HTTP clients, endpoints
src/com/getorcha/workers/ap/ingestion/extraction.clj Data actually sent to Anthropic
src/com/getorcha/workers/ap/ingestion/transcription.clj Data sent to Google Document AI and Gemini Vision
src/com/getorcha/workers/ap/acquisition/email/outlook.clj Microsoft Graph API usage
src/com/getorcha/workers/ap/acquisition/email/gmail.clj Gmail API usage
src/com/getorcha/oauth/providers/microsoft.clj Microsoft Entra ID authentication
src/com/getorcha/notifications.clj Slack API, Microsoft Bot Framework
src/com/getorcha/search.clj Google Vertex AI embeddings
resources/com/getorcha/config.edn Provider endpoints and regions
Full-text grep across src/, infra/ Claim verification
Prior O3 and O1 review findings Cross-referenced

Changes Made

1. REMOVED: OpenAI (entire Section 6 deleted)

Original: Section 6 was a full TIA for "OpenAI OpCo, LLC (Pre-Approved)" including company profile, data flow, transfer mechanism, US law exposure analysis, and risk determination. The document stated OpenAI was "pre-approved as a sub-processor but not yet active".

Finding: OpenAI is not integrated in the codebase. Exhaustive grep for openai, open-ai, gpt-4, gpt-3 across the entire repository returned zero matches. No SSM parameters, no config entries, no dependencies, no code.

Confidence: High. This matches the O3 review finding.

Action: Removed the entire OpenAI section. A TIA should assess actual data transfers. If OpenAI is desired as a future sub-processor, a TIA should be conducted as part of the activation procedure, not in advance on speculation. Removed OpenAI from scope, overview table, conclusion, and pre-approved list references.

2. ADDED: Microsoft Corporation (new Section 6)

Original: Microsoft was not covered. The original document treated Microsoft as though it was not a sub-processor.

Finding: Microsoft is a significant US-based sub-processor processing personal data in three distinct ways:

Confidence: High. Direct API URLs found in oauth/providers/microsoft.clj, workers/ap/acquisition/email/outlook.clj, and notifications.clj.

Action: Added a full Section 6 individual TIA for Microsoft Corporation with company profile, services used, data flow, transfer mechanism, US law exposure analysis, supplementary measures, and risk determination.

3. ADDED: Slack Technologies, LLC (new Section 7)

Original: Slack was not covered.

Finding: Slack Technologies (Salesforce subsidiary) receives notification messages containing document processing status, which may include supplier names and document references. Direct API calls to slack.com/api/chat.postMessage found in notifications.clj.

Confidence: High. Direct API usage confirmed in code.

Action: Added a full Section 7 individual TIA for Slack Technologies.

4. CORRECTED: AWS services list (Section 8)

Original claim: "Cloud hosting, compute (EC2), database (RDS/DynamoDB), object storage (S3), content delivery (CloudFront), authentication (Cognito), email delivery (SES), and backup services."

Findings:

Action: Removed DynamoDB and CloudFront. Added RDS, SQS, KMS, secrets management as accurate representations of used services.

5. CORRECTED: AWS encryption at rest claim (Section 8)

Original claim: "Encryption at rest: AES-256 encryption for all stored data, with customer-managed keys (AWS KMS)"

Finding: This overstates the actual encryption configuration.

Evidence: data_stack.py uses storage_encrypted=True without storage_encryption_key. foundation_stack.py S3 buckets use encryption=s3.BucketEncryption.S3_MANAGED. Single customer-managed KMS key (db-secrets-key) is used for specific field-level encryption only.

Confidence: High.

Action: Corrected to distinguish between AWS-managed encryption (storage-level) and customer-managed KMS (sensitive application fields).

6. CORRECTED: TLS 1.3 claim throughout

Original claim: "TLS 1.3 encryption for all API calls" (appears multiple times)

Finding: TLS 1.3 is not explicitly configured. The ALB uses the AWS default TLS policy. Outbound HTTP calls use the JVM default TLS settings via the hato HTTP client. No TLS version is pinned in code or infrastructure. Actual TLS version negotiated depends on server support (likely TLS 1.3 for modern endpoints, TLS 1.2 fallback).

Confidence: High. Zero matches for TLS version configuration in code.

Action: Replaced "TLS 1.3" with "TLS" throughout to reflect the actual measure (transport encryption) without claiming a specific version that isn't enforced.

7. CORRECTED: PII masking / data minimization claims

Original claims:

Finding: FALSE. No PII masking exists. The extraction prompt explicitly instructs the LLM: "IBAN & BIC: CRITICAL payment fields -- always extract the supplier's bank details". Full document text with all financial PII is transmitted unmasked. This matches the O3 and O1 review findings.

Confidence: High.

Action: Removed all PII masking / IBAN masking / tax ID redaction claims. The actual data minimization measures in place are: Vertex AI embeddings exclude IBANs (for search indexing only), email triage truncates body text, previews are low-resolution. Described these accurately.

8. CORRECTED: Zero-retention claims

Original claims:

Finding: There is no API-level zero-retention enforcement in the code. Anthropic's anthropic-beta header for zero-retention is not set. No Google-specific data retention configuration is applied. Zero-retention, if it applies, is enforced only contractually (via DPA), not technically at the API level.

Confidence: Medium. The contractual provisions may exist but cannot be verified from code; the claim that this is a technical measure is not supported.

Action: Reclassified "zero retention" from a technical measure to a contractual commitment throughout the document. Rephrased risk mitigation language to attribute these protections to DPA provisions rather than API configuration.

9. CORRECTED: Google data flow and regions

Original claim: "Customer document is retrieved from AWS Frankfurt, transmitted over TLS 1.3 to Google's Document AI and Gemini endpoints, processed with no persistent storage"

Finding: This is partially accurate but conflates services that have very different data residency:

Confidence: High. Matches O3 review finding.

Action: Corrected the Google section to distinguish between EU-regional Google services (Document AI, Vertex AI) and global Gemini endpoint. Updated the overview table to reflect this split.

10. EXPANDED: Data scope for AI services

Original claim: "Orcha makes an API call to Anthropic or Google, transmitting only the document content and extracted fields"

Finding: The actual scope is broader:

Confidence: High. Matches O3 review findings.

Action: Expanded the data flow and data categories descriptions to reflect the actual scope of data transferred.

11. CORRECTED: DPF certifications list

Original: DPF certifications were mentioned for Google, OpenAI, and AWS only. Microsoft's DPF certification was not referenced.

Finding: Microsoft is DPF-certified (public information). Since Microsoft is now in scope, its DPF certification should be referenced. OpenAI's DPF certification is no longer relevant as OpenAI is not used.

Confidence: Medium. DPF certifications are external facts (on the DPF public list); I did not independently verify current certification status, but these statuses can be verified against dataprivacyframework.gov at document review time.

Action: Added Microsoft DPF reference; removed OpenAI DPF reference.

12. REMOVED: Specific risk-analysis claims that rely on removed measures

Throughout the document, the original risk analysis relied heavily on the "zero retention" technical claim to justify MEDIUM risk ratings. For example:

"Orcha's zero data retention policy ensures that data sent to sub-processors is transient; there is no persistent data store at the AI provider to produce in response to a 702 order."

This was the primary justification for mitigating FISA 702, EO 12333, and CLOUD Act risks. Since the zero-retention protection is contractual only (not technically enforced), the mitigation language has been softened accordingly.

Action: Rephrased FISA 702, EO 12333, and CLOUD Act mitigation language to attribute protection to contractual commitments rather than technical enforcement. The overall MEDIUM risk rating is retained but the justification is more accurate.

13. CORRECTED: Change history and version

Original: Version 2.0, April 2026. No change history table. Action: Set to Version 3.0, 12.04.2026. Added change history table documenting v1.0, v2.0, and v3.0 changes.


Confidence Assessment

Finding Confidence Basis
OpenAI not integrated High Exhaustive grep; matches O3 review
Microsoft is a US sub-processor High Direct API URLs to graph.microsoft.com, login.microsoftonline.com
Slack is a US sub-processor High Direct calls to slack.com/api
DynamoDB not used High Zero matches
CloudFront not used High Zero matches in infra/
TLS 1.3 not explicitly configured High No TLS version config in code
No PII masking before AI calls High Matches O3 and O1 findings; extraction prompt requests IBANs
No API-level zero-retention Medium No retention headers found; contractual terms not verifiable from code
Gemini uses global endpoint High Direct URL in llm.clj
Document AI uses EU region High Config value location=eu
Vertex AI uses europe-west1 High Config value
AWS encryption uses mix of managed/customer keys High CDK configuration directly inspected
DPF certifications of Google/Microsoft/Amazon Medium Public facts; verifiable at DPF list but not from code
ยง203 StGB contractual coverage Unknown Legal content not verifiable from code
Specific SCCs and UK Addendum versions Unknown Contractual documents not in codebase

Retractions

None. All claims in this review are supported by code evidence or cross-referenced with prior reviews.

Critical

  1. Remove OpenAI from operational planning unless actual integration begins -- A proactive TIA for a non-integrated provider creates maintenance burden and risk of stale commitments. If OpenAI is genuinely planned, the TIA should be produced at the point of activation as part of the O3 approval procedure.
  2. Add Microsoft DPA coverage -- Ensure DPAs are in place with Microsoft covering Graph API, Entra ID, and Bot Framework usage. This has been flagged in the O3 review as well.
  3. Add Slack DPA coverage -- Ensure a DPA with Salesforce (parent of Slack) covers Slack API usage.
  4. Implement technical enforcement of zero-retention -- Set Anthropic anthropic-beta zero-retention header. Verify Google's default API retention and configure explicit no-retention where available. This converts the zero-retention guarantee from contract-only to technically enforced.
  5. Implement PII masking before AI API calls -- Currently the document's supplementary measures cannot include data minimization of PII fields because none is implemented. Either build the masking, or the document should not claim it.

High Priority

  1. Migrate Gemini to Vertex AI regional endpoint -- Using Vertex AI Gemini endpoints in europe-west1 instead of the global generativelanguage.googleapis.com would significantly reduce data residency risk and simplify the TIA.
  2. Verify DPF certification status annually -- Google, Microsoft, and Amazon DPF certifications should be verified against dataprivacyframework.gov at each TIA review.
  3. Document formal escalation path for government requests -- The escalation procedure names roles (DPO, Technical Lead, Managing Director) but should map to specific individuals and contact details.

Medium Priority

  1. Pin TLS version explicitly -- Configure ALB TLS security policy explicitly and set minimum TLS 1.2 (or 1.3) in outbound HTTP client configuration, so the "TLS encryption" claim is auditable at a specific version.
  2. Extend TIA scope to cover Maesn -- Maesn (DATEV integration) is an EU sub-processor so does not require US transfer analysis. If other third-country sub-processors are added in the future, extend this TIA to cover them.