Note (2026-04-24): After this document was written,
legal_entitywas renamed totenantand the oldtenantwas renamed toorganization. Read references to these terms with the pre-rename meaning.
MCP tool for programmatic Excel file analysis via a sandboxed Clojure DSL. Part of the Data Discovery Protocol (Phase 1 — File-by-File Analysis).
The DDP agent needs to inspect Excel files deeply: read cells/ranges, understand
sheet structure, detect merged cells, find named ranges, check formulas and
number formats. orcha-fpna-list-files with include_summary gives a surface
overview (sheet names, headers, row counts), but the agent needs full
programmatic access to handle the variety of financial spreadsheet layouts it
will encounter.
evaluate-excel(code: String, file-bytes: byte[]) -> String
The boundary takes a Clojure code string and raw file bytes. Returns the evaluation result as an EDN string. This signature works identically whether the implementation is in-process or out-of-process.
Results are serialized to EDN via pr-str. EDN is Clojure's native data format
— vectors, maps, keywords, nil all round-trip faithfully. The agent receives a
string that directly represents the Clojure data structure returned by its code.
For v2, the native binary does the same: pr-str to stdout.
The boundary always returns an EDN string — either the serialized result or a serialized error map. Two error categories:
Evaluation errors (syntax error, undefined symbol, type error): SCI throws with location info. Returned as:
{:error {:type :eval :message "..." :line 3 :column 12}}
Infrastructure errors (timeout, unparseable file, OOM in v2): caught by the host. Returned as:
{:error {:type :timeout}}
{:error {:type :parse :message "Not a valid Excel file"}}
The MCP handler wraps the EDN string in the standard
{:content [{:type "text" :text <edn-string>}]} response.
MCP Server (JVM)
orcha-fpna-excel handler
1. Resolve legal entity, get FileStore
2. Read file via FileStore -> byte[]
3. Pass (code, file-bytes) to evaluate-excel
a. Load workbook from bytes via POI
b. Build SCI context (workbook bound to custom fns)
c. Evaluate code with 30s thread deadline
d. pr-str the result (or error map) to EDN string
e. Close workbook
4. Return EDN string to agent
SCI runs on a dedicated thread with a deadline. If the thread exceeds the timeout, it is interrupted and the result is an error map, also serialized to EDN.
MCP Server (JVM)
orcha-fpna-excel handler
1. Resolve legal entity, get FileStore
2. Read file via FileStore -> byte[]
3. Spawn native binary, pipe file-bytes via stdin, code via CLI arg
4. Read EDN string from stdout, errors from stderr
5. SIGKILL after timeout if process hasn't exited
6. Return EDN string to agent
Native binary
1. Read file-bytes from stdin
2. Load workbook from bytes via POI
3. Build SCI context (workbook bound to custom fns)
4. Evaluate code
5. pr-str the result (or error map) to stdout
6. Close workbook, exit
Each invocation is a fresh process with no shared state.
POI has been confirmed to work inside GraalVM native-image with appropriate reflection configuration.
:classes {} except allowlisted Math methodsIn v2, each tool call spawns a native process that could run for up to the timeout duration. An attacker with a valid token (or an agent manipulated via prompt injection through crafted Excel content) could attempt to exhaust resources by firing many concurrent evaluations.
Mitigations:
Apache POI directly. No docjure — it doesn't expose formulas, merged cells, named ranges, or cell format strings, all of which the DDP requires.
Custom functions use POI's ss.usermodel interfaces:
Workbook, Sheet, Row, Cell for data accessCellRangeAddress for merged regionsName + AreaReference for named rangesCellStyle.getDataFormatString() for number/date/currency formatsCell.getCellFormula() for formula stringsAll functions operate implicitly on the workbook loaded from the provided file bytes. The SCI code never sees the workbook object directly.
(excel/summary)
Returns a map from sheet names to metadata.
{"Sheet1" {:headers ["Col A" "Col B" ...] :row-count 150 :column-count 8}
"Sheet2" {:headers [...] :row-count 42 :column-count 5}}
(excel/sheets)
Returns a vector of sheet names.
["Sheet1" "Sheet2" "Assumptions"]
(excel/read range)
(excel/read range opts)
Reads cells. Range uses standard Excel notation: "A1", "A1:D10",
"Sheet1!A1:D10". Always returns a 2D vector (vector of row vectors), even for
a single cell.
Options:
:metadata? true — leaf values become maps with :value, :formula, and
:format keys(excel/read "A1") ;=> [[42]]
(excel/read "A1:C2") ;=> [[1 2 3] [4 5 6]]
(excel/read "Sheet2!B3:D5") ;=> [[...] [...] [...]]
(excel/read "A1" {:metadata? true})
;=> [[{:value 42 :formula "SUM(B1:B10)" :format "#,##0.00"}]]
(excel/merged-regions sheet-name)
Returns merged cell ranges for a sheet. Ranges only, no values — use
excel/read to get values if needed.
(excel/merged-regions "Sheet1")
;=> [{:range "B1:F1"} {:range "A3:A8"}]
(excel/named-ranges)
Returns all named ranges in the workbook. Hidden/internal Excel names are filtered out.
(excel/named-ranges)
;=> [{:name "Revenue" :refers-to "Sheet1!$B$2:$B$50" :scope :workbook}
; {:name "Dept_Costs" :refers-to "Sheet2!$C$3:$C$20" :scope "Sheet2"}]
Data: map, filter, reduce, mapv, filterv, into, get, get-in, assoc, dissoc, update, select-keys, keys, vals, merge, zipmap, group-by, sort-by, frequencies, first, second, last, rest, next, nth, take, drop, take-while, drop-while, concat, cons, conj, distinct, flatten, reverse, partition, partition-by, interleave, interpose, count, empty?, not-empty, contains?, some, every?, vector, hash-map, hash-set, set, list, vec, seq, range, repeat, repeatedly
Arithmetic: + - * / inc dec mod rem quot max min abs
Comparison: < > <= >= = not= compare
Logic: and, or, not, if, when, when-let, if-let, cond, condp, case
Strings: str, subs, clojure.string/split, clojure.string/join, clojure.string/replace, clojure.string/trim, clojure.string/lower-case, clojure.string/upper-case, clojure.string/includes?, clojure.string/starts-with?, clojure.string/ends-with?, re-find, re-matches, re-seq
Type predicates: nil?, string?, number?, integer?, double?, keyword?, map?, vector?, set?, seq?, coll?, boolean?, true?, false?, zero?, pos?, neg?, even?, odd?
Binding & control: let, fn, def, defn, do, -> ->> as-> cond-> cond->> some-> some->>
Math: Math/floor, Math/ceil, Math/round, Math/pow (via :classes allowlist)
No IO, no network, no atoms/refs/agents, no loop/recur/trampoline, no Java interop beyond Math, no require/import/eval, no side effects.
range, repeat, and repeatedly produce infinite lazy sequences when called
without bounds. Combined with eager functions like mapv or vec, they will
run until the timeout kills execution. The timeout is the safety net — removing
these functions would cripple legitimate use (e.g., (range 12) for months).
Registered as ::excel defmethod on tools/-tool in
com.getorcha.link.mcp.tools.fpna. Scope: "fpna:read".
Input schema:
{
"type": "object",
"properties": {
"legal_entity_id": {
"type": "string",
"description": "UUID of the legal entity. Optional if identity has access to exactly one.",
"format": "uuid"
},
"file": {
"type": "string",
"description": "Relative file path within the legal entity's data directory."
},
"code": {
"type": "string",
"description": "Clojure code to evaluate. See tool description for available functions."
}
},
"required": ["file", "code"]
}
The tool description will contain the full DSL reference: all custom functions with signatures, return shapes, and examples, plus the list of available clojure.core functions.
Full DSL reference embedded in the MCP tool description. The agent sees this when it discovers the tool. Covers: all 5 custom functions with signatures, return shapes, and examples; the available clojure.core subset; what's not available; timeout behavior.
Add operational guidance to Phase 1:
(excel/summary) first for sheet-level triage(excel/read ...) for headers, sample data, specific cells(excel/merged-regions ...) when headers appear incomplete or empty(excel/named-ranges) for semantic markers in hand-built financial models{:metadata? true} to detect formulas and number formats (currency, dates,
percentages)New:
babashka/sci — SCI interpreterExisting (already in project):
org.apache.poi/poi-ooxml — Apache POI for Excel parsing (used by list-files
via docjure, but we use POI directly for this tool)Note: docjure remains as a dependency for list-files's extract-excel-summary.
No need to remove it — it's already there and working for that use case.