For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Prove that FastExcel + supplemental StAX can read xlsx files (values, formulas, formats, merged regions, named ranges) inside a GraalVM native-image binary.
Architecture: Standalone Clojure CLI with two parsing layers — FastExcel reader for cell data and supplemental javax.xml.stream StAX parsing for merged regions/named ranges. Both access the same xlsx zip file. Compiles to native-image via clj-easy/graal-build-time.
Tech Stack: Clojure 1.12, FastExcel reader 0.19.0 (aalto-xml + commons-compress), GraalVM CE 21.0.2, tools.build for uberjar.
Design doc: docs/plans/2026-03-05-excel-native-image-design.md
FastExcel reader classes (source at ~/code/oss/fastexcel/fastexcel-reader/):
ReadableWorkbook(File, ReadingOptions) — opens xlsx. ReadingOptions(true, false) enables format strings.ReadableWorkbook.getSheets() → Stream<Sheet>ReadableWorkbook.findSheet(String) → Optional<Sheet>Sheet.getName(), Sheet.getIndex(), Sheet.openStream() → Stream<Row>Row.getRowNum() (1-based), Row.getCellCount(), iterable over CellCell.getValue(), .getText(), .getType() (NUMBER/STRING/BOOLEAN/FORMULA/ERROR/EMPTY)Cell.getFormula() → formula string or nullCell.getDataFormatString() → format string or null (requires ReadingOptions(true, false))Cell.getAddress() → CellAddress (.getRow(), .getColumn() — 0-based)FastExcel directly instantiates com.fasterxml.aalto.stax.InputFactoryImpl (no ServiceLoader) in DefaultXMLInputFactory.java — this is good for native-image.
XLSX XML locations for supplemental parsing:
xl/worksheets/sheet{N}.xml → <mergeCells><mergeCell ref="A1:B2"/>...</mergeCells> (after </sheetData>)xl/workbook.xml → <definedNames><definedName name="Revenue" localSheetId="0">Sheet1!$B$2:$B$50</definedName>...</definedNames>Files:
deps.edn (add :native-excel and :build-native-excel aliases)scripts/ci/build_native_excel.cljStep 1: Add aliases to deps.edn
Add two aliases after the existing :antq alias:
:native-excel {:replace-paths ["src"]
:replace-deps {org.clojure/clojure {:mvn/version "1.12.0"}
org.dhatim/fastexcel-reader {:mvn/version "0.19.0"}
com.github.clj-easy/graal-build-time {:mvn/version "1.0.5"}}}
:build-native-excel {:replace-paths ["scripts/ci"]
:replace-deps {io.github.clojure/tools.build {:git/tag "v0.10.12" :git/sha "97c5562"}}
:ns-default build-native-excel}
Step 2: Create the build script
Create scripts/ci/build_native_excel.clj:
(ns build-native-excel
(:require [clojure.tools.build.api :as b]))
(defn uber [_]
(let [class-dir "target/native-excel"
uber-file "target/excel-sandbox.jar"
basis (b/create-basis {:aliases [:native-excel]})]
(b/delete {:path class-dir})
(b/copy-dir {:target-dir class-dir
:src-dirs ["src"]})
(b/compile-clj {:basis basis
:class-dir class-dir
:src-dirs ["src"]
:ns-compile '[com.getorcha.link.excel-sandbox]})
(b/uber {:basis basis
:class-dir class-dir
:uber-file uber-file
:main 'com.getorcha.link.excel-sandbox})))
Step 3: Verify the alias resolves
Run: clj -A:native-excel -Stree 2>&1 | head -20
Expected: Dependency tree showing clojure, fastexcel-reader, aalto-xml, commons-compress, graal-build-time. No POI.
Step 4: Commit
git add deps.edn scripts/ci/build_native_excel.clj
git commit -m "feat: add native-excel build alias and build script"
Files:
src/com/getorcha/link/excel_sandbox.cljStep 1: Create the namespace with imports and FastExcel reading functions
Create src/com/getorcha/link/excel_sandbox.clj:
(ns com.getorcha.link.excel-sandbox
"Prototype: read Excel files using FastExcel + supplemental StAX parsing.
Designed to compile to GraalVM native-image."
(:gen-class)
(:import
[java.io File]
[javax.xml.stream XMLInputFactory XMLStreamConstants]
[org.apache.commons.compress.archivers.zip ZipFile]
[org.dhatim.fastexcel.reader
Cell CellType ReadableWorkbook ReadingOptions Row Sheet]))
(defn- cell->value
"Extract the display value from a Cell."
[^Cell cell]
(when cell
(case (.name (.getType cell))
"NUMBER" (.getValue cell)
"STRING" (.getText cell)
"BOOLEAN" (.asBoolean cell)
"FORMULA" (.getText cell)
"ERROR" (str "ERROR:" (.getRawValue cell))
"EMPTY" nil
nil)))
(defn- cell->metadata
"Extract value + formula + format from a Cell."
[^Cell cell]
(when cell
{:value (cell->value cell)
:formula (.getFormula cell)
:format (.getDataFormatString cell)}))
(defn ^:private sheets
"Returns a vector of sheet names."
[^ReadableWorkbook wb]
(mapv #(.getName ^Sheet %) (iterator-seq (.iterator (.getSheets wb)))))
(defn ^:private summary
"Returns a map from sheet names to metadata (headers, row-count, column-count)."
[^ReadableWorkbook wb]
(into {}
(map (fn [^Sheet sheet]
(let [rows (with-open [stream (.openStream sheet)]
(vec (iterator-seq (.iterator stream))))
headers (when (seq rows)
(mapv (fn [^Cell c] (when c (.getText c)))
(first rows)))]
[(.getName sheet)
{:headers headers
:row-count (count rows)
:column-count (if (seq rows)
(.getCellCount ^Row (first rows))
0)}])))
(iterator-seq (.iterator (.getSheets wb)))))
(defn ^:private read-sheet-rows
"Read all rows from a sheet. When metadata? is true, includes formula and format."
[^ReadableWorkbook wb ^String sheet-name metadata?]
(let [sheet (.orElse (.findSheet wb sheet-name) nil)]
(when sheet
(let [extract (if metadata? cell->metadata cell->value)]
(with-open [stream (.openStream sheet)]
(mapv (fn [^Row row] (mapv extract row))
(iterator-seq (.iterator stream))))))))
Step 2: Verify it compiles on the JVM
Run: clj -A:native-excel -e "(require 'com.getorcha.link.excel-sandbox) (println :ok)"
Expected: :ok printed, no errors.
Step 3: Commit
git add src/com/getorcha/link/excel_sandbox.clj
git commit -m "feat: add FastExcel-based excel reading functions"
Files:
src/com/getorcha/link/excel_sandbox.cljStep 1: Add merged regions parser
Add after the read-sheet-rows function:
(defn- ^XMLInputFactory xml-input-factory []
(doto (XMLInputFactory/newFactory)
(.setProperty XMLInputFactory/IS_NAMESPACE_AWARE false)
(.setProperty XMLInputFactory/SUPPORT_DTD false)))
(defn ^:private parse-merged-regions
"Parse <mergeCells> from a sheet's XML entry in the xlsx zip."
[^ZipFile zip-file ^String sheet-entry-name]
(let [entry (.getEntry zip-file sheet-entry-name)]
(when entry
(with-open [is (.getInputStream zip-file entry)]
(let [reader (.createXMLStreamReader (xml-input-factory) is)]
(try
(loop [regions (transient [])]
(if (.hasNext reader)
(do (.next reader)
(if (and (= (.getEventType reader) XMLStreamConstants/START_ELEMENT)
(= (.getLocalName reader) "mergeCell"))
(recur (conj! regions {:range (.getAttributeValue reader nil "ref")}))
(recur regions)))
(persistent! regions)))
(finally
(.close reader))))))))
Step 2: Add named ranges parser
Add after parse-merged-regions:
(defn ^:private parse-named-ranges
"Parse <definedNames> from workbook.xml in the xlsx zip."
[^ZipFile zip-file]
(let [entry (or (.getEntry zip-file "xl/workbook.xml")
(.getEntry zip-file "xl/Workbook.xml"))]
(when entry
(with-open [is (.getInputStream zip-file entry)]
(let [reader (.createXMLStreamReader (xml-input-factory) is)]
(try
(loop [ranges (transient [])
in-names false]
(if (.hasNext reader)
(let [_ (.next reader)
type (.getEventType reader)]
(cond
(and (= type XMLStreamConstants/START_ELEMENT)
(= (.getLocalName reader) "definedNames"))
(recur ranges true)
(and (= type XMLStreamConstants/END_ELEMENT)
(= (.getLocalName reader) "definedNames"))
(persistent! ranges)
(and in-names
(= type XMLStreamConstants/START_ELEMENT)
(= (.getLocalName reader) "definedName"))
(let [name (.getAttributeValue reader nil "name")
local-sheet (.getAttributeValue reader nil "localSheetId")
refers-to (when (.hasNext reader)
(.next reader)
(when (= (.getEventType reader) XMLStreamConstants/CHARACTERS)
(.getText reader)))]
(recur (conj! ranges {:name name
:refers-to refers-to
:scope (if local-sheet local-sheet :workbook)})
true))
:else (recur ranges in-names)))
(persistent! ranges)))
(finally
(.close reader))))))))
Step 3: Verify it still compiles
Run: clj -A:native-excel -e "(require 'com.getorcha.link.excel-sandbox) (println :ok)"
Expected: :ok
Step 4: Commit
git add src/com/getorcha/link/excel_sandbox.clj
git commit -m "feat: add supplemental StAX parsing for merged regions and named ranges"
Files:
src/com/getorcha/link/excel_sandbox.cljStep 1: Add the -main function
Add at the end of the file:
(defn- run-all-features
"Exercise all Excel reading features on the given file and print results."
[^String path]
(let [file (File. path)
opts (ReadingOptions. true false)
zip (ZipFile. file)]
(try
(with-open [wb (ReadableWorkbook. file opts)]
(println "=== SHEETS ===")
(let [sheet-names (sheets wb)]
(prn sheet-names)
(println "\n=== SUMMARY ===")
(prn (summary wb))
(println "\n=== FIRST 5 ROWS (values) ===")
(when-let [first-name (first sheet-names)]
(let [rows (read-sheet-rows wb first-name false)]
(doseq [row (take 5 rows)]
(prn row))))
(println "\n=== FIRST 3 ROWS (metadata) ===")
(when-let [first-name (first sheet-names)]
(let [rows (read-sheet-rows wb first-name true)]
(doseq [row (take 3 rows)]
(prn row))))
(println "\n=== NAMED RANGES ===")
(prn (parse-named-ranges zip))
(println "\n=== MERGED REGIONS ===")
(doseq [[idx sheet-name] (map-indexed vector sheet-names)]
(let [entry-name (str "xl/worksheets/sheet" (inc idx) ".xml")
regions (parse-merged-regions zip entry-name)]
(when (seq regions)
(println (str " " sheet-name ":"))
(prn regions))))))
(finally
(.close zip)))))
(defn -main [& args]
(if-let [path (first args)]
(run-all-features path)
(do (println "Usage: excel-sandbox <path-to-xlsx>")
(System/exit 1))))
Step 2: Run on JVM with the test file
Run: clj -A:native-excel -M -m com.getorcha.link.excel-sandbox dump/duo-maesn.xlsx
Expected: Output showing sheets, summary, rows (values and metadata), named ranges, merged regions. No exceptions. This validates all FastExcel + supplemental parsing works on JVM before attempting native-image.
Step 3: Commit
git add src/com/getorcha/link/excel_sandbox.clj
git commit -m "feat: add -main entry point with full feature exercise"
Step 1: Build the uberjar
Run: clj -T:build-native-excel uber
Expected: target/excel-sandbox.jar created. No errors. The uberjar includes AOT-compiled classes for com.getorcha.link.excel-sandbox plus all dependencies.
Step 2: Verify uberjar runs on JVM
Run: java -jar target/excel-sandbox.jar dump/duo-maesn.xlsx
Expected: Same output as Task 4 Step 2. This confirms the uberjar is correctly assembled with the right main class and all deps.
Step 3: Commit
No code changed — this is a build verification step.
Step 1: Run native-image compilation
Run:
/usr/lib/jvm/java-21-graalvm/bin/native-image \
--features=clj_easy.graal_build_time.InitClojureClasses \
--no-fallback \
--report-unsupported-elements-at-runtime \
-H:+ReportExceptionStackTraces \
-jar target/excel-sandbox.jar \
-o target/excel-sandbox
Flags explained:
--features=clj_easy.graal_build_time.InitClojureClasses — auto-initializes Clojure classes at build time (from graal-build-time lib)--no-fallback — fail if native-image can't compile everything (no JVM fallback)--report-unsupported-elements-at-runtime — defer unsupported element errors to runtime rather than failing the build-H:+ReportExceptionStackTraces — full stack traces for build errorsExpected: Compilation succeeds (may take 1-3 minutes). Binary at target/excel-sandbox.
If it fails with ServiceLoader/reflection errors:
The most likely issue is javax.xml.stream.XMLInputFactory ServiceLoader lookup in the supplemental parser. FastExcel avoids this by directly instantiating com.fasterxml.aalto.stax.InputFactoryImpl, but our XMLInputFactory/newFactory call uses ServiceLoader.
Fix: Replace (XMLInputFactory/newFactory) in xml-input-factory with a direct instantiation:
(defn- ^XMLInputFactory xml-input-factory []
(doto (com.fasterxml.aalto.stax.InputFactoryImpl.)
(.setProperty XMLInputFactory/IS_NAMESPACE_AWARE false)
(.setProperty XMLInputFactory/SUPPORT_DTD false)))
This requires adding the import [com.fasterxml.aalto.stax InputFactoryImpl] to the ns form. Then rebuild uberjar and retry native-image.
If it fails with other reflection errors:
Create resources/META-INF/native-image/reflect-config.json with the classes mentioned in the error. Add resources to the :replace-paths in the :native-excel alias. Rebuild uberjar and retry.
Step 2: Commit any fixes
git add -p # stage only the files you changed
git commit -m "fix: resolve native-image compilation issues"
Step 1: Run the native binary
Run: ./target/excel-sandbox dump/duo-maesn.xlsx
Expected: Same output as the JVM run (Task 4 Step 2 / Task 5 Step 2). All features work: sheets, summary, cell values, cell metadata (formulas + formats), named ranges, merged regions.
Step 2: Compare JVM vs native output
Run:
java -jar target/excel-sandbox.jar dump/duo-maesn.xlsx > /tmp/jvm-output.txt 2>&1
./target/excel-sandbox dump/duo-maesn.xlsx > /tmp/native-output.txt 2>&1
diff /tmp/jvm-output.txt /tmp/native-output.txt
Expected: No differences (or only minor formatting differences in BigDecimal rendering).
Step 3: Check binary size and startup time
Run:
ls -lh target/excel-sandbox
time ./target/excel-sandbox dump/duo-maesn.xlsx > /dev/null
Expected: Binary roughly 20-50MB. Startup+execution under 1 second (vs several seconds on JVM).
Step 4: Final commit
If any fixes were needed during testing:
git add -p
git commit -m "fix: resolve native-image runtime issues"
Files:
docs/plans/2026-03-05-excel-native-image-design.mdStep 1: Add results section to design doc
Append a ## Results section to the design doc with:
Step 2: Commit
git add docs/plans/2026-03-05-excel-native-image-design.md
git commit -m "docs: add native-image prototype results"