Excel Native-Image Prototype Design

Goal

Determine whether an xlsx reader can run inside a GraalVM native-image binary, using FastExcel (not Apache POI) as the primary reading library.

Research findings

Apache POI: possible but painful

FastExcel reader: strong fit

Architecture

┌─────────────────────────────────────────────┐
│          excel_sandbox.clj (-main)          │
├──────────────────┬──────────────────────────┤
│  FastExcel Reader │  Supplemental StAX XML  │
│  (values, formulas│  (merged regions,       │
│   formats, sheets)│   named ranges)         │
├──────────────────┴──────────────────────────┤
│  commons-compress ZipFile (shared)          │
├─────────────────────────────────────────────┤
│  aalto-xml (StAX)  │  javax.xml.stream      │
└─────────────────────────────────────────────┘

FastExcel handles cell values, formulas, format strings, sheet enumeration, and streaming row iteration.

Supplemental StAX handles merged regions (<mergeCells> in sheet XML) and named ranges (<definedNames> in workbook.xml) by parsing the same xlsx zip with javax.xml.stream.XMLInputFactory (JDK built-in, zero reflection).

DSL features covered

Feature Source Method
Cell values FastExcel Cell.getValue/getText/asNumber/asBoolean
Formulas FastExcel Cell.getFormula()
Format strings FastExcel Cell.getDataFormatString() (requires ReadingOptions)
Sheet names FastExcel ReadableWorkbook.getSheets()
Sheet metadata FastExcel Iterate rows, count, read first row
Merged regions Supplemental StAX Parse <mergeCell ref="..."/> from sheet XML
Named ranges Supplemental StAX Parse <definedName> from workbook.xml

Dependencies

Dep Version Why
org.clojure/clojure 1.12.0 Runtime
org.dhatim/fastexcel-reader 0.19.0 Cell values, formulas, formats

Transitive: aalto-xml 1.3.4, commons-compress 1.28.0.

Build pipeline

  1. deps.edn :native-excel alias with replace-deps
  2. AOT-compile entry namespace
  3. Build uberjar with tools.build
  4. Compile with /usr/lib/jvm/java-21-graalvm/bin/native-image
  5. Test with dump/duo-maesn.xlsx

Native-image concerns

Separate zip access

FastExcel's OPCPackage is package-private. The supplemental parser opens its own ZipFile on the same file. This is fine — ZipFile is read-only and cheap.