Today we’re publicly releasing HEAL: the interface and development suite hardware teams use to plug their accelerators into Lattica’s fully homomorphic encryption (FHE) stack and run real encrypted workloads on it.
Practical FHE is a full-stack problem. The cryptography, the compiler, the runtime, and the silicon all have to line up. HEAL is the contract between Lattica’s stack and the accelerator underneath it, and the toolkit that lets a hardware team go from an empty function stub to a working backend without first becoming CKKS experts.
What HEAL is
A small, well-defined surface area. Functions take pointers to input and output tensors; the host allocates memory and the device executes. On top of that interface sits a JSON transcript format: an ordered sequence of function calls plus the encrypted inputs they operate on. The HEAL runtime walks the transcript and dispatches each call to whichever implementation you’ve wired up.
Tensor-shaped
Integer tensors (int32 / int64) with explicit shapes and strides. Non-contiguous layouts are first-class, so your hardware’s preferred memory pattern isn’t an afterthought.
Transcript-driven
Workloads are recorded as JSON transcripts. Reproducible, inspectable, and configurable per-function for which implementation to dispatch.
Pointer-based
Functions never manage memory. The host allocates, the device executes. The model maps cleanly onto how real accelerator APIs already work.
What ships in the suite
HEAL is designed so a hardware team can make progress on day one and keep making progress for months without ever being blocked by us.
Reference CPU implementation
Every function HEAL expects, already implemented in C++ as a working dummy on CPU. Run a full encrypted transcript end-to-end before writing a single line of hardware code, then swap in your kernels one function at a time.
Unit-test packs
Standardized validation suites on GitHub. Deterministic inputs, expected outputs, per-function pass/fail and performance logs. As you implement a function, the test pack tells you unambiguously: correct or not, and how fast.
The HEAL runtime
The same runtime we use internally to execute encrypted workloads. Point it at a transcript and a backend, get real, measurable inference numbers on your silicon.
Example transcripts
Real encrypted workloads captured as transcripts. Benchmark against meaningful end-to-end scenarios, not just isolated micro-kernels.
Specification & docs
Every function documented with its semantics, tensor contracts, and memory model. Published openly at healdocs.lattica.ai.
Direct engineering support
We work alongside hardware partners through integration: triaging test failures, profiling kernels, and, where it makes sense, extending HEAL with new high-level functions that map cleanly onto your architecture.
Built so the hardware can shine
The default HEAL surface is intentionally low-level: NTTs, modular arithmetic, basis conversions, the building blocks every FHE backend needs. That’s a fine starting point, but the most interesting accelerators don’t want to be told how to compose those primitives. They want to own the composition.
So HEAL lets hardware teams register their own higher-level functions. Instead of executing key-switch or mod-switch as a long chain of low-level ops dispatched one at a time, a backend can expose a single fused entry point and run the whole operation natively. The transcript calls the high-level function; the runtime dispatches it directly to your kernel. Fewer round-trips, fewer intermediate allocations, and full freedom to schedule the work the way your silicon prefers.
-
Start from the low-level spec Implement the primitive functions HEAL ships with and you have a correct, working backend.
-
Promote hot paths to fused ops Replace sequences like key-switch or mod-switch with a single high-level function your hardware executes end-to-end.
-
Keep data resident on device High-level functions let your backend hold intermediates in its own memory hierarchy instead of round-tripping through the host between every primitive.
-
Co-design new primitives with us When a fused op maps cleanly onto your architecture, we add it to the spec together so the rest of the stack can target it.
A hardware team isn’t forced to express its design through someone else’s decomposition. The accelerator gets to define what an “operation” means, and the rest of Lattica’s stack adapts to it.
How a typical integration goes
-
Understand the architecture
Read the core concepts: tensors, transcripts, host/device memory model. A few hours, not weeks.
-
Implement functions
Pick the subset that matters for your hardware. Use the C++ reference as a behavioral spec.
-
Run unit tests
Pull the test pack from GitHub, point it at your build, and validate function-by-function.
-
Execute the runtime
Run a full encrypted workload transcript on your hardware and measure end-to-end performance.
Who this is for
GPU vendors, FPGA shops, and custom-silicon teams who see encrypted AI as a workload worth optimizing for. If you’re building an accelerator and want a concrete, well-scoped way to prove it on FHE workloads, HEAL is the fastest path from idea to measured numbers.
“We didn’t want hardware partners to have to become FHE experts to integrate. HEAL is the contract that makes that unnecessary.”
— Lattica engineering team
Get started
Dig into the architecture on our HEAL page, browse the developer docs, or grab the unit-test packs from GitHub. If you’re a hardware team interested in a deeper integration, talk to us directly. We’d rather work with you than hand you a PDF.