FULLY HOMOMORPHIC ENCRYPTION

Compute on encrypted data. No decryption. No exposure. No trust required.

FHE lets a server run computations directly on ciphertext. The data is never decrypted, never seen, never reconstructable, yet the result, when decrypted by the user, is identical to running on plaintext.

THE PRIMITIVE

What is Fully Homomorphic Encryption?

A cryptographic scheme where operations on ciphertext correspond to operations on the underlying plaintext.

The defining property

ENCRYPT Enc(x) user-side
COMPUTE f( Enc(x) ) server-side
ENCRYPTED RESULT Enc( f(x) ) server output
DECRYPT f(x) user-side

Any function f, a matrix multiply, an attention layer, a similarity score, a SQL predicate, evaluates on encrypted inputs and produces an encrypted output that decrypts to the correct plaintext answer.

Encrypt locally

The user's secret key never leaves their device. Plaintext is encrypted before it touches the network.

Compute on ciphertext

The server runs additions and multiplications directly on encrypted values. The math works under the encryption.

Decrypt only the result

Only the user can decrypt the encrypted output. The server never learns the input, intermediates, or answer.

vs. other privacy approaches
TEEs: trust the hardware vendor MPC: needs every party online and chatty Federated: trust the aggregator FHE: trust no one. Provably.

THE 18-YEAR BOTTLENECK

Why FHE was too slow to use

The math has been known since Gentry's 2009 thesis. The problem was never feasibility, it was performance. For most of the last decade, FHE ran 10,000 to 1,000,000 times slower than plaintext.

One encrypted query over a small vector DB (cosine similarity + threshold)

Plaintext (GPU)
~5 ms
FHE on CPU (2018)
~6 hours
FHE on CPU (2026)
~19 minutes
Lattica FHE via HEAL (GPU)
~200 ms

Illustrative ranges for an encrypted nearest neighbor query (cosine similarity with a threshold filter) over a small vector index. Bars are log scaled for visibility.

Ciphertexts are huge

A single encrypted number is a high-degree polynomial with thousands of coefficients. A 4-byte weight becomes kilobytes of ciphertext.

Noise growth & bootstrapping

Every operation adds noise. Once it crosses a threshold an expensive 'bootstrapping' step must reset it, historically the slowest op in FHE.

CPU-only reference libraries

Open source FHE libraries were built for CPUs. A single encrypted inference call could take hours, fine for papers, unusable in production.

Non-linear functions are hard

FHE natively supports + and ×. The non-linearities at the heart of AI inference, activations, softmax, comparisons, normalizations, must be polynomial approximated. Done naively, this destroys accuracy or explodes cost.

WHY LATTICA

What changed

Lattica rebuilt the FHE stack from the kernels up around HEAL, our Homomorphic Encryption Abstraction Layer. Encrypted workloads compile to tensor operations and run on whatever hardware is fastest today, GPU, TPU, FPGA, or FHE specific ASIC, with no software rewrite.

Classical FHE

  • Built for CPUs, can't exploit modern accelerators
  • Hand written circuits per workload
  • Sequential, no real batching
  • Locked to one stack, rewrite the world for new hardware
  • Minutes to hours per query

Lattica with HEAL

  • FHE built for acceleration hardware, lowered to tensor ops
  • Compile models and workloads from a high level SDK
  • Massively parallel, free batching across ciphertexts
  • Hardware agnostic, GPU today, TPU/FPGA/ASIC tomorrow
  • Sub-second on production hardware

HEAL and acceleration

HEAL, hardware agnostic FHE

Our Homomorphic Encryption Abstraction Layer, think 'CUDA for FHE'. We compile CKKS and BGV primitives (NTT, key switching, rescaling, bootstrapping) to tensor operations that run on GPUs, TPUs, FPGAs, and FHE specific ASICs without rewriting the software.

FHE meets tensors

FHE is linear algebra at its core, polynomial rings, NTTs, matrix style key switching. That makes it a natural fit for tensor hardware. HEAL lowers encrypted workloads to tensor operations so the accelerator does what it is already best at.

Free batching

Modern accelerators are massively parallel. We pack many ciphertexts into a single batched computation and process them together at effectively no extra cost, so large models and deep inference graphs scale without a per-item penalty.

Approximations that hold

Carefully bounded polynomial approximations of activations, comparisons, and normalizations used across AI inference and analytics, preserving accuracy within fractions of a percent of plaintext.

10,000×+

Speedup over CPU reference implementations on encrypted AI inference workloads

<1%

Accuracy delta vs. plaintext baselines on production workloads

Zero

Plaintext exposure on Lattica infrastructure, by construction

THE STACK

Built end to end for encrypted data

Every layer, from GPU kernels up to the developer SDK, is designed for one job: making FHE fast enough to ship.

HEAL, Hardware Abstraction

Our 'CUDA for FHE'. CKKS and BGV primitives compile to tensor operations and run on GPU, TPU, FPGA, or FHE specific ASICs, no software rewrite when the hardware changes.

Compiler

FHE is linear algebra under the hood, a natural match for tensor hardware. The compiler lowers high level AI models and workloads to batched tensor ops that the accelerator runs in parallel.

Runtime

Scheduler that batches ciphertexts, amortizes bootstrapping, and pipelines accelerator work.

Key Management

Secret keys never leave the customer's device. Lattica only ever holds public evaluation keys and ciphertext. Plaintext is never seen, never stored, never reconstructable.