Key Takeaways
-
A new community standard for FHE benchmarking has arrived
The HomomorphicEncryption.org community launched a benchmarking suite that measures real application workloads, making it possible to compare FHE implementations on equal footing for the first time.
-
FHE is ready to be compared, not just celebrated
For years, fragmented benchmarks made performance claims hard to interpret. A shared framework with standardized workloads and public results tables means the field can now have honest, apples-to-apples conversations about what FHE can actually do.
-
The same performance advantage holds across workload types
Whether classifying encrypted images or running encrypted vector search, Lattica’s compute phase consistently outperforms the reference implementation by orders of magnitude.
-
Lattica delivers dramatic speedups
Across both ML Inference and encrypted vector search, Lattica’s compute phase runs over 3,000x faster than the reference implementation for small batches and over 60,000x faster for large batches.
FHE performance has long been measured across libraries, frameworks, academic projects, compilers, and hardware platforms. Library-specific benchmarks such as TFHE-rs and OpenFHE help developers evaluate implementation-level performance across core operations. Compiler and hardware efforts such as HEIR and HERACLES have pushed our FHE community toward more systematic evaluation.
Those efforts are valuable for engineering progress. However, their fragmentation limits comparability across implementations and makes performance claims harder to interpret. Results often differ by workload, scheme, implementation, security parameters, hardware, batching strategy, and what is included in timing.
A new FHE Benchmarking Suite, led by the HomomorphicEncryption.org community, is moving the field closer to application workloads with deployment-relevant metrics. It defines standardized, application-driven workloads and public result tables for evaluating encrypted computation across implementations and platforms.
The first workloads are:
- ML inference - encrypted classification over MNIST images
- Fetch-by-Similarity - encrypted vector search
- Zn multiplication - encrypted modular integer multiplication
Apples to Apples
The new suite is a community-wide, workload-level benchmarking effort. It defines shared workloads, common reporting fields, and public results across implementations.
That matters because workload-level benchmarks answer a different question than primitive-level benchmarks. Instead of measuring isolated operations, they help application teams evaluate how FHE behaves in more realistic settings, including total latency, compute time, communication, memory, key sizes, encrypted input and output sizes, and quality metrics.
Benchmarking Encrypted AI and Vector Search
Lattica contributed to two of the suite’s application-level workloads: ML Inference and Fetch-by-Similarity.
Lattica’s Encrypted Compute Speedup vs. Reference
Compute speedup vs. reference, log scale
In the ML Inference benchmark, Lattica’s Small Batch result covers 100 encrypted inputs with 205 ms of server-side homomorphic compute time and its Medium Batch result covers 1,000 encrypted inputs with the same server-side homomorphic compute time. In other words, Lattica processes 10x more encrypted image queries in approximately the same compute time (due to GPU parallel computing), while running over 3,000x faster for a Small Batch and over 31,000x faster for a Medium Batch relative to the reference implementation. Our Large Batch result is over 60,000x faster, covering 10,000 encrypted inputs with 1.05 secs of server-side homomorphic compute time.
Runtime Added for 900 Additional Encrypted Image Queries
~96 minute eliminated!
These are compute-phase speedups. They refer only to the server’s homomorphic computation phase and do not include other end-to-end stages such as client-side preparation, encryption, network transfer, and decryption.
Lattica’s Fetch-by-Similarity, Fetch Small result shows the same pattern in encrypted vector search. Lattica completes the encrypted compute phase in 285 ms, compared with the reference implementation’s 71 seconds, or about 1.2 minutes. That makes Lattica approximately 249x faster on the compute phase. As with ML Inference, the full end-to-end runtime includes additional stages such as setup, encryption, data movement, and decryption.
Remote execution is built into the benchmark design for workloads that reflect real client/server FHE deployments. In both ML Inference and Fetch-by-Similarity, the client-side flow prepares keys and encrypted inputs, the server performs computation over ciphertexts, and the client decrypts the result. This makes the benchmark more representative of deployed encrypted compute systems, where performance depends not only on the encrypted computation itself, but also on data movement, setup, encryption, and decryption.