Measuring Encrypted AI and Vector Search with Workload-Level FHE Benchmarks

Key Takeaways

A new community standard for FHE benchmarking has arrived
The HomomorphicEncryption.org community launched a benchmarking suite that measures real application workloads, making it possible to compare FHE implementations on equal footing for the first time.
FHE is ready to be compared, not just celebrated
For years, fragmented benchmarks made performance claims hard to interpret. A shared framework with standardized workloads and public results tables means the field can now have honest, apples-to-apples conversations about what FHE can actually do.
The same performance advantage holds across workload types
Whether classifying encrypted images or running encrypted vector search, Lattica’s compute phase consistently outperforms the reference implementation by orders of magnitude.
Lattica delivers dramatic speedups
Across both ML Inference and encrypted vector search, Lattica’s compute phase runs over 3,000x faster than the reference implementation for small batches and over 60,000x faster for large batches.

FHE performance has long been measured across libraries, frameworks, academic projects, compilers, and hardware platforms. Library-specific benchmarks such as TFHE-rs and OpenFHE help developers evaluate implementation-level performance across core operations. Compiler and hardware efforts such as HEIR and HERACLES have pushed our FHE community toward more systematic evaluation.

Those efforts are valuable for engineering progress. However, their fragmentation limits comparability across implementations and makes performance claims harder to interpret. Results often differ by workload, scheme, implementation, security parameters, hardware, batching strategy, and what is included in timing.

A new FHE Benchmarking Suite, led by the HomomorphicEncryption.org community, is moving the field closer to application workloads with deployment-relevant metrics. It defines standardized, application-driven workloads and public result tables for evaluating encrypted computation across implementations and platforms.

The first workloads are:

ML inference - encrypted classification over MNIST images
Fetch-by-Similarity - encrypted vector search
Zn multiplication - encrypted modular integer multiplication

Apples to Apples

The new suite is a community-wide, workload-level benchmarking effort. It defines shared workloads, common reporting fields, and public results across implementations.

That matters because workload-level benchmarks answer a different question than primitive-level benchmarks. Instead of measuring isolated operations, they help application teams evaluate how FHE behaves in more realistic settings, including total latency, compute time, communication, memory, key sizes, encrypted input and output sizes, and quality metrics.

Benchmarking Encrypted AI and Vector Search

Lattica contributed to two of the suite’s application-level workloads: ML Inference and Fetch-by-Similarity.

Lattica’s Encrypted Compute Speedup vs. Reference

ML Inference Batch of 10,000 image queries

60,836x

ML Inference Batch of 1,000 image queries

31,160x

ML Inference Batch of 100 image queries

3,087x

Fetch-by-Similarity Search over a 50,000 record DB

249x

Compute speedup vs. reference, log scale

In the ML Inference benchmark, Lattica’s Small Batch result covers 100 encrypted inputs with 205 ms of server-side homomorphic compute time and its Medium Batch result covers 1,000 encrypted inputs with the same server-side homomorphic compute time. In other words, Lattica processes 10x more encrypted image queries in approximately the same compute time (due to GPU parallel computing), while running over 3,000x faster for a Small Batch and over 31,000x faster for a Medium Batch relative to the reference implementation. Our Large Batch result is over 60,000x faster, covering 10,000 encrypted inputs with 1.05 secs of server-side homomorphic compute time.

Runtime Added for 900 Additional Encrypted Image Queries

Reference

95.914 mins

Lattica

0 mins

~96 minute eliminated!

These are compute-phase speedups. They refer only to the server’s homomorphic computation phase and do not include other end-to-end stages such as client-side preparation, encryption, network transfer, and decryption.

Lattica’s Fetch-by-Similarity, Fetch Small result shows the same pattern in encrypted vector search. Lattica completes the encrypted compute phase in 285 ms, compared with the reference implementation’s 71 seconds, or about 1.2 minutes. That makes Lattica approximately 249x faster on the compute phase. As with ML Inference, the full end-to-end runtime includes additional stages such as setup, encryption, data movement, and decryption.

Remote execution is built into the benchmark design for workloads that reflect real client/server FHE deployments. In both ML Inference and Fetch-by-Similarity, the client-side flow prepares keys and encrypted inputs, the server performs computation over ciphertexts, and the client decrypts the result. This makes the benchmark more representative of deployed encrypted compute systems, where performance depends not only on the encrypted computation itself, but also on data movement, setup, encryption, and decryption.

Join Us!

Lattica’s contribution to this new benchmarking initiative reflects where we see the market heading: privacy-preserving AI, private retrieval, secure vector search, and data collaboration without exposing sensitive inputs.

Shared measurement gives the FHE ecosystem a stronger foundation for comparison. It helps developers understand implementation tradeoffs, helps application teams evaluate feasibility, and helps the broader market move from performance claims to measurable results.

The suite is open and public. We encourage teams working on FHE implementations to run their systems against the workloads and contribute results.

Measuring Encrypted AI and Vector Search with Workload-Level FHE Benchmarks

Key Takeaways

Apples to Apples

Benchmarking Encrypted AI and Vector Search

Lattica’s Encrypted Compute Speedup vs. Reference

Runtime Added for 900 Additional Encrypted Image Queries

2nd FHE Landscape Survey

Technical Whitepaper

Announcing HEAL

Measuring Encrypted AI and Vector Search with Workload-Level FHE Benchmarks

Key Takeaways

Apples to Apples

Benchmarking Encrypted AI and Vector Search

Lattica’s Encrypted Compute Speedup vs. Reference

Runtime Added for 900 Additional Encrypted Image Queries

Keep reading

2nd FHE Landscape Survey

Technical Whitepaper

Announcing HEAL