If you’ve ever waited hours—or even days—for a full‑wave FDTD simulation to finish, you know how quickly those delays can bottleneck a photonics project. Whether you’re tuning a waveguide or sweeping parameters for a metasurface, speed matters. That’s why more engineers and researchers are turning to GPU acceleration.
This post shows how much faster—and more cost‑effective—your simulations can run with GPUs in Ansys Lumerical. You’ll see benchmark results, learn why GPUs are a natural fit for FDTD, and pick up optimisation tips for both on‑prem and cloud hardware.
Overview of Finite‑Difference Time‑Domain (FDTD) Simulation
FDTD solves Maxwell’s equations directly in the time domain, making it highly flexible for broadband problems and complex 3‑D geometries. The trade‑off is computational cost: fine spatial resolution, long propagation times, and large meshes quickly balloon to billions of Yee cells—especially in integrated photonics and nanophotonics design.
Why GPU Computing for FDTD?
GPUs contain thousands of lightweight cores that can update Yee cells concurrently, whereas CPUs rely on a few heavyweight cores. Because each cell update is independent, FDTD maps almost perfectly onto GPU hardware, yielding order‑of‑magnitude speed‑ups and superior energy efficiency.
Whether you start with a workstation‑class RTX A6000, scale up to a server‑grade A100, or burst to multi‑GPU L40S nodes in the cloud, GPU acceleration can slash turnaround time from hours to minutes.
For supported features and best practices, see Ansys KB:
Getting started with running FDTD on GPU.
Benchmarking Methodology
To provide actionable insights, we benchmarked the performance using the Ansys metalens Lumerical file (~0.85 B Yee cells), which is attached to this page for reference. This model represents a realistic, computationally demanding scenario. Consistent settings were maintained across all simulations, employing an auto-shutoff criterion when the field energy decayed to 10⁻⁵. This consistency ensures performance comparisons focus purely on hardware capability.
We evaluated multiple hardware platforms reflecting diverse user scenarios, ranging from local workstations and enterprise HPC clusters to scalable cloud solutions.
Hardware Configurations Tested
- CPU Cluster – 63‑core Intel Xeon (7 × 9 MPI ranks), 241 GiB RAM, CentOS 7.9.
- Workstation GPU – RTX A6000, 48 GiB VRAM, 111 GiB RAM, Windows 11.
- Server GPU – A100 80 GB PCIe, 500 GiB RAM, Rocky Linux 8.9.
- Burst Cloud – 4 × L40S, 739 GiB RAM, 200 Gb/s fabric.
Benchmark Results
Configuration |
Wall Time |
Speed‑Up vs CPU |
Throughput (Mnodes/s) |
Peak GPU Memory |
63× CPU Cluster |
62 min |
— |
1 985 |
— |
1× RTX A6000 |
25 min |
2.5× |
7 277 |
32 GiB |
1× A100 80 GB |
12 min |
5.2× |
13 200 |
32 GiB |
4× L40S (Burst) |
9 min |
7.1× |
29 074 |
8 GiB / GPU |
Key Observations
- Cloud‑scale GPUs finish in under 10 minutes**, turning an overnight task into a coffee‑break job.
- A single A100 delivers >5× speed‑up over a 63‑core cluster.
- Solver throughput scales nearly linearly from one to four GPUs.
Technical Optimisation Tips
- Balance MPI ranks vs threads. When launching simulations, aim for the right mix between MPI ranks (processes) and threads per rank. Too many ranks can cause memory bottlenecks; too few might underuse your GPU. The right balance helps maximize performance.
- Reduce Meshing Overhead: Meshing happens on the CPU before solving begins. If your scratch directory is on a fast NVMe SSD instead of slower storage, this step can be 30% faster.
- Maintain Consistent Benchmark Criteria: Use the same auto-shutoff level (like 10⁻⁵) or fixed simulation time when comparing hardware, to ensure the performance data is accurate and fair.Use fixed auto-shutoff thresholds or predefined simulation times for accurate comparisons.
Advantages of Ansys Cloud Burst
For an in-depth overview of how Ansys Lumerical Burst works—including queue management, licensing, and job submission—see the official Ansys documentation: Ansys Lumerical Burst – How it Works- Rapid Scalability: Instantly scale simulations across multiple GPUs without capital expenditures.
- Flexible Licensing: Leverage existing FDTD_Solutions_engine licenses seamlessly in the cloud.
- High-Performance Infrastructure: NVMe storage and 200 Gb/s network connections support heavy computational tasks.
- Easy Accessibility: Access cloud resources securely via browser-based interfaces, eliminating local GPU driver management.
- Massive Cost Savings: In real-world testing, the GPU configurations demonstrated significantly lower simulation costs—often under a few dollars—compared to CPU runs which can cost an order of magnitude more.
Next Steps
Ready to try GPU acceleration yourself?
- Book a personalised demo with Ozen Engineering.
A downloadable benchmark file is available at the following here (file):
Ozen Engineering Expertise
Ozen Engineering Inc. leverages its extensive consulting expertise in CFD, FEA, optics, photonics, and electromagnetic simulations to achieve exceptional results across various engineering projects, addressing complex challenges such as antenna design, signal integrity, electromagnetic interference (EMI), and electric motor analysis using Ansys software.
We offer support, mentoring, and consulting services to enhance the performance and reliability of your electronics systems. Trust our proven track record to accelerate projects, optimize performance, and deliver high-quality, cost-effective results. For more information, please visit https://ozeninc.com.
If you want to learn more about our consulting services, please visit: https://www.ozeninc.com/consulting/
CFD: https://www.ozeninc.com/consulting/cfd-consulting/
FEA: https://www.ozeninc.com/consulting/fea-consulting/
Optics: https://www.ozeninc.com/consulting/optics-photonics/
Photonics: https://www.ozeninc.com/consulting/optics-photonics/
Electromagnetic Simulations: https://www.ozeninc.com/consulting/electromagnetic-consulting/
Thermal Analysis & Electronics Cooling: https://www.ozeninc.com/consulting/thermal-engineering-electronics-cooling/
Jun 12, 2025 6:53:47 PM