If you’ve ever waited hours—or even days—for a full‑wave FDTD simulation to finish, you know how quickly those delays can bottleneck a photonics project. Whether you’re tuning a waveguide or sweeping parameters for a metasurface, speed matters. That’s why more engineers and researchers are turning to GPU acceleration.

This post shows how much faster—and more cost‑effective—your simulations can run with GPUs in Ansys Lumerical. You’ll see benchmark results, learn why GPUs are a natural fit for FDTD, and pick up optimisation tips for both on‑prem and cloud hardware.

Overview of Finite‑Difference Time‑Domain (FDTD) Simulation

FDTD solves Maxwell’s equations directly in the time domain, making it highly flexible for broadband problems and complex 3‑D geometries. The trade‑off is computational cost: fine spatial resolution, long propagation times, and large meshes quickly balloon to billions of Yee cells—especially in integrated photonics and nanophotonics design.

Why GPU Computing for FDTD?

GPUs contain thousands of lightweight cores that can update Yee cells concurrently, whereas CPUs rely on a few heavyweight cores. Because each cell update is independent, FDTD maps almost perfectly onto GPU hardware, yielding order‑of‑magnitude speed‑ups and superior energy efficiency.

Whether you start with a workstation‑class RTX A6000, scale up to a server‑grade A100, or burst to multi‑GPU L40S nodes in the cloud, GPU acceleration can slash turnaround time from hours to minutes.

For supported features and best practices, see Ansys KB:

Getting started with running FDTD on GPU.

Benchmarking Methodology

To provide actionable insights, we benchmarked the performance using the Ansys metalens Lumerical file (~0.85 B Yee cells), which is attached to this page for reference. This model represents a realistic, computationally demanding scenario. Consistent settings were maintained across all simulations, employing an auto-shutoff criterion when the field energy decayed to 10⁻⁵. This consistency ensures performance comparisons focus purely on hardware capability.
We evaluated multiple hardware platforms reflecting diverse user scenarios, ranging from local workstations and enterprise HPC clusters to scalable cloud solutions.

Hardware Configurations Tested

CPU Queue (Burst Cloud): 63-core instance (c6i.32xlarge), 239 GiB RAM, tested via A-c6i_32xlarge_noHyperthread-ONDM-AL queue
Workstation GPU: NVIDIA RTX A6000, 48 GiB GPU memory, 111 GiB system memory, Windows 11
Server GPU: NVIDIA A100 80 GB PCIe, 500 GiB system memory, Rocky Linux 8.9
Burst Cloud – 4× L40S GPUs: NVIDIA L40S ×4, 739 GiB system memory, high-speed NVMe & 200 Gb/s fabric
Burst Cloud – 8× L40S GPUs: NVIDIA L40S ×8 on A-g6e_48xlarge, 1480 GiB system memory, completed simulation in 398.9 s (~6.6 min) at 59,691 Mnodes/s

Benchmark Results

Configuration	Simulation Wall Time	Speed-Up vs CPU	Solver Throughput	Peak GPU Memory
CPU-63 Cores (Burst Cloud)	62 min	–	1,985 Mnodes/s	–
1× RTX A6000	25 min	2.5×	7,277 Mnodes/s	32 GiB
1× A100 80GB	12 min	5.2×	13,200 Mnodes/s	32 GiB
4× L40S (Burst Cloud)	9 min	7.1×	29,074 Mnodes/s	8 GiB per GPU
8× L40S (Burst Cloud)	6.6 min	9.4×	59,691 Mnodes/s	7.9 GiB per GPU

The performance improvements are clear and compelling. Transitioning from traditional CPU clusters to modern GPUs yields dramatic speed‑ups. A single NVIDIA A100 GPU outperforms a large CPU cluster by over 5×, and the 8‑GPU Burst configuration cuts wall time to 6.6 minutes.

Key Observations

Cloud‑scale GPUs finish in under 10 minutes—even faster (≈ 6.6 min) when scaling to eight L40S devices.
A single A100 delivers more than 5× speed‑up versus a 63‑core CPU cluster.
Solver throughput scales almost linearly from four to eight GPUs (29 k → 60 k Mnodes/s), confirming strong parallel efficiency on Burst.

Practical Setup Considerations

Maintain Consistent Benchmark Criteria: Use the same auto-shutoff level (like 10⁻⁵) or fixed simulation time when comparing hardware, to ensure the performance data is accurate and fair.Use fixed auto-shutoff thresholds or predefined simulation times for accurate comparisons

Advantages of Ansys Cloud Burst

For an in-depth overview of how Ansys Lumerical Burst works—including queue management, licensing, and job submission—see the official Ansys documentation: Ansys Lumerical Burst – How it Works

Rapid Scalability: Instantly scale simulations across multiple GPUs without capital expenditures.
Flexible Licensing: Leverage Ansys Credits to use Ansys Cloud Burst Compute with Lumerical.
High-Performance Infrastructure: NVMe storage and 200 Gb/s network connections support heavy computational tasks.
Easy Accessibility: Access cloud resources securely via browser-based interfaces, eliminating local GPU driver management.
Massive Cost Savings: In real-world testing, the GPU configurations demonstrated significantly lower simulation costs—often under a few dollars—compared to CPU runs which can cost an order of magnitude more.

A downloadable benchmark file is available at the following link (downloadable benchmark file).

Watch the Simulation Walkthrough

This video demonstrates how to set up and run a GPU-accelerated FDTD simulation using Ansys Cloud Burst:

Next Steps

Ready to try GPU acceleration yourself?

Book a personalised demo with Ozen Engineering.

Ozen Engineering Expertise

Ozen Engineering Inc. leverages its extensive consulting expertise in CFD, FEA, optics, photonics, and electromagnetic simulations to achieve exceptional results across various engineering projects, addressing complex challenges such as photonic integrated circuit design, lens system simulation, and lightguide modeling using Ansys software.

We offer support, mentoring, and consulting services to enhance the performance and reliability of your electronics systems. Trust our proven track record to accelerate projects, optimize performance, and deliver high-quality, cost-effective results. For more information, please visit https://ozeninc.com.

If you want to learn more about our consulting services, please visit: https://www.ozeninc.com/consulting/

CFD: https://www.ozeninc.com/consulting/cfd-consulting/

FEA: https://www.ozeninc.com/consulting/fea-consulting/

Optics: https://www.ozeninc.com/consulting/optics-photonics/

Photonics: https://www.ozeninc.com/consulting/optics-photonics/

Electromagnetic Simulations: https://www.ozeninc.com/consulting/electromagnetic-consulting/

Thermal Analysis & Electronics Cooling: https://www.ozeninc.com/consulting/thermal-engineering-electronics-cooling/

Tags:

FDTD, Lumerical GPU, Ansys Cloud Burst

Post by Majid Ebnali Heidari
Jun 12, 2025 6:53:47 PM

Accelerating FDTD Simulations with GPU Computing: Cloud Burst Benchmarks & Performance Insights

Accelerating FDTD Simulations with GPU Computing: Cloud Burst Benchmarks & Performance Insights

Overview of Finite‑Difference Time‑Domain (FDTD) Simulation

Why GPU Computing for FDTD?

Benchmark Results

Key Observations

Practical Setup Considerations

Advantages of Ansys Cloud Burst

Next Steps

Ozen Engineering Expertise

Tags:

Are you looking for an engineering simulation solution?

Subscribe to our Blog

Accelerating FDTD Simulations with GPU Computing: Cloud Burst Benchmarks & Performance Insights

Accelerating FDTD Simulations with GPU Computing: Cloud Burst Benchmarks & Performance Insights

Overview of Finite‑Difference Time‑Domain (FDTD) Simulation

Why GPU Computing for FDTD?

Benchmark Results

Key Observations

Practical Setup Considerations

Advantages of Ansys Cloud Burst

Next Steps

Ozen Engineering Expertise

Tags:

Are you looking for an engineering simulation solution?

Subscribe to our Blog

Advantages of Ansys Cloud Burst