Accelerate AI and Graphics Performance

To transform with generative AI, enterprises need to deploy more compute resources at a larger scale - and ASUS offers multiple NVIDIA L40S servers, providing faster time to AI deployment with quicker access to GPU availability and better performance per dollar with powerful computing performance.

ASUS is a select NVIDIA OVX server system provider and experienced and trusted AI-solutions provider, with the knowledge and capabilities to bridge technology chasms and deliver optimized solutions to customers.
Learn More

Top 3 Reasons on
ASUS L40S Server Systems

  • the icon of Faster Deployment

    Faster Deployment

    Short lead time

  • the icon of Better Price-Performance

    Better Price-Performance

    2X better performance than A100

  • the icon of Higher Performance

    Higher Performance

    Powerful AI & Graphics

NVIDI L40S product image

NVIDIA L40S

The NVIDIA L40S GPU, based on the Ada Lovelace architecture, is the most powerful universal GPU for the data center, delivering breakthrough multi-workload acceleration for large language model (LLM) inference and training, graphics, and video applications.
Learn more
  • Fine Tuning LLM

    4hrs

    GPT-175B 860M Tokens

  • LLM Inference

    1.1X

    Performance vs. HGX A100

  • AI Inference

    1.5X

    Performance vs. A100 80GM SXM2

NVIDIA L40S Specifications

L40S A100 80GB SXM
Best For Universal GPU for Gen AI Highest Perf Multi-Node AI
GPU Architecture NVIDIA Ada Lovelace NVIDIA Ampere
FP64 N/A 9.7 TFLOPS
FP32 91.6 TFLOPS 19.5 TFLOPS
RT Core 212 TFLOPS N/A
TF32 Tensor Core 366 TFLOPS 312 TFLOPS
FP16/BF16 Tensor Core 733 TFLOPS 624 TFLOPS
FP8 Tensor Core 1466 TFLOPS N/A
INT8 Tensor Core 1466 TOPS 1248 TFLOPS
GPU Memory 48 GB GDDR6 80 GB HBM2e
GPU Memory Bandwidth 864 GB/s 2039 GB/s
L2 Cache 96 MB 40 MB
Media Engines 3 NVENC(+AV1)
3 NVDEC
4 NVJPEG
0 NVENC
5 NVDEC
5 NVJPEG
Power Up to 350 W Up to 400 W
Form Factor 2-slot FHFL 8-way HGX
Interconnect PCle Gen4 x 16: 64 GB/s PCle Gen4 x 16: 64 GB/s

NVIDIA L40S for LLM Training

Great solution for fine tuning, training small models and small/mid scale training up to 4K GPU.
Fine-Tuning Existing Models
(Time to Train 860M Tokens)
Expected Speedup w TE/FP8
HGX A100 L40S HGX H100
GPT-40B LoRA (8 GPU) 12 hrs. 1.7x 4.4x
GPT-175B LoRA (64 GPU) 6 hrs. 1.6x 4.3x


Training Small Models
(Time to Train 10B Tokens)
Expected Speedup w TE/FP8
HGX A100 L40S HGX H100
GPT-7B (8 GPU) 12 hrs. 1.7x 4.4x
GPT-13B (8 GPU) 6 hrs. 1.6x 4.3x


Training Foundation Models
(Time to Train 300B Tokens)
Expected Speedup w TE/FP8
HGX A100 L40S HGX H100
GPT-175B (256 GPU) 64 hrs. 1.4x 4.5x
GPT-175B (1K GPU) 16 hrs. 1.3x 4.6x
GPT-175B (4K GPU) 4 hrs. 1.2x 4.1x

Product for your solution

ESC8000-E11

8 GPUs, 4U, Dual-socket 4th Intel Xeon Scalable CPUs

ESC4000-E11

4 GPUs, 2U, Dual-socket 4th Intel Xeon Scalable CPUs

ESC4000-E10

4 GPUs, 2U, Dual-socket 3th Intel Xeon Scalable CPUs

ESC8000A-E12

8 GPUs, 4U, Dual-socket EPYC 9004 CPUs PCIe 5.0 switch solution

ESC8000A-E11

8 GPUs, 4U, Dual-socket EPYC 7003 CPUs

ESC4000A-E12

4 GPUs, 2U, Single-socket EPYC 9004 CPU

ESC4000A-E11

4 GPUs, 2U, Single-socket EPYC 7003 & 7002 CPUs