NVIDIA Mellanox MQM9790-NS2F InfiniBand Switch in Action: Low-Latency Interconnect Optimization

April 13, 2026

Neueste Unternehmensnachrichten über NVIDIA Mellanox MQM9790-NS2F InfiniBand Switch in Action: Low-Latency Interconnect Optimization

NVIDIA Mellanox MQM9790-NS2F InfiniBand Switch in Action: Low-Latency Interconnect Optimization for RDMA/HPC/AI Clusters

In AI training, high-performance computing (HPC) simulations, and large-scale distributed storage, network latency and bandwidth often dictate the upper limit of cluster efficiency. To help organizations break through this bottleneck, the MQM9790-NS2F InfiniBand switch from NVIDIA Mellanox is becoming a core interconnect component in many AI and HPC deployments. This article walks through a real-world upgrade of a large-scale AI training cluster, illustrating how this switch delivers on low-latency RDMA networks and measurable performance gains.

Background & Challenge: From Thousand-GPU to Ten-Thousand-GPU Network Pressure

A leading research institution previously operated a thousand-GPU cluster for large language model training and weather simulation. As model parameters grew from tens of billions to hundreds of billions, the existing 200Gb/s HDR InfiniBand network began experiencing congestion and rising communication overhead. Cross-node All-Reduce operations took significantly longer, and GPUs frequently idled while waiting for network transfers. Architects urgently needed a solution offering higher port density, finer-grained load balancing, and full compatibility with existing RDMA infrastructure.

After thorough evaluation, the team selected an NDR‑grade InfiniBand fabric based on the NVIDIA Mellanox MQM9790-NS2F. With 64 OSFP ports, each operating at 400Gb/s line rate, the switch perfectly matches the throughput demands of next-generation GPU servers.

Solution & Deployment: NDR Fabric + Lossless RDMA Network

In the new design, each GPU server is equipped with dual-port ConnectX‑7 adapters, uplinked to two leaf switches. At the core, the MQM9790-NS2F 400Gb/s NDR 64-port OSFP switches form a two‑layer Fat‑Tree topology using a non‑blocking Clos architecture. Adaptive routing and congestion control are enabled, leveraging native InfiniBand RDMA to transfer data directly from GPU memory to remote GPU memory, bypassing CPU and software stack overhead.

  • Port utilization & compatibility: Existing HDR adapters can operate at reduced speed, protecting prior investments. The MQM9790-NS2F compatible list covers mainstream GPU servers and storage systems, requiring no driver modifications during deployment.
  • Intelligent operations: Built‑in telemetry monitors link errors and congestion in real time, helping teams quickly isolate optical module or cable issues and drastically reducing mean time to repair.

Results & Benefits: Training Iteration Time Cut by 38%, Network Overhead Drops to 8%

After the upgrade, the institution ran comparative tests on production workloads. In a 100‑billion‑parameter GPT‑style pre‑training task, the cluster built on the MQM9790-NS2F InfiniBand switch reduced iteration time from 2.8 seconds to 1.73 seconds — a 38% improvement. Network communication’s share of total latency fell from 22% to 8%, meaning GPUs spent significantly more time on useful computation. Thanks to SHARPv3 in‑network computing inside the NDR switch, All‑Reduce bandwidth utilization nearly doubled.

On the storage side, low‑latency NVMe over InfiniBand boosted aggregate read/write bandwidth of the parallel file system by 2.3×. Checkpoint save and restore times shrank from 12 minutes to under 5 minutes. These figures are captured in internal test reports and align with the MQM9790-NS2F specifications baseline.

Summary & Outlook: NDR Interconnect as the Default Choice for Next‑Gen AI Infrastructure

This case clearly demonstrates that for large‑scale RDMA/HPC/AI clusters, adopting the MQM9790-NS2F InfiniBand switch solution effectively eliminates network congestion, boosts GPU utilization, and simplifies operations. For architects planning ten‑thousand‑GPU clusters, the MQM9790-NS2F datasheet is an essential reference for evaluating power, port density, and feature sets. The model is now in volume production; for MQM9790-NS2F price or MQM9790-NS2F for sale inquiries, please contact authorized NVIDIA partners. As future workloads drive demand toward 800Gb/s and beyond, the NDR switching platform will continue to play a pivotal role in unlocking compute potential.