noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

Technology

VMware LLC

09/09/2025 | News release | Distributed by Public on 09/09/2025 22:59

Deploy VMware Private AI on HGX servers with Broadcom Ethernet Networking

AI and Generative AI (Gen AI) require substantial infrastructure, and tasks like fine-tuning, customization, deployment, and querying can strain resources. Scaling up these operations becomes problematic without adequate infrastructure. Additionally, diverse compliance and legal requirements must be met across various industries and countries. Gen AI solutions must ensure access control, proper workload placement, and audit readiness to comply with these standards. To address these challenges, Broadcom introduced VMware Private AI to help customers run models next to their proprietary data. By combining innovations from both companies, Broadcom and NVIDIA aim to unlock the power of AI and unleash productivity with lower total cost of ownership (TCO).

Our technical white paper, "Deploy VMware Private AI on HGX servers with Broadcom Ethernet Networking," details the end-to-end deployment and configuration, focusing on DirectPath I/O (passthrough) GPUs and Thor 2 NICs with Tomahawk 5 Ethernet switch. This guide is essential for infrastructure architects, VCF administrators, and data scientists aiming to achieve optimal performance for their AI models in VCF.

What's Covered in the White Paper?

The white paper provides in-depth guidance on:

Broadcom Thor2 NICs and NVIDIA GPUs: Learn how to effectively integrate Broadcom NICs and NVIDIA GPUs within Ubuntu-based Deep Learning Virtual Machines (DLVMs) in a VMware Cloud Foundation (VCF) environment.
Network Configuration: Detailed instructions for configuring Ethernet Thor 2 NICs and Tomahawk 5 switches to enable RoCE (RDMA over Converged Ethernet) with NVIDIA GPUs, ensuring low-latency, high-throughput communication critical for AI workloads.
Benchmark Testing: Step-by-step procedures for running benchmark tests with essential collective communications libraries like NCCL, validating the efficiency of multi-GPU operations.
LLM Inference: Guidance on launching and benchmarking Large Language Model (LLM) inference using NVIDIA Inference Microservices (NIM) and vLLM, demonstrating real-world performance gains.

Key Highlights of the Solution

The solution detailed in the white paper focuses on VMware Private AI certified HGX systems, which typically feature 4x or 8x H100/H200 GPUs with NVSwitch and NVLink interconnects. The target environment is a VCF-based private cloud utilizing Broadcom 400G BCM957608 NICs and clustered NVIDIA H100 GPUs connected through Ethernet.

A crucial aspect of this deployment is the emphasis on DirectPath I/O for GPUs and Thor2 NICs, allowing for dedicated access to hardware resources and maximizing performance. The guide also covers vital aspects such as:

BIOS and Firmware Settings: Recommended configurations for HGX servers to unlock peak performance.
ESX Settings: Optimizing ESXi for GPU and network device passthrough, including proper hardware labeling and ACS (Access Control Services) configurations.
VM Settings: Customizing Deep Learning VMs (DLVMs) for DirectPath I/O, including static IP assignment and crucial advanced VM settings for power-on and performance enhancement.
Performance Validation:
- Detailed sections on running RDMA, GPUDirect RDMA with Perftest, and NCCL tests on multi-nodes, providing insights into expected bandwidth and latency.
- Benchmark the Llama-3.1-70b NIM's virtual and bare-metal performance using genai-perf, and get near bare-metal results.
- Use evalscope to benchmark the accuracy and stress-test performance of the state-of-the-art reasoning model gpt-oss-120b.

This comprehensive guide serves as an invaluable resource for anyone looking to deploy and optimize AI inference workloads on a robust, virtualized infrastructure using NVIDIA HGX servers and Broadcom Ethernet. By following the best practices outlined, organizations can build scalable and high-performing AI platforms that meet the demands of modern deep learning applications.

For a deeper dive into the technical specifics and deployment procedures, we encourage you to read the full white paper: https://www.vmware.com/docs/paif-hgx-brcm-eth

Ready to get started on your AI and ML journey? Check out these helpful resources:

Complete this form to contact us!
Read the VMware Private AI solution brief.
Learn more about VMware Private AI.
Connect with us on Twitter at @VMwareVCF and on LinkedIn at VMware VCF.

VMware LLC published this content on September 09, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 10, 2025 at 04:59 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]

Back

View original format