The newly formed UALink Consortium brings together major tech companies to address the vital ...
[+] technical challenge of GPU-to-GPU connectivity in datacenters. The Ultra Accelerator Link Consortium has recently incorporated, giving companies the opportunity to join, and it has announced that the UALink 1.0 specification will be available for public consumption in Q1 2025.
Included in the Consortium are its “Promoter” members, including AMD, Astera Labs, AWS, Cisco, Google, HPE, Intel, Meta and Microsoft. The UALink Consortium aims to deliver specifications and standards that allow industry players to develop high-speed interconnects for AI accelerators at scale. In other words, it addresses the GPU clusters that train the largest of large language models and solve the most complex challenges.
Much like Nvidia developed its proprietary NVLink to address GPU-to-GPU connectivity, UALink looks to broaden this capability across the industry. The key to the UALink Consortium is the partnership among the biggest technology companies—many of whom compete with one another—to better enable the future of AI and other accelerator-dependent workloads. Let’s explore this initiative and what it could mean for the market.
How We Got Here — The CPU Challenge High-performance computing was perhaps the first workload classification that highlighted that CPUs were not always the best processor for the job. The massive parallelism and high data throughput of GPUs enable tasks like deep learning, genomic sequencing and big data analytics to perform far better than they would on a CPU. These architectural differences and programmability have made GPUs the accelerator of choice for AI.
In particular, the training of LLMs that double in size every six months or so happens far more efficiently and much faster on GPUs. However, in a server architecture, the CPU (emphasis on the “C”— central ) is the brain of the server, with all functions routing through it. If a GPU is to be used for a function, it connects to a CPU over PCIe.
Regardless of how fast that GPU can perform a function, system performance is limited by how quickly a CPU can route traffic to and from it. This limitation becomes glaringly noticeable as LLMs and datasets become ever larger, requiring a large number of GPUs to train them in concert in the case of generative AI. This is especially true for hyperscalers and other large organizations training AI frontier models.
Consider a training cluster with thousands of GPUs spread across several racks, all dedicated to training GPT-4, Mistral or Gemini 1.5. The amount of latency introduced into the training period is considerable.
Celebrities React To Trump Win: Roseanne Barr Says She’d Work For Trump, Billie Eilish Says He ‘Hates Women’ FBI Warns Gmail, Outlook Users Of $100 Government Emergency Data Email Hack New Chrome, Safari, Edge, Firefox Warning—Do Not Use These Websites This is not just a training issue, however. As enterprise IT organizations begin to operationalize generative AI, performing inference at scale is also challenging. In the case of AI and other demanding workloads such as HPC, the CPU can significantly limit system and cluster performance.
This can have many implications in terms of performance, cost and accuracy. Introducing UALink The UALink Consortium was formed to develop a set of standards that enables accelerators to communicate with one another (bypassing the CPU) in a fast, low-latency way—and at scale. The specification defines an I/O architecture that enables speeds of up to 200 Gbps (per lane), scaling up to 1,024 AI accelerators.
This specification delivers considerably better performance than that of Ethernet and connects considerably more GPUs than Nvidia’s NVLink. To better contextualize UALink and its value, think about connectivity in three ways: front-end network, scale-up network and scale-out network. Generally, the front-end network is focused on connecting the hosts to the broader datacenter network for connectivity to compute and storage clusters as well as the outside world.
This network is connected through Ethernet NICs on the CPU. The back-end network is focused on GPU-to-GPU connectivity. This back-end network is composed of two components: the scale-up fabric and the scale-out fabric.
Scale-up connects hundreds of GPUs at the lowest latency and highest bandwidth (which is where UALink comes in). Scale-out is for scaling AI clusters beyond 1,024 GPUs—to 10,000 or 100,000. This is enabled using scale-out NICs and Ethernet and is where Ultra Ethernet will play.
When thinking about a product like the Dell PowerEdge XE9680, which can support up to eight AMD Instinct or Nvidia HGX GPUs, a UALink-enabled cluster would support well over 100 of these servers in a pod where GPUs would have direct, low-latency access to one another. As an organization’s needs grow, Ultra Ethernet Consortium-based connectivity can be used for scale-out. In 2023, industry leaders including Broadcom, AMD, Intel and Arista formed the UEC to drive performance, scale and interoperability for bandwidth-hungry AI and HPC workloads.
In fact, AMD just launched the first UEC-compliant NIC, the Pensando Pollara 400, a few weeks ago. (Our Moor Insights & Strategy colleague Will Townsend has written about it in detail .) Getting back to UALink, it is important to understand that this is not simply some pseudo-standard being used to challenge the dominance of Nvidia and NVLink.
This is a real working group developing a genuine standard with actual solutions being designed. In parallel, we see some of the groundwork being laid by UALink Promotor companies like Astera Labs, which recently introduced its Scorpio P-Series and X-Series fabric switches. While the P-Series switch enables GPU-to-CPU connectivity over PCIe Gen 6 (which can be customized), the X-Series is a switch aimed at GPU-to-GPU connectivity.
Given that the company has already built the underlying fabric, one can see how it could support UALink sometime soon after the specification is published. It is important to understand that UALink is agnostic about accelerators and the fabrics, switches, retimers and other technology that enable accelerator-to-accelerator connectivity. It doesn’t favor AMD over Nvidia, nor does it favor Astera Labs over, say, Broadcom (if that company chooses to contribute).
It’s about building an open set of standards that favors innovation across the ecosystem. While the average enterprise IT administrator, or even CIO, won’t care much about UALink, they will care about what it will deliver to their organization: faster training and inference on platforms that consume less power and can be somewhat self-managed and tuned. Putting a finer point on it—faster results at lower cost.
What About Nvidia And NVLink? It’s easy to regard what UALink is doing as an attempt to respond to Nvidia’s stronghold. And at some level, it certainly is. However, in the bigger picture this is less about copying what Nvidia does and more about ensuring that critical capabilities like GPU-to-GPU connectivity don’t fall under the purview of one company with a vested interest in optimizing for its own GPUs.
It will be interesting to watch how server vendors such as Dell, HPE, Lenovo and others choose to support both UALink and NVLink. (Lenovo is a “Contributor” member of the UALink Consortium, but Dell has not joined as yet.) NVLink uses a proprietary signaling interconnect to support Nvidia GPUs.
Alternatively, UALink will support accelerators from a range of vendors, with switching and fabric from any vendor that adheres to the UALink standard. There is a real and significant cost to these server vendors—from design to manufacturing and through the qualification and sales/support process. On the surface, it’s easy to see where UALink would appeal to, say, Dell or HPE.
However, there is a market demand for Nvidia that cannot and will not be ignored. Regardless of one’s perspective on the ability of “the market” to erode Nvidia’s dominance, we can all agree that its dominance will not fade fast. Cooperating For Better Datacenter Computing The UALink Consortium (and forthcoming specification) is a significant milestone for the industry as the challenges surrounding training AI models and operationalizing data become increasingly complex, time-consuming and costly.
If and when we see companies like Astera Labs and others develop the underlying fabric and switching silicon to drive accelerator-to-accelerator connectivity, and when companies like Dell and HPE build platforms that light all of this up, the downmarket impact will be significant. This means the benefits realized by hyperscalers like AWS and Meta will also benefit enterprise IT organizations that look to operationalize AI across business functions. Ideally, we would have a market with one standard interconnect specification for all accelerators—all GPUs.
And maybe at some point that day will come. But for now, it's good to see rivals like AMD and Intel or Google and AWS coalesce around a standard that is beneficial to all..
Technology
Digging Into The Ultra Accelerator Link Consortium
The newly formed UALink Consortium brings together major tech companies to address the vital technical challenge of GPU-to-GPU connectivity in datacenters.