noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

Rival Technologies Inc.

Market Research Online Communities: Best Practices & Examples
Joni Ernst

Ernst Bill Protecting Americans from Foreign-Directed Crimes Passes Senate
Nuveen Investments Inc.

Nuveen Australian Real Estate Debt Strategy Reaches A$650 Million with[...]

Information Technology

Oracle Corporation

05/02/2025 | Press release | Archived content

High-Performance Networking for AI Infrastructure at Scale

May 2, 2025 | 3 minute read

Ejaz Akram

Senior Cloud Solutions Architect

Ricardo Anda

Cloud Solution Architect - Multicloud & Networking

Oracle AI infrastructure provides best in class service for any AI workload or application. Compute, network and storage services work hand in hand to provide solid building blocks to build AI infrastructure for running any advanced AI applications. Oracle Cloud Infrastructure Kubernetes Engine's (OKE) tight integration gives scalability and containerization for better productivity and manageability, to excel in orchestrating containers seamlessly with the AI infrastructure.

In this blog we cover how Oracle Cloud Infrastructure (OCI) services are stitched together with OCI networking, built from ground up without compromising on performance and security features. We'll also discuss how OCI provides an optimised AI networking glue for customers to run large language model models (LLM), generative artificial intelligence (GenAI) applications, physics simulations and more.

Oracle AI Infrastructure provides one of the highest performance, lowest cost graphics processing unit (GPU) cluster technologies in the world with remote direct memory access (RDMA) as part of lossless, nonblocking network architecture, local non-volatile memory express (NVMe) storage for containerized applications, high performance scalable file system storage for models training / inferencing, and powerful bare metal compute underpinned by peripheral component interconnect express (PCIe) interfaces to drive all components together at scale.

AI Infra Networking

OKE provides full orchestration for scalability and manageability connected over the cloud fabric to provide seamless integration to other cloud services and the GPU/Kubernetes cluster.

OCI File Storage Service with Lustre at scale gives deep integration with OKE and the file system over the cloud fabric network, that also gives access to GPUs bare metal nodes with the power of thousands of GPUs in a super cluster.

A Kubernetes node directly connects to the GPUs over the PCIe/NVMe interface within the bare metal compute. At the same time, the NVIDIA link provides seamless communication between every GPU within each bare metal node of a cluster at over 900 Gbps of link speed.

GPU cluster nodes are knitted together with high throughput and low latency RDMA over a converged ethernet version 2 (RoCE v2) network. They provide staggering performance capabilities and can scale to meet large AI application demands from training models to inferencing. OCI AI Infrastructure includes:

Ultrafast and scalable networking

Custom-designed RDMA over converged ethernet protocol (RoCE v2)

2.5 to 9.1 microseconds of latency for cluster networking

Up to 3,200 Gb/sec of cluster network bandwidth

Up to 200 Gb/sec of front-end network bandwidth

3-tier Clos topology

Lossless network

Supercharged compute

Bare metal instances without any hypervisor overhead

Accelerated by NVIDIA Blackwell (GB200 NVL72, HGX B200), Hopper (H200, H100), and previous-generation GPUs

Option to use AMD MI300X GPUs

Data processing unit (DPU) for built-in hardware acceleration

Massive capacity and high-throughput storage

Local storage: up to 61.44 TB of NVMe SSD capacity

File storage: OCI File Storage with Lustre scale up to 20 petabytes (PB).

High sustained performance for each terabyte (TB) of provisioned capacity.

125 MBps per provisioned TB

250 MBps per provisioned TB

500 MBps per provisioned TB

1000 MBps per provisioned TB

Oracle AI infrastructure provides customers superior performance and highly scalable networking. Customers can access large GPU superclusters without compromising on network throughput, latency and security features while building application layers containerised with OKE for scale and simple manageability.

For more information and details on how to build better architecture with OCI AI infrastructure see the following links:

OCI AI Infrastructure

Blog: First Principles: Inside Zettascale OCI Superclusters for Next-gen AI

Blog: Generally Available: Fully Managed Lustre File Storage in the Cloud

Blog: Deploying an HPC cluster with RDMA network on OCI OKE and File Storage service mount

Blog: Announcing the General Availability of NVIDIA GPU Device Plugin Add-On for OKE

Oracle Corporation published this content on May 02, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on May 05, 2025 at 19:15 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at support@pubt.io

Back