RDMA RoCE network profile

This page provides an overview of the Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) network profile in Google Cloud.

Overview

The RDMA RoCE network profile lets you create a Virtual Private Cloud (VPC) network that provides low-latency, high-bandwidth RDMA communication between the GPUs of VMs that are created in the network by using the RoCE v2 protocol. A VPC network that uses the RoCE network profile is called an RoCE VPC network.

RoCE VPC networks are useful for running AI workloads. For more information about running AI workloads in Google Cloud, see AI Hypercomputer overview.

The resource name of an RoCE network profile has the following format ZONE-vpc-roce—for example europe-west1-b-vpc-roce. To view specific network profile names, see List network profiles.

Supported zones

The RoCE network profile is available in the following zones:

  • europe-west1-b
  • us-central1-a
  • us-central1-b
  • us-east4-b
  • us-west1-c

You can only create an RoCE VPC network in a zone where an RoCE network profile is available.

Specifications

RoCE VPC networks have the following specifications:

  • NVIDIA ConnectX NICs: NVIDIA ConnectX NICs appear as MRDMA network interfaces in Google Cloud.

  • Zonal constraint: resources using an RoCE VPC network are limited to the same zone as the RoCE network profile associated with the RoCE VPC network during the RoCE network creation. This zonal limit has the following effects:

    • All instances that have network interfaces in an RoCE VPC network must be created in the zone that matches the zone of the RoCE network profile used by the RoCE VPC network.

    • All subnets created in an RoCE VPC network must be located in the region that contains the zone of the RoCE network profile used by the RoCE VPC network.

  • MRDMA network interfaces only: RoCE VPC networks only support MRDMA network interfaces (NICs), which are only available on the A3 Ultra, A4, and A4X machine series.

    All non-MRDMA NICs of a virtual machine (VM) must be attached to a regular VPC network.

  • 8896 byte default MTU: the default maximum transmission unit (MTU) of an RoCE VPC network is 8896 bytes. This allows the RDMA driver in the VM's guest operating system to use smaller MTUs if needed. For best performance, we recommend that you not change the default MTU.

  • Firewall differences: RoCE VPC networks use different implied firewall rules. They only support regional network firewall policies that have an RoCE firewall policy type. The set of parameters for rules within a supported regional network firewall policy are limited. For more information, see Cloud NGFW for RoCE VPC networks.

  • No VPC Flow Logs support: RoCE VPC networks don't support VPC Flow Logs, even if you enable VPC Flow Logs for a subnet in an RoCE VPC network.

  • No Connectivity Tests support: Connectivity Tests doesn't support RoCE VPC networks.

  • Other VPC features: RoCE VPC networks support a limited set of other VPC features. For more information, see the following Supported and unsupported features section.

Supported and unsupported features

The following table lists which VPC features are supported by RoCE VPC networks.

Feature Supported Network profile property Network profile property value Details
MRDMA NICs interfaceTypes MRDMA

RoCE VPC networks only support MRDMA NICs, not other types, such as GVNIC or VIRTIO_NET.

Multi-NIC in the same network allowMultiNicInSameNetwork MULTI_NIC_IN_SAME_NETWORK_ALLOWED

RoCE VPC networks support multi-NIC VMs, allowing two or more MRDMA VM NICs to be in the same RoCE VPC network. Each NIC must attach to a unique subnet in the RoCE VPC network.

See RoCE VPC network multi-NIC considerations.

IPv4-only subnets subnetworkStackTypes SUBNET_STACK_TYPE_IPV4_ONLY

RoCE VPC networks support IPv4-only subnets, including the same Valid IPv4 ranges as regular VPC networks.

RoCE VPC networks don't support dual-stack or IPv6-only subnets. For more information, see Types of subnets.

PRIVATE subnet purpose subnetworkPurposes SUBNET_PURPOSE_PRIVATE

RoCE VPC networks support regular subnets, which have a purpose attribute value of PRIVATE.

RoCE VPC networks don't support Private Service Connect subnets, proxy-only subnets, or Private NAT subnets. For more information, see Purposes of subnets.

GCE_ENDPOINT address purpose addressPurposes GCE_ENDPOINT

RoCE VPC networks support IP addresses with a purpose attribute value of GCE_ENDPOINT, which is used by internal IP addresses of VM NICs.

RoCE VPC networks don't support special purpose IP addresses, such as the SHARED_LOADBALANCER_VIP purpose. For more information, see the address resource reference.

Attachments from nic0 allowDefaultNicAttachment DEFAULT_NIC_ATTACHMENT_BLOCKED RoCE VPC networks don't support attaching the nic0 network interfaces of a VM to the network. Each MRDMA NIC attached to an RoCE VPC network must not be nic0.
External IP addresses for VMs allowExternalIpAccess EXTERNAL_IP_ACCESS_BLOCKED RoCE VPC networks don't support assigning external IP addresses to MDRMA VM NICs. Consequently, MDRMA VM NICs don't have internet access.
Dynamic Network Interfaces allowSubInterfaces SUBINTERFACES_BLOCKED RoCE VPC networks don't support Dynamic NICs.
Alias IP ranges allowAliasIpRanges ALIAS_IP_RANGE_BLOCKED RoCE VPC networks don't support assigning alias IP ranges to MRDMA NICs.
IP forwarding allowIpForwarding IP_FORWARDING_BLOCKED RoCE VPC networks don't support IP forwarding.
VM network migration allowNetworkMigration NETWORK_MIGRATION_BLOCKED RoCE VPC networks don't support migrating VM NICs between networks.
Auto mode allowAutoModeSubnet AUTO_MODE_SUBNET_BLOCKED RoCE VPC networks can't be auto mode networks. For more information, see subnet creation mode.
VPC Network Peering allowVpcPeering VPC_PEERING_BLOCKED RoCE VPC networks don't support connecting to other VPC networks using VPC Network Peering. Consequently, RoCE VPC networks don't support connecting to services using private services access.
Static routes allowStaticRoutes STATIC_ROUTES_BLOCKED RoCE VPC networks don't support static routes.
Packet Mirroring allowPacketMirroring PACKET_MIRRORING_BLOCKED RoCE VPC networks don't support Packet Mirroring.
Cloud NAT allowCloudNat CLOUD_NAT_BLOCKED RoCE VPC networks don't support Cloud NAT.
Cloud Router allowCloudRouter CLOUD_ROUTER_BLOCKED RoCE VPC networks don't support Cloud Routers and dynamic routes.
Cloud Interconnect allowInterconnect INTERCONNECT_BLOCKED RoCE VPC networks don't support Cloud Interconnect VLAN attachments.
Cloud VPN allowVpn VPN_BLOCKED RoCE VPC networks don't support Cloud VPN tunnels.
Network Connectivity Center allowNcc NCC_BLOCKED RoCE VPC networks don't support Network Connectivity Center. You can't add an RoCE VPC network as a VPC spoke to a Network Connectivity Center hub.
Cloud Load Balancing allowLoadBalancing LOAD_BALANCING_BLOCKED RoCE VPC networks don't support Cloud Load Balancing. Consequently, RoCE VPC networks don't support load balancer features, including Google Cloud Armor.
Private Google Access allowPrivateGoogleAccess PRIVATE_GOOGLE_ACCESS_BLOCKED RoCE VPC networks don't support Private Google Access.
Private Service Connect allowPsc PSC_BLOCKED RoCE VPC networks don't support Private Service Connect.

RoCE VPC network multi-NIC considerations

To support workloads that benefit from cross-rail GPU-to-GPU communication, RoCE VPC networks support VMs that have multiple MRDMA NICs in the network. Each MRDMA NIC must be in a unique subnet. Placing two or more MRDMA NICs in the same RoCE VPC network might affect network performance, including increased latency. MRDMA NICs use NCCL. NCCL attempts to align all network transfers, even for cross-rail communication. For example, it uses PXN to copy data through NVlink to a rail-aligned GPU before transferring it over the network.

What's next