Goodput optimization recipes

This document helps you optimize Goodput, the rate of useful data transferred, for your workloads. To achieve this optimization, we have curated reproducible Goodput recipes that use common machine learning (ML) frameworks and models. To review these recipes, see the AI Hypercomputer GitHub organization.

Before you begin

Before you use the Goodput recipes in this document, complete the following steps if you haven't already:

  1. Choose an accelerator that best suits your workload. See Choose a deployment strategy.
  2. Select a consumption method based on your accelerator of choice. See Consumption options.
  3. Create your cluster based on the type of accelerator selected. See Cluster deployment guides.

Recipes

The following reproducible Goodput recipes are available for pre-training on GKE clusters:

Recipe name Accelerator Model Framework Workload type
Llama3.1 70B - A3 Mega A3 Mega Llama3.1 70B NeMo Pre-training on GKE