Goodput optimization recipes

This document helps you optimize Goodput, the rate of useful data transferred, for your workloads. To achieve this optimization, we have curated reproducible Goodput recipes that use common machine learning (ML) frameworks and models. To review these recipes, see the AI Hypercomputer GitHub organization. The Goodput recipes were tested on clusters created using Cluster Toolkit.

Before you begin

Before you use the Goodput recipes in this document, complete the following steps if you haven't already:

Choose an accelerator that best suits your workload. See Choose a deployment strategy.
Select a consumption method based on your accelerator of choice. See Consumption options.
Create your cluster based on the type of accelerator selected. See Cluster deployment guides.

Recipes

The following reproducible Goodput recipes are available for pre-training on GKE clusters:

Recipe name	Accelerator	Model	Framework	Workload type
Llama3.1 70B - A3 Mega	A3 Mega	Llama3.1 70B	NeMo	Pre-training on GKE