To support you with running your workloads, we have curated a set of reproducible benchmark recipes that use some of the most common machine learning (ML) frameworks and models. These are stored in GitHub repositories. To access these repositories, see AI Hypercomputer GitHub organization.
Overview
Before you get started with these recipes, ensure that you have completed the following steps:
- Choose an accelerator that best suits your workload. See Choose a deployment strategy.
- Select a consumption method based on your accelerator of choice, see Consumption options.
Create your cluster based on the type of accelerator selected. See Cluster deployment guides.
Recipes
The following reproducible benchmark recipes are available for pre-training and inference on GKE clusters.
To search the catalog, you can filter by a combination of your framework, model, and accelerator.
- NeMo
- MaxText
- SGLang
- TensorRT-LLM
- vLLM
- DeepSeek R1 671B
- GPT3-175B
- Llama3 70B
- Llama3.1 70B
- Llama-3.1-405B
- Mixtral-8-7B
- A3 Ultra
- A3 Mega
- Inference
- Pre-training
Recipe name | Accelerator | Model | Framework | Workload type |
---|---|---|---|---|
DeepSeek R1 671B | A3 Mega | DeepSeek R1 671B | SGLang | Inference on GKE |
DeepSeek R1 671B | A3 Mega | DeepSeek R1 671B | vLLM | Inference on GKE |
DeepSeek R1 671B | A3 Ultra | DeepSeek R1 671B | SGLang | Inference on GKE |
DeepSeek R1 671B | A3 Ultra | DeepSeek R1 671B | vLLM | Inference on GKE |
GPT3-175B - A3 Mega | A3 Mega | GPT3-175B | NeMo | Pre-training on GKE |
Llama-3.1-405B | A3 Ultra | Llama-3.1-405B | TensorRT-LLM | Inference on GKE |
A3 Mega |
|
NeMo | Pre-training on GKE | |
Llama3.1 70B - A3 Ultra | A3 Ultra | Llama3.1 70B | MaxText | Pre-training on GKE |
Llama3.1 70B - A3 Ultra | A3 Ultra | Llama3.1 70B | NeMo | Pre-training on GKE |
Mixtral 8x7B - A3 Mega | A3 Mega | Mixtral 8x7B | NeMo | Pre-training on GKE |
Mixtral-8-7B - A3 Ultra | A3 Ultra | Mixtral-8-7B | MaxText | Pre-training on GKE |
Mixtral-8-7B - A3 Ultra | A3 Ultra | Mixtral-8-7B | NeMo | Pre-training on GKE |