Run batch inference using GPUs on Cloud Run jobs

You can run batch inference with Meta's Llama 3.2-1b LLM and vLLM on a Cloud Run job, then write the results directly to Cloud Storage using Cloud Run volume mounts.

See a step-by-step instructional codelab at How to run batch inference on Cloud Run jobs.