Run LLM inference on Cloud Run GPUs with vLLM (services)

The following codelab shows how to run a backend service that runs vLLM, which is an inference engine for production systems, along with Google's Gemma 2, which is a 2 billion parameters instruction-tuned model.

See the entire codelab at Run LLM inference on Cloud Run GPUs with vLLM.