Run LLM inference on Cloud Run GPUs with Hugging Face TGI
Stay organized with collections
Save and categorize content based on your preferences.
The following example shows how to run a backend service that runs the Hugging Face Text Generation Inference (TGI) toolkit, which is a toolkit for deploying and serving Large Language Models (LLMs), using Llama 3.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Run LLM inference on Cloud Run GPUs with Hugging Face TGI\n\nThe following example shows how to run a backend service that runs the [Hugging Face Text Generation Inference (TGI) toolkit](https://huggingface.co/docs/text-generation-inference), which is a toolkit for deploying and serving Large Language Models (LLMs), using Llama 3.\n\nSee the entire example at [Deploy Llama 3.1 8B with TGI DLC on Cloud Run](https://huggingface.co/docs/google-cloud/examples/cloud-run-tgi-deployment)."]]