Gemini 2.0 Flash-Lite

Gemini 2.0 Flash-Lite is our fastest Gemini 2.0 model, optimized for cost efficiency and low latency.

Try in Vertex AI View in Model Garden (Preview) Deploy example app

Note: To use the "Deploy example app" feature, you need a Google Cloud project with billing and Vertex AI API enabled.
Model ID gemini-2.0-flash-lite
Supported inputs & outputs
  • Inputs:
    Text, Code, Images, Audio, Video
  • Outputs:
    Text
Token limits
  • Maximum input tokens: 1,048,576
  • Maximum output tokens: 8,192 (default)
Capabilities
Usage types
Input size limit 500 MB
Technical specifications
Images
  • Maximum images per prompt: 3,000
  • Maximum image size: 7 MB
  • Maximum tokens per minute (TPM):
    • High/Medium/Default media resolution:
      • US/Asia: 6.7 M
      • EU: 2.6 M
    • Low media resolution:
      • US/Asia: 2.6 M
      • EU: 2.6 M
  • Supported MIME types:
    image/png, image/jpeg, image/webp
Documents
  • Maximum number of files per prompt: 3,000
  • Maximum number of pages per file: 1,000
  • Maximum file size per file for the API or Cloud Storage imports: 50 MB
  • Maximum file size per file for direct uploads through the console: 7 MB
  • Maximum tokens per minute (TPM) per project1:
    • US/Asia: 3.4 M
    • EU: 3.4 M
  • Supported MIME types:
Video
  • Maximum video length (with audio): Approximately 45 minutes
  • Maximum video length (without audio): Approximately 1 hour
  • Maximum number of videos per prompt: 10
  • Maximum tokens per minute (TPM):
    • High/Medium/Default media resolution:
      • US/Asia: 6.3 M
      • EU: 3.2 M
    • Low media resolution:
      • US/Asia: 3.2 M
      • EU: 3.2 M
  • Supported MIME types:
    video/x-flv, video/quicktime, video/mpeg, video/mpegs, video/mpg, video/mp4, video/webm, video/wmv, video/3gpp
Audio
  • Maximum audio length per prompt: Appropximately 8.4 hours, or up to 1 million tokens
  • Maximum number of audio files per prompt: 1
  • Speech understanding for: Audio summarization, transcription, and translation
  • Maximum tokens per minute (TPM):
    • US/Asia: 3.5 M
    • EU: 3.5 M
  • Supported MIME types:
Parameter defaults
  • Temperature: 0.0-2.0 (default 1.0)
  • topP: 0.0-1.0 (default 0.95)
  • topK: 64 (fixed)
  • candidateCount: 1–8 (default 1)
Supported regions

Model availability

(Includes dynamic shared quota & Provisioned Throughput)

  • Global
    • global
  • United States
    • us-central1
    • us-east1
    • us-east4
    • us-east5
    • us-south1
    • us-west1
    • us-west4
  • Europe
    • europe-central2
    • europe-north1
    • europe-southwest1
    • europe-west1
    • europe-west4
    • europe-west8
    • europe-west9

ML processing

  • United States
    • Multi-region
  • Europe
    • Multi-region
See Data residency for more information.
Knowledge cutoff date June 2024
Versions
  • gemini-2.0-flash-lite-001
    • Launch stage: Generally available
    • Release date: February 25, 2025
    • Discontinuation date: February 25, 2026
Security controls
Online prediction
  • Data residency (at rest) Supported
  • Customer-managed encryption keys (CMEK) Supported
  • VPC Service Controls Supported
  • Access Transparency (AXT) Supported
Batch prediction
  • Data residency (at rest) Supported
  • Customer-managed encryption keys (CMEK) Not supported
  • VPC Service Controls Supported
  • Access Transparency (AXT) Not supported
Tuning
  • Data residency (at rest) Not supported
  • Customer-managed encryption keys (CMEK) Not supported
  • VPC Service Controls Not supported
  • Access Transparency (AXT) Not supported
See Security controls for more information.
Supported languages See Supported languages.
Pricing See Pricing.