Lyria | AI Music Generator

You can use Lyria to generate new instrumental music tracks from a text prompt that you provide in the Google Cloud console or send in a request to the Vertex AI Gemini API API.

Try Lyria on Vertex AI (Vertex AI Studio)

Lyria features and launch stage

Lyria offers generative AI features for music creation.

The following table describes features that are Generally Available (GA) to all users:

Feature Description Launch stage
Generate music from text Generate instrumental music tracks from descriptive text input. General Availability
Negative prompting Guide the model to avoid specific elements in the generated music. General Availability
Seed for reproducibility Use a seed to ensure consistent generation outputs for the same prompt and parameters. General Availability

Locations

A location is a region you can specify in a request to control where data is stored at rest. Lyria 2 is offered globally. For the list of available regions for Vertex AI services, see Generative AI on Vertex AI locations.

Performance and limitations

Limits Value
Modalities Text-to-music generation (instrumental only)
API calls (prompts per project per minute) Refer to [Quotas and limits documentation](/vertex-ai/generative-ai/docs/quotas) for standard Vertex AI Prediction limits.
Request latency (Generation time) Audio clips are typically generated within 10 to 20 seconds. Some requests may be queued during peak usage.
Maximum number of audio clips returned per request Up to the specified sample_count. (Max value for sample_count TBD, typically a small integer e.g., 1-4).
Audio length per clip 30 seconds
Supported returned audio format WAV audio at 48kHz sample rate
Input prompt language US English (en-us)

Responsible AI

Lyria generates instrumental music from text prompts. Lyria applies safety measures including content safety filters, recitation checking, and artist intent checks to prevent the input and generation of harmful or inappropriate content. SynthID watermarking is used on generated audio. Prompts that violate responsible AI guidelines may be blocked.

Lyria Vertex AI model versions and lifecycle

The Lyria model and version are the following:

Model name Identifier
Lyria 2 lyria-002

Generate music from text

You can generate novel instrumental music tracks using descriptive text as input.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Enable the Vertex AI API.

    Enable the API

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  5. Enable the Vertex AI API.

    Enable the API

  6. Set up authentication for your environment.

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      After installing the Google Cloud CLI, initialize it by running the following command:

      gcloud init

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Console

  1. In the Google Cloud console, go to the Vertex AI Studio > Media Studio page.

    Media Studio

  2. Select the Lyria model or the music generation option.

  3. Optional: In the Settings pane, configure the following settings:

    • Model: Choose lyria-002 if multiple versions are available.
    • Number of samples (sample_count): Adjust the number of audio clips to generate for the prompt. (Cannot be used with Seed).
    • Seed: Enter an integer for reproducible outputs. (Cannot be used with Number of samples).
  4. In the Write your prompt box, enter your text prompt in US English that describes the music to generate.

  5. Optional: In the Negative prompt box, enter terms or descriptions to exclude from the generated music.

  6. Click Generate.

  7. The generated audio clip(s) will be available for preview and download as WAV files.

REST

Use the predict method to send a music generation request. The response will directly contain the audio data, typically base64-encoded if the response is JSON.

For more information about lyria-002 model requests, see the lyria-002 model API reference.

To generate music, send a POST request to the model's predict endpoint.

Request:

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/lyria-002:predict \
  -d '{
    "instances": [
      {
        "prompt": "An uplifting and hopeful orchestral piece with a soaring string melody and triumphant brass.",
        "negative_prompt": " dissonant, minor key",
        "seed": 12345
      }
    ],
    "parameters": {
      // "sample_count": 1 // Use either seed or sample_count
    }
  }'

Response:

A successful request returns a JSON object containing the generated audio data. This should look something like this:

{
  "predictions": [
    {
      "audioContent": "BASE64_ENCODED_WAV_STRING_SAMPLE_1",
      "mimeType": "audio/wav"
    },
    {
      "audioContent": "BASE64_ENCODED_WAV_STRING_SAMPLE_2",
      "mimeType": "audio/wav"
    }
  ],
  "deployedModelId": "xxxxxxxxxxxxxxx",
  "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/lyria-002",
  "modelDisplayName": "Lyria 2"
}

You would then decode the audioContent (base64) to get the WAV audio file. Each clip is 30 seconds long.

What's next