チュートリアル: Python SDK を使用して評価を行う

このページでは、Vertex AI SDK for Python を使用して Gen AI Evaluation Service でモデルベースの評価を行う方法について説明します。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
Gen AI Evaluation Service に依存する Vertex AI SDK for Python をインストールします。
```
!pip install google-cloud-aiplatform[evaluation]
```

認証情報を設定します。このクイックスタートを Colaboratory で実行している場合は、次のコマンドを実行します。

from google.colab import auth
auth.authenticate_user()

他の環境については、Vertex AI に対する認証をご覧ください。

ライブラリをインポートする

ライブラリをインポートして、プロジェクトとロケーションを設定します。

import pandas as pd

import vertexai
from vertexai.evaluation import EvalTask, PointwiseMetric, PointwiseMetricPromptTemplate
from google.cloud import aiplatform

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
EXPERIMENT_NAME = "EXPERIMENT_NAME"

vertexai.init(
    project=PROJECT_ID,
    location=LOCATION,
)

EXPERIMENT_NAME に使用できるのは、小文字の英数字とハイフンのみです。最大 127 文字です。

条件に基づいて評価指標を設定する

次の指標の定義では、大規模言語モデルから生成されたテキストの品質を、Fluency と Entertaining の 2 つの基準に基づいて評価します。このコードでは、次の 2 つの条件を使用して custom_text_quality という指標を定義しています。

custom_text_quality = PointwiseMetric(
    metric="custom_text_quality",
    metric_prompt_template=PointwiseMetricPromptTemplate(
        criteria={
            "fluency": (
                "Sentences flow smoothly and are easy to read, avoiding awkward"
                " phrasing or run-on sentences. Ideas and sentences connect"
                " logically, using transitions effectively where needed."
            ),
            "entertaining": (
                "Short, amusing text that incorporates emojis, exclamations and"
                " questions to convey quick and spontaneous communication and"
                " diversion."
            ),
        },
        rating_rubric={
            "1": "The response performs well on both criteria.",
            "0": "The response is somewhat aligned with both criteria",
            "-1": "The response falls short on both criteria",
        },
    ),
)

データセットを準備する

次のコードを追加して、データセットを準備します。

responses = [
    # An example of good custom_text_quality
    "Life is a rollercoaster, full of ups and downs, but it's the thrill that keeps us coming back for more!",
    # An example of medium custom_text_quality
    "The weather is nice today, not too hot, not too cold.",
    # An example of poor custom_text_quality
    "The weather is, you know, whatever.",
]

eval_dataset = pd.DataFrame({
    "response" : responses,
})

データセットで評価を実行する

評価を実行します。

eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[custom_text_quality],
    experiment=EXPERIMENT_NAME
)

pointwise_result = eval_task.evaluate()

metrics_table Pandas DataFrame で各レスポンスの評価結果を表示します。

pointwise_result.metrics_table

クリーンアップ

このページで使用したリソースについて、 Google Cloud アカウントに課金されないようにするには、次の手順を実施します。

評価によって作成された ExperimentRun を削除します。

aiplatform.ExperimentRun(
    run_name=pointwise_result.metadata["experiment_run"],
    experiment=pointwise_result.metadata["experiment"],
).delete()