빠른 시작: 생성형 AI 평가 서비스 워크플로

이 페이지에서는 Vertex AI SDK for Python을 사용하여 Gen AI Evaluation Service로 모델 기반 평가를 수행하는 방법을 보여줍니다.

시작하기 전에

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.
Gen AI Evaluation Service 종속 항목이 있는 Vertex AI SDK for Python을 설치합니다.
```
!pip install google-cloud-aiplatform[evaluation]
```
사용자 인증 정보를 설정합니다. Colaboratory에서 이 빠른 시작을 실행하는 경우 다음을 실행합니다.
```
from google.colab import auth
auth.authenticate_user()
```
다른 환경의 경우 Vertex AI에 인증을 참고하세요.

라이브러리 가져오기

라이브러리를 가져오고 프로젝트 및 위치를 설정합니다.

import pandas as pd

import vertexai
from vertexai.evaluation import EvalTask, PointwiseMetric, PointwiseMetricPromptTemplate
from google.cloud import aiplatform

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
EXPERIMENT_NAME = "EXPERIMENT_NAME"

vertexai.init(
    project=PROJECT_ID,
    location=LOCATION,
)

EXPERIMENT_NAME에는 소문자 영숫자 문자와 하이픈만 포함할 수 있으며 최대 127자(영문 기준)까지 가능합니다.

기준에 따른 평가 측정항목 설정

다음 측정항목 정의는 Fluency 및 Entertaining이라는 두 가지 기준을 기반으로 대규모 언어 모델에서 생성된 텍스트 품질을 평가합니다. 코드는 이러한 두 가지 기준을 사용하여 custom_text_quality라는 측정항목을 정의합니다.

custom_text_quality = PointwiseMetric(
    metric="custom_text_quality",
    metric_prompt_template=PointwiseMetricPromptTemplate(
        criteria={
            "fluency": (
                "Sentences flow smoothly and are easy to read, avoiding awkward"
                " phrasing or run-on sentences. Ideas and sentences connect"
                " logically, using transitions effectively where needed."
            ),
            "entertaining": (
                "Short, amusing text that incorporates emojis, exclamations and"
                " questions to convey quick and spontaneous communication and"
                " diversion."
            ),
        },
        rating_rubric={
            "1": "The response performs well on both criteria.",
            "0": "The response is somewhat aligned with both criteria",
            "-1": "The response falls short on both criteria",
        },
    ),
)

데이터 세트 준비

다음 코드를 추가하여 데이터 세트를 준비합니다.

responses = [
    # An example of good custom_text_quality
    "Life is a rollercoaster, full of ups and downs, but it's the thrill that keeps us coming back for more!",
    # An example of medium custom_text_quality
    "The weather is nice today, not too hot, not too cold.",
    # An example of poor custom_text_quality
    "The weather is, you know, whatever.",
]

eval_dataset = pd.DataFrame({
    "response" : responses,
})

데이터 세트를 사용한 평가 실행

평가를 실행합니다.

eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[custom_text_quality],
    experiment=EXPERIMENT_NAME
)

pointwise_result = eval_task.evaluate()

metrics_table Pandas DataFrame에서 각 응답의 평가 결과를 확인합니다.

pointwise_result.metrics_table

삭제

이 페이지에서 사용한 리소스 비용이 Google Cloud 계정에 청구되지 않도록 하려면 다음 단계를 수행합니다.

평가에서 생성된 ExperimentRun을 삭제합니다.

aiplatform.ExperimentRun(
    run_name=pointwise_result.metadata["experiment_run"],
    experiment=pointwise_result.metadata["experiment"],
).delete()