快速入门:Gen AI Evaluation Service 工作流
本页介绍了如何使用 Python 版 Vertex AI SDK 通过 Gen AI Evaluation Service 执行基于模型的评估。
准备工作
-
Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Google Cloud project.
安装包含 Gen AI Evaluation Service 依赖项的 Vertex AI SDK for Python:
!pip install google-cloud-aiplatform[evaluation]
设置凭据。如果您是在 Colaboratory 中运行本快速入门,请运行以下命令:
from google.colab import auth auth.authenticate_user()
对于其他环境,请参阅向 Vertex AI 进行身份验证。
导入库
导入您的库并设置项目和位置。
import pandas as pd import vertexai from vertexai.evaluation import EvalTask, PointwiseMetric, PointwiseMetricPromptTemplate from google.cloud import aiplatform PROJECT_ID = "PROJECT_ID" LOCATION = "LOCATION" EXPERIMENT_NAME = "EXPERIMENT_NAME" vertexai.init( project=PROJECT_ID, location=LOCATION, )
请注意,EXPERIMENT_NAME
只能包含小写字母数字字符和连字符,最多不得超过 127 个字符。
根据您的条件设置评估指标
以下指标定义根据两个条件(Fluency
和 Entertaining
)评估大型语言模型生成的文本质量。该代码使用以下两个条件定义名为 custom_text_quality
的指标:
custom_text_quality = PointwiseMetric(
metric="custom_text_quality",
metric_prompt_template=PointwiseMetricPromptTemplate(
criteria={
"fluency": (
"Sentences flow smoothly and are easy to read, avoiding awkward"
" phrasing or run-on sentences. Ideas and sentences connect"
" logically, using transitions effectively where needed."
),
"entertaining": (
"Short, amusing text that incorporates emojis, exclamations and"
" questions to convey quick and spontaneous communication and"
" diversion."
),
},
rating_rubric={
"1": "The response performs well on both criteria.",
"0": "The response is somewhat aligned with both criteria",
"-1": "The response falls short on both criteria",
},
),
)
准备数据集
添加以下代码以准备数据集:
responses = [
# An example of good custom_text_quality
"Life is a rollercoaster, full of ups and downs, but it's the thrill that keeps us coming back for more!",
# An example of medium custom_text_quality
"The weather is nice today, not too hot, not too cold.",
# An example of poor custom_text_quality
"The weather is, you know, whatever.",
]
eval_dataset = pd.DataFrame({
"response" : responses,
})
使用数据集运行评估
运行评估:
eval_task = EvalTask(
dataset=eval_dataset,
metrics=[custom_text_quality],
experiment=EXPERIMENT_NAME
)
pointwise_result = eval_task.evaluate()
在 metrics_table
Pandas DataFrame 中查看每个回答的评估结果:
pointwise_result.metrics_table
清理
为避免因本页中使用的资源导致您的 Google Cloud 账号产生费用,请按照以下步骤操作。
删除评估创建的 ExperimentRun
:
aiplatform.ExperimentRun(
run_name=pointwise_result.metadata["experiment_run"],
experiment=pointwise_result.metadata["experiment"],
).delete()