User guide: Summarization automatic evaluation

Summarization automatic evaluation (autoevaluation) assesses the quality of summaries based on three metrics: accuracy, completeness, and adherence. This feature helps you to evaluate the quality of summarization generators, which is crucial when tuning custom section definitions or upgrading software that affect summarization models. For more details about the evaluation metrics, see summarization autoevaluation basics.

Before you begin

Create a summarization generator that generates candidate summaries for your evaluation.
The section definitions in the generator serve as the ground truth for evaluation.
For the final score to reflect generator quality, each section's instructions must match those in the generator.

Create a summarization autoevaluation request

Follow these steps to create a summarization generator.

Navigate to the Agent Assist console and sign in.

Go to console
Select your project.
Click Evaluations > New evaluation.
Enter a unique Display Name and choose a Generator that matches the candidate summary.
Click Next.
Choose an evaluation dataset with one of the following two options:
- Random sample of conversations from date range: If you've paid for the Agent Assist summarization feature, specify a Date range and maximum Number of conversations. Evaluations randomly select up to the maximum number of conversations and summaries from your history.
- Select a specific dataset: Choose a specific sample of conversations that you curated with Conversational Insights.
Choose a summary source from one of the following options:
- Generate only missing summaries using the specified generator. You pay the summary generation cost
- Generate new summaries for all conversations. You pay the summary generation cost
- Don't generate summaries. Instead, use existing summaries found with the dataset. You must pay a different cost.
Click Next.
Click Select folder, choose a Cloud Storage folder in a bucket to store your result, then click Ok. Autoevaluation writes to a CSV file and saves it in your Cloud Storage bucket.
Click Run. This is a long-running operation. Your new evaluation is listed in the table.

Evaluations

Follow these steps to view a list of summarization generators and completed evaluation session runs.

Navigate to the Agent Assist console and sign in. Go to console
Select your project.
Click Evaluations.
Optional: To switch the creation time order, click keyboard_arrow_down Down in the Create time column heading.

Filter the list

Follow these steps to filter your list to search for a specific evaluation:

Click Filter evaluations.

Choose one of the following options:

- **Display name**
- **Generator name**
- **Create time**

Enter the value of the parameter.

Delete an evaluation

Wait until your generator finishes running an evaluation.
Click delete Delete to clear the evaluation.

View evaluation results

Follow these steps to view detailed evaluation results from the Agent Assist console.

Navigate to the Agent Assist console and sign in. Go to console
Select your project.
Click Evaluations.
Click any row in the list to view Evaluation details, which includes the following information:
- Display name for the generator
- Number of conversations
- If any Conversational Insights datasets, name of the associated Conversational Insights dataset
- Run time
- Cloud Storage output
- Overall performance results for an entire dataset
- Conversation-level results for each conversation in a dataset

Conversation-level results

Agent Assist displays the conversation-level evaluation results in a table.

From the Evaluation details page, click any row of the Conversation-level results table to view detailed results, a summary, and a transcript of the conversation.
Click a section to view examples of that section from the summary.

Results comparison

Follow these steps to compare results across different evaluations:

Within the Agent Assist console, navigate to Evaluations.
Click an evaluation to view its details. This evaluation provides the basis for comparison and determines which metrics the comparison includes.
Click Compare > Add run to add other evaluations for comparison. You can compare up to five evaluations at one time. Each display name is highlighted in gray.
To remove an evaluation from the comparison, click close Close.

Limitation

Summarization autoevaluation doesn't support VPC Service Controls.

Available regions

Summarization autoevaluation is available only in three Agent Assist regions. For a list of regions available for each feature, see the regionalization table.