Safety classifiers for Claude in Vertex AI

Vertex AI includes a safety classifier that filters requests to all hosted Anthropic models that may contain images that include Child Sexual Abuse Material (CSAM). Vertex AI's CSAM safety classifier is separate from the Trust and Safety (T&S) filters shipped directly with Anthropic's models.

This document covers which parts of the request and response that the CSAM safety classifier filters and what happens when the classifier blocks a request.

Safety and content filters act as a barrier to prevent harmful output, but they don't directly influence the model's behavior. To learn more about model steerability, see System instructions for safety.

Unsafe prompts

The CSAM classifier filters only the images in requests to Anthropic models in Vertex AI. The CSAM classifier doesn't filter the model's outputs.

Requests that trigger the CSAM classifier are blocked and return a 200 HTTP status code with the following message:

{
  "promptFeedback": {
    "blockReason": "PROHIBTED_CONTENT"
  },
}

If the request is blocked by the classifier, the request stream is cancelled and the following message is returned:

"event": "vertex-block-event",
"data": {"promptFeedback": {"blockReason": "PROHIBITED_CONTENT"}}

Location availability

The CSAM classifier is available in all supported regions.