This document describes how you deploy the architecture in Automate malware scanning for files uploaded to Cloud Storage.
This deployment guide assumes that you're familiar with the basic functionality of the following technologies:
Architecture
The following diagram shows the deployment architecture that you create in this document:
The diagram shows the following two pipelines that are managed by this architecture:
- File scanning pipeline, which checks if an uploaded file contains malware.
- ClamAV malware database mirror update pipeline, which maintains an up-to-date mirror of the database of malware that ClamAV uses.
For more information about the architecture, see Automate malware scanning for files uploaded to Cloud Storage.
Objectives
Build a mirror of the ClamAV malware definitions database in a Cloud Storage bucket.
Build a Cloud Run service with the following functions:
- Scanning files in a Cloud Storage bucket for malware using ClamAV and move scanned files to clean or quarantined buckets based on the outcome of the scan.
- Maintaining a mirror of the ClamAV malware definitions database in Cloud Storage.
Create an Eventarc trigger to trigger the malware-scanning service when a file is uploaded to Cloud Storage.
Create a Cloud Scheduler job to trigger the malware-scanning service to refresh the mirror of the malware definitions database in Cloud Storage.
Costs
This architecture uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Artifact Registry, Cloud Build, Resource Manager, Cloud Scheduler, Eventarc, Logging, Monitoring, Pub/Sub, Cloud Run, and Service Usage APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Artifact Registry, Cloud Build, Resource Manager, Cloud Scheduler, Eventarc, Logging, Monitoring, Pub/Sub, Cloud Run, and Service Usage APIs.
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In this deployment, you run all commands from Cloud Shell.
Deploy the architecture
You can deploy the architecture described in this document by using one of the following methods:
Use Cloud Shell: Use this method if you want to see how each component of the solution is deployed and configured using the
Google Cloud CLI
command line tool.To use this deployment method, follow the instructions in Deploy using Cloud Shell.
Use the Terraform CLI: Use this method if you want to deploy the solution in as few manual steps as possible. This method relies on Terraform to deploy and configure the individual components.
To use this deployment method, follow the instructions in Deploy using the Terraform CLI.
Deploy using Cloud Shell
To manually deploy the architecture described in this document, complete the steps in the following subsections.
Prepare your environment
In this section, you assign settings for values that are used
throughout the deployment, such as
region and zone.
In this deployment, you use us-central1
as the region for the
Cloud Run service and us
as the location for the
Eventarc trigger and Cloud Storage buckets.
In Cloud Shell, set common shell variables including region and location:
REGION=us-central1 LOCATION=us PROJECT_ID=PROJECT_ID SERVICE_NAME="malware-scanner" SERVICE_ACCOUNT="${SERVICE_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
Replace
PROJECT_ID
with your project ID.Initialize the
gcloud
environment with your project ID:gcloud config set project "${PROJECT_ID}"
Create three Cloud Storage buckets with unique names:
gcloud storage buckets create "gs://unscanned-${PROJECT_ID}" --location="${LOCATION}" gcloud storage buckets create "gs://quarantined-${PROJECT_ID}" --location="${LOCATION}" gcloud storage buckets create "gs://clean-${PROJECT_ID}" --location="${LOCATION}"
${PROJECT_ID}
is used to make sure that the bucket names are unique.These three buckets hold the uploaded files at various stages during the file scanning pipeline:
unscanned-PROJECT_ID
: Holds files before they're scanned. Your users upload their files to this bucket.quarantined-PROJECT_ID
: Holds files that the malware-scanner service has scanned and deemed to contain malware.clean-PROJECT_ID
: Holds files that the malware-scanner service has scanned and found to be uninfected.
Create a fourth Cloud Storage bucket:
gcloud storage buckets create "gs://cvd-mirror-${PROJECT_ID}" --location="${LOCATION}"
${PROJECT_ID}
is used to make sure that the bucket name is unique.This bucket
cvd-mirror-PROJECT_ID
is used to maintain a local mirror of the malware definitions database, which prevents rate limiting from being triggered by the ClamAV CDN.
Set up a service account for the malware-scanner service
In this section, you create a service account to use for the malware scanner service. You then grant the appropriate roles to the service account so that it has permissions to read and write to the Cloud Storage buckets. The roles ensure that the account has minimal permissions and that it only has access to the resources that it needs.
Create the
malware-scanner
service account:gcloud iam service-accounts create ${SERVICE_NAME}
Grant the Object Admin role to the buckets. The role allows the service to read and delete files from the unscanned bucket, and to write files to the quarantined and clean buckets.
gcloud storage buckets add-iam-policy-binding "gs://unscanned-${PROJECT_ID}" \ --member="serviceAccount:${SERVICE_ACCOUNT}" --role=roles/storage.objectAdmin gcloud storage buckets add-iam-policy-binding "gs://clean-${PROJECT_ID}" \ --member="serviceAccount:${SERVICE_ACCOUNT}" --role=roles/storage.objectAdmin gcloud storage buckets add-iam-policy-binding "gs://quarantined-${PROJECT_ID}" \ --member="serviceAccount:${SERVICE_ACCOUNT}" --role=roles/storage.objectAdmin gcloud storage buckets add-iam-policy-binding "gs://cvd-mirror-${PROJECT_ID}" \ --member="serviceAccount:${SERVICE_ACCOUNT}" --role=roles/storage.objectAdmin
Grant the Metric Writer role, which allows the service to write metrics to Monitoring:
gcloud projects add-iam-policy-binding \ "${PROJECT_ID}" \ --member="serviceAccount:${SERVICE_ACCOUNT}" \ --role=roles/monitoring.metricWriter
Create the malware-scanner service in Cloud Run
In this section, you deploy the malware-scanner service to Cloud Run. The service runs in a Docker container that contains the following:
- A
Dockerfile
to build a container image with the service, Node.js runtime, Google Cloud SDK, and ClamAV binaries. - The TypeScript files for the malware-scanner Cloud Run service.
- A
config.json
configuration file to specify your Cloud Storage bucket names. - A
updateCvdMirror.sh
shell script to refresh the ClamAV malware definitions database mirror in Cloud Storage. - A
bootstrap.sh
shell script to run the necessary services on instance startup.
To deploy the service, do the following:
In Cloud Shell, clone the GitHub repository that contains the code files:
git clone https://github.com/GoogleCloudPlatform/docker-clamav-malware-scanner.git
Change to the
cloudrun-malware-scanner
directory:cd docker-clamav-malware-scanner/cloudrun-malware-scanner
Create the
config.json
configuration file based on theconfig.json.tmpl
template file in the GitHub repository:sed "s/-bucket-name/-${PROJECT_ID}/" config.json.tmpl > config.json
The preceding command uses a search and replace operation to give the Cloud Storage buckets unique names that are based on the Project ID.
Optional: View the updated configuration file:
cat config.json
Perform an initial population of the ClamAV malware database mirror in Cloud Storage:
python3 -m venv pyenv . pyenv/bin/activate pip3 install crcmod cvdupdate ./updateCvdMirror.sh "cvd-mirror-${PROJECT_ID}" deactivate
These commands performs a local install of the
CVDUpdate
tool, and then runs theupdateCvdMirror.sh
script which usesCVDUpdate
to copy the ClamAV malware database to thecvd-mirror-PROJECT_ID
bucket that you created earlier.You can check the contents of the mirror bucket:
gcloud storage ls "gs://cvd-mirror-${PROJECT_ID}/cvds"
The bucket should contain several CVD files that contain the full malware database, several
.cdiff
files that contain the daily differential updates, and two JSON files with configuration and state information.Create and deploy the Cloud Run service using the service account that you created earlier:
gcloud beta run deploy "${SERVICE_NAME}" \ --source . \ --region "${REGION}" \ --no-allow-unauthenticated \ --memory 4Gi \ --cpu 1 \ --concurrency 20 \ --min-instances 1 \ --max-instances 5 \ --no-cpu-throttling \ --cpu-boost \ --timeout 300s \ --service-account="${SERVICE_ACCOUNT}"
The command creates a Cloud Run instance that has 1 vCPU and uses 4 GiB of RAM. This size is acceptable for this deployment. However, in a production environment, you might want to choose a larger CPU and memory size for the instance, and a larger
--max-instances
parameter. The resource sizes that you might need depend on how much traffic the service needs to handle.The command includes the following specifications:
- The
--concurrency
parameter specifies the number of simultaneous requests that each instance can process. - The
--no-cpu-throttling
parameter lets the instance perform operations in the background, such as updating malware definitions. - The
--cpu-boost
parameter doubles the number of vCPUs on instance startup to reduce startup latency. - The
--min-instances 1
parameter maintains at least one instance active, because the startup time for each instance is relatively high. - The
--max-instances 5
parameter prevents the service from being scaled up too high.
- The
When prompted, enter
Y
to build and deploy the service. The build and deployment takes about 10 minutes. When it's complete, the following message is displayed:Service [malware-scanner] revision [malware-scanner-UNIQUE_ID] has been deployed and is serving 100 percent of traffic. Service URL: https://malware-scanner-UNIQUE_ID.a.run.app
Store the
Service URL
value from the output of the deployment command in a shell variable. You use the value later when you create a Cloud Scheduler job.SERVICE_URL="SERVICE_URL"
Optional: To check the running service and the ClamAV version, run the following command:
curl -D - -H "Authorization: Bearer $(gcloud auth print-identity-token)" \ ${SERVICE_URL}
The output looks like the following sample. It shows the version of the malware-scanner service, the version of ClamAV, and the version of the malware definitions with the date that they were last updated.
gcs-malware-scanner version 3.2.0 Using Clam AV version: ClamAV 1.4.1/27479/Fri Dec 6 09:40:14 2024
The Cloud Run service requires that all invocations are
authenticated, and the authenticating identities must have the
run.routes.invoke
permission on the service. You add the permission in the
next section.
Create an Eventarc Cloud Storage trigger
In this section, you add permissions to allow Eventarc to
capture Cloud Storage events and create a trigger to send these events
to the Cloud Run malware-scanner
service.
If you're using an existing project that was created before April 8, 2021, add the
iam.serviceAccountTokenCreator
role to the Pub/Sub service account:PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)") PUBSUB_SERVICE_ACCOUNT="service-${PROJECT_NUMBER}@gcp-sa-pubsub.iam.gserviceaccount.com" gcloud projects add-iam-policy-binding ${PROJECT_ID} \ --member="serviceAccount:${PUBSUB_SERVICE_ACCOUNT}"\ --role='roles/iam.serviceAccountTokenCreator'
This role addition is only required for older projects and allows Pub/Sub to invoke the Cloud Run service.
In Cloud Shell, grant the Pub/Sub Publisher role to the Cloud Storage service account:
STORAGE_SERVICE_ACCOUNT=$(gcloud storage service-agent --project="${PROJECT_ID}") gcloud projects add-iam-policy-binding "${PROJECT_ID}" \ --member "serviceAccount:${STORAGE_SERVICE_ACCOUNT}" \ --role "roles/pubsub.publisher"
Allow the
malware-scanner
service account to invoke the Cloud Run service, and act as an Eventarc event receiver:gcloud run services add-iam-policy-binding "${SERVICE_NAME}" \ --region="${REGION}" \ --member "serviceAccount:${SERVICE_ACCOUNT}" \ --role roles/run.invoker gcloud projects add-iam-policy-binding "${PROJECT_ID}" \ --member "serviceAccount:${SERVICE_ACCOUNT}" \ --role "roles/eventarc.eventReceiver"
Create an Eventarc trigger to capture the finalized object event in the unscanned Cloud Storage bucket and send it to your Cloud Run service. The trigger uses the
malware-scanner
service account for authentication:BUCKET_NAME="unscanned-${PROJECT_ID}" gcloud eventarc triggers create "trigger-${BUCKET_NAME}-${SERVICE_NAME}" \ --destination-run-service="${SERVICE_NAME}" \ --destination-run-region="${REGION}" \ --location="${LOCATION}" \ --event-filters="type=google.cloud.storage.object.v1.finalized" \ --event-filters="bucket=${BUCKET_NAME}" \ --service-account="${SERVICE_ACCOUNT}"
If you receive one of the following errors, wait one minute and then run the commands again:
ERROR: (gcloud.eventarc.triggers.create) INVALID_ARGUMENT: The request was invalid: Bucket "unscanned-PROJECT_ID" was not found. Please verify that the bucket exists.
ERROR: (gcloud.eventarc.triggers.create) FAILED_PRECONDITION: Invalid resource state for "": Permission denied while using the Eventarc Service Agent. If you recently started to use Eventarc, it may take a few minutes before all necessary permissions are propagated to the Service Agent. Otherwise, verify that it has Eventarc Service Agent role.
Change the message acknowledgement deadline to five minutes in the underlying Pub/Sub subscription that's used by the Eventarc trigger. The default value of 10 seconds is too short for large files or high loads.
SUBSCRIPTION_NAME=$(gcloud eventarc triggers describe \ "trigger-${BUCKET_NAME}-${SERVICE_NAME}" \ --location="${LOCATION}" \ --format="get(transport.pubsub.subscription)") gcloud pubsub subscriptions update "${SUBSCRIPTION_NAME}" --ack-deadline=300
Although your trigger is created immediately, it can take up to two minutes for that trigger to be fully functional.
Create a Cloud Scheduler job to trigger ClamAV database mirror updates
Create a Cloud Scheduler job that executes an HTTP
POST
request on the Cloud Run service with a command to update the mirror of the malware definitions database. To avoid having too many clients use the same time slot, ClamAV requires that you schedule the job at a random minute between 3 and 57, avoiding multiples of 10.while : ; do # set MINUTE to a random number between 3 and 57 MINUTE="$((RANDOM%55 + 3))" # exit loop if MINUTE isn't a multiple of 10 [[ $((MINUTE % 10)) != 0 ]] && break done gcloud scheduler jobs create http \ "${SERVICE_NAME}-mirror-update" \ --location="${REGION}" \ --schedule="${MINUTE} */2 * * *" \ --oidc-service-account-email="${SERVICE_ACCOUNT}" \ --uri="${SERVICE_URL}" \ --http-method=post \ --message-body='{"kind":"schedule#cvd_update"}' \ --headers="Content-Type=application/json"
The
--schedule
command-line argument defines when the job runs using the unix-cron string format. The value given indicates that the job should run at the specific randomly-generated minute every two hours.
This job only updates the ClamAV mirror in Cloud Storage. The ClamAV freshclam daemon in each instance of the Cloud Run checks the mirror every 30 minutes for new definitions and updates the ClamAV daemon.
Deploy using the Terraform CLI
This section describes deploying the architecture described in this document by using the Terraform CLI.
Clone the GitHub Repository
In Cloud Shell, clone the GitHub repository that contains the code and Terraform files:
git clone https://github.com/GoogleCloudPlatform/docker-clamav-malware-scanner.git
Prepare the environment
In this section, you assign settings for values that are used
throughout the deployment, such as
region and zone.
In this deployment, you use us-central1
as the region for the
Cloud Run service and us
as the location for the
Eventarc trigger and Cloud Storage buckets.
In Cloud Shell, set common shell variables including region and location:
REGION=us-central1 LOCATION=us PROJECT_ID=PROJECT_ID
Replace
PROJECT_ID
with your project ID.Initialize the
gcloud CLI
environment with your project ID:gcloud config set project "${PROJECT_ID}"
Create the
config.json
configuration file based on theconfig.json.tmpl
template file in the GitHub repository:sed "s/-bucket-name/-${PROJECT_ID}/" \ docker-clamav-malware-scanner/cloudrun-malware-scanner/config.json.tmpl \ > docker-clamav-malware-scanner/cloudrun-malware-scanner/config.json
The preceding command uses a search and replace operation to give the Cloud Storage buckets unique names that are based on the Project ID.
Optional: View the updated configuration file:
cat docker-clamav-malware-scanner/cloudrun-malware-scanner/config.json
Configure the Terraform variables. The contents of the
config.json
configuration file are passed to Terraform by using theTF_VAR_config_json
variable, so that Terraform knows which Cloud Storage buckets are to create. The value of this variable is also passed to Cloud Run to configure the service.TF_VAR_project_id=$PROJECT_ID TF_VAR_region=us-central1 TF_VAR_bucket_location=us TF_VAR_config_json="$(cat docker-clamav-malware-scanner/cloudrun-malware-scanner/config.json)" TF_VAR_create_buckets=true export TF_VAR_project_id TF_VAR_region TF_VAR_bucket_location TF_VAR_config_json TF_VAR_create_buckets
Deploy the base infrastructure
In Cloud Shell, run the following commands to deploy the base infrastructure:
gcloud services enable \ cloudresourcemanager.googleapis.com \ serviceusage.googleapis.com cd docker-clamav-malware-scanner/terraform/infra terraform init terraform apply
Respond
yes
when prompted.This Terraform script performs the following tasks:
- Creates the service accounts
- Creates the Artifact Registry
- Creates the Cloud Storage buckets
- Sets the appropriate roles and permissions
- Performs an initial population of the Cloud Storage bucket that contains the mirror of ClamAV malware definitions database
Build the container for the service
In Cloud Shell, run the following commands to launch a Cloud Build job to create the container image for the service:
cd ../../cloudrun-malware-scanner gcloud builds submit \ --region="$TF_VAR_region" \ --config=cloudbuild.yaml \ --service-account="projects/$PROJECT_ID/serviceAccounts/malware-scanner-build@$PROJECT_ID.iam.gserviceaccount.com" \ .
Wait a few minutes for the build to complete.
Deploy the service and trigger
In Cloud Shell, run the following commands to deploy the Cloud Run service:
cd ../terraform/service/ terraform init terraform apply
Respond
yes
when prompted.It can take several minutes for the service to deploy and start.
This terraform script performs the following tasks:
- Deploys the Cloud Run service by using the container image that you just built.
- Sets up the Eventarc triggers on the
unscanned
Cloud Storage buckets. Although your trigger is created immediately, it can take up to two minutes for that trigger to be fully functional. - Creates the Cloud Scheduler job to update to the ClamAV malware definitions mirror.
If the deployment fails with one of the following errors, wait one minute and then run the
terraform apply
command again to retry creating the Eventarc trigger.Error: Error creating Trigger: googleapi: Error 400: Invalid resource state for "": The request was invalid: Bucket "unscanned-PROJECT_ID" was not found. Please verify that the bucket exists.
Error: Error creating Trigger: googleapi: Error 400: Invalid resource state for "": Permission denied while using the Eventarc Service Agent. If you recently started to use Eventarc, it may take a few minutes before all necessary permissions are propagated to the Service Agent. Otherwise, verify that it has Eventarc Service Agent role..
Optional: To check the running service and the ClamAV version in use, run the following commands:
MALWARE_SCANNER_URL="$(terraform output -raw cloud_run_uri)" curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \ "${MALWARE_SCANNER_URL}"
The output looks like the following sample. It shows the version of the malware-scanner service, the version of ClamAV, and the version of the malware definitions with the date that they were last updated.
gcs-malware-scanner version 3.2.0 Using Clam AV version: ClamAV 1.4.1/27479/Fri Dec 6 09:40:14 2024
Test the pipeline by uploading files
To test the pipeline, you upload one clean (malware-free) file and one test file that mimics an infected file:
Create a sample text file or use an existing clean file to test the pipeline processes.
In Cloud Shell, copy the sample data file to the unscanned bucket:
gcloud storage cp FILENAME "gs://unscanned-${PROJECT_ID}"
Replace
FILENAME
with the name of the clean text file. The malware-scanner service inspects each file and moves it to an appropriate bucket. This file is moved to the clean bucket.Give the pipeline a few seconds to process the file and then check your clean bucket to see if the processed file is there:
gcloud storage ls "gs://clean-${PROJECT_ID}" --recursive
You can check that the file was removed from the unscanned bucket:
gcloud storage ls "gs://unscanned-${PROJECT_ID}" --recursive
Upload a file called
eicar-infected.txt
that contains the EICAR standard anti-malware test signature to your unscanned bucket:echo -e 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' \ | gcloud storage cp - "gs://unscanned-${PROJECT_ID}/eicar-infected.txt"
This text string has a signature that triggers malware scanners for testing purposes. This test file is a widely used test—it isn't actual malware and it's harmless to your workstation. If you try to create a file that contains this string on a computer that has a malware scanner installed, you can trigger an alert.
Wait a few seconds and then check your quarantined bucket to see if your file successfully went through the pipeline:
gcloud storage ls "gs://quarantined-${PROJECT_ID}" --recursive
The service also logs a Logging log entry when a malware infected file is detected.
You can check that the file was removed from the unscanned bucket:
gcloud storage ls "gs://unscanned-${PROJECT_ID}" --recursive
Test the malware definitions database update mechanism
In Cloud Shell, trigger the check for updates by forcing the Cloud Scheduler job to run:
gcloud scheduler jobs run "${SERVICE_NAME}-mirror-update" --location="${REGION}"
The results of this command are only shown in the detailed logs.
Monitor the service
You can monitor the service by using Cloud Logging and Cloud Monitoring.
View detailed logs
In the Google Cloud console, go to the Cloud Logging Logs Explorer page.
If the Log fields filter isn't displayed, click Log Fields.
In the Log Fields filter, click Cloud Run Revision.
In the Service Name section of the Log Fields filter, click malware-scanner.
The logs query results shows the logs from the service, including several lines that show the scan requests and status for the two files that you uploaded:
Scan request for gs://unscanned-PROJECT_ID/FILENAME, (##### bytes) scanning with clam ClamAV CLAMAV_VERSION_STRING
Scan status for gs://unscanned-PROJECT_ID/FILENAME: CLEAN (##### bytes in #### ms)
...
Scan request for gs://unscanned-PROJECT_ID/eicar-infected.txt, (69 bytes) scanning with clam ClamAV CLAMAV_VERSION_STRING
Scan status for gs://unscanned-PROJECT_ID/eicar-infected.txt: INFECTED stream: Eicar-Signature FOUND (69 bytes in ### ms)
The output shows the ClamAV version and malware database signature revision, along with the malware name for the infected test file. You can use these log messages to set up alerts for when malware has been found, or for when failures occurred while scanning.
The output also shows the malware definitions mirror update logs:
Starting CVD Mirror update
CVD Mirror update check complete. output: ...
If the mirror was updated, the output shows additional lines:
CVD Mirror updated: DATE_TIME - INFO: Downloaded daily.cvd. Version: VERSION_INFO
Freshclam update logs appear every 30 mins:
DATE_TIME -> Received signal: wake up
DATE_TIME -> ClamAV update process started at DATE_TIME
DATE_TIME -> daily.cvd database is up-to-date (version: VERSION_INFO)
DATE_TIME -> main.cvd database is up-to-date (version: VERSION_INFO)
DATE_TIME -> bytecode.cvd database is up-to-date (version: VERSION_INFO)
If the database was updated, the freshclam log lines are instead similar to the following:
DATE_TIME -> daily.cld updated (version: VERSION_INFO)
View Metrics
The service generates the following metrics for monitoring and alerting purposes:
- Number of clean files processed:
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/clean-files
- Number of infected files processed:
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/infected-files
- Number of files ignored and not scanned:
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/ignored-files
- Time spent scanning files:
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/scan-duration
- Total number of bytes scanned:
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/bytes-scanned
- Number of failed malware scans:
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/scans-failed
- Number of CVD Mirror update checks:
workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/cvd-mirror-updates
You can view these metrics in the Cloud Monitoring Metrics Explorer:
In the Google Cloud console, go to the Cloud Monitoring Metrics Explorer page.
Click the Select a metric field and enter the filter string
malware
.Expand the Generic Task resource.
Expand the Googlecloudplatform category.
Select the googlecloudplatform/gcs-malware-scanning/clean-files metric. The graph shows a data point that indicates when the clean file was scanned.
You can use metrics to monitor the pipeline and to create alerts for when malware is detected, or when files fail processing.
The generated metrics have the following labels, which you can use for filtering and aggregation to view more fine-grained details with Metrics Explorer:
source_bucket
destination_bucket
clam_version
cloud_run_revision
In the ignored_files
metric, the following reason
labels define why files are ignored:
ZERO_LENGTH_FILE
: If theignoreZeroLengthFiles
config value is set, and the file is empty.FILE_TOO_LARGE
: When the file exceeds the maximum scan size of 500 MiB.REGEXP_MATCH
: When the filename matches one of the patterns defined infileExclusionPatterns
.FILE_SIZE_MISMATCH
: If the file size changes while it is being examined.
Advanced configuration
The following sections describe how you can configure the scanner with more advanced parameters.
Handle multiple buckets
The malware scanner service can scan files from multiple source buckets and send the files to separate clean and quarantined buckets. Although this advanced configuration is out of the scope of this deployment, the following is a summary of the required steps:
Create unscanned, clean, and quarantined Cloud Storage buckets that have unique names.
Grant the appropriate roles to the
malware-scanner
service account on each bucket.Edit the
config.json
configuration file to specify the bucket names for each configuration:{ "buckets": [ { "unscanned": "unscanned-bucket-1-name", "clean": "clean-bucket-1-name", "quarantined": "quarantined-bucket-1-name" }, { "unscanned": "unscanned-bucket-2-name", "clean": "clean-bucket-2-name", "quarantined": "quarantined-bucket-2-name" } ], "ClamCvdMirrorBucket": "cvd-mirror-bucket-name" }
For each of the unscanned buckets, create an Eventarc trigger. Make sure to create a unique trigger name for each bucket.
The Cloud Storage bucket must be in the same project and region as the Eventarc trigger.
If you are using the Terraform deployment, the steps
in this section are automatically applied when you pass your updated
config.json
configuration file in the terraform configuration variable
TF_VAR_config_json
.
Ignoring temporary files
Some uploading services, such as SFTP to Cloud Storage gateways, create one or more temporary files during the upload process. These services then rename these files to the final filename once the upload is complete.
The normal behavior of the scanner is to scan and move all files, including these temporary files as soon as they are written, which may cause the uploader service to fail when it can't find its temporary files.
The fileExclusionPatterns
section of the config.json
configuration file lets
you use regular expressions to specify a list of filename patterns to ignore.
Any files matching these regular expressions are left in the unscanned
bucket.
When this rule is triggered, the ignored-files
counter is incremented, and a
message is logged to indicate that the file matching the pattern was ignored.
The following code sample shows a config.json
configuration file with the
fileExclusionPatterns
list set to ignore files ending in .tmp
or containing
the string .partial_upload.
.
{
"buckets": [
{
"unscanned": "unscanned-bucket-name",
"clean": "clean-bucket-name",
"quarantined": "quarantined-bucket-name"
},
],
"ClamCvdMirrorBucket": "cvd-mirror-bucket-name",
"fileExclusionPatterns": [
"\\.tmp$",
"\\.partial_upload\\."
]
}
Take care when using \
characters in the regular expression as they
will need to be escaped in the JSON file with another \
. For example, to
specify a literal .
in a regular expression, the symbol needs to be escaped twice - once
for the regular expression, and again for the text in the JSON file, therefore
becoming \\.
, as in the last line of the preceding code sample.
Ignore zero-length files
Similarly to temporary files, some upload services create a zero-length file on Cloud Storage, then update this file later with more contents.
These files can also be ignored by setting the config.json
parameter
ignoreZeroLengthFiles
to true
, for example:
{
"buckets": [
{
"unscanned": "unscanned-bucket-name",
"clean": "clean-bucket-name",
"quarantined": "quarantined-bucket-name"
},
],
"ClamCvdMirrorBucket": "cvd-mirror-bucket-name",
"ignoreZeroLengthFiles": true
}
When this rule is triggered, the ignored-files
metric is incremented, and a
message is logged to indicate that a zero-length file was ignored.
Maximum scan file size
The default maximum scan file size is 500 MiB. This is chosen because it takes approximately 5 minutes to scan a file of this size.
Files that are larger than 500 MiB are ignored, and are left in the unscanned
bucket. The files-ignored
metric is incremented and a message is logged.
If you need to increase this limit, then update the following limits so they accommodate the new maximum file size and scan duration values:
- The Cloud Run service request timeout is 5 minutes
- The Pub/Sub subscription message acknowledgement deadline is 5 minutes
- The Scanner code has a
MAX_FILE_SIZE
constant of 500 MiB. - The ClamAV service config has
StreamMaxLength
,MaxScanSize
, andMaxFileSize
settings of 512 MB. These settings are set by thebootstrap.sh
script.
Clean up
The following section explains how you can avoid future charges for the Google Cloud project that you used in this deployment.
Delete the Google Cloud project
To avoid incurring charges to your Google Cloud account for the resources used in this deployment, you can delete the Google Cloud project.
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Explore Cloud Storage documentation.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.