[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-07-16。"],[[["\u003cp\u003eThis architecture provides an automated, event-driven pipeline to scan files uploaded to Cloud Storage for malware, like trojans and viruses, using ClamAV.\u003c/p\u003e\n"],["\u003cp\u003eThe system utilizes a user-uploaded file scanning pipeline to automatically move files into clean or quarantined Cloud Storage buckets depending on if ClamAV detects malware.\u003c/p\u003e\n"],["\u003cp\u003eThe ClamAV malware database mirror update pipeline keeps a local copy of the malware database updated and ensures frequent updates to keep scans relevant.\u003c/p\u003e\n"],["\u003cp\u003eThe architecture uses Cloud Run, Eventarc, and Cloud Storage, and records events in Cloud Logging and Cloud Monitoring, using Google Cloud products in combination with the open source ClamAV.\u003c/p\u003e\n"],["\u003cp\u003eThe design is optimized to avoid rate limiting from the public malware database CDN and helps ensure ClamAV service availability.\u003c/p\u003e\n"]]],[],null,["# Automate malware scanning for files uploaded to Cloud Storage\n\nThis reference architecture shows you how to build an event-driven pipeline that\ncan help you automate the evaluation of files for malware like trojans,\nviruses, and other malicious code. Manually evaluating the large number of files\nthat are uploaded to\n[Cloud Storage](/storage)\nis too time-consuming for most apps. Automating the process can help you save\ntime and improve efficiency.\n\nThe pipeline in this architecture uses Google Cloud products along\nwith the open source antivirus engine\n[ClamAV](https://www.clamav.net/).\nYou can also use any other anti-malware engine that performs on-demand\nscanning in Linux containers. In this architecture, ClamAV runs in a Docker\ncontainer hosted in\n[Cloud Run](/run).\nThe pipeline also writes log entries to\n[Cloud Logging](/logging)\nand records metrics to\n[Cloud Monitoring](/monitoring).\n\nArchitecture\n------------\n\nThe following diagram gives an overview of the architecture:\n\nThe architecture shows the following pipelines:\n\n- User-uploaded file scanning pipeline, which checks if an uploaded file contains malware.\n- ClamAV malware database mirror update pipeline, which maintains an up-to-date mirror of the database of malware that ClamAV uses.\n\nThe pipelines are described in more detail in the following sections.\n\n### User-uploaded file scanning pipeline\n\nThe file scanning pipeline operates as follows:\n\n1. End users upload their files to the *unscanned* Cloud Storage bucket.\n2. The Eventarc service catches this upload event and tells the Cloud Run service about this new file.\n3. The Cloud Run service downloads the new file from the unscanned Cloud Storage bucket and passes it to the ClamAV malware scanner.\n4. Depending on the result of the malware scan, the service performs one of the following actions:\n - If ClamAV declares that the file is clean, then it's moved from the unscanned Cloud Storage bucket to the *clean* Cloud Storage bucket.\n - If ClamAV declares that the file contains malware, then it's moved from the unscanned Cloud Storage bucket to the *quarantined* Cloud Storage bucket.\n5. The service reports the result of these actions to Logging and Monitoring to allow administrators to take action.\n\n### ClamAV Malware database mirror update pipeline\n\nThe ClamAV Malware database mirror update pipeline keeps an up-to-date\n[private local mirror](https://docs.clamav.net/appendix/CvdPrivateMirror.html)\nof the database in Cloud Storage. This ensures that the ClamAV public\ndatabase is only accessed once per update to download the smaller differential\nupdates files, and not the full database, which prevents any rate-limiting.\n\nThis pipeline operates as follows:\n\n1. A Cloud Scheduler job is configured to trigger every two hours, which is the same as the default update check interval used by the ClamAV freshclam service. This job makes an HTTP `POST` request to the Cloud Run service instructing it to update the malware database mirror.\n2. The Cloud Run instance copies the malware database mirror from the Cloud Storage bucket to the local file system.\n3. The instance then runs the [ClamAV CVDUpdate](https://github.com/Cisco-Talos/cvdupdate) tool, which downloads any available differential updates and applies them to the database mirror.\n4. Then, it copies the updated malware database mirror back to the Cloud Storage bucket.\n\nOn startup, the\n[ClamAV freshclam](https://docs.clamav.net/manual/Usage/SignatureManagement.html#freshclam)\nservice running in the Cloud Run instance downloads the\nmalware database from Cloud Storage. During runtime, the service also\nregularly checks for and downloads any available database updates from the\nCloud Storage bucket.\n\nDesign considerations\n---------------------\n\nThe following guidelines can help you to develop an architecture that meets your\norganization's requirements for reliability, cost, and operational efficiency.\n\n### Reliability\n\nIn order to scan effectively, the ClamAV malware scanner needs to maintain an\nup-to-date database of malware signatures. The ClamAV service is run using\nCloud Run, which is a stateless service. Upon startup of an\ninstance of the service, ClamAV must always download the latest complete malware\ndatabase, which is several hundreds of megabytes in size.\n\nThe public malware database for ClamAV is hosted on a Content Distribution\nNetwork (CDN), which rate limits these downloads. If multiple instances start up\nand attempt to download the full database, rate limiting can be triggered. This\ncauses the external IP address used by Cloud Run to be blocked\nfor 24 hours. This prevents the ClamAV service from starting up, as well as\npreventing download of malware database updates.\n\nAlso, Cloud Run uses a shared pool of external IP addresses. As a\nresult, downloads from different projects' malware scanning instances are seen\nby the CDN as coming from a single address and also trigger the block.\n\n### Cost optimization\n\nThis architecture uses the following billable components of Google Cloud:\n\n- [Cloud Storage](/storage/pricing)\n- [Cloud Run](/run/pricing)\n- [Eventarc](/eventarc/pricing)\n\nTo generate a cost estimate based on your projected usage, use the\n[pricing calculator](/products/calculator).\n\n### Operational efficiency\n\nTo\n[trigger log-based alerts](/monitoring/alerts)\nfor files that are infected, you can use log entries from\nLogging. However, setting up these alerts is outside the scope of\nthis architecture.\n\nDeployment\n----------\n\nTo deploy this architecture, see\n[Deploy automated malware scanning for files uploaded to Cloud Storage](/architecture/automate-malware-scanning-for-documents-uploaded-to-cloud-storage/deployment).\n\nWhat's next\n-----------\n\n- Explore [Cloud Storage documentation](/storage/docs).\n- For more reference architectures, diagrams, and best practices, explore the [Cloud Architecture Center](/architecture)."]]