Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Last reviewed 2024-11-20 UTC
Infrastruktur yang andal adalah persyaratan penting untuk workload di cloud.
Sebagai arsitek cloud, untuk merancang infrastruktur workload yang andal, Anda
memerlukan pemahaman yang baik mengenai kemampuan keandalan penyedia
cloud pilihan Anda. Dokumen ini menjelaskan elemen penyusun keandalan di
Google Cloud (zona, region, dan resource cakupan lokasi) dan
tingkat ketersediaan yang disediakannya. Dokumen ini juga memberikan panduan
untuk menilai persyaratan keandalan workload Anda, dan menyajikan
rekomendasi arsitektur untuk membangun serta mengelola infrastruktur
yang andal di Google Cloud.
Jika Anda telah membaca panduan ini sebelumnya dan ingin melihat apa saja yang berubah, lihat
Catatan rilis.
Ringkasan keandalan
Aplikasi atau workload dikatakan dapat diandalkan jika memenuhi tujuan Anda
dan memiliki ketersediaan dan ketahanan terhadap kegagalan.
Ketersediaan (atau waktu beroperasi) adalah persentase waktu
penggunaan aplikasi. Misalnya, untuk aplikasi yang memiliki target ketersediaan
99,99%, total periode nonaktif tidak boleh melebihi 8,64 detik selama periode 24 jam.
Terkadang, ketersediaan diukur sebagai proporsi permintaan yang
berhasil ditayangkan oleh aplikasi selama periode tertentu. Misalnya, untuk aplikasi
yang memiliki target ketersediaan 99,99%, untuk setiap 100.000
permintaan yang diterima, yang boleh gagal tidak lebih dari sepuluh permintaan. Ketersediaan sering kali
dinyatakan dengan angka sembilan dalam persentase. Misalnya, ketersediaan 99,99%
dinyatakan dengan "4 angka sembilan".
Bergantung pada tujuan aplikasi, Anda mungkin memiliki kumpulan indikator
yang berbeda untuk menunjukkan keandalan aplikasi. Berikut adalah contoh indikator keandalan:
Untuk aplikasi yang menayangkan konten, indikator keandalan yang penting antara lain
ketersediaan, latensi, dan throughput. Indikator ini menunjukkan apakah
aplikasi dapat merespons permintaan, berapa lama aplikasi dapat merespons
permintaan, dan berapa banyak permintaan yang dapat diproses oleh aplikasi
dengan sukses dalam periode tertentu.
Untuk database dan sistem penyimpanan, indikator keandalannya berupa latensi, throughput, ketersediaan,
dan ketahanan (seberapa baik data terlindungi dari
kehilangan atau kerusakan). Metrik ini menunjukkan berapa lama sistem membutuhkan waktu untuk membaca
atau menulis data, dan apakah data dapat diakses secara on-demand.
Untuk big data dan workload analisis seperti pipeline pemrosesan data,
performa pipeline yang konsisten (throughput dan latensi) sangat penting untuk
memastikan keaktualan produk data, dan merupakan indikator keandalan
yang penting. Metrik ini menunjukkan jumlah data yang dapat diproses, dan berapa lama
waktu yang dibutuhkan pipeline untuk bergerak dari penyerapan data ke pemrosesan data.
Sebagian besar aplikasi menjadikan ketepatan data sebagai indikator
keandalan yang penting.
Keandalan aplikasi yang di-deploy di Google Cloud bergantung pada faktor-faktor berikut:
Desain internal aplikasi.
Aplikasi atau komponen sekunder yang menjadi dependensi aplikasi.
Google Cloud resource infrastruktur seperti komputasi, jaringan,
penyimpanan, database, dan keamanan tempat aplikasi berjalan, serta cara
aplikasi menggunakan infrastruktur tersebut.
Kapasitas infrastruktur yang Anda sediakan, dan cara kapasitas diskalakan.
Proses dan alat DevOps yang Anda gunakan untuk mem-build, men-deploy, dan
mengelola aplikasi, beserta dependensinya, juga infrastruktur Google Cloud.
Faktor-faktor ini dirangkum dalam diagram berikut:
Seperti yang ditunjukkan pada diagram sebelumnya, keandalan aplikasi yang
di-deploy di Google Cloud bergantung pada beberapa faktor. Fokus dari panduan ini
adalah keandalan infrastruktur Google Cloud .
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2024-11-20 UTC."],[[["\u003cp\u003eReliable cloud infrastructure is essential for application workloads, requiring a deep understanding of the cloud provider's reliability features.\u003c/p\u003e\n"],["\u003cp\u003eApplication reliability is defined by meeting availability and failure resilience objectives, with availability often measured in terms of uptime or successful request rates.\u003c/p\u003e\n"],["\u003cp\u003eKey reliability indicators vary by application type, including availability, latency, throughput, durability, and data correctness.\u003c/p\u003e\n"],["\u003cp\u003eThe reliability of applications in Google Cloud is influenced by the application's design, dependencies, infrastructure resources, capacity management, and DevOps processes.\u003c/p\u003e\n"],["\u003cp\u003eThis document provides a comprehensive guide for cloud architects on how to assess reliability requirements, and design, build and manage infrastructure in Google Cloud.\u003c/p\u003e\n"]]],[],null,["# Google Cloud infrastructure reliability guide\n\nReliable infrastructure is a critical requirement for workloads in the cloud.\nAs a cloud architect, to design reliable infrastructure for your workloads, you\nneed a good understanding of the reliability capabilities of your cloud provider\nof choice. This document describes the building blocks of reliability in\nGoogle Cloud (zones, regions, and location-scoped resources) and the\navailability levels that they provide. This document also provides guidelines\nfor assessing the reliability requirements of your workloads, and presents\narchitectural recommendations for building and managing reliable infrastructure\nin Google Cloud.\n\nThis document is divided into the following parts:\n\n- Overview of reliability (this part)\n- [Building blocks of reliability in Google Cloud](/architecture/infra-reliability-guide/building-blocks)\n- [Assess the reliability requirements for your cloud workloads](/architecture/infra-reliability-guide/requirements)\n- [Design reliable infrastructure for your workloads in Google Cloud](/architecture/infra-reliability-guide/design)\n- [Manage traffic and load for your workloads in Google Cloud](/architecture/infra-reliability-guide/traffic-load)\n- [Manage and monitor your Google Cloud infrastructure](/architecture/infra-reliability-guide/manage-and-monitor)\n\nIf you've read this guide previously and want to see what's changed, see the\n[Release notes](/architecture/release-notes).\n\nOverview of reliability\n-----------------------\n\nAn application or workload is reliable when it meets your current objectives\nfor availability and resilience to failures.\n\nAvailability (or uptime) is the percentage of time that an application is\nusable. For example, for an application that has an availability target of\n99.99%, the total downtime must not exceed 8.64 seconds during a 24-hour period.\nSometimes, availability is measured as the proportion of requests that the\napplication serves successfully during a given period. For example, for an\napplication that has an availability target of 99.99%, for every 100,000\nrequests received, not more than ten requests can fail. Availability is often\nexpressed as the number of nines in the percentage. For example, 99.99%\navailability is expressed as \"4 nines\".\n\nDepending on the purpose of the application, you might have different sets of\nindicators for how reliable the application is. The following are examples of\nsuch reliability indicators:\n\n- For applications that serve content, availability, latency, and throughput are important reliability indicators. They indicate whether the application can respond to requests, how long the application takes to respond to requests, and how many requests the application can process successfully in a given period.\n- For databases and storage systems, latency, throughput, availability, and durability (how well data is protected against loss or corruption), are indicators of reliability. They indicate how long the system takes to read or write data, and whether data can be accessed on demand.\n- For big data and analytics workloads such as data processing pipelines, consistent pipeline performance (throughput and latency) is essential to ensure freshness of the data products, and is an important reliability indicator. It indicates how much data can be processed, and how long it takes for the pipeline to progress from data ingestion to data processing.\n- Most applications have data correctness as an essential reliability indicator.\n\nFor further guidelines to define the reliability objectives for your\napplications, see\n[Assess the reliability requirements for your cloud workloads](/architecture/infra-reliability-guide/requirements).\n| **Note:** Planning for disaster recovery (DR) is related to reliability, and DR is essential for business continuity. For detailed guidance about DR planning, see the [Disaster recovery planning guide](/architecture/dr-scenarios-planning-guide).\n\nFactors that affect application reliability\n-------------------------------------------\n\nThe reliability of an application that's deployed in Google Cloud depends\non the following factors:\n\n- The internal design of the application.\n- The secondary applications or components that the application depends on.\n- Google Cloud infrastructure resources such as compute, networking, storage, databases, and security that the application runs on, and how the application uses the infrastructure.\n- Infrastructure capacity that you provision, and how the capacity scales.\n- The DevOps processes and tools that you use to build, deploy, and maintain the application, its dependencies, and the Google Cloud infrastructure.\n\nThese factors are summarized in the following diagram:\n\nAs shown in the preceding diagram, the reliability of an application that's\ndeployed in Google Cloud depends on multiple factors. The focus of this\nguide is the reliability of the Google Cloud infrastructure.\n\nWhat's next\n-----------\n\n- [Building blocks of reliability in Google Cloud](/architecture/infra-reliability-guide/building-blocks)\n- [Assess the reliability requirements for your cloud workloads](/architecture/infra-reliability-guide/requirements)\n- [Design reliable infrastructure for your workloads in Google Cloud](/architecture/infra-reliability-guide/design)\n- [Manage traffic and load for your workloads in Google Cloud](/architecture/infra-reliability-guide/traffic-load)\n- [Manage and monitor your Google Cloud infrastructure](/architecture/infra-reliability-guide/manage-and-monitor)\n\nContributors\n------------\n\nAuthors:\n\n- [Nir Tarcic](https://www.linkedin.com/in/nirtarcic) \\| Cloud Lifecycle SRE UTL\n- [Kumar Dhanagopal](https://www.linkedin.com/in/kumardhanagopal) \\| Cross-Product Solution Developer\n\n\u003cbr /\u003e\n\nOther contributors:\n\n- [Alok Kumar](https://www.linkedin.com/in/alok-kumar-0a51159) \\| Distinguished Engineer\n- [Andrew Fikes](https://www.linkedin.com/in/andrew-fikes) \\| Engineering Fellow, Reliability\n- [Chris Heiser](https://www.linkedin.com/in/christopher-heiser) \\| SRE TL\n- [David Ferguson](https://www.linkedin.com/in/davidsferguson) \\| Director, Site Reliability Engineering\n- [Joe Tan](https://www.linkedin.com/in/joe-tan-378a55a8) \\| Senior Product Counsel\n- [Krzysztof Duleba](https://www.linkedin.com/in/kduleba) \\| Principal Engineer\n- [Narayan Desai](https://www.linkedin.com/in/nldesai) \\| Principal SRE\n- [Sailesh Krishnamurthy](https://www.linkedin.com/in/saileshkrishnamurthy) \\| VP, Engineering\n- [Steve McGhee](https://www.linkedin.com/in/stevemcghee) \\| Reliability Advocate\n- [Sudhanshu Jain](https://www.linkedin.com/in/sudhanshujain) \\| Product Manager\n- [Yaniv Aknin](https://www.linkedin.com/in/yanivaknin) \\| Software Engineer\n\n\u003cbr /\u003e"]]