Enterprise application with Oracle Database on Compute Engine

Last reviewed 2024-09-02 UTC

This document provides a reference architecture to help you build the infrastructure to host a highly available enterprise application that uses an Oracle database, with the entire stack deployed on Compute Engine VMs. You can use this reference architecture to efficiently rehost (lift and shift) on-premises applications that use Oracle databases to Google Cloud. This document also includes guidance to help you build an Oracle Database topology in Google Cloud that meets Oracle's maximum availability architecture (MAA) requirements. The intended audience for this document is cloud architects and Oracle database administrators. The document assumes that you're familiar with Compute Engine and Oracle Database.

Architecture

The following diagram shows the infrastructure for a multi-tier enterprise application that uses Oracle Database. The web tier, application tier, and Oracle Database instances are hosted on Compute Engine VMs. The web tier and application tier run in active-active mode on VMs that are distributed across two zones within a Google Cloud region. The primary and standby database instances are deployed in separate zones. This architecture is aligned with the regional deployment archetype, which helps to ensure that your Google Cloud topology is robust against single-zone outages.

A multi-tier enterprise application uses Oracle Database on Compute Engine VMs.

The architecture that's shown in the preceding diagram includes the following components:

Component Purpose
Regional external Application Load Balancer The regional external Application Load Balancer receives and distributes user requests to the web tier VMs.
Google Cloud Armor security policy The Google Cloud Armor security policy helps to protect your application stack against threats like distributed denial-of-service (DDoS) attacks and cross-site scripting (XSS).
Regional managed instance group (MIG) for the web tier The web tier of the application is deployed on Compute Engine VMs that are part of a regional MIG. This MIG is the backend for the external Application Load Balancer. The MIG contains Compute Engine VMs in two zones. Each of these VMs hosts an independent instance of the web tier of the application.
Regional internal Application Load Balancer The regional internal Application Load Balancer distributes traffic from the web tier VMs to the application tier VMs.
Regional MIG for the application tier The application tier, such as an Oracle WebLogic Server cluster, is deployed on Compute Engine VMs that are part of a regional MIG. This MIG is the backend for the internal Application Load Balancer. The MIG contains Compute Engine VMs in two zones. Each VM hosts an independent instance of the application server.
Oracle Database instances deployed on Compute Engine VMs The application in this architecture uses a primary-standby pair of Oracle Database instances that are deployed on Compute Engine VMs in separate zones. You bring your own licenses (BYOL) for these Oracle Database instances, and you manage the VMs and database instances.
Hyperdisk Storage Pools The VMs in each zone (across all the tiers in the application stack) use Hyperdisk Balanced volumes from a Hyperdisk Storage Pool. By creating and managing all the disks in a single storage pool, you improve capacity utilization and reduce operational complexity while maintaining the storage capacity and performance that the VMs need.
Oracle Data Guard FSFO observer The Oracle Data Guard Fast-Start Failover (FSFO) observer is a lightweight program that initiates automatic failover to the standby Oracle Database instance when the primary instance is unavailable. The observer runs on a Compute Engine VM in a zone that's different from the zones where the primary and standby database instances are deployed.
Cloud Storage bucket To store backups of the Oracle Database instances, this architecture uses a Cloud Storage bucket. To facilitate recovery of the database during a region outage, you can store the backups geo-redundantly in a dual-region or multi-region bucket.
Virtual Private Cloud (VPC) network and subnet All the Google Cloud resources in the architecture use a single VPC network and subnet. Depending on your requirements, you can choose to build an architecture that uses multiple VPC networks or multiple subnets. For more information, see Deciding whether to create multiple VPC networks.
Public Cloud NAT gateway The architecture includes a public Cloud NAT gateway to enable secure outbound connections from the Compute Engine VMs that have only internal IP addresses.
Cloud Interconnect and Cloud VPN To connect your on-premises network to the VPC network in Google Cloud, you can use Cloud Interconnect or Cloud VPN. For information about the relative advantages of each approach, see Choosing a Network Connectivity product.
Cloud Monitoring and Cloud Logging Cloud Monitoring helps you to observe the behavior, health, and performance of your application and Google Cloud resources. Ops Agent collects metrics and logs from the Compute Engine VMs, including the VMs that host the Oracle Database instances. The agent sends logs to Cloud Logging and sends metrics to Cloud Monitoring.

Products used

This reference architecture uses the following Google Cloud products:

  • Compute Engine: A secure and customizable compute service that lets you create and run VMs on Google's infrastructure.
  • Google Cloud Hyperdisk: A network storage service that you can use to provision and dynamically scale block storage volumes with configurable and predictable performance.
  • Cloud Load Balancing: A portfolio of high performance, scalable, global and regional load balancers.
  • Cloud Storage: A low-cost, no-limit object store for diverse data types. Data can be accessed from within and outside Google Cloud, and it's replicated across locations for redundancy.
  • Virtual Private Cloud: A virtual system that provides global, scalable networking functionality for your Google Cloud workloads.
  • Google Cloud Armor: A network security service that offers WAF rules and helps to protect against DDoS and application attacks.
  • Cloud NAT: A service that provides Google Cloud-managed high-performance network address translation.
  • Cloud Monitoring: A service that provides visibility into the performance, availability, and health of your applications and infrastructure.
  • Cloud Logging: A real-time log management system with storage, search, analysis, and alerting.
  • Cloud Interconnect: A service that extends your external network to the Google network through a high-availability, low-latency connection.
  • Cloud VPN: A service that securely extends your peer network to Google's network through an IPsec VPN tunnel.

This reference architecture uses the following Oracle products:

  • Oracle Database: A relational database management system (RDBMS) that extends the relational model to an object-relational model.
  • Oracle Data Guard: A set of services to create, maintain, manage, and monitor one or more standby databases.

You're responsible for procuring licenses for the Oracle products that you deploy in Google Cloud, and you're responsible for complying with the terms and conditions of the Oracle licenses.

Design considerations

This section describes design factors, best practices, and design recommendations that you should consider when you use this reference architecture to develop a topology that meets your specific requirements for security, reliability, operational efficiency, cost, and performance.

The guidance in this section isn't exhaustive. Depending on the specific requirements of your application and the Google Cloud and third-party products and features that you use, there might be additional design factors and trade-offs that you should consider.

System design

This section provides guidance to help you to choose Google Cloud regions for your deployment and to select appropriate Google Cloud services.

Region selection

When you choose the Google Cloud region for your deployment, consider the following factors and requirements:

  • Availability of Google Cloud services in each region. For more information, see Products available by location.
  • Availability of Compute Engine machine types in each region. For more information, see Regions and zones.
  • End-user latency requirements.
  • Cost of Google Cloud resources.
  • Regulatory requirements.

Some of these factors and requirements might involve trade-offs. For example, the most cost-efficient region might not have the lowest carbon footprint. For more information, see Best practices for Compute Engine regions selection.

Compute infrastructure

The reference architecture in this document uses Compute Engine VMs to host all the tiers of the application. Depending on the requirements of your application, you can choose the following other Google Cloud compute services:

  • Containers: You can run containerized applications in Google Kubernetes Engine (GKE) clusters. GKE is a container-orchestration engine that automates deploying, scaling, and managing containerized applications.
  • Serverless: If you prefer to focus your IT efforts on your data and applications instead of setting up and operating infrastructure resources, then you can use serverless services like Cloud Run.

The decision of whether to use VMs, containers, or serverless services involves a trade-off between configuration flexibility and management effort. VMs and containers provide more configuration flexibility and control, but you're responsible for managing the resources. In a serverless architecture, you deploy workloads to a preconfigured platform that requires minimal management effort. The design guidance for those services is outside the scope of this document. For more information about service options, see Application Hosting Options.

Storage options

The architecture shown in this document uses a Hyperdisk Storage Pool in each zone, with Hyperdisk Balanced volumes for the VMs in all the tiers. Hyperdisk volumes provide better performance, flexibility, and efficiency than Persistent Disk. For information about Hyperdisk types and features, see About Hyperdisk.

To store data that's shared across multiple VMs in a region, like configuration files for all the VMs in the web tier, you can use a Filestore regional instance. The data that you store in a Filestore regional instance is replicated synchronously across three zones within the region. This replication ensures high availability and robustness against zone outages. You can store shared configuration files, common tools and utilities, and centralized logs in the Filestore instance, and mount the instance on multiple VMs.

When you design storage for your workloads, consider the functional characteristics of the workloads, resilience requirements, performance expectations, and cost goals. For more information, see Design an optimal storage strategy for your cloud workload.

Network design

When you build infrastructure for a multi-tier application stack, you must choose a network design that meets your business and technical requirements. The architecture that's shown in this document uses a simple network topology with a single VPC network and subnet. Depending on your requirements, you can choose to use multiple VPC networks or multiple subnets. For more information, see the following documentation:

Security, privacy, and compliance

This section describes factors to consider when you use this reference architecture to design a topology in Google Cloud that meets the security and compliance requirements of your workloads.

Protection against external threats

To protect your application against external threats like DDoS attacks and XSS, define appropriate Google Cloud Armor security policies based on your requirements. Each policy is a set of rules that specifies the conditions to be evaluated and actions to take when the conditions are met. For example, a rule could specify that if the incoming traffic's source IP address matches a specific IP address or CIDR range, then the traffic must be denied. You can also apply preconfigured web application firewall (WAF) rules. For more information, see Security policy overview.

External access for VMs

In the reference architecture that this document describes, the VMs that host the web tier, application tier, and Oracle Database instances don't need direct inbound access from the internet. Don't assign external IP addresses to those VMs. Google Cloud resources that have only private, internal IP addresses can still access certain Google APIs and services by using Private Service Connect or Private Google Access. For more information, see Private access options for services.

To enable secure outbound connections from Google Cloud resources that have only private IP addresses, like the Compute Engine VMs in this reference architecture, you can use Secure Web Proxy or Cloud NAT.

VM image security

Approved images are images with software that meets your policy or security requirements. To ensure that your VMs use only approved images, you can define an organization policy that restricts the use of images in specific public image projects. For more information, see Setting up trusted image policies.

Service account privileges

In Google Cloud projects where the Compute Engine API is enabled, a default service account is created automatically. For Google Cloud organizations that were created before May 3, 2024, this default service account is granted the Editor IAM role (roles/editor), unless this behavior is disabled.

By default, the default service account is attached to all VMs that you create by using the Google Cloud CLI or the Google Cloud console. The Editor role includes a broad range of permissions, so attaching the default service account to VMs creates a security risk. To avoid this risk, you can create and use dedicated service accounts for each tier of the application stack. To specify the resources that the service account can access, use fine-grained policies. For more information, see Limit service account privileges.

Disk encryption

By default, the data that's stored in Hyperdisk volumes is encrypted using Google-owned and Google-managed keys. As an additional layer of protection, you can choose to encrypt the Google-owned data encryption keys by using keys that you own and manage in Cloud Key Management Service (Cloud KMS). For more information, see About disk encryption.

Network security

To control network traffic between the resources in the architecture, you must configure appropriate Cloud Next Generation Firewall (NGFW) policies. Each policy contains rules that let you control traffic based on parameters like the protocol, IP address, and port. For example, you can configure a firewall rule to allow TCP traffic from the web server VMs to a specific port of the database VMs, and block all other traffic.

More security considerations

When you build the architecture for your workload, consider the platform-level security best practices and recommendations that are provided in the Enterprise foundations blueprint.

Reliability

This section describes design factors to consider when you use this reference architecture to build and operate reliable infrastructure for your deployment in Google Cloud.

Robustness against VM failures

In the architecture that's shown in this document, if a Compute Engine VM in any of the tiers fails, the application can continue to process requests.

  • If a VM in the web tier or application tier crashes, the relevant MIG recreates the VM automatically. The load balancers forward requests to the available web server instances and application server instances.
  • If the VM that hosts the primary Oracle Database instance fails, the Oracle Data Guard FSFO observer initiates an automatic failover to the standby Oracle Database instance.

VM autohealing

Sometimes the VMs that host your web tier and application tier might be running and available, but there might be issues with the application itself. The application might freeze, crash, or not have enough memory. In this scenario, the VMs won't respond to load balancer health checks, and the load balancer won't route traffic to the unresponsive VMs. To help ensure that applications respond as expected, you can configure application-based health checks as part of the autohealing policy of your MIGs. If the application on a particular VM isn't responding, the MIG autoheals (repairs) the VM. For more information about configuring autohealing, see About repairing VMs for high availability.

Robustness against zone outages

If a zone outage occurs, the application remains available.

  • The web tier and application tier are available (and responsive) because the VMs are in regional MIGs. The regional MIGs ensure that new VMs are created automatically in the other zone to maintain the configured minimum number of VMs. The load balancers forward requests to the available web server VMs and application server VMs.
  • If an outage affects the zone that has the primary Oracle Database instance, then the Oracle Data Guard FSFO observer initiates an automatic failover to the standby Oracle Database instance. The FSFO observer runs on a VM in a zone that's different from the zones that have the primary and standby database instances.
  • To ensure high availability of data in Hyperdisk volumes during a single-zone outage, you can use Hyperdisk Balanced High Availability. When data is written to a volume, the data is replicated synchronously between two zones in the same region.

Robustness against region outages

If both of the zones in the architecture have an outage or if a region outage occurs, then the application is unavailable. To reduce the downtime caused by multi-zone or region outages, you can implement the following approach:

For business-critical applications that must continue to be available even when a region outage occurs, consider using the multi-regional deployment archetype. For the database tier, you can use Oracle Active Data Guard FSFO to automatically failover to a standby Oracle Database instance in the failover region. This approach maps to Oracle's MAA Gold tier.

MIG autoscaling

When you run your application on VMs in a regional MIG, the application remains available during isolated zone outages. The autoscaling capability of stateless MIGs lets you maintain application availability and performance at predictable levels. Stateful MIGs can't be autoscaled.

To control the autoscaling behavior of your MIGs, you can specify target utilization metrics, such as average CPU utilization. You can also configure schedule-based autoscaling. For more information, see Autoscaling groups of instances.

VM placement

In the architecture that this document describes, the application tier and web tier run on Compute Engine VMs that are distributed across multiple zones. This distribution ensures that your application is robust against single-zone outages. To improve this robustness further, you can create a spread placement policy and apply it to the MIG template. With a spread placement policy, when the MIG creates VMs, it places them within each zone on different physical servers (called hosts), so your VMs are robust against failures of individual hosts. However, a trade-off with this approach is that the latency for inter-VM network traffic might increase. For more information, see Placement policies overview.

VM capacity planning

To make sure that capacity for Compute Engine VMs is available when required for MIG autoscaling, you can create reservations. A reservation provides assured capacity in a specific zone for a specified number of VMs of a machine type that you choose. A reservation can be specific to a project, or it can be shared across multiple projects. You incur charges for reserved resources even if the resources aren't provisioned or used. For more information about reservations, including billing considerations, see Reservations of Compute Engine zonal resources.

Block storage availability

The architecture in this document uses a Hyperdisk Storage Pool in each zone to provide block storage for the Compute Engine VMs. You create a pool of block storage capacity for a zone. You then create Hyperdisk volumes in the storage pool and attach the volumes to VMs in the zone. The storage pool attempts to add capacity automatically to ensure that the utilization rate doesn't exceed 80% of the pool's provisioned capacity. This approach ensures that block storage space is available when required. For more information, see How Hyperdisk Storage Pools work.

Stateful storage

A best practice in application design is to avoid the need for stateful local disks. But if the requirement exists, you can configure your disks to be stateful to ensure that the data is preserved when the VMs are repaired or recreated. However, we recommend that you keep the boot disks stateless, so that you can update them easily to the latest images with new versions and security patches. For more information, see Configuring stateful persistent disks in MIGs.

Backup and recovery

The architecture in this document uses Cloud Storage to store database backups. If you choose the dual-region or multi-region location type for the Cloud Storage bucket, the backups are replicated asynchronously across at least two geographic locations. If a region outage occurs, you can use the backups to restore the database in another region. With a dual-region bucket, you can achieve faster replication by enabling the turbo replication option. For more information, see Data availability and durability.

You can use Backup and DR Service to create, store, and manage backups of Compute Engine VMs. Backup and DR Service stores backup data in its original, application-readable format. When required, you can restore workloads to production by directly using data from long-term backup storage without time-consuming data-movement or preparation activities. For more information, see the following documentation:

More reliability considerations

When you build the cloud architecture for your workload, review the reliability-related best practices and recommendations that are provided in the following documentation:

Cost optimization

This section provides guidance to optimize the cost of setting up and operating a Google Cloud topology that you build by using this reference architecture.

VM machine types

To help you optimize the utilization of your VM resources, Compute Engine provides machine type recommendations. Use the recommendations to choose machine types that match your workload's compute requirements. For workloads that have predictable resource requirements, you can customize the machine type to your needs and save money by using custom machine types.

VM provisioning model

If your application is fault tolerant, then Spot VMs can help to reduce the Compute Engine costs for your VMs in the web tier and application tier. The cost of Spot VMs is significantly lower than regular VMs. However, Compute Engine might preemptively stop or delete Spot VMs to reclaim capacity.

Spot VMs are suitable for batch jobs that can tolerate preemption and that don't have high availability requirements. Spot VMs offer the same machine types, options, and performance as regular VMs. However, when the resource capacity in a zone is limited, MIGs with Spot VMs might not be able to scale out (that is, create VMs) automatically to reach the specified target size until the required capacity becomes available again. Don't use Spot VMs for the VMs that host the Oracle Database instances.

VM resource utilization

The autoscaling capability of stateless MIGs enables your application to gracefully handle increases in traffic to the web tier and application tier. Autoscaling also helps you to reduce cost when the need for resources is low. Stateful MIGs can't be autoscaled.

Oracle Database licensing

You're responsible for procuring licenses for the Oracle products that you deploy on Compute Engine, and you're responsible for complying with the terms and conditions of the Oracle licenses. When you calculate the Oracle Database licensing cost, consider the number of Oracle Processor licenses that are required based on the machine type that you choose for the Compute Engine VMs that host the Oracle Database instances. For more information, see Licensing Oracle Software in the Cloud Computing Environment.

Block storage resource utilization

The architecture in this document uses a Hyperdisk Storage Pool in each zone to provide block storage for the Compute Engine VMs. You can improve the overall utilization of block storage capacity and reduce cost by using Advanced capacity storage pools, which use thin provisioning and data reduction technologies to improve storage efficiency.

More cost considerations

When you build the architecture for your workload, also consider the general best practices and recommendations that are provided in Google Cloud Architecture Framework: Cost optimization.

Operational efficiency

This section describes the factors to consider when you use this reference architecture to design a Google Cloud topology that you can operate efficiently.

VM configuration updates

To update the configuration of the VMs in a MIG (like the machine type or boot-disk image), you create a new instance template with the required configuration and then apply the new template to the MIG. The MIG updates the VMs by using an update method that you specify: automatic or selective. Choose an appropriate method based on your requirements for availability and operational efficiency. For more information about these MIG update methods, see Apply new VM configurations in a MIG.

VM images

For your MIG instance templates, instead of using Google-provided public images, we recommend that you create and use custom OS images that include the configurations and software that your applications require. You can group your custom images into a custom image family. An image family always points to the most recent image in that family, so your instance templates and scripts can use that image without you having to update references to a specific image version. You must regularly update your custom images to include the security updates and patches that are provided by the OS vendor.

Deterministic instance templates

If the instance templates that you use for your MIGs include startup scripts (for example, to install third-party software), make sure that the scripts explicitly specify the software-installation parameters, like the software version. Otherwise, when the MIG creates the VMs, the software that's installed on the VMs might not be consistent. For example, if your instance template includes a startup script to install Apache HTTP Server 2.0 (the apache2 package), then make sure that the script specifies the exact apache2 version that should be installed, such as version 2.4.53. For more information, see Deterministic instance templates.

Block storage management

The architecture in this document uses a Hyperdisk Storage Pool in each zone to provide block storage for the Compute Engine VMs. Hyperdisk Storage Pools help simplify storage management. Instead of allocating and managing capacity individually for numerous disks, you define a pool of capacity that can be shared across multiple workloads in a zone. You then create Hyperdisk volumes in the storage pool and attach the volumes to the VMs in the zone. The storage pool attempts to add capacity automatically to ensure that the utilization rate doesn't exceed 80% of the pool's provisioned capacity.

Application server to database connectivity

For connections from your application to Oracle Database, we recommend that you use the database VM's zonal internal DNS name rather than its IP address. Google Cloud automatically resolves the DNS name to the VM's primary internal IP address. An added advantage with this approach is that you don't need to reserve and assign static internal IP addresses for the database VMs.

Oracle Database administration and support

When you run a self-managed Oracle Database instance on a Compute Engine VM, there are similar operational concerns as when you run Oracle Database on-premises. However, with a Compute Engine VM you no longer need to manage the underlying compute, networking, and storage infrastructure.

More operational considerations

When you build the architecture for your workload, consider the general best practices and recommendations for operational efficiency that are described in Google Cloud Architecture Framework: Operational excellence.

Performance optimization

This section describes the factors to consider when you use this reference architecture to design a topology in Google Cloud that meets the performance requirements of your workloads.

Compute performance

Compute Engine offers a wide range of predefined and customizable machine types that you can choose from depending on the performance requirements of your workloads.

  • For the VMs that host the web tier and application tier, choose an appropriate machine type based on your performance requirements for those tiers. To get a list of the available machine types that support Hyperdisk volumes and that meet your performance and other requirements, use the Machine series comparison table.
  • For the VMs that host the Oracle Database instances, we recommend that you use a machine type in the C4 machine series from the general-purpose machine family. C4 machine types provide consistently high performance for database workloads.

Network performance

For workloads that need low inter-VM network latency, you can create a compact placement policy and apply it to the MIG template that's used for the application tier. When the MIG creates VMs, it places the VMs on physical servers that are close to each other. While a compact placement policy helps improve inter-VM network performance, a spread placement policy can help improve VM availability as described earlier. To achieve an optimal balance between network performance and availability, when you create a compact placement policy, you can specify how far apart the VMs must be placed. For more information, see Placement policies overview.

Compute Engine has a per-VM limit for egress network bandwidth. This limit depends on the VM's machine type and whether traffic is routed through the same VPC network as the source VM. For VMs with certain machine types, to improve network performance, you can get a higher maximum egress bandwidth by enabling Tier_1 networking. For example, by default, the c4-standard-48 machine type supports up to 34 Gbps for egress traffic that's routed within the same VPC network. With Tier_1 networking enabled, the maximum egress bandwidth for a VM with the c4-standard-48 machine type is 50 Gbps. For more information, see Configure per VM Tier_1 networking performance.

Hyperdisk storage performance

The architecture that's described in this document uses Hyperdisk volumes for the VMs in all the tiers. Hyperdisk lets you scale performance and capacity dynamically. You can adjust the provisioned IOPS, throughput, and the size of each volume to match your workload's storage performance and capacity needs. The performance of Hyperdisk volumes depends on the Hyperdisk type and the machine type of the VMs to which the volumes are attached. For more information about Hyperdisk performance limits and tuning, see the following documentation:

More performance considerations

When you build the architecture for your workload, consider the general best practices and recommendations that are provided in Google Cloud Architecture Framework: Performance optimization.

What's next

Contributors

Author: Kumar Dhanagopal | Cross-Product Solution Developer

Other contributors: