Azure multi-region & multi-subnet – Basic Availability Groups – SQL 2016 Standard Edition deployment for SCOM 2016

Overview

In one of my last System Center Operations Manager 2016 deployments the databases were designed to run using Basic Availability Groups (BAGs are new in SQL Server 2016) hosted by a SQL cluster in Azure with nodes in 2 separated regions in 2 separate subnets. I was looking around the internet for about 2-3 Hours to find a solution done by someone else to make my life a little easier, but I was not lucky enough. This was the time, when I had to use my Azure environment and test the possibilities. In the following paragraphs, I try to give you as much as possible information about this deployment.

SQL requirements:

According to the Low-Level Design of the environment, the SQL deployment need to be built by supporting the following requirements:

  • it needs to be a 2 node cluster
  • SQL Server 2016 Standard is essential
  • both node need to be in a separate Azure region
  • both nodes are in a separate IP subnet

To accomplish all the requirements I decided to build the environment according to the following diagram:

Basic Availability Group

Basic Availability Group (BAG) is a new feature in Microsoft SQL Server 2016 Standard Edition to make a single database High Available. It is similar to the Advanced AlwaysOn Availability solution with a few limitations:

  • only 2 replicas in a BAG (a Primary and a Secondary)
  • only 1 database per BAG
  • no read access on secondary replica
  • no backups on secondary replica
  • BAG cannot be upgraded to Enterprise Availability Group
  • only SQL Server 2016 Standard edition supports BAG

Prerequisites

To deploy the solution described above there are several prerequisite we need to meet. It includes firewall clearings and software installations

Software

Availability Solution for SQL Server is built on the top of Windows Server Failover Cluster (WSFC) technology, so to configure a BAG, the nodes first need to be installed and configured in a WSFC solution. (For more information, see Configure WFSC chapter)

Firewall clearings

Because there are more than 1 node are involved in the scenario there are certain ports need to be open between the servers

WSFC (between the 2 SQL Cluster nodes)

These are the list of port(s) required to be open between the nodes of WSFC to work properly:

  • TCP 135
  • UDP 137
  • TCP/UDP 3343
  • TCP/UDP 49125-65535 (RPC High Ports)
  • ICMP
  • TCP 5022 – Basic Availability Group Endpoint port

FSW (between the 2 SQL nodes and the FSW)

These are the list of port(s) required to be open between all the SQL nodes and the File Share Witness (FSW) server

  • TCP 445

BAG (between the ILB and their related cluster node)

These ports are required to be open from the Azure Internal Load Balancer (NLB) to the SQL nodes and the Listener IPs

  • TCP 1433 – This port depends on the setting of your SQL installation. The default instance is listening on 1433 by default
  • TCP 59999 – Probe Port configured on ILB

If you are using server level Network Security Groups (NSG), the source should be configured as AzureLoadBalancer

Dedicated IP Addresses

To build this environment, I needed to dedicate a few extra IP addresses other than the server addresses. It is a good idea to have them provisioned as soon as possible:

  • Cluster IP in EUN subnet
  • Cluster IP in EUW subnet
  • ILB-EUN IP (This is the same as the Listener IP for the BAGs in EUN)
  • ILB-EUW IP (This is the same as the Listener IP for the BAGs in EUW)

Installing SQL instances

When the requested Virtual Machines (VM/VMs) are provisioned properly in Azure, we can start with installation of the SQL Server 2016 Standard Edition.

  • Insert the SQL Server Media and run setup.exe
  • Click Installation and New SQL Server stand-alone installation… to start the wizard


  • Select Enter the product key and enter the key into the field. In the version we used during the installation the key was already added. Click Next


  • Check I accept the license terms and click Next
  • The setup now runs a prerequisite check. If something is not configured correctly for this phase, a warning or error appears. During the installation the check passed without any issue
  • On the next screen, select Microsoft Update if you wish and click Next. During the installation it was not selected


  • Acknowledge the prerequisites and click Next


  • The wizard now installs the setup files


  • Firewall ports may need to be opened depends on the environment.


  • On the next page select the following features:

    • Database Engine Service
    • Full-Text and Semantic Extractions…
  • Then click Next


  • Select Named Instance and fill up the fields, then click Next


  • Setup the services with the correct Startup Type and Account information and click on Collation tab.


  • Make sure the collation is SQL_Latin1_General_CP1_CI_AS. If not, click Customize


  • Select SQL Collation, used for backwards compatibility and select the correct one. Hit OK and Next on the Server Configuration page


  • On Database Engine Configuration screen add your current user and the SQL service account from AD, then go to the Data Directories tab


  • Modify the entries according to the requirements (like screenshot). On the TempDB tab, there is a possibility to configure the TempDB folders and size. We configure the TempDB together with the other databases after SCOM installation


  • Check the summary screen and click Install to start the installation


  • Once the installation is finished click Close


  • Because the SQL Server Management Studio is not included in the installation, we need to download it. If your server has internet access it can be done using Install SQL Server Management Tools link from the Installation Centre. It will open a browser window and the file can be downloaded. If the server does not have internet access use this link (https://msdn.microsoft.com/en-us/library/mt238290.aspx.) to download it from an internet facing computer.


  • Always use the latest GA release and not the RC one


  • Save the file to the server and start the installation. On the first page click Install


  • When the installation is finished click Close


  • By default, every Named Instance using Dynamic port selection. We need to configure our instance to use 1433
  • Start SQL Server Configuration Manager on the both nodes
  • Expand SQL Server Network Configuration and select Protocols for <InstanceName>
  • Right-click->Properties on TCP/IP


  • On IP Addresses tab scroll down to IPAll and fill the TCP Port with 1433. Click OK

  • To have the configuration take affect we need to restart the services. Just select the SQL Server Services on the left side. Right-click on the SQL Server (<InstanceName>) entry and select Restart.


Configure WSFC

This process starts with the installation of the Windows feature and finishes with having a Windows Server Failover Cluster in place with a FSW quorum configuration. The steps marked with (*) will need to be done on both cluster node.

  • Login to the WFSC node (*)
  • Open Server Manager and select Add roles and features. This will open a wizard (*)

  • Click Next until Features screen appears (*)
  • Check Failover Clustering on the list. When a window appears about extra features needed just click Add Features. Then click Next (*)

  • Check the Confirmation page and hit Install. This will start the installation of this feature (*)

  • Wait until the installation finishes and click Close (*)

  • Launch Failover Cluster Manager using Server Manager console

  • Validate the cluster using Validate Configuration wizard

  • Click Next on the first page and Browse for the servers of the cluster. When the servers are checked and added click Next

  • On Testing Option, select Run all tests. This is the recommended action. Hit Next twice to start the tests

  • Wait until all the tests finished and check the result. If the result is satisfying check the Create the cluster now using the validated nodes… and click Finish. This will finish the validation wizard and start the Create Cluster wizard.

  • When the Create Cluster wizard starts click Next on the first page. Enter the desired cluster name into the field and click Next

  • Uncheck the Add all eligible storage to the cluster and click Next

  • This will now create the cluster with the given parameters. Wait until the process finishes and check the result. I everything looks OK, click Finish

  • When the process creates the cluster, it tries to use the nodes’ IP Addresses as cluster IP Addresses. Obviously, it will not work. We need to modify the IP addresses listed under the Cluster Name resource. It has 2 IP Addresses, because this cluster has nodes in 2 different subnets

  • Use the previously selected IP addresses and modify the resources. Right-click on the resource and select Properties

  • Select Static IP Address and fill up the Address field correctly with your selected value. When finished click OK

  • Do the same for the other resource. During installation I picked address ends 110 as IP address in both EUN and EUW subnets
  • When the IP address resources are configured correctly, the Cluster Name resource can be brought Online. It will bring online the correct IP address resource as well. Obviously, there can be only 1 IP address resource online

  • The next step is to create and configure FSW for the quorum. Login to the selected server and create folder FSW if it is not exist. Modify the permission to have the 2 cluster-nodes Modify access and the cluster object Full access

  • For the share permission, it is OK to use Everyone with Read/Write access. The restriction is on NTFS level.

  • Go to the Failover Cluster Manager. Right-click->More Action->Configure Cluster Quorum Settings…

  • Click Next on the first page of Configure Cluster Quorum Wizard then select the Select the quorum witness configuration option. Hit Next

  • On the next page, select Configure a file share witness option and click Next

  • On the next page Browse… for the correct file share and click Next

  • Click Next on the Confirmation page if everything is OK to configure the FSW
  • When the FSW is configured click Finish. The following should appear in the list of Cluster Core Resources

Configure BAGs

To configure the BAGs we need to create 2 temporary databases. With these 2 databases we can create the groups and later the new databases can replace the temporary ones.

  • Login to a server where SSMS is installed and has access to the primary replica server
  • Start SSMS
  • Connect to the SQL servers which will hold the primary replica of the BAGs
  • Create 2 temporary databases by Right-click->New Database… on the Databases folder

  • Name it as you see fit. It is only a temporary DB. We will delete it later. Hit OK to create the database.

  • Create the second database as well
  • To be able to configure AlwaysOn High Availability the nodes need to be prepared for that
  • Start SQL Server Configuration Manager. Select SQL Server Services
  • Right-click on the SQL Server (<InstanceName>) entry and select Properties

  • On the AlwaysOn High Availability tab check the Enable AlwaysOn Availability Groups and click OK

  • To have the configuration take affect we need to restart the services. Right-click on the SQL Server (<InstanceName>) entry and select Restart.

  • Go back to the SSMS and create Full Backup of both temporary DBs. We can start by right-click on the DB and select Tasks and Back Up…

  • Make sure that the details are selected accordingly and note the folder of the backup location. Click OK to start the backup

  • When finished create a Full Backup of the other database and copy the files to the secondary SQL server
  • Open SSMS on the secondary replica server. Right-click on the Databases and select Restore Database…

  • Select Device as Source and click to add a device. In the opening window, select Add and select the first file just copied to the server and click OK on the file selection and OK again on the device selection.

  • Go to the Options page and select RESTORE WITH NORECOVERY as Recovery state. Then click OK to start the restore

  • If the restore was successful, you should see the database in a state (Restoring…). This state is required on the secondary replica when we add it to an Availability Group.

  • Repeat the steps with the second database
  • Go back to the SSMS and expand AlwaysOn High Availability. Right-click Availability Groups and select New Availability Group Wizard…

  • On the first page of the wizard click Next. Fill the Name of availability group and click Next again

  • Select the desired database from the list and click Next

  • Select Add Replica… and connect to the secondary replica server

  • Add and configure the replica as you can see on the screenshot and go to the Endpoints tab

  • Verify that the port number is 5022

  • The Backup Preferences are greyed out, because of the limitation of Basic Availability Groups. You can back up only the Primary replica. Do not create a listener just yet. Click Next
  • On the Data Synchronization settings select Join only, as we already backed up the database and restored it on the secondary replica with the correct settings. Click Next again

  • Check the Validation and click Next. Note that there is a Warning at the listener configuration, because we did not configured it yet.

  • On the Summary page click Finish to start the creation of the Availability Group

  • When the BAG is created there should be a Result appears. If no error happened, click Close

  • Repeat the steps to create the second BAG using name you selected for DW and the other temporary DB
  • When both BAG is created the Object Explorer in SSMS should look like this:

Create and configure Internal Load Balancers (ILBs)

To create and configure the ILB we need to do the following steps. Please do these tasks in the same subscription and Azure Resource group as the SQL servers

Create Load Balancers

  • On the Main screen Click Add
  • Use the search field to look for Load Balancer
  • Select the one called Load Balancer and the provider is Microsoft
  • Use the following settings to create 2 Load Balancers, 1 in each subnet where the SQL servers are in

    • ILB Name: A text name representing the load balancer. Used during the initial configuration were LB-EUN-SCOM-SQL / LB-EUW-SCOM-SQL
    • Type: Internal
    • Virtual network: The virtual network the SQL server is in
    • Subnet: Subnet that the SQL server is in
    • IP Address assignment: Static
    • Private IP address: EUN IP address ends with .111 / EUW IP address ends with .111
    • Subscription: Same as the servers are in
    • Resource Group: Same as the specified SQL server is in
    • Location: Same as the specified SQL server is in

Configure Backend Pool

  • Click the load balancer you just created
  • On Settings, select Backend pools
  • On Backend pool, click Add to create a new pool
  • On Add backend pool, click +Add a virtual machine
  • Under Choose virtual machines select “None” as availability set and select the available VM
  • Click OK to finish adding the backend pool

Configure Health Probe

  • On the load balancer Settings blade, select Health probes
  • On the Health Probes blade, click Add
  • Configure the probe with the following details:

    • Name: A test name representing the probe.
    • Protocol: TCP
    • Port: 59999
    • Interval: 5
    • Unhealthy threshold: 2

Configure Load Balancing Rules (5 rules on both ILB)

  • On the load balancer Settings blade, select Load balancing rules
  • On the LB Rules blade, click Add
  • Add load balancing rules with the following details:

    • Name: A text name representing the load balancing rule
    • Protocol: TCP
    • Port: 1433 / 135 / 139 / 5723 / 445
    • Probe: Select the probe you created before
    • Session persistence: None
    • Idle timeout (minutes): 4
    • Floating IP (direct server return): Enabled

Configure Listener for the Primary BAG

When the ILBs are created, the time is right to configure the listener for one of the BAGs. In these steps, we will manually create the listener in Failover Cluster Manager and SSMS. We need only one listener, because we make sure that both BAGs will run on the same node all the time. See later on the SQL Agent job for failover section.

  • Login to the primary cluster node with a user enough rights to configure cluster resources
  • Start Failover Cluster Manager and connect to the cluster if necessary
  • Expand the cluster name and select the Roles folder
  • In the Roles pane, right-click the Availability Group name and select Add Resource->Client Access Point

  • In the Name field, enter the name of the Availability Group Listener, click Next twice and Finish. Do not bring the resource online just yet

  • Click to the Resources tab and then expand the Client Access Point we just created. There should be 2 IP address resources, 1 for each cluster network

  • Modify both IP Address resource to use the static IP Addresses we selected for it (x.x.x.111). Right-click and select Properties

  • Select Static IP Address and fill up the Address field with the IP Address related to that network

  • After changing the IP Addresses take a note of the Name of the IP Address resources. That name also can be changes as we see fit. In this configuration, we changed it to IP Address EUN / IP Address EUW to represent the location of the subnet. It will be used to assign value to the $IPResourceName variable in the PowerShell script we will run on both node
  • Go to the Networks folder. We can see two cluster-networks, because the cluster has nodes from two Azure VNETs. Please take a note of both network names. Later we will use them to assign values to $ClusterNetworkName variable in a PowerShell script

  • Now, login both cluster node. Modify the following script according the data we collected for the variables and execute it on the corresponding node in an elevated PowerShell window. This script configures the IP Address resource with the IP address of the cloud service and sets other parameters like the probe port. When the IP Address resource is online, it can then respond to the polling on the probe port from the ILBs created in the previous section.

# Define variables

$ClusterNetworkName = “<Name of the ClusterNetwork of this node>2”

# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)

$IPResourceName = “<Name of the IP resource>”

# the IP Address resource name

$ILBIP = “<IP Address>”

# the IP Address of the Internal Load Balancer (ILB)

Import-Module FailoverClusters

# If you are using Windows Server 2012 or higher:

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{“Address”=”$ILBIP”;”ProbePort”=”<ProbePort>”;”SubnetMask”=”255.255.255.255″;”Network”=”$ClusterNetworkName”;”EnableDhcp”=0}

  • Now, navigate to the Failover Cluster Manager. Expand Roles and the select the Availability Group we just created for the listener. On the Resources tab, right-click on the Listener’s Name and click Properties

  • Click on Dependencies tab. It here are multiple resources listed, verify that the IP addresses have OR, not AND, dependencies. Click OK

  • Bring the listener Online. Once it is online, from Resource tab, right-click the availability group and click Properties

  • Create a dependency on the listener name resource (not the IP address resources name). Click OK

  • Launch the SSMS and connect to the primary replica
  • Navigate to AlwaysOn High Availability->Availability Groups->Availability Group Listeners
  • There should be now the listener available we created in the Failover Cluster Manager. Right-click on the listener name and click Properties

  • In the Port field, specify the port number for the availability group listener the same the instance is listening on (default is 1433), then click OK

  • Because SCOM is not supporting the MultisubnetFailover in the connection string we need to make sure in case of any failover the interruption in the connection is minimal. To achieve this we need to set the RegisterAllProvidersIP and HostRecordTTL parameters for the ListenerNetworkName cluster resource. To do this we need to run the following PowerShell command in an elevated PS window on one of the cluster nodes.

$CAPName = ‘ListenerNetworkName’

Get-ClusterResource $CAPName | Set-ClusterParameter -Multiple @{“HostRecordTTL”=300;”RegisterAllProvidersIP”=1}

  • When everything is done, the only thing we need to do is to test the availability group connectivity using the listener name.
  • Login to the previously mentioned server and start SSMS
  • Try to connect to the instance using the Listener Name only.

  • When everything is configured correctly, the SSMS can connect to the DB Engine without issues

SQL Agent Job for Failover

Because we use only one AG listener, we need to make sure that all the BAGs are using the same node as Primary replica. To accomplish this task we create a SQL Agent Job on both Node to check whether the priority database is running locally or not. If the priority script are running locally and the secondary BAG are not, then the script initiate a Failover for the secondary BAG. Here is the T-SQL script what we use for it:

DECLARE @king_bag UNIQUEIDENTIFIER;

— set this to GUID of king bag

SELECT @king_bag = ag_id

FROM sys.dm_hadr_name_id_map

WHERE ag_name = ‘<Name_of_the_king_ag>’;

DECLARE @loyal_subject UNIQUEIDENTIFIER;

— other bag

SELECT @loyal_subject = ag_id

FROM sys.dm_hadr_name_id_map

WHERE ag_name = ‘<Name_of_the_loyal_subject_ag>’;

USE master;

IF ( SELECT role_desc

FROM sys.dm_hadr_availability_replica_states

WHERE group_id = @king_bag

AND is_local = 1

) = ‘PRIMARY’

BEGIN

IF ( SELECT role_desc

FROM sys.dm_hadr_availability_replica_states

WHERE group_id = @loyal_subject

AND is_local = 1

) <> ‘PRIMARY’

BEGIN

ALTER AVAILABILITY GROUP [<Name_of_the_loyal_subject_ag>] FAILOVER;

END;

END;

We need to create an SQL Agent Job, which runs the mentioned T-SQL script in every 5 minutes. In the first step, we need to decide which database will be the priority one and which is the secondary:

  • Primary: OperationsManager database
  • Secondary: OperationsManagerDW database

Follow the steps to create the job:

  • Logon to a node with SQL Server Management Studio (SSMS) installed on it and access to the Primary and Secondary replicas of the desired Cluster
  • In SSMS connect to the Primary replica of the Priority BAG and expand SQL Server Agent->Jobs folder

  • Right-click on Jobs and select New Job… This will start the New Job wizard

  • The first is the General page. Fill up the Name as you see fit, select the Category and enter Description. When done, go to the Steps page

  • Click New… to create a new step. This will open a New Job Step wizard.

  • On the General page of this wizard enter the Step name, select the Type, leave the database as master and enter the script into the Command field. When done go to the Advanced page

  • Because this is the only step in this job, we select Quit the job reporting success as On success action. Retry attempt is 3 and Retry interval is 1 minute. Select the Output file for the report by using the button. When everything is set click OK

  • This will take us back to the New Job wizard where we now select the Schedules page and click New… on it to start the New Job Schedule configuration window

  • Enter the desired data to the fields (like screenshot) and click OK to add the schedule

  • Click OK on the New Job wizard to create the job. If everything set correctly, we will see our new job appears on the list.
  • Do the same steps on the Secondary replica as well

Change the databases in the BAGs

  • When the installation of the Management Group is created the databases stop the following services on all Management Servers:

    • System Center Data Access Service (OMSDK)
    • System Center Management Configuration (cshost)
    • Microsoft Monitoring Agent (HealthService)
  • Login to the SQL server, which is the Primary replica for the BAGs, and start SMSS and connect to the DB engine running on the primary replica. Use the installation user
  • Modify the Recovery Model to FULL for the following databases. It is a requirement for Availability Groups:

    • OperationsManager
    • OperationsManagerDW
  • To do this Expand the Databases folder and right-click on the database. Click Properties

  • Select Options page and set the Recovery Model to FULL then click OK

  • Next step is to create a FULL backup of the databases and restore it to the secondary replica server. Follow the Configure Bags steps 12-18. Obviously, we will do it for the production databases and not for the temporary ones
  • After the databases are available on both node with a desired status we can start replacing the temporary databases with the ones we just prepared. Go to the SSMS on the Primary replica server
  • Expand AlwaysOn High Availability->Availability Groups-> <Your_availiability_group> ->Availability Databases. Right-click on the temporary database and select Remove Database from Availability Group…

  • Click OK in the confirmation windows. This will remove the database from this Availability Group. Then right-click on the Availability Databases folder and select Add Database…

  • Click Next on the first page. Then select the database we are about to add to the AG and click Next

  • Select Join only and click Next

  • Click on Connect… and connect to the secondary replica database

  • When it is connected, click Next
  • Click Next on the Validation page and Finish on the Summary page
  • When the process is finished click Close

  • Repeat steps for the other database

Summary

After following the steps described in this article, there should be a complete multi-subnet, multi-datacentre SQL backend solution for SCOM 2016, where the databases are running in high available mode using SQL 2016 Standard Edition Basic Availability Groups.

Enterprise Mobility Workshops - 24th November 2015 - London | 9:00am – 3:00pm

2 thoughts on “Azure multi-region & multi-subnet – Basic Availability Groups – SQL 2016 Standard Edition deployment for SCOM 2016

  1. I have read a number of articles covering different aspects of the setup, but this post brings it all together and in one place very nicely! Thank you for sharing.

    One question – I have recently ran into an issue with a similar SQL setup when application using it could not set multi-subnet failover option on the connection string (error stating that connecting to mirrored database is not supported, even though mirroring was never used). Testing with Windows built-in ODBC app and latest SQL driver produced the same results – testing connection with multi-subnet option fails, without works. After extensive troubleshooting that didn’t yield any conclusive results I have upgraded to SQL Server 2016 Enterprise edition and that resolved the issue. Have you seen something like this? Any ideas?

Leave a Reply

Your email address will not be published. Required fields are marked *