Basics of SQL Server Clustering

If your AAA critical SQL server faces a memory board failure, how long will be the outage? How much will this cost your business in productivity and data availability to the users? Being a SQL Server DBA can be demanding and stressful, as the success of your application is often a function of your database uptime. As DBA, we have some control over the uptime of SQL servers, but there are many uncertain areas, which we do not have full control of. There is not much a DBA can do if motherboard fails on a server. As you may already be aware, there is one way to help boost your SQL Server’s uptime, and that is by clustering your SQL Servers. This way, should one SQL Server fail in the cluster, another clustered server will automatically take over, keeping downtime to minutes, instead of hours or more.

Clustering can be best described as a technology that automatically allows one physical server to take over the tasks and responsibilities of another physical server that has failed. The obvious goal behind this, given that all computer hardware and software will eventually fail, is to ensure that users running AAA applications will have little or no downtime when such a failure occurs. Downtime can be very expensive, and our goal as DBA is to help reduce it as much as possible.

More specifically, clustering refers to a group of two or more servers, also called nodes, that work together and represent themselves as a single virtual server to the network. In other words, when a client connects to clustered SQL Servers, it thinks there is only a single SQL Server, not more than one. When one of the nodes fails, its responsibilities are taken over by another server in the cluster, and the end-user notices little, if any differences before, during, and after the failover.

One very important aspect of clustering that often gets overlooked is that it is not a complete backup system for your databases. It is only one part of a multi-part strategy required to ensure minimum downtime and 100% recoverability.

The main benefits that clustering provides is the ability to recover from failed server hardware -- excluding the shared disk, and failed software; such as failed services or a server lockup. It is not designed to protect data, to protect against a shared disk array from failing, to prevent hack attacks, to protect against network failure, or to prevent SQL Server from other potential disasters, such as power outages.



 

Clustering is just a part of an entire strategy needed to help reduce SQL Server downtime. You will also need to have a shared disk array that offers redundancy and make tape backups. So don’t think that clustering is all you need to create a highly available SQL Server system. It is just one part of it.

 

Types of SQL Server Clustering

Once you decide to go for clustered SQL Server, you have to choose the cluster layout. This choice is extremely important for architecting the clustering environment and it can be made upon your application and business needs. Let’s look at the configuration types.

 

Active / Passive

An Active/Passive, or Single Instance cluster, refers to a scenario where only one instance of SQL Server is running on one of the physical node in the cluster, and the other physical node does nothing, other then waiting to takeover should the primary node fail, or a manual failover for maintenance. From a performance perspective, this is the better solution. On the other hand, this option makes less productive use of your physical hardware, which means this solution is more expensive.

If an active node fails and there is a passive node available, applications and services running on the failed node can be transferred to the passive node. Since the passive node has no current workload, the server should be able to assume the workload of the failed server without any problems (assuming the hardware of the nodes is the same).

 

2-Node Clustering Active / Passive Scenario

In this case, let's look at a two node example, Node X and Node Y. Node X will be configured as Active Node -- Primary Owner of SQL Server instance and having that instance running on it. As you can see in the case below, Node Y is in passive or standby mode, doing nothing. The active cluster will be communicating and working along with the shared disks.

 

2-Node Clustering Active / Passive Failover Scenario

When a failover occurs on Node X, SQL Server instance A will get transferred, with all its running processes, connections, and responsibilities to Passive Node Y, and now Node Y will be the Active Node. As you can see, even after the failover, the active cluster is communicating and working with Shared Disks as usual, there is no change.

 

4-Node Clustering Active / Passive Scenario

In this case, let's look at an example of four nodes, Node X and Node Y, Node XX and Node YY. Node X will be configured as an Active Node -- Primary Owner of SQL Server Instance A and Node XX is also an Active Node – Primary Owner of SQL Server Instance AA. As you can see in below case Node Y and YY are in Passive, or Standby mode, doing nothing.




 

4-Node Clustering Active / Passive Failover Scenario

When failover occurs on Node X, SQL Server Instance A will get transferred with all its running processes, connections, and responsibilities to Passive Node Y, and now Node Y will be an Active Node. When failover occurs on Node XX, SQL Server Instance AA will get transferred with all its running processes, connections, and responsibilities to Passive Node YY, and now Node YY will be an Active Node.

 

Active / Active

An Active/Active SQL Server cluster means two separate SQL Server instances are running on both nodes of a two-way cluster. Each SQL Server acts independently, and users see two different SQL Servers instances. If one of the SQL Servers in the cluster should fail, then the failed instances of SQL Server will failover to the remaining server. This means that then both instances of SQL Server will be running on one physical server, instead of two. As you can imagine, if two instances have to run on one physical server, performance can be affected, especially if the server’s have not been sized appropriately. Remember that two separate SQL Server instances in this configuration are entirely isolated entities by default.

If all severs in a cluster are active and a node fails, the applications and services running on the failed node can be transferred to another active node. Since the server is already active, the server will have to handle the processing load of both systems. The server must be sized to handle multiple workloads or it may fail as well.

 

2-Node Clustering Active / Active Scenario

In this case, let's look at an example of two nodes, Node X and Node Y. Node X and Y both will be configured as Active Nodes, Primary Owner of SQL Server Instances A and B on each of them. As you can see below, Node X and Y both are active and running an instance of SQL server on each of them.

 

2-Node Clustering Active / Active Failover Scenario

When failover occurs on Node X, SQL Server Instance A will be  transferred with all its running processes, connections, and responsibilities to Active Node Y, and now Node Y will have to share all its memory, CPU and network resources with Instance A and B.

 

4-Node Clustering Active / Active Scenario

In the 4-node configuration illustrated below, where nodes X, Y, XX and YY are configured as active and failover could go to between nodes X and Y or nodes XX and YY, this could mean configuring servers so that they use about 25% of CPU and memory resources under average workload. In this example, node X could fail over to Y or node XX could fail over to YY.

 

4-Node Clustering Active / Active Failover Scenario

When failover occurs on Node Y, SQL Server Instance B will be transferred with all its running processes, connections, and responsibilities to active Node X, and now Node X will have two instances A and B, sharing all the resources. When failover occurs on Node YY, SQL Server Instance BB will get transferred with all its running processes, connections, and responsibilities to Active Node XX, and now Node XX will have two instances, AA and BB, sharing all the resources.



 

In a multi-node configuration where there are more active nodes than passive nodes, the servers can be configured so that under average workload they use a proportional percentage of CPU and memory resources.

Active/Active configuration can have multiple-instance cluster set up, which can support up to 16 SQL Server instances. Windows NT Server 4.0 Enterprise Edition, Windows 2000 Advanced Server, and Window 2003 Advanced Server all support two-node clustering, Windows 2000 Datacenter Server supports up to four-node clustering, and Windows 2003 supports up to eight node clustering, however you are limited to four nodes if SQL Server 2000 clustering is to be used.

SQL Server, in a clustered environment, also behaves differently from a stand-alone named instance in relation to IP ports. During the installation process, a dynamic port that may be something other than 1433 is configured, and that port number is reserved for the instance. In a failover cluster, multiple instances can be configured to share the same port, such as 1433, because the failover cluster listens only to the IP address assigned to the SQL Server virtual server, and is not limited to a 1:1 ratio. However, for security and potentially increased availability, you may want to assign each virtual server to its own unique port of your choice, or leave it as it was configured during installation.

 

Scaling Clustering Resources