Are clustered servers too difficult?
I was talking with my friend the other day who works for a medical device manufacturer about his companies fear of clustered servers. His peers and managers have the opinion that clustered servers are hard to maintain, including long a frequent maintenance periods and are difficult to support.
Both he and I have been using clusters in enterprise environments for many years and this thought of difficulty is far from the truth. Granted clustered servers are more difficult to setup than a stand alone server and are more expensive but what is the downtime per hour or per minute of your business application (or typically in clustered databases - many applications)? If you had a hardware failure or an extended maintenance window because you are not using clustered servers, shame on you. But I digress.
Once you setup a cluster, there is no "cluster maintenance", it just works. There are standard OS updates, but at least you can fail over to the other server and have an outage of seconds comparded to how long it takes for a stand alone server. Lets say your application has a memory issue, in a non-clustered environment, you may need to restart the entire server, but in a clustered environment, you can failover your node to another box and cut downtime by factor of 10 or more.
Managing a clustered server takes a little bit of knowledge, but any skilled systems engineer or DBA will not have a problem. I think fear of the unknown and the misconception that clustered servers are really complicated give them a bad rep.
Are clustered servers worth the money? Talk with your preferred server vendor and get a price comparison between stand alone server and a clustered pair. You will find the storage component is the biggest cost factor. Often clustered servers are paired with SANs, but this isn't a requirement. Way the add cost of the clustered servers with the down time of a server during business hours. If it takes 2 hours for a support person to find what the issue is with the server, they will need to find replacement parts, if you are lucky enough to have a support agreement with fast turnaround then you maybe down for 4 hours. If you don't, include time for person to get to server, diagnose, find replacement parts, and hope they actually solve the problem. So, now it has turned into an all night ordeal, granted in worst case scenario, but this is life and it happens to the best of us. Factor in lost business, reputation to affected clients and affect it has on your employees. Now think if you just had a clustered server it would have failed over automatically and down time would have been in seconds.
Tell me what you think?
