1-888-310-4540 (main) / 1-888-707-6150 (support) info@spkaa.com
Select Page

How to Deal with Single Points of Failure: Software

Published by Mike Solinap
on May 28, 2013

In our series looking at false economies that place your business at risk, we have considered the dangers of leaving system administration and maintenance to engineers and looked at the single point of failure risks associated with hardware. But hardware isn’t the only possible single point of failure. There are several others including software and the unknown human factor.

Software is a key risk and if it fails, for whatever reason, it leaves your engineers (and likely others) unproductive. In this context software can include operating systems, development and design tools and support services such as web and email.

The problem with software is that it can be complicated. Hardware is complicated, but in the worst case scenario, a piece of misbehaving hardware can be replaced with a new one. But software is different. There is installation, configuration, upgrades, updates and performance tuning. Any one of these tasks, if performed incorrectly, can cause software to malfunction.

Operating systems aren’t generally upgraded often (maybe every few years) but they are frequently updated. Microsoft, Apple and the Linux distribution providers all publish regular, critical updates. A failed update on a key system can bring everything in your business to a halt without properly managed IT services.

When OS upgrade time does come around, it shouldn’t be undertaken lightly. Upgrading from one edition of Windows Server to another isn’t necessarily straightforward, neither is a move from a major point release of a Linux distribution (like CentOS 5 to CentOS 6).

Development tools are another key single point of failure. If your design team can’t use the CAD software, or your developers can’t compile code then valuable time can be lost while whole teams of people sit around waiting for the issues to be fixed.

The damage done to a business by the failure of key support systems like email and web can also be significant. The warnings about OS and development tools are equally applicable to email servers (like Exchange) or web services (like Apache or JBoss).

There are several rules to follow to help reduce the risk of software failure:

  1. Never update or upgrade a production / live system until a test system has been upgraded / updated first and the results monitored.
  2. Always keep good backups not only of system data but also of configuration information. Full system (image) backups are also essential.
  3. Ensure that engineers and other tech savvy staff don’t try to “tweak” the systems. All performance tuning, updates and upgrades should be done by those who are intimately familiar with the system.
Another possibility to help mitigate against the risk of critical software failure is to switch to using Software as a Service (SaaS). By moving critical software applications to the cloud or to managed hosting, the risk of local failures in terms of configuration, scalability or upgrades is reduced. Good SaaS services also include redundancy and backup which removes the burden on local IT staff.

Testing upgrades before they are applied to live systems, keeping backups and tuning systems can be a time-consuming task. If the proper support staff aren’t on hand, then these tasks can be seen as a lower priority. This in turn increases the risk of software being a failure point.

Using an IT outsourcing company to handle these software related tasks, including creating and managing SaaS, can free your current staff and also ensure that experts with extensive software administration experience are protecting your investment and ensuring that your designers can keep working.

Next Steps:

Latest White Papers

Three Trends Are Transforming The Service Desk

Three Trends Are Transforming The Service Desk

Your IT service desk is about to change. Find out what's shaping the future. Three factors — enterprise service management (ESM), collaboration, and intelligent service management — are driving the transformation of the service desk. To better meet customers’ needs...

Related Resources

Extending CloudBees SDA Analytics

Extending CloudBees SDA Analytics

CloudBees SDA Analytics has more power than you think One of the main features of CloudBees SDA is CloudBees Analytics, powered by ElasticSearch. It’s a powerful tool for displaying continuous integration data and there are loads of useful metrics available from...

How To Add More Disk Space To Your Redhat Server Without Reformatting

How To Add More Disk Space To Your Redhat Server Without Reformatting

(Originally published in 2012, updated January 2022.) One of the common tasks for any system administrator is managing disk space on a server. A common question is how to increase disk space on a linux system. I won't go into a boring lecture on why managing disk...

The Power of CloudBees Procedures

The Power of CloudBees Procedures

What are Procedures in CloudBees SDA? In CloudBees SDA, procedures are the basis for reusable code. Any engineer familiar with setting up CD pipelines knows the irritation of re-coding the same task in a multitude of different ways for different teams. With...