1-888-310-4540 (main) / 1-888-707-6150 (support) info@spkaa.com
Select Page

How to Deal with Single Points of Failure: Software

Published by Mike Solinap
on May 28, 2013

In our series looking at false economies that place your business at risk, we have considered the dangers of leaving system administration and maintenance to engineers and looked at the single point of failure risks associated with hardware. But hardware isn’t the only possible single point of failure. There are several others including software and the unknown human factor.

Software is a key risk and if it fails, for whatever reason, it leaves your engineers (and likely others) unproductive. In this context software can include operating systems, development and design tools and support services such as web and email.

The problem with software is that it can be complicated. Hardware is complicated, but in the worst case scenario, a piece of misbehaving hardware can be replaced with a new one. But software is different. There is installation, configuration, upgrades, updates and performance tuning. Any one of these tasks, if performed incorrectly, can cause software to malfunction.

Operating systems aren’t generally upgraded often (maybe every few years) but they are frequently updated. Microsoft, Apple and the Linux distribution providers all publish regular, critical updates. A failed update on a key system can bring everything in your business to a halt without properly managed IT services.

When OS upgrade time does come around, it shouldn’t be undertaken lightly. Upgrading from one edition of Windows Server to another isn’t necessarily straightforward, neither is a move from a major point release of a Linux distribution (like CentOS 5 to CentOS 6).

Development tools are another key single point of failure. If your design team can’t use the CAD software, or your developers can’t compile code then valuable time can be lost while whole teams of people sit around waiting for the issues to be fixed.

The damage done to a business by the failure of key support systems like email and web can also be significant. The warnings about OS and development tools are equally applicable to email servers (like Exchange) or web services (like Apache or JBoss).

There are several rules to follow to help reduce the risk of software failure:

  1. Never update or upgrade a production / live system until a test system has been upgraded / updated first and the results monitored.
  2. Always keep good backups not only of system data but also of configuration information. Full system (image) backups are also essential.
  3. Ensure that engineers and other tech savvy staff don’t try to “tweak” the systems. All performance tuning, updates and upgrades should be done by those who are intimately familiar with the system.
Another possibility to help mitigate against the risk of critical software failure is to switch to using Software as a Service (SaaS). By moving critical software applications to the cloud or to managed hosting, the risk of local failures in terms of configuration, scalability or upgrades is reduced. Good SaaS services also include redundancy and backup which removes the burden on local IT staff.

Testing upgrades before they are applied to live systems, keeping backups and tuning systems can be a time-consuming task. If the proper support staff aren’t on hand, then these tasks can be seen as a lower priority. This in turn increases the risk of software being a failure point.

Using an IT outsourcing company to handle these software related tasks, including creating and managing SaaS, can free your current staff and also ensure that experts with extensive software administration experience are protecting your investment and ensuring that your designers can keep working.

Next Steps:

Latest White Papers

Atlassian Cloud: Understanding Zero Trust Security

Atlassian Cloud: Understanding Zero Trust Security

Where To Start & Why It Matters What is the Atlassian Cloud Zero Trust Security model? Well, for decades, enterprise security controls were built to protect a large, single perimeter around a corporation. Often described as castle-and-moat security, This approach...

Related Resources

Top 6 Ways To Improve Your DevOps Journey

Top 6 Ways To Improve Your DevOps Journey

Knowing how to improve DevOps can be challenging. But, creating an integrated DevOps toolchain can set organizations apart from the rest. This is because having a well-defined business DevOps journey can reduce errors, improve collaboration and drastically increase...

CloudBees SDA:  Software Delivery Without Silos

CloudBees SDA:  Software Delivery Without Silos

Tired of missing deadlines, and only measuring performance of departments, but not of your entire software delivery organization? This blog will show you how CloudBees Software Delivery Automation (CloudBees SDA) can break down silos, drive DevOps mentality and...