IT Obstacles: If It Ain’t Broke
When we consider so many items in our lives, many people feel the common adage, “If it ain’t broke, don’t fix it.” In technology, this can be detrimental to our business and personal lives if we do not pay close attention to the risks associated with taking such a stance. Attempting to utilize out of date technology can be a money saver on the surface, but more often, it is a money trap waiting to spring from both a capital expense perspective, and an operating expense perspective.
Years ago, I was working to identify a series of systems and determine use and necessary upgrades for those systems. In my work, I came across several old, older, and oldest systems that were in use and identified one, an older AS400 that was over 15 years old. The system was a core to about 400 individual and was critical to do their work, and each person that I discussed the system with promptly told me two things: They could not work without the system, and it was OK because they paid support for that system.
Each person involved was adamant that we could not touch that system because “They were special”, “It could not be down”, and “They had support so we did not need to worry about it.”
As my team reviewed the system, I sat down with the system owner and called the vendor. They had been paying an excessive amount of money each year for support and I asked the vendor a simple question. “If the system goes down with a hardware failure, will you guarantee it will be repaired?” There was a pause, and then the answer came back. “Our SLA is we will have a technician on site within 4 hours.” I smiled, waited, and asked the question differently, “Can you guarantee you will be able to bring the system back online”, and the answer came back again, “Our SLA is we will have a technician on site within 4 hours.” We had some additional discussions but after the call I looked at the system owner, a non-technical person in charge of a major area and asked if they understood what had just happened, they were very thoughtful, and simply said. “I think we need to look at some additional options.”
We replaced that system with a newer box and worked towards replacement of the software. By utilizing virtual techniques we moved the system to a more resilient platform, ensuring the system would be online as necessary, and ensuring the solution would not be a tech onsite within 4 hours, but instead a system supporting 400 workers that would be online even in the event of a disaster.
So why did we make a good decision? It is easy. First, if the entity had gone down for even 1 hour, the 400 workers effected would cost an excessive dollar amount. Even if it is a minimum job at $10 an hour, which it was not, that is $4000 dollars an hour. If an outage was experienced it could have become a massive dollar amount in operating expenses in time lost that overshadows any other cost. Second, if the data had been lost, there would not have been alternate operating systems or hardware to bring the system back online and the cost of losing the data could be immeasurable. Third, the system itself, being out of date for so long had numerous security issues and could easily have been a breach of data that is protected by regulation. This alone can destroy both credibility of a business and business finances with minimal opportunity for recovery. Fourth, the system itself was impacting users and becoming less and less usable, causing actual workers to find workaround to do their work that was even more costly.
Of course there were many more reasons, but how does this matter to small and large businesses alike? Well, as the age of a system goes up, we add risk to that system and potential points of failure including replacement issues. The bigger the system, meaning the more moving parts, the more possible it is to run into issues as the systems can be affected more easily and impact users more easily.
A simple approach can be HardwareAge+OSAge+Risk+userimpact+financialimpact-DR resilience<10.
Why?
Well, as hardware ages, it requires updates but also may require replacement parts. As the parts become less available, the risk to the system is difficult and can be frustrating. If you virtualize you should consider the virtual strategy to be part of the same equation, but in the case of the system, your hardware age is always 1 as the virtual system then becomes the necessary upgrade.
The operating system can become a nightmare as its age goes up since it will develop more and more security risks. If it is end of life and not being supported anymore, you are instantly at major risk and need to find a solution. We often forget the Operating system and it is the source of much of what we do, and in most programs the foundation for doing work at all.
Risk can be a massive discussion on its own, but in this case let us consider risk as regulatory or agency risk as the whole equation is about risk indirectly. So consider risk from 0-5 where five is the greatest controlled items and regulatory work, like HIPAA, and zero is no risk at all.
For user impact and financial impact this is subjective but rate the impact from 0-3 where 0 is no impact at all, and 3 is high impact.
Disaster resilience can subtract from your score by creating situations where you can be back online quickly without as much risk of downtime. This can be achieved through programs that bring your system back online quickly. Using a virtual machine and a solution like Datto can get you back online quickly even in the event of a total loss, creating a lowering of overall risk.
This is not a hard rule and it is something I worked out to explain to people the risks associated with systems in a simple manner. A good technology professional would look at this and say it is a start, but there is a lot more to it, but this will let you know where to start. If you come up with a number greater than 10, it is definitely time to start talking to someone. If we take the example, we had previously. We get these numbers:
15+16+5+3+3-1=42
Every increment beyond 10 should have been a red flag, and in this case the DR resilience could have been 1, and still it would have been bad.
This is still just a guess. It is just as valid to measure supportability and availability with no equation. As the number of people who can support a system dwindles, it builds risk quickly whether the system is high risk or not. Multiple times, I have been put in the position of finding a way into a system that no one knows a password to and no one knows how to repair. If your support is single threaded, it is time to replace the software, the hardware or both.
It is also important to look closely at what vendors say to you. Obviously, there is no guarantee on any system but when you are not given an ETA or an escalation path in case of an outage, you are skirting with downtime, and potential costs associated with such.
Remember, if a system is not critical, will cost no time, will not be missed, has no critical or useful data on it, and can be gone forever with no impact on you or your business, then maybe it is OK to keep a very old system. I am sure there are some exceptions as well, where a piece of software would cost a lot to upgrade and the upgrade is avoided, but in the end if you have these systems and say, “If it ain’t broke, don’t fix it” maybe those machines should be shut down anyway and new solutions found to really help business.