Saturday, May 23, 2015

Factors Affecting Software Resiliency

The digital transformation is happening everywhere right from small private firms to government organizations. On the personal front, connected things is coming on, where by every thing that we have or use will be smart enough to connect and communicate with other things(systems). This in effect means there will be an increased reliance on IT systems to accomplish various tasks. This will call for high order of resilience on the part of such systems and the absence of which may lead to disasterous situation.

As we all know, the word resiliency means 'the ability to bounce-back after some events'. In otherwords, it is a capability of withstanding any shock or impact without any major deformation or rupture. In software terms, resilience is the persistence of the avoidance of failures when facing a change or in a deviated circumstance.

To design a resilient system, one should first understand the various factors that work against the resiliency. Here are some such factors:

Design Flaws

Design and Architecture of the systems is a major factor that works in favor or against the resiliency requirement. The architects shall while designing the system or solution should have a good understanding of what could go wrong and provide for an exception handling ability, so that all exceptions are appropriately handled, making the system not to go down and instead recover from such exception and continue to operate. The architects have many options today in terms of tools, technologies, standards, methodologies and frameworks that help buidling resiliency within. It is the ability of choosing the right combination of tools, technologies, etc for the specific systems that will decide on the resilience capability of the system. 

Software Complexity

The size and complexity of software systems is increasing, thus the ways in which a system can fail also increases. It is fair to assume that the increase in failure possibilities does not bear a linear or additive relationship to system complexity. Typically, the complexity of the software systems increases as it evolves by responding to the changing business needs. This is more so as the tools and technologies used to design and build the software are becoming outdated, making it difficult in maintaining the systems. 

This complexity attribute makes it increasingly difficult to incorporate resiliency routines that will respond effectively to failures in the individual systems and in their complex system. The cost of achieving an equivalent level of resiliency due to the complexity factor should be added to that of the individual systems

Interdependency and Interconnectivity

We are living in a connected world and systems of many of today's businesses depend on connectivity with their partner entities to do their business. This adds multiple points of failures over and above the network connectivity. The system resiliency is increasingly dependent on the resiliency of systems different other organizations over which the entity has no control. This means that a failure or outage of a business partner's system can have a ripple effect. This situation requires the systems need to be aware and capable of such failure or outage with other connected systems and the ability to recover from such events should be designed within. 

Rapid Changes

Thanks to the evolving digital economy, the business needs are changing too frequently and thus needing system changes. Every change in an existing system, for sure will add a bit of complexity, as the architecture on which the system originally designed wouldn't have considered the changes that are coming through. Many a times, considering the time to market, such changes need to be implemented quicker than expected, leaving the software designers to adopt a quick and dirty approach to deliver the change, leaving a permanent solution for a later time period. The irony is that there will never be a time when the 'permanent solution' is implemented.

Change is one of the key source of adding complexity to the Software systems. However, the evolving tools, technologies and methodologies come to the rescue, so that the Architects design systems and solutions in such a way to pave way for embracing such changes and to embed the resiliency factors in the design.

A frequently held criticism of Common Criteria testing is that, by the time the results are available, there is a good chance that the tested software has already been replaced. The danger here is that the new software may contain new vulnerabilities that may not have existed in prior versions. Thus, determining that an obsolete piece of software is sufficiently resilient is not particularly indicative of the state of the newest version and, therefore, is not very useful


Higher levels of resilience can be achieved by leveraging Machine Learning and Big Data tools and techniques. As the world is moving towards more and more connected things, high order of resilience is critical. With Machine Learning capability, the systems and devices can be embedded with algorithms that make them learn from past events and the data collected from various other connected networks and systems in addition to the ambient data. The systems can be designed to predict the health of various underlying components and thus its own health as well. Based on such prediction, the components may choose to use alternate approaches, like using alternate network protocols like Wireless, Bluetooth, etc, or choose to connect to a different component or system altogether.