Tech Bytes by Kannan Subbiah: software quality

Showing posts with label software quality. Show all posts

Sunday, April 10, 2016

Economics of Software Resiliency

Resilience is a design feature that facilitates the software to recover from occurrence of an disruptive event. As it is evident, this is kind of automated recovery from disastrous events after occurrence of such events. Yes, given an option, we would want the software that we build or buy has the resilience within it. Obviously, the resilience comes with a cost and the economies of benefit should be seen before deciding on what level of resilience is required. There is a need to balance the cost and effectiveness of the recovery or resilience capabilities against the events that cause disruption or downtime. These costs may be reduced or rather optimized if the expectation of failure or compromise is lowered through preventative measures, deterrence, or avoidance.

There is a trade-off between protective measures and investments in survivability, i.e., the cost of preventing the event versus recovering from the event. Another key factor that influences this decision is that cost of such event if it occurs. This suggests that a number of combinations need to be evaluated, depending on the resiliency of the primary systems, the criticality of the application, and the options as to backup systems and facilities.

This analysis in a sense will be identical to the risk management process. The following elements form part of this process:

Identify problems

The events that could lead to failure of the software are numerous. Developers know that exception handling is an important best practices one should adhere to while designing and developing a software system. Most modern programming languages provide support for catching and handling of exceptions. This will at a low level help in identifying the exceptions encountered by a particular application component in the run-time. There may be certain events, which can not be handled from within the component, which require an external component to monitor and handle the same. Leave alone the exception handling ability of the programming language, the architects designing the system shall identify and document such exceptions and accordingly design a solution to get over such exception, so that the system becomes more resilient and reliable. The following would primarily bring out possible problems or exceptions that need to be handled to make the system more resilient:

Dependency on Hardware / Software resources - Whenever the designed system need to access a hardware resource, for example a specified folder in the local disk drive, expect a situation of the folder not being there, the application context doesn't have enough permissions to perform its actions, disk space being exhausted, etc. This equally applies to software resources like, an operating system, a third party software component, etc.
Dependency on external Devices / Servers / Services / Protocols - Access to external devices like printers, scanners, etc., or other services exposed for use by the application system, like an SMTP service for sending emails, database access, a web service over HTTPS protocol, etc. could also cause problems, like the remote device not being reachable, or a protocol mismatch, request or response data inconsistency, access permissions etc.
Data inconsistency - In complex application systems, certain scenarios could lead to a situation of inconsistent internal data which may lead to the application getting into a dead-lock or never ending loop. Such a situation may have cascading effect as such components will consume considerable system resources quickly and leading to a total system crash. This is a typical situation in web applications as each external request is executed in separate threads and when each such thread get into a 'hung' state, over a period, the request queue will soon surpass the installed capacity.

Cost of Prevention / recovery

The cost of prevention depends on the available solutions to overcome or handle such exceptions. For instance, if the issue is about the SMTP service being unavailable, then the solution could be to have an alternate redundant, always active SMTP service running out of a totally different network environment, so that the system can switch over to such alternate service if it encounters issues with the primary one. While the cost of implementing the handling of multiple SMTP services and a fail-over algorithm may not be significant, but maintaining redundant SMTP service could have significant cost impact. Thus with respect to each such event that may have an impact on the software resilience, the total cost for a pro-active solution vis-a-vis a reactive solution should be assessed.

Time to Recover & Impact of Event

While the cost of prevention / recovery as assessed above will be an indicator of how expensive the solution is, the Time to Recover and the Impact of such an event happening will indicate the cost of not having the event handled or worked around. Simple issues like a database dead-lock may be reactively handled by the DBAs who will be monitoring for such issues and will act immediately when such an event arise. But issues like, the network link to an external service failing, may mean an extended system unavailability and thus impacting the business. So, it is critical to assess the time to recover and the impact that such an event may have, if not handled instantly.

Depending on the above metric, the software architect may suggest an cost-effective solution to handle each such events. The level of resiliency that is appropriate for an organization depends on how critical the system in question is for the business, and the impact of the lack of resilience for the business. The organization understands that the resiliency has its own cost-benefit. The architects should have this in mind and design solutions to suit the specific organization.

The following are some of the best practices that the architects and the developers should follow while designing and building the software systems:

Avoid usage of proprietary protocols and software that makes migration or graceful degradation very difficult.
Identify and handle single points of failure. Of course, building redundancy has cost.
Loosely couple the service integrations, so that inter-dependence of services is managed appropriately.
Identify and overcome weak architecture / designs within the software modules or components.
Anticipate failure of every function and design for fall-back-scenarios, graceful degradation when appropriate.
Design to protect state in multi‐threaded and distributed execution environments.
Expect exceptions and implement safe use of inheritance and polymorphism
Manage and handle the bounds of various software and hardware resources.
Manage allocated resources by using it only when needed.
Be aware of timeouts of various services and protocols and handle it appropriately

Saturday, May 23, 2015

Factors Affecting Software Resiliency

The digital transformation is happening everywhere right from small private firms to government organizations. On the personal front, connected things is coming on, where by every thing that we have or use will be smart enough to connect and communicate with other things(systems). This in effect means there will be an increased reliance on IT systems to accomplish various tasks. This will call for high order of resilience on the part of such systems and the absence of which may lead to disasterous situation.

As we all know, the word resiliency means 'the ability to bounce-back after some events'. In otherwords, it is a capability of withstanding any shock or impact without any major deformation or rupture. In software terms, resilience is the persistence of the avoidance of failures when facing a change or in a deviated circumstance.

To design a resilient system, one should first understand the various factors that work against the resiliency. Here are some such factors:

Design Flaws

Design and Architecture of the systems is a major factor that works in favor or against the resiliency requirement. The architects shall while designing the system or solution should have a good understanding of what could go wrong and provide for an exception handling ability, so that all exceptions are appropriately handled, making the system not to go down and instead recover from such exception and continue to operate. The architects have many options today in terms of tools, technologies, standards, methodologies and frameworks that help buidling resiliency within. It is the ability of choosing the right combination of tools, technologies, etc for the specific systems that will decide on the resilience capability of the system.

Software Complexity

The size and complexity of software systems is increasing, thus the ways in which a system can fail also increases. It is fair to assume that the increase in failure possibilities does not bear a linear or additive relationship to system complexity. Typically, the complexity of the software systems increases as it evolves by responding to the changing business needs. This is more so as the tools and technologies used to design and build the software are becoming outdated, making it difficult in maintaining the systems.

This complexity attribute makes it increasingly difficult to incorporate resiliency routines that will respond effectively to failures in the individual systems and in their complex system. The cost of achieving an equivalent level of resiliency due to the complexity factor should be added to that of the individual systems

Interdependency and Interconnectivity

We are living in a connected world and systems of many of today's businesses depend on connectivity with their partner entities to do their business. This adds multiple points of failures over and above the network connectivity. The system resiliency is increasingly dependent on the resiliency of systems different other organizations over which the entity has no control. This means that a failure or outage of a business partner's system can have a ripple effect. This situation requires the systems need to be aware and capable of such failure or outage with other connected systems and the ability to recover from such events should be designed within.

Rapid Changes

Thanks to the evolving digital economy, the business needs are changing too frequently and thus needing system changes. Every change in an existing system, for sure will add a bit of complexity, as the architecture on which the system originally designed wouldn't have considered the changes that are coming through. Many a times, considering the time to market, such changes need to be implemented quicker than expected, leaving the software designers to adopt a quick and dirty approach to deliver the change, leaving a permanent solution for a later time period. The irony is that there will never be a time when the 'permanent solution' is implemented.

Change is one of the key source of adding complexity to the Software systems. However, the evolving tools, technologies and methodologies come to the rescue, so that the Architects design systems and solutions in such a way to pave way for embracing such changes and to embed the resiliency factors in the design.

A frequently held criticism of Common Criteria testing is that, by the time the results are available, there is a good chance that the tested software has already been replaced. The danger here is that the new software may contain new vulnerabilities that may not have existed in prior versions. Thus, determining that an obsolete piece of software is sufficiently resilient is not particularly indicative of the state of the newest version and, therefore, is not very useful

Conclusion

Higher levels of resilience can be achieved by leveraging Machine Learning and Big Data tools and techniques. As the world is moving towards more and more connected things, high order of resilience is critical. With Machine Learning capability, the systems and devices can be embedded with algorithms that make them learn from past events and the data collected from various other connected networks and systems in addition to the ambient data. The systems can be designed to predict the health of various underlying components and thus its own health as well. Based on such prediction, the components may choose to use alternate approaches, like using alternate network protocols like Wireless, Bluetooth, etc, or choose to connect to a different component or system altogether.

Sunday, June 29, 2014

Governance of Agile Delivery

Introduction

The Agile methodology brings in alternate approach to traditional project management, where success was hard to get. Typically used in software development, Agile methodology help businesses respond to unpredictability. By focusing on the repetition of smaller work cycles as well as the deliverables, agile methodology is described as “iterative” and “incremental”. In waterfall, development teams only have one chance to get each aspect of a project right. In an agile paradigm, every aspect of development viz. requirements, design, etc. is continually revisited. When a team stops and re-evaluates the direction of a project every two weeks, there’s time to change course. Because teams can develop software at the same time they’re gathering requirements, “analysis paralysis” is less likely to impede a team from making progress. Agile development preserves a product’s critical market relevance and ensures a team’s work doesn’t wind up on a shelf, never released. Considering the value delivery that the Agile methodology promises, its adoption has been on the rise and today most organizations, including Government are embracing Agile approaches.

Governance of Agile Delivery

Critics say that Agile methodology is all about working in an unstructured way and for that reason, they believe that governing agile practices is always a challenge. While some of the Agile principles appear to support such criticism, there are many cases where organizations have successfully implemented processes and frameworks towards governance of Agile practices. Agile practitioners believe that because the agile methods are designed to be self-assuring, when practiced right, there exists built-in governance and accountability.

More so, the agile practices are more collaborative and operates continuously, requiring the stakeholders to review and test the deliverables on a continuous basis and helps the team to take alternate course of action as may be needed. Collaborative culture helps resolution of problems quicker and makes decisions are made on time. This helps to have a continuous focus on the value forecast with respect to the business case and manage the risks that may potentially impact on the expected value.

Principles of Governance

The following are the key governance principles for a successful governance of Agile Delivery:

Focus on the value delivery - only do a task if it brings value to the business. This principle also recognizes the timely delivery of a task as the value derived is more likely to deteriorate with the delayed delivery. In case of Agile deliveries, the governance is continuous and at a work unit level. It should also focus on what activity is taking place and the value such task delivers.

Embrace Change - This another principle of Agile and the Governance framework should take this into consideration. This would mean that the decisions or work flows should be flexible enough to change course based on the feedback received. Given that all stakeholders collaborate, decisions should be taken across the table, without putting things on hold and for the purpose, all needed specialists should take part in the reviews.

Decide on the performance metrics - Another key principle of Agile methodology is to 'fail fast and learn quiuckly'. Given that the overall objective is to improve the certainty that the team will deliver a usable product or service of good quality, the teams should be able to identify and implement the right metrics that will accurately indicate the quality of the deliverables and the performance of the team. For example they measure tasks completed; rework they had to perform; the backlog list and the value of the product or service to the business at the end of each iteration. Teams display this information visually, updating it frequently. This makes progress transparent to business users and management. If senior managers require performance information to oversee projects, they define what the ‘must have’ data are. Performance reports for senior management become a task in each iteration and an output of the delivery team.

Collaboration - All stakeholders, including senior management, external assessors, business users and the development team should be partners in quality, and this collaborative approach is an essential change in mindset. The business owner and delivery team defines what ‘quality’ tests they will use and what results are acceptable at the outset of each iteration – the definition of ‘done’. Regular user feedback identifies whether the product or service is providing the expected business value at each stage. External assessors are not gatekeepers; rather they are an integral part of the team. The iterative approach ensures continual reviews and feedback on progress, so external assessors are not just involved at critical points as defined in a traditional project life cycle.

Focus on behaviours and not just processes and documentation - More specifically, the external reviews or assessments will be more effective in providing critical challenge if the assessors have high-end skills, including technical and Agile delivery experience. In addition, they provide better value if they continually review how the team is performing, using observation as their main method of evidence collection. The focus of such external review or assessment shall be on the following:

the skills and experience of the team;
the team dynamics – frequency and nature of communication inside and outside of the delivery team, and the level of input to the delivery team from the business;
the organisational culture – the level of commitment and openness;
the timing and nature of quality control by the delivery team – the testing and release framework;
the order in which the team tackled the tasks – prioritisation of actions and deliverables, the amount of actions in the backlog list;
the way the team changes its activity in response to the results achieved in each iteration; and
the value of outputs to the business.

IBM's Disciplined Agile Delivery Methodology

IBM believes Agile delivery allows it to continually issue new capabilities that meet user needs. It usually introduces software as part of a wider business change project so, to keep both in step, it has developed several Agile project methodologies. Disciplined Agile Delivery is a hybrid method that can be applied by a large number of teams working on the same project at the same time. The image below shows the Disciplined Agile Delivery life cycle. It starts with a few short iterations that allow the team and its stakeholders to identify the initial requirements, develop the architecture and agree a release plan. IBM also uses this to determine the system level properties and characteristics – the non-functional requirements. There are iterations after the business owner has decided that the system has sufficient functionality. These additional iterations are necessary for IBM to support the operation and maintenance of the solution once it is in service.

In contrast to the traditional approach of looking at outputs, plans, resourcing and how a project is organised, external assessors should focus on outcomes, prioritisation of work and team dynamics. The most useful indicators of success are how the teams are organising the delivery of an operational service or capability and what Agile behaviours and practices are used. Areas for assessment include whether:
system level issues (security, availability) are addressed within the iterations;

short- and longer-term planning exists;
the stakeholders have a shared vision;
there is continuous integration; and
the team has the right people

Reference:

National Audit Office's Review on Governance of Agile Delivery

Thursday, August 29, 2013

Common & Practical Problems of Requirements Elicitation

Requirement elicitation is an important and challenging phase of any software project. This holds good for both product and project development activities, but the approach, techniques might vary. A well specified requirement has been found to considerably improve success rates of projects. Though various methods and techniques have evolved over the last couple of decades to better produce a good requirements specification, many struggle to get it done well.

This could be mainly because that requirement elicitation is just not science, it is an art too. It is more an art because it is highly human intensive and much depends on the skills of the people involved in the process. More so, as which method or technique to use and the way the document is structured and written depends on the abilities of the person driving this activity. Based on my experience in the be-spoken project development and product development activities, I have listed down some of the most common and practical problems with this activity as below:

1. Preconceived Notions

The requirements of every customer even in the same business domain, would be different. For example, requirement of a bank X would not be the same as that of bank Y. Each enterprise would have different business processes to differentiate their abilities or value deliveries from their competitors. The teams involved in requirement elicitation shall start with a clean slate for every project and thus should not try to bias the elicitation work with their previous project experience in mind. Ignoring this principle would result in misaligned requirement specification and thus ending up delivering a deficient product. As this is a human intensive process, it is quite common for the customer representatives too to easily miss out on such things.

This is quite a common problem with the product companies. Irrespective of whether the client contracts for the product with customization or a project, the vendors would prefer to reuse their existing code assets. As such, the business analysts engaged in the requirement elicitation tend to scope the customer requirements in such a way that it fits within the existing product architecture and related constraints. Even in case of a product based contract, the requirement elicitation or the gap study shall focus shall be unbiased and then it is the Solution Architects who will come in to come up with solutions to bridge the gaps. In the process, the customer will have the option to decide to dilute his requirement in favor of an existing work around.

The business analysts shall master the art of unlearning and relearning to handle this area well.

2. The Design Mix-up

The next common problem is to mix up the requirement elicitation with the solution design. This happens on both the sides i.e, the vendors and the customers. The business analysts from the vendor side often would start visualizing the solution design with a specific use case and would start suggesting deviations or work around to the use case. Similarly on the customer front, the users may start talking on the system perspective. For example, customers when narrating the requirements might talk about a drop down list, check boxes, etc. Ideally such details should be left to the design teams and where appropriate, the customer might want to review those designs or might specify the design guidelines to be followed or specify usability requirements for the vendor to conform to.

There is another school of thought that visualizing or thinking of solution early on would eliminate feasibility issues down the line. While this is partly true, the problem arise when such design constraints hide the underlying actual business requirement, which could lead to mis-interpretations later on.

3. Poor Planning

The requirement elicitation has to be a planned process with proper entry and exit criteria for each of the sub processes. There are many frameworks and techniques to perform this activity. Irrespective of the methods or techniques, the elicitation process shall comprise of the following activities: Identifying the Stake Holders; Define Use Case specifications; Generate scenarios;Organize walk throughs / interviews; Document Requirements and Validate Requirements. It is quire possible that each of these activities might have to be performed in multiple iterations. Poor planning of these activities might result in ambiguous or deficient requirements.

A related key issue is the exit planning. i.e. when to consider the requirement elicitation as complete. Depending on other project constraints, the exit criteria has to be carefully identified and further planning should be around that. For instance, if time is a key constraint, just for the sake of meeting the timeline, the elicitation activities should not be hurried up and thus ending up with an imperfect specification. Instead, in such cases, the scope can be divided into broader sub components and agree with the customer to defer some such components to a later phase based on priorities. Agile approach could also be thought of to solve this situation. i.e. start eliciting the requirements as specific user stories are taken up in respective sprints. A careful consideration of all the project constraints and priorities is a must in choosing a solution and there by coming up with the best course of action.

4. Volatility

In one of the projects we were handed off with a four hundred page requirements specification document was an year long work of the internal business analysts of the customer. But it was no surprise, that the actual business requirements were far different than it was documented as the business practices and processes have changed a lot during this very same period. This has been a common problem that the industry has been battling with and Agile approach is emerging as a solution to this problem. This volatile nature of the business requirements requires the solutions to be delivered quicker to reap the time to market advantage.

Another aspect of volatility is that the requirements as elicited from different users / departments could be different and at times conflicting too. In some cases such differences could be misstatements or misunderstanding or in some cases it could be genuine, in which case the different requirements shall be specified appropriately and let the design teams come up with solutions to meet all such differences.

5. Undiscovered Ruins

It is the human nature to answer just the questions that were asked. Thus the business analysts shall master the art of asking appropriate follow up questions based on the responses from the customer representatives. That is where the elicitation is important. i.e. the business analysts shall provoke the customer to fully reveal what is required of the system. In the process it is very much common that certain needs might go undiscovered, but would show up later on as a deficiency. This problem can be partly addressed by identifying the right stakeholders for the purpose and then to get those validated by different stakeholders, who would look at these with a different perspective, which might bring out gaps if any.

Saturday, June 1, 2013

Software Quality Attributes: Trade-off anaysis

We all know that the Software Quality is not just about meeting the Functional Requirements, but also about the extent of the software meeting a combination of quality attributes. Building a quality software will requires much attention to be paid to identifying and prioritizing the quality attributes and design & build the software to adhere those. Again, going by the saying "you cannot manage what you cannot measure", it is also important to design the software with the ability to collect metrics around these quality attributes, so that the degree to which the end product satisfies the specific quality attribute can be measured and monitored.

It has always remained as a challenge for the software architects or designers in coming up with the right mix of the quality attributes with appropriate priority. This is further complicated as these attributes are highly interlinked as a higher priority on one would result in an adverse impact on another. Here is a sample matrix showing the inter-dependencies of some of the software quality metrics.

	Avail- ability	Effici- ency	Flexi- bility	Inte- grity	Inter-oper-ability	Main-tain-ability	Port-ability	Reli-ability	Reus-ability	Rob-ust-ness	Test-ability
Avail-ability								+		+
Effici- ency			-		-	-	-	-		-	-
Flexi- bility		-		-		+	+	+		+
Integrity		-			-				-		-
Inter-oper-ability		-	+	-			+
Maintai-nabilit	+	-	+					+			+
Port-ability		-	+		+	-			+		+
Reli- ability	+	-	+			+				+	+
Reus-ability		-	+	-				-			+
Robust-ness	+	-						+
Test-ability	+	-	+			+		+

While the '+' sign indicates positive impact, the '-' sign indicates negative impact. This is only an likely indication of the dependencies and in reality, this could be different. The important takeaway however is that there is a need for planning and prioritizing the quality attributes for every software being designed or built and the prioritization has to be accomplished keeping mind the inter-dependencies amongst the quality attributes. This would mean that there should be some trade-off made and the business and IT should be in agreement with these trade off decisions.

SEI's Architecture Trade-off Analysis Method (ATAM) provides a structured method to evaluate the trade off points. . The ATAM not only reveals how well an architecture satisfies particular quality goals (such as performance or modifiability), but it also provides insight into how those quality attributes interact with each other—how they trade off against each other. Such design decisions are critical; they have the most far-reaching consequences and are the most difficult to change after a system has been implemented.

A prerequisite of an evaluation is to have a statement of quality attribute requirements and a specification of the architecture with a clear articulation of the architectural design decisions. However, it is not uncommon for quality attribute requirement specifications and architecture renderings to be vague and ambiguous. Therefore, two of the major goals of ATAM are to

elicit and refine a precise statement of the architecture’s driving quality attribute requirements
elicit and refine a precise statement of the architectural design decisions

Sensitivity points use the language of the attribute characterizations. So, when performing an ATAM, the attribute characterizations are used as a vehicle for suggesting questions and analyses that guide towards potential sensitivity points. For example, the priority of a specific quality attribute might be a sensitivity point if it is a key property for achieving an important latency goal (a response) of the system. It is not uncommon for an architect to answer an elicitation question by saying: “we haven’t made that decision yet”. However, it is important to flag key decisions that have been made as well as key decisions that have not yet been made.

All sensitivity points and tradeoff points are candidate risks. By the end of the ATAM, all sensitivity points and tradeoff points should be categorized as either a risk or a non-risk. The risks/non-risks, sensitivity points, and tradeoffs are gathered together in three separate lists.

Saturday, May 18, 2013

Why Software Product Delivey is not Identical to a Car Delivery?

I happened to sit in one of the project review meeting where client raised a question on software delivery expecting it to be used in production on the day of delivery by the vendor. He went ahead and started comparing it with a use case of a customer driving off a car upon taking delivery. I know all the project managers out there will jump in to say that don't compare apple with orange. On the one hand, yes, software development is unique and cannot be compared with production of a tangible product. On the other hand, attempts are being made by various standards organizations in helping the industry achieve a high maturity process capability and thus deliver production quality software consistently.

Software product vendors selling or licensing software products can however be compared to Car manufacturers. Take for example, Microsoft Office product suite, one can just go and buy it off the shelf and use it, just like driving off a car. This is possible for product vendors because, the product vendors over a period acquire enormous amount of knowledge on the targeted product domain and in turn subjecting it through as many cycles of testing before announcing its readiness.

A typical car from concept to production may take atleast few years and it would be in the order of around 3 to 6 years. During this period, the product undergoes various cycles of tests, which include crash test, drivability on the road conditions of the target market, etc. Similarly, software product vendors do conceptualize the product idea and then work on it over a period. The vendors are attempting to achieve a faster time to market, but by adopting newer product development methodologies. The one that works best is to identify the smallest piece of the software that can meet certain specific use cases and then build on top of it over a period of time.

In case of a software product, it is the vendor's responsibility to subject the product through various tests in production like environments before getting it out to customers. The tests include alpha and beta tests, where in interested end users are engaged to use it in production environments and collect the feedback on various aspects and the developers working on it to have all those critical issues addressed. Achieving a zero defect state may not be a possibility and vendors take balanced approach as to whether wait until all the identified issues be addressed or reach out to the market with certain known issues, which can be addressed in subsequent product releases.

But it is a different approach when it comes to bespoken software projects. The major challenges with the bespoken software projects that differentiates from software products include:

It is the Client who specifies the requirement. Though the vendor might facilitate documenting the requirements, it is the customer who formally agrees as to what need to be produced.
Lack of understanding and agreement on the non functional requirements, which are not documented in most cases.
The vendor might not be an expert in the target domain area. Though the vendor has expertise in the domain area, it is the customer's need that is final and not the vendor's understanding of the requirement.
The ever changing requirements. By the time, the vendor delivers the software, the requirements as agreed by the client would have undergone change due to various reasons.
It is difficult to unambiguously understand the requirements as the project progresses through various phases involving humans with varying abilities.
There are dependencies with various pre-existing or emerging software and hardware environments in the production environments of the client.
The client has the roles and responsibility to assume to make the project successful. However, it is for the vendor to ensure that the client understands their role and responsibility throughout project life cycle.

Another point to consider for this discussion is what constitutes delivery. A well written SoW (Statement of Work) clearly lists down the acceptance criteria, which when met would constitute acceptance of the delivery by the client. For a given requirements, no two vendors would build an identical solution. That's due to the tools, technologies out there for use and the varying intellectual abilities of those involved in building the software. It is important as to what the client wants, than how the vendor delivers the solution. For this reason, the client shall assume the responsibility of performing an user acceptance tests. Some times, the delivery of a software might mean implementation in production environment, which might involve data migration, appropriately configuring various pre-existing software and / or hardware.

In the end, it is all about managing the expectations of the client. It is not just setting at the start, but needs to be appropriately managed throughout the project life cycle.

Sunday, March 31, 2013

Software Complexity - an IT Risk perspective

IT risk management predominantly focuses on IT Security though there are other category of IT risks which are less known. Software complexity is such an important unusual or unpopular IT risk. Basically, software complexity is one of many non functional quality attributes, which in most cases is ignored completely, leading to associated risks occurring and thus leading to financial and non financial impacts on the business entity. In this blog, let us try to understand the software complexity in the perspective of IT risk, and shed some light on whose baby it is and how to manage or control it.

What is Software Complexity?

Let us first understand what is Software Complexity. The simple definition of complexity is how hard a software is to understand, use and / or verify it. Complexity is relative to the observer, meaning that what appears complex to one person might be well understood by another. For example, a software engineer familiar with the systems theory of control would see good structure and organization in a well-designed attitude control system, but another engineer, unfamiliar with the theory, would have much more difficulty understanding the design.

Basili defines complexity as a measure of resources expended by a system while interacting with a piece of software to perform a given task. In terms of the end users, this could be understood as the usability. i.e. how much time the user needs to spend accomplish a task using a software and how much training is required to get familiar with it. Similarly in terms of the developers, this could be understood as maintainability i.e. how easy it is for new developers to understand and work on it to further improve it or to address issues. Again, in terms of the IT infrastructure team, this could be understood as the level of efforts needed to have it deployed and support it, which again should be considered under maintainability.

Why Complexity is an evil?

Complexity as the above definition goes, requires more resources to be expended than normal and thus is counter productive. We have numerous best practices, standards and frameworks that advocate for eliminating complexity, but somehow it creeps in and pose as a challenge in most cases. The consequence of software complexity as we have put it above clearly is risk, and it could be even be a business risk, when we look at it in the end user perspective. The following is a sample list of risks that could arise due to software complexity:

Complex software is difficult to verify and validate and this could lead to new defects surfacing even after months of its production use. Depending on the severity of the defect, the impact could be low to very high.
Software complexity could render the business processes inefficient, as the end users may have to spend more time in using the software to execute certain business processes, which could result in losing the competitiveness and thereby losing market.
Failure to consider the downstream complexity by the development teams could result in a state of ‘project successful, but mission failed’. i.e. even if the project teams have successfully completed the project, the resulting software product might be complex enough to render it useless thereby rendering the entire investment in the project to be loss.
Post production support will always be challenge as it is difficult to retain trained and capable resources to maintain the complex software and this is a most common risk that enterprises are continuing to battle with.
Complexity in deploying and distributing the software is another area of concern for the infrastructure team. Sometimes, such complex software might demand so much of computing resources wherein the costs of such hardware and other infrastructure resources might outweigh the perceived benefits out of using such a software.

The above list is only illustrative and the list could be even bigger and I would let you come up with more such consequences in the form of feedback.

Measuring Software Complexity

To be managed or controlled, we should be able to measure the complexity. While the discussion of measures of software complexity is out of the scope of this blog (I might work on to write another blog on measuring software complexity), at a high level, the following measures can be used.

Complexity Metric	Primary Measure of …
Cyclomatic complexity (McCabe)	Soundness and confidence; measures the number of linearly-independent paths through a program module; strong indicator of testing effort
Halstead complexity measures	Algorithmic complexity, measured by counting operators and operands; a measure of maintainability
Henry and Kafura metrics	Coupling between modules (parameters, global variables, calls)
Bowles metrics	Module and system complexity; coupling via parameters and global variables
Troy and Zweben metrics	Modularity or coupling; complexity of structure (maximum depth of structure chart); calls-to and called-by
Ligier metrics	Modularity of the structure chart

Controlling Software Complexiity

As we have seen in the definition section, complexity is context sensitive. i.e. the same software may be viewed at different complexity level by different class of users. While this is a challenge for the developers, they find it as a useful shelter to stay away from addressing it. Developers primarily depend on the business system analysts, who in most cases play the role of representing the end users and when they fail to see the complexity which the end users would, development teams are mislead. In one of the project that we have been working on, the business system analyst, who was also tasked to triage the defects logged by end users, simply turns down such defects suggesting the users to learn to use the system. This definitely goes against controlling the complexity there by managing the IT Risk.

Controlling Software Complexity calls for adopting certain practices in the areas of architecture, design, build, testing and project management some of which are listed below:

Set the expectations: Overly stringent requirements and simplistic hardware interfaces can complicate software and a lack of consideration for testability can complicate verification efforts. Cultivating awareness about the downstream complexity and how detrimental the complexity could be, if ignored, would help in getting the much needed team collaboration and commitment.
Requirement Reviews: Ambiguous software requirements has been a cause of complexity, as many times, the developers tend to assume things. Similarly certain requirements may be unnecessary and could be overly stringent, which can be spotted in the review process leading to reduction in complexity.
Domain Skills: Having the relevant product or domain skills will go a long way in reducing downstream complexity as the development team would be familiar as to how the industry is handling such requirements.
Architecture Reviews: A good architectural design will go a long way in reducing the downstream complexity. Design and architecture reviews early on would go a long way in addressing this need. It would even make sense to have an Architecture Review Board comprising of representatives who will be able to use their expertise in the specific areas and spot problem areas.
COTS: Third party libraries usually termed as COTS (Commercially available Off the Shelf) components often come with unneeded features that adds to the complexity of the software. It is important to make a careful and thoughtful decision in case of COTS.
Software Testability: Design the Software with testability in mind, so that the complexity can be verified by the QA teams.

Conclusion:

Much of the plans for mitigating the software complexity risk revolves around the software acquisition or software development process. This calls for a close co-ordination and collaboration between the IT Risk managers and the Development teams. Though risk management is well understood as an important function of the project managers, how well the risk of complexity is managed is still questionable. The project managers are primarily concerned on those risks that might impact the project success and fail to consider those like the complexity that emerge downstream post project. It has also been observed that reduction of complexity is an important characteristic of disruptive innovation. Innovations emerge disruptive, when it addresses the complexity in already existing products or systems and thus being able to capture market segments one after the other in it course to the top of the cliff.

Pages