Saturday, September 15, 2012

Leveraging Lessons Learned

Success = failure + failure + failure … Sounds familiar?


Leadership experts and management gurus have said enough about how failures lead to success. That is very true for the individuals when the respective person takes it in the right context and work on the causes of the failure to overcome it in the next opportunity. But how does this work in reality for the organization?
 
If you have been part of a project, which has failed to deliver the promised features on time or at the agreed cost, you are most likely out of that organization, as the management want to penalize those involved in it. In the process, the organization loses as it did not want to capitalize on the lessons learned by the team through the failed project and the new team that takes over might commit same or even different mistakes, which could again lead to failure. 
 
Agile projects are likely to fare better in this space as Agile project management calls for identifying things that went well and those did not went well at the end of every sprints. Here again the one question that remains to be answered is, how does the scrum master and the teams deal with the things that did not went well in the earlier sprint. Yet another question that needs to be answered is how open are the project team members in openly admitting their own errors and omissions, which could have adversely impacted the project. 
 
 
As far as the development teams, there are so much to be learnt on a daily basis, for example, the defects uncovered in unit testing, findings in the requirements, design and code reviews and even the project issues could lead to a great lesson to be learned by every other member of the team. 
 
 
Here are few ideas that will help the organization in leveraging the lessons learned by the teams through various errors, mistakes and omissions.
 
  • Mentor the teams to the effect that they demonstrate accountability and responsibility and that admitting a mistake early on is a good thing. The earlier, the triggers are known, it is better as other members of the team would stay away from committing the same mistakes.
  • Coach the teams to share, share and share with their peers and even across the teams. This can be accomplished by removing the mind blocks within the employees in admitting their own mistakes and they should be encouraged to share those for the good of themselves and the organization. It is the tendency of the employees that when they uncover any issues during unit testing and reviews, they would just fix it themselves and do not report it further.
  • Encourage teams to share their previous experiences every now and then and for sure there will be some takeaways from such experiences for some members of the team.
  • Bring in a culture within the organization which will discourage egos and emotions which are found to be the barriers for sharing.
  • Promote risk management and encourage every employee to participate in it. It is needless to say that every identified risk has the potential of becoming an issue and soon can come in way to prevent the project from being successful. Past experience and lessons learned is a great source of risk identification.
  • Above all, make the sharing the lessons easy by putting in place an appropriate knowledge base platform and train and encourage employees to use it.
 
Though the above ideas are more suitable for IT services organization, they can be practiced in any other organization as well with some tweaks.
 
Here is an interesting article to read on, where in Ken Bruss discusses about leveraging lessons learned for competitive advantage.

Sunday, August 26, 2012

Solution Architecture - Basic Principles

As I have written in my earlier blog post Solution Architect – Understanding the Role, the Solution Architecture practice area demands a wider knowledge in business and technical areas. The solution Architects need to be jack of all trades rather than being the master of a specific area. The Solution Architects should be able to bridge the gaps that the business users have on technical space and those that the technical teams have on the business areas.


Architecting a good solution is always challenging as the context and the technology keeps changing too fast over time. A solution which was perfect in a time frame may not be so after a time period. Given that each of the non-functional quality attributes might adversely affect one or more other non-functional quality attributes, the Solution Architects should be able to do the balancing act on these attributes in line with the business needs and other factors that could be foreseen in the near and longer term.


Here are some of the basic principles, which when practiced, would help a solution architect to deliver a good solution.


Business Drives Information Technology

Just in case, if one has the expertise in IT, then it is important to take off that hat and wear the business hat while architecting solutions. Business does not mind which technology or tools that would be in use, but they want their business problems solved with reasonable longevity and other non-functional requirements. Ignoring or undermining business expectations or priorities could lead to building a solution with excessive engineering or technical complexity, and could result in higher cost and delays.


Just Enough Architecture, Just in Time

Let the solutions evolve based on business priorities. It would be ideal to take one business problems at a time based on its priority and architect the solution keeping in mind the first principle and factoring the ability to change and scale in response to the business needs. This will help the businesses get the solutions faster and in increments. This will also help the Architects adopt agile methods and ensure timely responses to business changes as mostly needed.


Common Business Solutions

There is a tendency for departments to go in for solutions on their own as they have their own budgets to spend. This many times leads to a situation where multiple solutions being in use for the same problem domain within various departments. It is a must for architects to have a complete view of the problems and solutions within the enterprise and the solutions architected should be common and usable across multiple departments. If need be, the existing solution can be enhanced to meet the changing requirements of different department. This principle when practiced will facilitate easy maintenance and will ensure better data governance.


Conform to Data Governance principles

Data and information is used for decision making at various levels and it is important that the data is organized and maintained at the highest possible quality level. The need to understand the sensitivity levels of the data within and across departments and partner systems is also equally important while architecting the solution. With big data era emerging, organizations are looking at handling petabytes of structured and unstructured data to be managed. It is imperative to have the Data Governance policies and processes in place and the Solution Architecture team should ensure that the solutions are in compliance with it.


Comply with Information Security Framework

Similar to the Data Governance policies, the organization should have Information Security policies and related framework in place and the Solution Architecture team should ensure that the solutions are in conformance with it. Security must be designed into the solutions from the inception and adding it later could result in higher cost and delays. This however is dependent on the organization’s business context and its risk appetite.


Also read the following other blog posts on Architecture Review - Scalability

Friday, August 17, 2012

Taking over - stay away from wrong battles

 If you are about to take up a Senior Managerial role in an different organization, it is important that you are able to settle down at the right pace and pick up the right battle to make a mark in the first few weeks of taking over. While it is true that the management has through multiple rounds of discussions have tried their best to understand your abilities and have got convinced that you are the person to take the organization further down the roadmap, there could be challenges which you would not have faced before and you should take little care about few things like the following.

The takeover session
 
 
Usually, you might have a chance to have few rounds of discussions with your predecessor as part of the handing over process. It is important to use this very effectively. Among other things, the important items to pick up in this session are 
  • Get to know why your predecessor is leaving, and this would help you to plan and carefully handle such pain areas so that you also don’t end up getting into a battle.
  • Get to know from your predecessor as to his opinions about people, process and technology in the organization and this would give you certain handles to pick up and carry on with.
  • Get to know as to what he has been upto in the past three to six months period so that to understand his unfinished initiatives and if certain initiatives failed why so. This will help you to understand the various constraints with which he has been operating and most likely those constraints would hold on for you too to deal with.
  • Get to know about the strategy, vision and goals of the organization and the roadmap to achieve those. It might be possible for you to identify certain areas to work on, but again don’t jump into action plans you need to get 360 degree view of the issues.
 
 
Just in case, you don’t have this opportunity of a smooth hand over, try to get same inputs from the next level executives, but use such inputs with care as you might want to validate the same from few other sources.
 
 
The Cultural Values
 
 
Each organization has its own culture that suits the most for the teams and the business. As part of your taking over, it is important that you understand he organizational culture, the morale of the employees and if required you may spend little more time to make yourself a fit into the prevalent culture and gain the confidence of the teams. While there is a chance that the given culture could be the cause of certain pain points and may need a change, you may not want to pick up such battles so soon as it could lead to the teams not accepting you as a leader. As you might have come with a different cultural background, it is so easy for you to get carried away and make missteps.
 
 
Spot the problem areas and the pain points
 
 
While you would have got to know some of the priority areas that need immediate attention, before jumping into action, spend more time to talk to various teams to understand completely as to the current state of various projects and initiatives that they are upto. Depending on your approach, style and your experience, you would spot certain pain points that need attention. Capture those for later action. Sooner you identify those it will help you to settle down quickly. It would also be a good idea to spend some time in understanding the failed projects or initiatives in the recent times, which would help you pick up certain process areas to revisit and work on. You might have to use your tactical and people skills here so that the teams open up to you freely and you get a good handle of the areas to work on.
 
 
Perform a careful analysis of these items for their impact on other aspects like, culture, process and technology which will help you to categorize and prioritize these areas and come up with a revised roadmap for the near term and the longer term.
 
 
In the process of settling in, it is very much likely that you will try to use your experience and suggest course correction or jump into actions in the middle, which if not done well could land you into trouble as might start facing resistance from some quarters. Though these could be overcome with authority, it may not work well if you use it early on. In such cases, you should be convincingly demonstrate to such teams that the course correction is much needed given the situation and take them into confidence that way. Picking up wrong battle early on could prove costly.
 
 
As many leaders say, a good leader is a good listener. So listen more early on to get to know the perspective better and try to pick up some lessons. For a while, you should forget your experience and expertise and try to be a learner and keep listening. Once you are done listening, do an alnalysis and in that process, you may use your experience. Keep in mind that there is no single best way to accomplish a thing and there could be multiple ways and means and it could be so that you might have a chance to pick up certain new things that might work well too.
 
 
Please understand that this is not a complete guide for you to just practice blindly. This could be completely out of context in some situations and may not hold good at all. However, what is to be taken out this is that try to stay away from picking wrong battles early on.

Friday, July 13, 2012

Architecture Review - Scalability

Scalability is an important quality attribute of any system, be it hardware or software. But in most cases, the need for a scalability check or review is felt only when certain signs of scalability problems show up. Typically, the following are such signs that call for a scalability review of an existing application.
  • When changes requested on certain subsystems are turned down by the development team(s) citing that it is a complex subsystem and any change to it might call for huge efforts in terms of regression testing or else, it could lead to a bigger impact on the whole system. This indicates that there are certain components or sub systems which prevent the system from scaling. 
  • Months after production usage, the application performance gradually slows down and there is a tendency to accept the performance slow down or to pump in more hardware to compensate the slowdown. This is again is an important sign that the application is not scaling to take on the ever growing user base and transaction volumes.
There could be more signs that could indicate that there are scalability issues within the application. It is unfortunate that scalability reviews are not done in the initial design phase, so that these post production troubles won’t show up. While reviewing an existing application for potential scalability issues may be easy, the solutions for addressing those may not be really easy. That could be because of the underlying design & architecture of the application and its inter-dependencies with other systems in use. Let us examine certain important aspects to look into to spot potential scalability problems.

Distributed architecture: While distributed design is likely to improve performance, it could lead to scalability issues when one or more components or sub system relies on the local resources. Another reason for this to be reviewed with care is an ill designed system may call for too much of communication across physical and logical boundaries of various subsystems and would rely more on the communication infrastructure.

Component interaction: Examine how the components or subsystems are designed to interact with each other and how closely they are positioned. Too much of component interactions could lead to network congestion and also result in very high latencies which results in performance scalability issues later on when the usage increases. Measure the payload and the latency of such inter component interactions and isolate the components that need redesign. Keeping the data and the behaviour closer will reduce the interactions across boundaries and as a result keeps the latency under check.

Resource contentions: Look for potential limitations on the hardware or software resources used by the application. For instance, if the application produces huge amounts of log data, on the same disk where its transaction data is stored, the write requests may encounter resource contentions. Similarly, how fast the data files grow and how does the disk subsystems support such growth. Possible solutions for such issues are using resource pooling, message queues or such other asynchronous mechanisms.

Remote Communications: It would always be beneficial to limit the remote calls to the minimum or else, too many remote calls may expose the system too much on the reliability and availability of the communication infrastructure. Ensure required validations are performed ahead of the remote calls, so that unnecessary remote calls are avoided. Where possible, the remote calls should be stateless and asynchronous. Synchronous calls may hold up the communication channels and associated resources for longer period which could be the potential cause for performance and scalability issues. Use of message queues may help in decoupling subsystems from being held up for synchronous responses.

Cache Management: While use of Cache can help achieve better performance, it could also prevent the application from scaling in a load balanced environment, unless a distributed caching mechanism is designed and used.

State Management: Look for how the state of the persistent objects is being managed. Stateless objects scale better than the stateful objects. Distributed state management is the solution to address the state management issue in a load balanced environment. Always prefer stateless components or services as this will perform well and at the same time scale well.

Here are some of the best practices that help achieve high scalability
  • Prefer stateless asynchronous communications as this will free up resources considerably and supports load balancing.
  •  Design the application into multiple fault isolated subsystems with ability of being deployed on different hardware environments (or isolated application pools), so that faults in one subsystem does not impact the other subsystems. This partition can be either by service categories or by customer segments.
  •  Use distributed cache solutions, so that the cached data is available on multiple clustered environments.
  •  Use distributed databases with appropriate replication so that loads can be distributed.
  •  Do not depend too much on the specific capabilities of the RDBMS, as this might couple the application tightly to one vendor’s RDBMS. High degree of scalability can be achieved by keeping the business logic outside of the RDBMS.
  •  Spot the potential scalability issues early on by performing design reviews during development and by performing periodic load and performance tests.
  •  Do not ignore the capacity planning activity early during the pre-project phase, as it could significantly impact the application usage in production over a period of time. Also be aware of the data growth rates and have a road map to support the ever increasing data and volume growth.
  • Do not ignore the root cause analysis as many times when developers roll in a fix for a defect, they are not fixing the root cause, which could come back later as a scalability bottleneck.
Update:
Also read this MSDN Library article which lists down five key considerations for a scalable design.

Saturday, July 7, 2012

Direct Database Updates – A Cause of Concern


Many organizations still have the practice of directly updating the production databases to fix data integrity issues. This shows that the one or more applications deployed on top of the database are not reliable enough to maintain the database integrity. This is one of the biggest concerns for the information security auditors as this requires certain resources being granted he access privilege to the production databases. This opens up opportunity for internal hackers to indulge into fraudulent activities.

There could be a multitude of reasons which could lead to such a situation, needing frequent database updates. The following are some such reasons that impact the reliability:
  • Incomplete requirements – It may be possible that the business rules and / or validations are not completely gathered and documented. 
  • Design deficiencies – Design deficiencies like inappropriate error handling, managing the concurrency, etc. could also lead to data integrity issues.
  • Shared database across multiple applications – When multiple applications use a shared database, it might possible that some business rules or data validation requirements might be implemented differently or some applications might have technology or design limitations leading to introducing data integrity issues.
  • Creeping code complexity over a period of application maintenance – As the applications move into maintenance cycle, and as newer resources may get on to maintain the application code base, chances are high that due to the growing complexity and lack of complete knowledge, issues might slip through the development and sometimes QA phase as well.
  • Lack of adequate QA / Reviews – Review is a very effective technique to identify potential issues way ahead in the application development life cycle. But, unfortunately, most organizations does not give importance to requirement, design and code reviews or don’t get it done effectively. This review or QA deficiency could impact the reliability. 
Though the software development process has matured enough, organizations tend to compromise in some of the quality attributes which might lead to a situation of the application being not reliable. Thus, it may not be possible to completely eliminate the need for direct database updates. However, a process with adequate checks and controls should be put in place around this activity to ensure that the chances of security breach through this channel are under control. At a minimum, he following checks and controls need to be in place to have the database updates in control.
  • Every request for database update should originate from business function heads and should formally be supported by a service request as logged in to an appropriate tracking system or into a register.
  • Every such request shall be reviewed by the analysts and / or architects to identify whether the data update is necessary and there not another way of fixing this using any of the application features.
  • The review should also suggest two solutions, one being the isolating the specific data table and columns that need to be updated (corrective action) and the other being the possible enhancement to the application(s) to prevent such integrity issue from occurring in the future. The review should also identify the constraints in implementing the data fix, for instance some of the fixes may warrant that they should be executed ahead or after a specific scheduled job or sometimes may need the database to be taken offline before execution.
  • In most cases, these issues would be very hard to investigate, as the occurrence would be rare and upon encountering a unique combination of data / program flow. It would be beneficial if the result of such review flows into the process and necessary checks and controls are put in place to prevent such issues slipping through the review and testing phases of the SDLC.
  • On completion of the review, developers may be engaged to create necessary SQL scripts that are required for such updates.
  • This shall be subject to review by the analysts and / or architects and then subject to testing by the QA team. 
  • Once the review and test results are clear the scripts shall be forwarded to the DBAs who should execute the scripts in production. Ideally such data updates should be performed in batches and the affected tables / objects should be backed up prior to execution, so that the old data can be restored when needed.
  • The DBAs should maintain a record of such execution and the resulting log data and the same shall be subject to periodic audit, so as to ensure that the scripts remain unaltered and that no additional unwanted activities happen along with script execution.
  • None of the resources involved in this process except the DBAs should have access to production database. For the purpose of investigation or troubleshooting certain cases, a clone of the production data may be made available on request and should be taken off when the its intended purpose is complete. It is important to have a practice of masking sensitive data while making such production clones and also should have restricted access over the network.
  • It is important that the responsibilities are divided amongst different groups and the associated employees should have demonstrated high credibility in the past and the accountability should be well established.
  • A periodic end to end audit should be performed, which should track right from the origination of the service request to its execution in the production database and any non-compliance must be seriously dealt with.

More than these checks and controls, the organization should look for declining database update requests over a period of time, which is an indicator of improving system reliablity. Another way to look at the improvement is that the recurring requests of the same nature should vanish after two or three occurrances. The organization's software engineering process also should call far adequate checks and controls which will contribute to improved system reliability.

Sunday, July 1, 2012

Software Architecture Reviews


Review is a powerful technique that contributes to software quality. Various artifacts of the software development lifecycle are subject to review to ensure that any deficiencies could be spotted early on and addressed sooner, before letting it slip through further phases and in turn consuming more efforts than expected down the line. One such important review is the review of the software architecture. If you are asked to review the architecture of a software, it could be due to one of the following reasons.

  1. Possibly, you are a Senior Architect and is expected to complement your fellow Architect by reviewing his work and thereby helping him and in turn the organization to get the best possible software Design. Some or most organizations mandate this need as part of their engineering process. When this review is done effectively, the benefits are huge, as this review occurs early in the development life cycle. 
  2. One or more of the custom built application(s) used in the organization are suspected to have certain serious reliability / performance issue and you are engaged to come up with an analysis and a plan to set it right. If this situation arise, then it is very much evident that the first one did not happen or it wasn’t done well. In some cases, such situation arise when the stake holders knowingly compromise on certain software quality attributes initially and then surprised to see its impact down the line as it hits back. 
  3. You are possibly looking out to license a product and are evaluating its suitability to your organization. In this, case you will probably have a checklist of items created based on the IT policies and framework of your organization and this is highly dependent on the information revealed by the product vendor. 

Though there could be more reasons, the above are some of the primary reasons as to why one would need to perform an architectural review. In spite of as many reviews and testing, issues slips through and challenges the IT architects at some point down the line. Resolution of such issues may call for certain specific reviews and the method and approach would be different based on the type of the problem. For instance, if there be a data breach, a security review of the architecture is what is needed to not only identify the root cause for the current problem, but also to identify potential vulnerabilities and come up with solutions to plug those gaps.

These specific reviews can be typically associated with the broad software quality attributes, which are also termed as non-functional-requirements. The best way to approach these specific reviews is to start with an architectural review. A review checklist would be a good tool to use for the purpose, but the checklist should be exhaustive enough to cover necessary areas, so that the reviewer can get the right and required inputs and would be in a good position to form an opinion about the possible deficiencies and can relate it with the problem being attempted to be resolved.

Keep a watch on my blog for more on specific architecture reviews.

Saturday, June 23, 2012

The pressure points of Cloud Adoption


The values and benefits of cloud adoption is increasingly clear and well known. Not to be carried away with these values and benefits, it is important to identify and be aware of the pressure points that the Cloud Adoption brings in as called out by ISACA in its white paper titled as ‘Guiding Principles for Cloud Computing Adoption and Use. Essentially the differences in technology itself and its use impacts the way IT is governed and managed and the management’s reaction to these impacts brings on the pressure points as well, which need to be managed.

Differences such as change in cost allocation from capital to operational may have consequences that may not apparent at the beginning. For instance, contracting for a cloud based software would be an operational spend and may have a lower cost of entry and thus, such decisions may fall outside the review and approval process. While in most cases, the pressure points are to be managed as risks, these are not necessarily risks.

Speed and Agility

The time-to-market is a driver for cloud adoption as solutions to meet market needs would be available more quicker at lesser cost though there could be gaps in meeting the exact requirements. This agile exploitation in a reduced time frame puts greater pressure on enterprise, in which culture, process, and human factors related to technology have been developed to support longer development cycles and long term technology use. This pressure when not handled appropriately could result in increased risk level in the following areas:

  • An unbalanced prioritisation of value over trust in technology solution choices
  • Missed opportunities when other alternatives are not considered
  • Recovery mishaps because fallback positions are not fully exploited
  • Missing functionality if full requirements are not identified
  • Increased long-term costs due to reliance on multiple short-lived solutions
  • Reduced performance when enterprises are hesitant to introduce new solutions because of existing technology investments


Changing Boundaries

The reliance on cloud providers calls for change in the roles and responsibilities within the enterprise and transfer certain responsibilities to outside parties. Contracts and SLAs with service providers attempt to assign accountabilities, but governance dictates that the enterprises themselves, their board and management remain accountable. With this, the locus of decision making changes from governance functions to business line leaders. This change in the organizational boundaries can put greater pressure on enterprises. The risk outcomes out of this pressure point could be:

  • Role confusion when accountabilities and responsibilities are not clearly defined
  • Diminished effectiveness when decisions are made without engaging in a wider consideration of trust and value before cloud acquisition
  • Failure to satisfy constituent and end-user expectations for protection and privacy
  • Project delay and increased costs due to the need for personnel with governance responsibilities to revisit cloud plans
  • Unclear specifications of provider responsibilities and accountabilities in SLAs
  • Incomplete information being provided to board members and senior management


New Technologies and Technology Expectations

Cloud follows a sequence of disruptions in how technology is viewed, integrated into organizational strategy and managed and in how IT risks are identified and managed. Areas of high pressure can result when strategy and enterprise architecture do not consider the unique qualities of cloud computing and when enterprise processes and procedures do not easily adapt to changes made possible by cloud computing. The following risks could be the outcome of this pressure point:

  • Missed opportunities to extract value from the integration of cloud and internal systems
  • Increased vulnerability from incompatibilities and inconsistencies between cloud and internal systems
  • Less than expected results when human factors are not considered in the design and integration of cloud services and infrastructures
  • Levels of organizational performance that do not meet expectations because cloud solutions do not fully support organizational processes
  • Levels of technical performance that do not meet expectations because processes do not take full advantage of cloud capabilities


Level Playing Field

Cloud computing removes the advantage that large enterprises have traditionally had in terms of availability of technology specialists and technical sophistication. Smaller enterprises now have the ability to leverage the cloud services and use technology sophistication that large enterprise used to enjoy. This brings the small and medium enterprises on an equal position with much larger enterprises. This level playing field can have an impact on the strategy and its implementation. Ignoring this impact can result in increase of risk levels in the following areas:

  • New entrants claiming a segment of traditional market dominance
  • Strategies that do not address competitor capabilities
  • Less-than-expected benefits received from technology-dependent solutions


Utility Services and Service Supply Chains

With cloud computing, where computing is viewed as a utility, focus is shifting to the value and benefits obtained from such utilities. Agile enterprises benefits from solutions that can be used as needed and discarded when they no longer provide value. This view of computing as a utility and the delivery of solutions supply chain of information systems solutions puts greater pressure on enterprises that contain a culture that is not accepting of utility solutions, a structure that does not facilitate cooperative planning and processes that cannot take advantage of computing solutions provided as supply chain of utilities. Ignoring this could result in the following risk outcomes:

  • Over-investment of resources in planning and building internally developed information system solutions
  • Less-than-optimal results when value-producing cloud utilities are missing from the total solution
  • Duplication of effort when specialist services available through cloud providers are not integrated as part of system management
  • Less-than-expected results when utility components are not integrated into and managed as an information system capability supply chain


In the white paper, ISACA suggests that enterprises follow a six guiding principles that can help illuminate the path for cloud adoption. Click here to download the complete white paper which is available for registered (free registration) users.

Saturday, June 16, 2012

Key risk areas that can impact the project success


Till recently and to some extent, even now, some of us don’t want our insurance advisor talk about our own risk of life. It has been the belief for some not to think or talk about the risk of losing life. However, things are changing and most of us today are managing personal risks well by atleast transferring the risk to the insurer for financial protection. Risk management is not just financial protection and there is more to it, even though finance makes the core part in most cases. The same way, if we look at managing software projects, the project manager and sponsors had to deal with so many issues every now and then, before it was felt that preventing such issues coming along the way is better than dealing with such issues, which is what is risk management.

The risk managers or risk experts always think of a what if scenario for every action / decision so that all possible risks are identified early on and this in the past was thought to be creating ‘negative vibes’ and did not receive much support from the project sponsors. Because, when much of the risks are identified upfront, it might be so that the project may have to be shelved at the start itself. Here again, things have changed and most CxOs are accepting ‘failing fast’ a better option than failing at the end. Failing at start is an even better option. Refer my own blog on ‘failing at start’.

Given that most of the project sponsors and project managers have realized that risk management is better than its only alternative crisis management and are believing that risk management is a key area of project management, let us explore the key risk areas in a broader context that need a close watch which if not could impact the success of an end to end software project.

User Expectations

Even though a well written functional specifications exist, and the development team developing to that requirement, there is a potential risk of end users when they look at the final product go back and say that “this is not working in a way we want’, and pushing back the product for rework. This is mainly due to the fact that everything around the business is changing with time. The more time the development team take in involving the end users, the chances are very much likely that the expectation would have changed. Agile is the solution to address this risk area, where by the end users are participating in the development and small chunks of the product are delivered every now and then for user feedback.

There are more to it, most projects do not have the non functional requirements documented and much of the user expectations go around such software capabilities, for instance application performance is a key non functional requirement, which the development team is expected to take care of as part of. While the solution for this lies with the solution architect, the project manager and the stakeholders should not lose sight of this important area and should be managing the user expectations all through the project execution.

If one can identify or spot the potential risks around the user expectation and manage it well, the chances of the project successfully reaching the milestone is very high.

Technology Shocks

This is a broader risk area and could be broken down into many sub areas. Many projects hit a road block when it get closer to production deployment, by when the infrastructure team may find heavy investment in terms of hardware and software tools required to support the product in production. Some projects even faces issues in the early stages as well, as the development team may find some tools or technology not suitable for the given solution.

A well done pre-project risk assessment can help address this risk as the architects involves in such assessments can anticipate and call out such road blocks, which might help the project sponsors to take a call. Development teams in most cases would want to jump on to the project and start building everything themselves without considering re-usable off the shelf components being available for most capabilities. This tendency can add to the schedule risk, as the project may take longer due to technical issues that the team may face.

Managing risks in this area early on is very important as certain choices might be very difficult to reverse in the middle of the project. It could also be a case where implementing certain requirements with a given technology platform or tools could be much more complex than with certain other tools. Much of this responsibility lies with the Architects, who should do a good job to take the project pass through this risk area.

Skill gaps

While this item is to some extent related to technology shock, there is more to skill than the technology itself and that made me to call this out as a separate risk area itself. Soft skills play a key role in taking the project on the success path. Staff attrition is inevitable in software projects and as such managing the dependency on people is very important. The resources holding key roles should have the right attitude of hand holding the teams, willing to share the knowledge, quick and effective in on-boarding new members to the team.

Such skills of key resources holding the lead role even influence staff leaving the team or even the organization and it could be some of the best resources of the project team exiting for such reasons. The Project Sponsors should play an important role here in getting to know of such risk items and manage them well so as to keep the morale of the team high and get the best of the team.

Every member of the project should contribute towards the success of the project, keeping in mind the project goals and objectives. It is not uncommon that some key members of the project make certain decisions in such a way that is beneficial to him or them and in turn putting the success of the project at risk. This is a sort of political behaviour by some members within the project. Here again, the sponsors should involve more into the project and look for such risks spotted early on, so that control measures can be put in place by changing the team composition or by imparting necessary training or counselling to the needy members.

This third risk, when identified and managed well will ensure the team members collaborate and communicate well and deliver their best which eventually will contribute towards project success.

Lack of Risk awareness

Finally, lack of risk awareness by the project stakeholders is the biggest risk. This calls for a proper risk strategy and objectives at the organization level and at every project level and letting every member of the project to actively participate in risk identification and management.

While most project managers do mandate risk management as part of the project charter and do maintain a risk register, they fail to apply the risk management principles consistently, which eventually lead to incorrect risk prioritization. Not to be stressed, the controls and tasks identified to reduce the risk level also need to be monitored on par with other project tasks for timely actions. It is also a common mistake that the project managers do by missing out to estimate the efforts for risk management and not making it part of the project schedule.

Another important aspect of risk management, which is normally ignored is the communication and continuous monitoring. Risks need continuous monitoring and need different levels of communication or escalation depending on the risk level. Most projects have a stale risk register, where the risks are just identified and no monitoring or follow ups being done on them. Use of an appropriate risk management tool is recommended as it will ensure the visibility of the risks to all concerned with automatic alerts and escalations and also will facilitate consolidation of risks across multiple projects, there by facilitating managing risks at enterprise level.

Friday, May 18, 2012

Pre-project Reviews help projects fail at start


A new project was approved and a 15 member team sit round the table in the project kick off meeting with excitement and enthusiasm. Business Analysts have put together the stories which Developers have started developing. There were periodical project status reviews and there were issues and risks discussed in these meetings and the project manager did well in managing the issues and risks. There comes a message from stakeholders that all work on the project be stopped with immediate effect.

That may be sounding familiar to every IT familiar as study says that 37% of the IT projects are at the risk of failing. Another study finds that 31% of the projects are cancelled before even hitting the finish line.  It really hurts the project team members when a project is cancelled or shelved. But there could be reasons for doing so and such decisions are taken when continuing with the project will only increase the loss and would not bring in value for the sponsors. This is where ‘fail fast’ will help as some times pulling the project down even earlier would save a lot of efforts and money for the sponsors.

Let us see what can be done even before starting any work on a project, so that the potential failure is spotted in the pre-project stage itself so that the project is not allowed to start in the first place. When the need for a project is felt, the executive management (sometimes called Project Review Board) takes a look at it and reviews it and if it sees merit in it, gives a go for the project. In some cases, by the time the Project Review Board (PRB) sees the project proposal, considerable efforts would have already incurred in the form of requirement gathering and analysis. The review by the PRB members, when done well will bring out the ability to measure the risk exposure and the RoI (Return on Investment) of the project, which in turn influences the further decision. However, it is important here that the PRB members shall be presented with adequate information that helps taking the right decision. A well drafted checklist or template that captures all the required information would help not to miss out the details and will help reduce the projects failing down the road.

The project proposal template shall at a minimum capture the following details, so that an informed decision is taken by the PRB.

Motivation:

The problem the project is expected to solve or a business opportunity that the project will help the business to capitalize should be stated well, so that the PRB members are able to appreciate the motivation behind the project need. It would be appropriate to quantify the value / benefit the solution would bring when implemented as intended. Some projects may bring immediate benefits, whereas some may bring benefits in the longer term, in which case, the same shall be called out explicitly. Similarly some projects may bring in monetary benefits, while others may bring in intangible value / benefits.

Estimated investment:

It may not make sense if the investment to be made for the project is significantly higher, which may leave the value or benefit negligible. Most projects get hit on account of cost overrun. The estimate should be close to being accurate and use of a template will make sure that all cost elements are considered. In some cases, a better estimation can be made only after gathering requirements with sufficient details. It would be wise that in such cases, the PRB may take a call to sponsor the initial efforts alone and let the proposal be placed again for review with more accurate estimation.

Constraints:

All projects will be constrained with various ifs and buts. Each possible constraint should be identified and called out in the proposal document. Each constraint will have one more associated risk items for which a contingency plan and mitigation plan has to be put in place. Risk Management practice has evolved in recent times and there are methods and techniques with which the risks can be quantified (a factor of probability and impact) and the overall risk exposure of the proposed project can be measured. This will help the reviewers to take an informed decision whether this much risk is worth to be taken for the value this project might give back.

Solution longevity and re-usability:

It would help if the expected life of the solution is determined. Some of the solutions may have shorter life, but may offer the re-usability with minor changes / enhancements. This will have to be determined considering the longevity of the tools and technology that forms part of the solution. Considerable number projects get cancelled as the sponsors gets to know that the useful life of the solution is going to be short and that the project may not be completed on time leading to further reduction in the longevity.

The above are some of the key factors that influence the decision on the project. There are many other factors which depending on the nature and size of the project may have significant influence on the decision making. For instance, security and compliance requirements could be a significant risk, which the PRB needs to know of while taking the go decisions on projects. Other factors that may find a place in the Proposal document include, deployment infrastructure, post production maintenance aspects, choice of technology, resource availability, etc.

On top of everything, the team that prepares the project proposal document should have the expertise in the business domain, technology used, estimation and related skill areas. Typically this will require a team of architects of appropriate specialization to accomplish this.

References:

Saturday, May 12, 2012

Software Testability Review


Introduction

Close to 40% of the overall software development efforts is spent in testing activities which include test design, test preparation, test execution, and test result analysis. Improvement of test criteria and coverage, test automation, use of tools test analysis and test case re-use, etc are the popular methods used to bring this cost down.

Testability of a software application also has an influence on the testing efforts, making it harder or easier to test and analyse the test results. Testability is an important factor to achieve an efficient and effective test process. Organizations attempt to achieve higher testability by adopting an systematic approach which cuts across the SDLC, which is called Testability Engineering. Software Architects or specialized test architects are called in at different stages of product design and development to ensure that the testability is designed and built as part of the system. Being an important characteristic as it plays a vital role in influencing the cost of testing. Testability review technique is one of the primary technique which when adopted at the right stages will ensure high testability. This blog focuses primarily on the Testability Reviews.

Testability Reviews

Reviews in general is a popular and widely used technique by Software project teams to isolate defects in various stages of design and development. While testers do participate in most such reviews, whether they are called upon to bring out potential testing problems during these reviews needs validation. Let us see some of the testability heuristics as derived from testing practices and related literatures.

Architecture Level


Concentrate control structure related to a particular functionality in one cluster (or class). At times, Software Arcitects tend to emphasize this aspect, which could lead to dependent functions or methods being part of different clusters and in case of distributed development, this could even be with a different development teams. This will call for unnecessary development of test stubs or test drivers while testing.

Give higher priority to the modularity of a system than to the reuse of components. This one is debatable, but the reason why it is an important testability heuristics is that a higher stress on reusability may mean the non-availability of some such dependent methods or components, leading to delays in testing. One has to carefully evaluate the pros and cons and take a call on this heuristic.

Implement a standard test interface within each domain class. Implement a standard test interface within base classes and technical classes if they are selected for direct class testing. A standard test interface in all classes causes minimal additional implementation effort and has a high potential payback.

Introduce observation points at semantically meaningful points. Observation points are helpful in being able to collect or watch states and values of various objects or variables as the code execution pass through different stage. Observation points will be very helpful to debug or troubleshoot some of the hard to simulate defects.

Map your test strategy and your design approach with respect to inheritance hierarchies. Polymorphic method calls between classes within an inheritance hierarchy may require re-testing of the superclass when a subclass changes and vice versa. There are at least three different ways you can deal with this problem:

  1. Avoid inheritance, use delegation instead: re-testing is not necessary.
  2. Use inheritance, perform an analysis of the dependencies between super- and subclasses, and use a selective regression testing strategy.
  3. Use inheritance, don’t analyse dependencies and rerun all test cases (test automation strategy).


Design Level


Make control structures explicit. Sometimes, control structure can be hidden in data, which gives room for chances of omitting important test cases there by leading to reduced test coverage. Being explicit will help testers to overcome this problem.

Avoid cyclic dependencies, especially between methods. Cyclic dependencies make the determination of test order complicated and also necessitate stubs and drivers and this could significantly increase the testing efforts.

Avoid unmotivated polymorphic method parameters, especially if strict subtyping does not apply. This could have a multiplier effect on the number of test cases or the test data as the all combinations of parameter classes need to be tested. An alternative here could be to use strict sub typing or minimize the number of parameter classes and restrict the type by appropriate casts wherever possible.

Avoid implicit inputs and side-effects. It is easy to miss implicit input and side-effects of methods, especially if they are not well documented.

Avoid unmotivated state behavior of objects.  This could lead to increase in number of test cases as it would be necessary to test each method for each object state.

Implement a state testing function for each test relevant class and restrict its invocation by test drivers if necessary. The state of an object is an important part of the test result after each test case execution but normally not accessible from the outside. Encapsulation makes testing more difficult. Breaking the encapsulation introduces unwanted dependencies between test drivers and the class to test.

Compensate test relevant information loss by built-in-tests. Built-in-test facilities like assertions provide a way to review and control intermediate computation results, thereby reducing the impact of information loss on the testing efficiency.

There has to be at least one input element (or combination of input values) for each output element. Unachievable output values (called output inconsistencies) may be indicators for unreachable statements and paths. This when avoided makes test result analysis a lot easier.

Provide means to trigger all exceptions. Some exceptions do never occur according to theory but needs consideration as it could improve the reliability of the system. It is a challenge for testers too to being able to trigger these exceptions so that the test cases cover all exceptions. A testable exception handling requires a design strategy and perhaps simulation of failure modes.

Code Level


Don’t squeeze the code. Code readability is an important factor which will be of great help when the testers are to design test cases and achieve maximum code coverage especially in case of unit testing.
Avoid variable reuse. Variable reuse leads to implicit information loss, i.e. loss of intermediate computation results and can be avoided by using more variables.

Minimize the number of unachievable paths. Unachievable paths will impact the code coverage as it would be difficult for the testers to design test cases that traverse through such paths. In order to reduce the number of unachievable paths, avoid correlated decisions.

Avoid recursive implementations of algorithms if there is no checking of invariants. It is difficult to test recursive algorithms because we cannot easily create a stub for the component or method we want to test. A solution to this problem is to split it into pairs that call each other or to use built-in assertions.

Summary

Dealing with testability issues during reviews of software artifacts should be the first choice to achieve an effective and efficient test process. Testability is not a characteristic of code alone but applies to the architecture and design as well.  The heuristics presented above are not exhaustive as more heuristics may need consideration depending on the test & design strategies, the nature of the system, the tools and technologies used for development and testing.