Saturday, December 29, 2012

Resilient Systems - Survivability of Software Systems

Resilience as we all know is an ability to withstand through tough times.There is also another term quite interchangeably used, which is Reliability. But Reliability and Resilience are different. Reliability is about a system or a process that has zero tolerance to failure or the one that should not fail. In other words, when we talk about reliable systems, the context is that failure is not expected or rather acceptable. Whereas Resilience is about the ability to recover from failures. What is important to understand about resilience is that failure is expected and is inherent in any systems or processes, which might be triggered due to changes to the platform, environment and data. While Reliability is about the system’s robustness of not failing, Resilience is its ability to sense or detect failures ahead and then prevent it from encountering such events that lead to failure and when it cannot be avoided, allow it to happen and then recover from the failure sooner.


A working definition for resilience (of a system) developed by the Resilient Systems Working Group (RSWG) is as follows:

“Resilience is the capability of a system with specific characteristics before, during and after a disruption to absorb the disruption, recover to an acceptable level of performance, and sustain that level for an acceptable period of time.“ The following words were clarified:
  • The term capability is preferred over capacity since capacity has a specific meaning in the design principles.
  • The term system is limited to human-made systems containing software, hardware, humans, concepts, and processes. Infrastructures are also systems.
  • The term sustain allows determination of long-term performance to be stated.
  • Characteristics can be static features, such as redundancy, or dynamic features, such as corrective action to be specified.
  • Before, during and after – Allows the three phases of disruption to be considered.
    • Before – Allows anticipation and corrective action to be considered
    • During – How the system survives the impact of the disruption
    • After – How the system recovers from the disruption
  • Disruption is the initiating event of a reduction is performance. A disruption may be either a sudden or sustained event. A disruption may either be internal (human or software error) or external (earthquake, tsunami, hurricane, or terrorist attack).

Evan Marcus, and Hal Stern in their book Blueprints for High Availability, define a resilient system as one that can take a hit to a critical component and recover and come back for more in a known, bounded, and generally acceptable period of time.


In general Resilience is a term of concern for Information Security professionals as the final impact of disruption (from which a system needs to recover), could mostly be on Availability which is one of the three tenets of Information Security (CIA - Confidentiality, Integrity and Availability). But there is a lot for System designers and developers, especially those tasked to build mission critical systems to be concerned about Resilience and architect the systems to build in a required level of Resilience characteristics. For a system to be resilient, it should draw necessary support from dependent software and hardware components, systems and the platform. For instance a disruption for a web application can even be from network outage, security attacks at the network layer, which the software has no control over. But it is important to consider software resiliency in relation to the resiliency of the entire system, including the human and operational components.


The PDR (Protect - Detect - React) strategy is no longer as effective as it used to be due to various factors. It is time that predictive analytics and some of the disruptive technologies like big data and machine learning need a consideration in enhancing the system resiliency. Based on the logs of various inter-connected applications or components and other traffic data on the network, intelligence need to be built into the system to a combination of number of possible actions. For instance, if there is a reason to suspect an attacker attempting to gain access to the systems, a possible action could be to operate the system at a reduced access mode, i.e. parts of the systems may be shut down or parts of the networks to which the system is exposed could be blocked, etc.


OWASP’s AppSensor project is worth checking by the architects and developers. The AppSensor project defines a conceptual framework and methodology that offers prescriptive guidance to implement intrusion detection and automated response into an existing application. AppSensor defines over 50 different detection points which can be used to identify a malicious attacker.Appsensor provides guidance in the form of possible responses for each such identified malicious atacer.


The following are some of the factors that need to be appropriately addressed to enhance the resilience of the systems:

Complexity - With systems continuously evolving and many discrete systems and components increasingly integrated into today’s IT eco-system, the complexity is on the rise. This makes the resiliency routines as built in to individual systems needing a constant review and revision.

Cloud - Cloud computing is gaining higher acceptance and as organizations embrace cloud for its IT needs, the location of data, systems and components are widespread across the globe and that brings in challenge for those involved in building resilience.

Agility - To stay on top of the competition, organizations need agility in their business processes, which means rapid changes in the underlying systems and this could be a challenge as this will call for a constant check to ensure that the changes being introduced does not downgrade or compromise the resiliency level of the systems.


While there are techniques and guiding principles which when followed and applied, the resilience of the systems can be greatly improved, such design or implementation comes with a price and that is where the economics of Resiliency needs to be considered. For instance, mission critical software systems like the ones used in medical devices, need to have a high resilience characteristic, but quite many of the business systems can have a higher tolerance level and thereby being less resilient. However, it is good to document the expected resilience level at the initial stage and work on it in the early life cycle of the system development. thinking about resilience later in the life cycle may not be any good as implementation will call for higher investment.



References:

Crosstalk - The journal of Defense Software Engineering Vol 22 No:6

Resilient Systems Working Group

OWASP - AppSensor Project

Saturday, December 15, 2012

Effective vs Ineffective Security Governance

Continuing with my earlier blog on Measuring the Performance of EA, I was looking for methods and measures that can be used for measuring the effectiveness of the security program in an enterprise. I happened to read a CERT article titled as Characteristics of Effective Security Governance which contains a good comparision of what is effective and what is ineffective. I have reproduced it here in this blog for a quick reference. The original article of CERT though out dated is worth reading.

EffectiveIneffective or Absent
Board members understand that information security is critical to the organization and demand to be updated quarterly on security performance and breaches.

The board establishes a board risk committee (BRC) that understands security’s role in achieving compliance with applicable laws and regulations, and in mitigating organization risk.

The BRC conducts regular reviews of the ESP.

The board’s audit committee (BAC) ensures that annual internal and external audits of the security program are conducted and reported.
Board members do not understand that information security is in their realm of responsibility, and focus solely on corporate governance and profits.

Security is addressed adhoc, if at all.

Reviews are conducted following a major incident, if at all.

The BAC defers to internal and external auditors on the need for reviews. There is no audit plan to guide this selection.
The BRC and executive management team set an acceptable risk level. This is based on comprehensive and periodic risk assessments that take into account reasonably foreseeable internal and external security risks and magnitude of harm.

The resulting risk management plan is aligned with the entity’s strategic goals, forming the basis for the company's security policies and program.
The CISO locates boilerplate security policies, inserts the organization's name, and has the CEO sign them.

If a documented security plan exists, it does not map to the organization’s risk management or strategic plan, and does not capture security requirements for systems and other digital assets.
A cross-organizational security team comprised of senior management, general counsel, CFO, CIO, CSO and/or CRO, CPO, HR, internal communication/public relations, and procurement personnel meet regularly to discuss the effectiveness of the security program, new issues, and to coordinate the resolution of problems.CEO, CFO, general counsel, HR, procurement personnel, and business unit managers view information security as the responsibility of the CIO, CISO, and IT department and do not get involved.

The CSO handles physical and personnel security and rarely interacts with the CISO.
The general counsel rarely communicates particular compliance requirements or contractual security provisions to managers and technical staff, or communicates on an ad-hoc basis.
The CSO/CRO reports to the COO or CEO of the organization with a clear delineation of responsibilities and rights separate from the CIO.

Operational policies and procedures enforce segregation of duties (SOD) and provide checks and balances and audit trails against abuses.
The CISO reports to the CIO. The CISO is responsible for all activities associated with system and information ownership.

The CRO does not interact with the CISO or consider security to be a key risk for the organization.
Risks (including security) inherent at critical steps and decision points throughout business processes are documented and regularly reviewed.

Executive management holds business leaders responsible for carrying out risk management activities (including security) for their specific business units.

Business leaders accept the risks for their systems and authorize or deny their operation.
All security activity takes place within the security department, thus security works within a silo and is not integrated throughout the organization.

Business leaders are not aware of the risks associated with their systems or take no responsibility for their security.
Critical systems and digital assets are documented and have designated owners and defined security requirements.Systems and digital assets are not documented and not analyzed for potential security risks that can affect operations, productivity, and profitability. System and asset ownership are not clearly established.
There are documented policies and procedures for change management at both the operational and technical levels, with appropriate segregation of duties.

There is zero tolerance6 for unauthorized changes with identified consequences if these are intentional.
The change management process is absent or ineffective. It is not documented or controlled.

The CIO (instead of the CISO) ensures that all necessary changes are made to security controls. In effect, SOD is absent.
Employees are held accountable for complying with security policies and procedures. This includes reporting any malicious security breaches, intentional compromises, or suspected internal violations of policies and procedures.Policies and procedures are developed but no enforcement or accountability practices are envisioned or deployed. Monitoring of employees and checks on controls are not routinely performed.
The ESP implements sound, proven security practices and standards necessary to support business operations.No or minimal security standards and sound practices are implemented. Using these is not viewed as a business imperative.
Security products, tools, managed services, and consultants are purchased and deployed in a consistent and informed manner, using an established, documented process.

They are periodically reviewed to ensure they continue to meet security requirements and are cost effective.
Security products, tools, managed services, and consultants are purchased and deployed without any real research or performance metrics to be able to determine their ROI or effectiveness.

The organization has a false sense of security because it is using products, tools, managed services, and consultants.
The organization reviews its enterprise security program, security processes, and security’s role in business processes.

The goal of the ESP is continuous improvement.
The organization does not have an enterprise security program and does not analyze its security processes for improvement.

The organization addresses security in an ad-hoc fashion, responding to the latest threat or attack, often repeating the same mistakes.
Independent audits are conducted by the BAC. Independent reviews are conducted by the BRC. Results are discussed with leaders and the Board. Corrective actions are taken in a timely manner, and reviewed.Audits and reviews are conducted after major security incidents, if at all.


The article also lists eleven characteristics of effective security governance in addition to listing the Ten challenges to implementing an effective security governance. I would highly recommend you to read the full article.


References:
CERT’s resources on Governing for Enterprise Security


CERT and CERT Coordination Center are registered in the U.S. Patent and Trademark Office by Carnegie Mellon University

Thursday, December 13, 2012

Implementing IT Balanced Scorecard

Source: ISACA, Board Briefing on IT Governance 2nd edition

With IT increasingly becoming an enabler of business, more and more organizations are looking for effective and efficient management of IT, so that the investment in IT fetches optimum value. On the same lines, the need for better IT Governance is being felt by the Board of increasing number of organizations. One of the key domain of IT Governance is Performance Measurement. Going by "what is not measured cannot be managed", there need to be plans and processes in place for measuring the performance of IT so that it can be better governed.

Much of the value returned by IT are intangible. While it is easy to measure the tangible benefits, measuring intangible benefits is difficult. Business Scorecard (BSC) which evolved in the early 1990s has evolved into an very useful tool for measuring both tangible and intangible benefits segmented into four perspectives - Financial, Customer, Internal Process and Learning. IT BSC as derived from the Business Scorecards were found to be a a very effective measurement system addressing the concerns of reporting the intangible benefits to the Board.
The Balanced Scorecard as it has evolved over a period of time is being looked at not just as a performance measurement tool, but as a strategic planning and management system. This is because, the Balanced Scorecards can be cascaded down smaller business units including IT and aggregated upwards to the higher-level. IT BSC, which is cascaded from the Business Scorecard can be further subdivided into one for each of the technology domains, for instance one for managing the IT Operations and another to manage the IT Development areas. While doing so, it is important to maintain the linkages between each such cascaded Scorecards and this way the Balanced Scorecard can facilitate Strategy Mapping, thereby improving the Alignment of the objectives of the smaller business and IT units into the business strategy.


The perspective of the IT BSC may be redefined to better represent the IT organization. For instance, the following four perspectives may be used in IT BSC:

  • Corporate Contribution - Equivalent to the Finance perspective of the Balanced Scorecard, this represents the view of business executives on the IT department. 
  • Customer Orientation - Equivalent to the Customer perspective of the Balanced Scorecard, this represents the view of the end users on the IT department. 
  • Operational Excellence - Equivalent to Process perspective of the Balanced Scorecard, this represents the effectiveness and efficiency of various standards, processes and policies followed by the IT department. 
  • Future Orientation - Equivalent to Learning and Growth perspective of the Balanced scorecard, this represent a view of how well IT is prepared to meet the future needs of the business.

To be effective, the following three principles need to be built into the balanced scorecards:

  • Cause-and-effect relationships - the identified performance measures have a cause and effect relationships amongst them, for instance a measure on Improved developer skills (Future Orientation perspective) as a cause will result in improved quality in the applications delivered(Operational Excellence perspective), which in turn should contribute for user statisfaction (User Orientation perspective) 
  • Sufficient performance drivers - While it is common to measure all the possible outcomes (measuring what you have done), it is also important to identify and include suufficient performance drivers(how you are doing). A good mix of both outcome measures and performance drivers are essential for the Scorecard to be effective. 
  • Linkage to financial measures - IT Scorecard, being cascaded from the Enterprise Business Scorecard, the measures in the IT Scorecard should link up to a corresponding measure in the top-level business scorecard. 

To have the Balanced Scorecard implemented as part of the IT Governance initiative, the following steps are recommended:

  • Obtain commitment - Make a presentation to the board and executives explaining the concepts, benefits and cost of implementing it and get a commitment to go ahead. 
  • Kick-off - Kick off the Balanced Scorecard initiative as a project and as part of this activity, train the staff and identify the project team members. 
  • Strategy map - Get an understanding of the corporate business strategy and the sub unit level strategies and then establish a strategy map. 
  • Metrics selection - Understand the existing metrics if any and identify the required metrics, which should be a good mix of both outcome measures and performance drivers 
  • Metrics definition - With respect to each identified metric, create a standard definition, related processes to collect and manage the data. As part of this, the cause and effect relationships should also be clarified and the linkage with higher level scorecards should also be established. 
  • Assign ownership - Assign owners for each metric. 
  • Define Targets - With respect to each metric, set targets (may be a range) for the function heads to achieve and devise strategic initiatives to achieve these targets. 
  • Act on the results - Have the appropriate executive management or board as may be required to review the resulting measures and then act on the results. 
  • Review periodically - The metric definitions, the linkages and the cause-effect relationships may require revision based on experience and this achieved through periodic reviews. 

Successful execution of strategy requires the successful alignment of four components: the strategy, the organization, the employees and the management systems. As Kaplan and Norton put it, “Strategy execution is not a matter of luck. It is the result of conscious attention, combining both leadership and management processes to describe and measure the strategy, to align internal and external organizational units with the strategy, to align employees with the strategy through intrinsic and extrinsic motivation and targeted competency development programs and finally, to align existing management processes, reports and review meetings, with the execution, monitoring and adapting of the strategy.”

Sunday, December 9, 2012

Measuring Performance of Enterprise Architecture

Enterprise Architecture is viewed as an important function in IT Governance and it plays a vital role in aligning IT with the Business. This function is expected to define the technical direction and to ensure application of principles of Architecture to the design and maintenance of IT systems, which in turn should be in alignment with the business vision, mission and strategies and the role that the IT within the organization is expected to play. While the support for commitment and funding is important, it is also important that the EA function consider the following(not exhaustive) to be successful:

  • Alignment with the business strategy and the culture of the organization, 
  • Actively involve in projects to ensure that the principles of design and evolution are adhered to and ensure the continued focus on the business requirements.
  • Offer technical consultancy for all the business and IT functions both internal and external. 
  • Acting as a gate for all decisions impacting the design & evolution architecture. 

IT Governance views EA as the hub of the IT wheel with linkages with various processes, components and goals of the enterprise and some of such key and enabling links are:

  • Promoting and enabling Business Agility
  • Providing standards, policies and principles for the IT Project, Program and Portfolio management function
  • Guides and enables cost management and consolidation
  • Facilitates cost-effective, scalable integration of various IT systems
  • Supports IT Governance by defining / providing the conceptual and technical priorities and thereby promoting informed decision making

Going by the premise, what is not measured does not get managed, it is important to identify the measurable objectives for the Architecture function itself, so that it is well managed and its contribution to the success of the organization is established. While measurement of Architectural activities is difficult, COBIT suggests a set of measurable outcomes and performance measures, some of which are the following:

Number of technology solutions that are not aligned with the business strategy - One of the objectives of the EA function should be to ensure that the technology solutions chosen or implemented are aligned to the business strategy a measure around this could be very useful to establish that the number of misaligned solutions are on the decline. There could be other measures derived around this and could be represented as a relative measure to the total solutions.

Percent of non-compliant technology projects and platforms planned - With the fast changing business environment, there will be times when the business will need to solutions technology projects that are not compliant with the standards and principles laid out by EA. EA has the responsibility to carefully review such needs and grant waivers. Such waivers should be for a shorter term and should be backed with a plan to normalize it. At times, this could call revision in the standards, policies or principles. A measure around this could be very useful that the EA is effective in dealing with non-compliant technology projects and platforms.

Decreased number of technology platforms to maintain - Standardization is one of the objectives of EA, which could contribute to cost reductions and reduce the technical complexity. Statistics and surveys show that enterprises without an active IT Governance / EA function have more multiple applications requiring different platforms for the same business requirement being used by different departments. With an effective EA function, these should be very less in number and should decline over a period. A measure around this is a very good indicator of EA being effective in this area.

Reduced application deployment effort and time-to-market - Supporting Business agility is yet another key objective of the EA function. Today’s businesses are operating in a highly dynamic industry environment and in order to stay competitive and to sustain its market position, need support from IT to have the new or changed capabilities with a reduced time-to-market. A delayed delivery from IT could mean an opportunity lost. A measure around this indicator would really be helpful in establishing how IT is supporting the business changes.

Increased interoperability between systems and applications - It is quite common that most enterprises have multiple applications for specific needs, but there is a need to have these applications share data and information amongst each other. With cloud computing gaining wider acceptance, most enterprises are looking at discrete cloud based solutions. With the benefits outweighing the concerns and constraints, and that the industry is working towards addressing these concerns, there will be increased focus on move to cloud. This will mean hosted applications from various providers would need to be interoperable and working with other in house applications. EA should ensure that the technology and solutions acquired or designed should support this important attribute i.e. interoperability. A measure around this parameter would be an important indicator of EA function’s effectiveness.

Percent of IT budget assigned to technology infrastructure and research - Yet another expectation from the EA function is that it should help businesses in leveraging emerging technology to its advantage. This will require the Architects to be continuously looking for newer technologies and its application areas, and recommend such technology or solutions for implementation so that the business will get the most out of IT to accomplish its mission. It is also important the extent of this activity should be in line with the identified and stated role of IT in the organization. While the percentage of IT budget used for research is an useful measure, there could be other useful measures derived from this, for instance, number of research solutions getting implemented as a percentage to total number of solutions implemented in a given period. Business satisfaction on timely identification and analysis of technology opportunities is another related measure which is indicative of the outcome of this research.

Number of months since the last technology infrastructure review - With the fast changing IT space, it is important to ensure that the technology infrastructure is of continued relevance to meeting the business objectives and if needed changes should be considered. A measure indicating that the EA function is performing review of the technology infrastructure periodically is a good indicator of its effectiveness. Measures derived around this review could be based on the outcome of the review will also be very useful.

There is no one size fits all in IT and as such the measures indicated above need to be tailored to suit the organization and the role of IT within the organization.

References:

COBIT - an IT Governance framework from ISACA.

Developing a successful governance strategy - A best practice guide for decision makers in IT from The National Computing Center, UK