Tech Bytes by Kannan Subbiah

Sunday, March 31, 2013

Software Complexity - an IT Risk perspective

IT risk management predominantly focuses on IT Security though there are other category of IT risks which are less known. Software complexity is such an important unusual or unpopular IT risk. Basically, software complexity is one of many non functional quality attributes, which in most cases is ignored completely, leading to associated risks occurring and thus leading to financial and non financial impacts on the business entity. In this blog, let us try to understand the software complexity in the perspective of IT risk, and shed some light on whose baby it is and how to manage or control it.

What is Software Complexity?

Let us first understand what is Software Complexity. The simple definition of complexity is how hard a software is to understand, use and / or verify it. Complexity is relative to the observer, meaning that what appears complex to one person might be well understood by another. For example, a software engineer familiar with the systems theory of control would see good structure and organization in a well-designed attitude control system, but another engineer, unfamiliar with the theory, would have much more difficulty understanding the design.

Basili defines complexity as a measure of resources expended by a system while interacting with a piece of software to perform a given task. In terms of the end users, this could be understood as the usability. i.e. how much time the user needs to spend accomplish a task using a software and how much training is required to get familiar with it. Similarly in terms of the developers, this could be understood as maintainability i.e. how easy it is for new developers to understand and work on it to further improve it or to address issues. Again, in terms of the IT infrastructure team, this could be understood as the level of efforts needed to have it deployed and support it, which again should be considered under maintainability.

Why Complexity is an evil?

Complexity as the above definition goes, requires more resources to be expended than normal and thus is counter productive. We have numerous best practices, standards and frameworks that advocate for eliminating complexity, but somehow it creeps in and pose as a challenge in most cases. The consequence of software complexity as we have put it above clearly is risk, and it could be even be a business risk, when we look at it in the end user perspective. The following is a sample list of risks that could arise due to software complexity:

Complex software is difficult to verify and validate and this could lead to new defects surfacing even after months of its production use. Depending on the severity of the defect, the impact could be low to very high.
Software complexity could render the business processes inefficient, as the end users may have to spend more time in using the software to execute certain business processes, which could result in losing the competitiveness and thereby losing market.
Failure to consider the downstream complexity by the development teams could result in a state of ‘project successful, but mission failed’. i.e. even if the project teams have successfully completed the project, the resulting software product might be complex enough to render it useless thereby rendering the entire investment in the project to be loss.
Post production support will always be challenge as it is difficult to retain trained and capable resources to maintain the complex software and this is a most common risk that enterprises are continuing to battle with.
Complexity in deploying and distributing the software is another area of concern for the infrastructure team. Sometimes, such complex software might demand so much of computing resources wherein the costs of such hardware and other infrastructure resources might outweigh the perceived benefits out of using such a software.

The above list is only illustrative and the list could be even bigger and I would let you come up with more such consequences in the form of feedback.

Measuring Software Complexity

To be managed or controlled, we should be able to measure the complexity. While the discussion of measures of software complexity is out of the scope of this blog (I might work on to write another blog on measuring software complexity), at a high level, the following measures can be used.

Complexity Metric	Primary Measure of …
Cyclomatic complexity (McCabe)	Soundness and confidence; measures the number of linearly-independent paths through a program module; strong indicator of testing effort
Halstead complexity measures	Algorithmic complexity, measured by counting operators and operands; a measure of maintainability
Henry and Kafura metrics	Coupling between modules (parameters, global variables, calls)
Bowles metrics	Module and system complexity; coupling via parameters and global variables
Troy and Zweben metrics	Modularity or coupling; complexity of structure (maximum depth of structure chart); calls-to and called-by
Ligier metrics	Modularity of the structure chart

Controlling Software Complexiity

As we have seen in the definition section, complexity is context sensitive. i.e. the same software may be viewed at different complexity level by different class of users. While this is a challenge for the developers, they find it as a useful shelter to stay away from addressing it. Developers primarily depend on the business system analysts, who in most cases play the role of representing the end users and when they fail to see the complexity which the end users would, development teams are mislead. In one of the project that we have been working on, the business system analyst, who was also tasked to triage the defects logged by end users, simply turns down such defects suggesting the users to learn to use the system. This definitely goes against controlling the complexity there by managing the IT Risk.

Controlling Software Complexity calls for adopting certain practices in the areas of architecture, design, build, testing and project management some of which are listed below:

Set the expectations: Overly stringent requirements and simplistic hardware interfaces can complicate software and a lack of consideration for testability can complicate verification efforts. Cultivating awareness about the downstream complexity and how detrimental the complexity could be, if ignored, would help in getting the much needed team collaboration and commitment.
Requirement Reviews: Ambiguous software requirements has been a cause of complexity, as many times, the developers tend to assume things. Similarly certain requirements may be unnecessary and could be overly stringent, which can be spotted in the review process leading to reduction in complexity.
Domain Skills: Having the relevant product or domain skills will go a long way in reducing downstream complexity as the development team would be familiar as to how the industry is handling such requirements.
Architecture Reviews: A good architectural design will go a long way in reducing the downstream complexity. Design and architecture reviews early on would go a long way in addressing this need. It would even make sense to have an Architecture Review Board comprising of representatives who will be able to use their expertise in the specific areas and spot problem areas.
COTS: Third party libraries usually termed as COTS (Commercially available Off the Shelf) components often come with unneeded features that adds to the complexity of the software. It is important to make a careful and thoughtful decision in case of COTS.
Software Testability: Design the Software with testability in mind, so that the complexity can be verified by the QA teams.

Conclusion:

Much of the plans for mitigating the software complexity risk revolves around the software acquisition or software development process. This calls for a close co-ordination and collaboration between the IT Risk managers and the Development teams. Though risk management is well understood as an important function of the project managers, how well the risk of complexity is managed is still questionable. The project managers are primarily concerned on those risks that might impact the project success and fail to consider those like the complexity that emerge downstream post project. It has also been observed that reduction of complexity is an important characteristic of disruptive innovation. Innovations emerge disruptive, when it addresses the complexity in already existing products or systems and thus being able to capture market segments one after the other in it course to the top of the cliff.

Saturday, March 23, 2013

Surviving Disruptive Innovations

I have recently made a presentation on Disruptive Technologies at the Chennai Chapter of ISACA. While chose the topic in the context of presenting a picture of the pace at which the disruption is happening in IT world and what are the upcoming Disruptions to watch out for. But As I was preparing the agenda and content for the presentation, I was curious to find out how successful enterprises are managing rather surviving disruptions and in the process have stumbled upon some of the research work done by Clayton Christensen.

It was interesting to observe few things from his theory, which are the following:

Good Management principles would not be of great help in managing or surviving the disruptive innovations. Christensen sites the examples of how Toyota came up disrupting General Motors. He sees a pattern in the happening of disruptions in the form of an S curve, where the top of the curve is a cliff. Leaders / Leadership teams follow the bese management principles to climb up the S Curve and when they reach top they just fall off the cliff.

Extendable core is the key enabler of of innovations becoming disruptive. The potential disruptive innovations would appear as if it is insignificant in terms of the competitive capabilities of the incumbent’s existing products and thus tempting the incumbent to ignore it. But having an extendable core within it, the new entrant quietly enhances its capabilities and slowly get into the mainstream market of the incumbent and then disrupting a whole market resulting in driving the big and well managed incumbents out of the market.

Emerges from where it is least expected. For example, we now find it very comfortable to use a smartphone to various jobs which otherwise were performed by some special purpose devices. Examples include GPS devices, Digital Cameras and even PCs. While GPS device manufacturers still believe that GPS feature of Smartphone is not a threat for them as the special purpose GPS devices have certain unique advantages, which Smartphones don’t. But be reminded that the smartphones have the extendable core and can easily address this capability gap and soon GPS devices will be a thing of the past and we are already seeing the signs of it.

While there are many other interesting observations to note, I would leave it for you to find those out. I was then curious to look into the cases of disruptions that happened in the past. the following three cases of disruptions were of interest to me:

Kodak: Kodak ruled the photography market for a whole century. Their management as all the best qualities and were praised in all respects. Kodak has many innovations to its credit and have many firsts as well. With such a performance it has recently gone into bankruptcy and has sold its patent portfolio, which included close to 1000 patents to salvage some value. It is natural for us to think that the emergence of Digital Cameras would have disrupted Kodak in a big way. But as many would know, Kodak knew that digital era is emerging and they were the first to introduce a Digital Camera in the 1970s. But then what went wrong and how did they miss to sustain that innovation and stay alive in the market? Kodak has been believing till early 200 that the Photographic films wouldn’t die so soon. The other interesting observation out of Kodak’s failure is that with a heavyweight team of experts, sustaining innovation is really expensive and the outside view is most likely ignored.

That’s where the management tend to give up on some of the innovations as the time and investment in it may not worth it as they are making decent business with their current line of products. The situation is different for new entrants as the startups usually break the rules of convention and are in a position to pursue such innovations relatively a lot cheaper and also in an progressive manner. Startups usually start to focus a market which is ignored or to which the incumbents don’t pay much attention and there by not drawing the attention of the incumbents till a point when it will be difficult for the incumbent to respond to.

NOKIA: Nokia came big in the cellular phone, but failed to get its innovation strategy right with the smartphones. Even in NOKIA’s case, its research team came up with a prototype of smartphone with internet access and touch interface, way back in 2003, but the management, again going by good management principles, citing the risk involved in the product being successful and the very high cost of its development has turned down the proposal to pursue this plan further. Exactly three years later Apple launched its iPhone.

NetFlix: Netflix case is a little different. Netflix has been very successful in its DVD Rental business and in fact has seen the emergence of disruptive innovations in the form of streaming videos. It even responded to it by pursuing its research activities in that direction and has developed a service of streaming videos. What went wrong according to analysts is that it got is business model and pricing wrong as it combined both the traditional service and the digital streaming service as a bundle and increased the pricing. Ideally it would have been more appropriate to have offered the digital streaming under a different brand or as a separate service, as the surveys indicated that the DVD rentals still account for 70% of the total video sales in the US.

Now, given that just good management principles don’t help in sustaining or surviving the disruptive innovations, what should the organizations do to stay alive in todays world where IT has enabled the disruptive innovations to emerge with much faster pace leaving very little time for the incumbents to respond to. We also keep hearing that the “break the rules” is the way to go to foster innovation. While disruption is always seen as a risk to be managed, how well enterprises come up with the right risk mitigation and contingency plans to handle the risk of disruption is still a mystery.

You may check out my presentation on the subject at Slideshare and feel free to share your views and thoughts on this topic. You may google to find out some of the great articles and papers on the theory of disruptive innovation by Clayton Christensen. You will also find some good video lectures of his on YouTube.

Saturday, February 9, 2013

Stress Testing a Multi-player Game Application

I recently had an opportunity to consult for a friend of mine on stress testing a multi-player game application. This was a totally new experience for me and this blog is to detail as to how I approached this need to simulate the required amount stress and have it tested under stressed circumstances.

About the Application Architecture

The Application was developed using Flash Action Scripts with few php scripts for some of the support activities. The multi-player platform is aided by SmartFox multi-player gaming middleware. It also makes use of MySQL. The flash files containing the action scripts and a lot of images have been hosted on an Apache web server, which also hosts the PHP. All of Apache, MySQL and SmartFox have been hosted on a single cloud hosted hardware on Linux operating system.

The test approach

My first take on the test approach was to focus on simulating stress on the server and get the client out of the scope of this test. This made sense as all of the flash action scripts get executed on the client side and in reality it is typically single user using the game on a client device and the client side application has all the CPU, memory and related resources available on the client device. Thus in reality there is no multi-player stress on the client.

Given that I will now focus on the impact of the stress on the server resources, I had to understand how the client communicates with the server and about the request / response protocols and related payload. I used Fiddler to monitor the traffic out of the client device on which the game is being played and I could only see http requests for fetching the flash and image files and few of the php files. But I could not find any traffic for the SmartFox Server on http and figured out that those requests are tcp socket requests and thus not captured by fiddler.

The test tools

At this stage, it was much clear that we need to simulate stress on apache by sending in as many http requests, we need to simulate stress on SmartFox server as well over tcp sockets. We have a choice of numerous open source tools to simulate http traffic. I chose JMeter for http traffic simulation, which is open source, UI driven, easy to setup and use. It also supports multi node load simulation.

I need figure out for a tool for simulating load on sockets. I checked with SmartFox to see if they offer a stress test tool, but they don’t. A search through the SmartFox forums revealed that a custom tool is the way to go and to make it easier, we can use one of SmartFox client API libraries, which are available for .NET, Java, Action Script and few other languages. I settled for .NET route as C# is the language in which I have been working with in the recent years.

I have built a multi-threaded custom .NET tool using the SmartFox Client API to simulate the stress on the SmartFox. To my surprise, the SmartFox Client API library has not been designed to work with multi threading and SmartFox support confirmed this behaviour. I then decided redesign my custom tool to use the multi-process architecture and it worked fine..

I needed a server monitoring tool to monitor and measure various server performance parameters under stress conditions. I have found the cloud based NewRelic as the tool of choice to monitor the Linux Server hosting the game components.

The test execution

I had JMeter configured on three nodes (one being the monitoring node) and had set it up to spawn the desired number of threads. I had the custom .NET tool on another client and set it up to spawn the desired number of processes making a sequence of tcp socket requests. I also engaged couple of QA resources to play the game record the user experience under stress conditions.

The test execution went well and we could gather the needed data to form an opinion and make recommendations.

References:

Apache JMeter

NewRelic

Multi Process Architecture in C#

Saturday, January 26, 2013

Solution Architecture - Aligning Solutions with Business Needs

As the Enterprise Architecture & IT Governance adoption is on the rise, we keep hearing a lot about IT - Business alignment. ISACA positions Strategy Alignment as an important knowledge area along with five other knowledge areas. Enterprise Architecture being an important function in architecting or engineering the enterprise itself has a major role to play in appropriately deriving the IT strategy from the business strategy and ensure its execution as intended. Within the overall Enterprise Architecture function, it is the Solution Architecture function that acts as a bridge between Business Architecture and the Technology / Application Architecture. So, the Solution Architecture is a key role in transforming the business needs into IT solutions and it is this function that is key to ensure that the IT solutions are in alignment with the business needs and strategies.

Now, most of the Solution Architects are well aware of this important need at the early stage of solution design and Architecture, things keep going in the wrong way and the business in general are not happy with the solutions delivered. While most of the Solution Architects do well in the initial phase of the solutioning, somewhere down the line things tend to stray away and let gaps creep in resulting in a misaligned solution being delivered. A close examination of the causes of various misaligned projects reveal the causes which are listed below. This list is not exhaustive and is not in any order of priority.

Requirement Analysis

Typically a detailed Business Requirements Specification document is the input for the Solution Architecture function and the solution that the Architects propose largely depends on the veracity of this document. A good knowledge on the business domain of the organization in general and the ability to apply critical review on the requirements would bring out possible gas that if ignored might creep into the solution implementation leading to gaps. Needless to stress here is the non functional requirements, which on most occasions remain undocumented and a largely contributes for the gaps to creep in. Lack of formal processes that call for iterative reviews of the business case and in turn the requirements specifications is cause of concern.

Generally the Solution Architects should be engaged from the initial stages of problem definition, so that they will be in a position to gather the needed details, which might not figure in the requirement specifications.

Expectation Management

Traditionally, the business just throws in the problem they want to solve or the opportunity they want to exploit and probably participate in defining the requirements. Thereafter, they just leave it to the Architects and developers to do whatever and finally deliver the solution. In one of a project for a leading multinational bank, it so happened that when the solution was delivered, they reacted stating that this is not what we wanted and they found that the requirements specification as duly accepted by them was in total mismatch with what was expected. It is good and easier to have this expectation mismatch ironed out early on, before getting deep into implementation. While the expectation mismatch is generally on the non functional needs, like usability, portability, reliability, etc, at times, even the functional requirements also leaves room for misunderstanding by different teams leading to misalignment. The best way to address this area is to involve the business in the periodical reviews by using Agile practices. Even with Agile, this could happen if those representing the business do not have the same understanding of the business needs at large.

Validating the solution design with the business representatives and keeping a watch on the expected design outcome as the implement progresses for possible impacts on the expectation is the best way to address this concern.

Skill Gaps

Solutioning is not a pure science and one should be able to blend the science and art. Getting the best solution depends on the Solution Architects’ ability to visualize the final outcome given the constraints and choice of tools and making sure that this what the business would want. While the skill gaps with other related functions would still be a cause for concern, the deficiencies with Solution Architecture function could mean dear as they play a part early on in the life cycle.

Choice of Tools & Technology

While the choice of tools and technology is not generally within the domain of the Solution Architect, it is important to be knowledgeable of various tools and technology that can be leveraged within the constraints of the IT strategy will go a long way in proposing a best solution. Another concern around this issue is that as the Technical Architects or the development team encounter feasibility issues with a chosen tool or technology, the Solution Architect has to jump in again and align or tweak his solution to overcome such issues without impacting the expected final outcome.

Culture, Communication & Collaboration

While continuous review and monitoring would help bring out exceptions, it is for the organizaiton to nurture a culture amongst the teams so that everyone knows what is the business and IT strategy and the value of the work that they deliver towards IT solutions. The Solution Architecture team should have a thorough understanding of the IT Strategy and ensure that the solutions that they propose are in line with it. It is also important that the teams should collaborate well and work for the common objective of ensuring that the execution of the solution design progress in the expected manner and timely escalation of exceptions for appropriate action.

Roles & Responsibilities

Lack of clear definition of roles and responsibilities would lead to negligence in the deliverables by individuals and teams. As with a project I cited earlier, the business representative who approved the requirements specification did so without being sure of what he is approving. Further reviews and monitoring are very much needed to spot such lapses early on. The culture of the organization should also be such that the teams and team members own the problem and contribute for the solution to the expectation of the business.

Review and Monitoring

Depending on the size and strategic importance of the projects, the program and project management team should keep a watch on any potential impact on the expected outcome and exceptions if any observed are analysed for possible changes to the solution design and implementation and in turn managing the expectations appropriately. This would call for identifying the key goals and outcomes upfront and come up with appropriate metrics to keep track of.

Another area to keep a watch on are the assumptions, constraints and the dependencies which have been identified as part of the project charter. It would be impossible to iron out all ambiguities at the start of the project and as such it is a prevalent practice to start off by making assumptions on such ambiguities and proceed. However efforts should be on to get those assumptions validated and make them unambiguous as the project progresses. Any change in the assumption might trigger redesigning the solution. Same is the case with the various known constraints and dependencies with which the solution is designed and being implemented.

As we could observe all the stakeholders have a role to play in ensuring the alignment with business need. However, the Solution Architecture function sets it off and the other teams carry on from there. The Solution Architects should hold the key, being in close touch with the implementation teams, making themselves available for clarifications, being receptive to emerging issues around the solution design and coming up with needed improvement to the solution designs.

Saturday, January 12, 2013

Characteristics of High Maturity SOA Implementation

Cloud adoption is increasing and so is SOA implementation. As the analysts point out that be it cloud adoption or SOA adoption, unless appropriate governance framework is in place ahead of the adoption journey, it is highly likely that the perceived benefits might not be realized. Benchmarking is a good approach to identify the current state of SOA implementation as compared to an industry recognized maturity model. This also helps to define a target state and then work towards achieving it. Amongst many SOA maturity models the Open Group's OSIMM v2 is quite popular and defines seven levels. While the first three levels are classified as foundational, Level 4 through 7 expects certain demonstrated characteristics and the Level 7 indicates the highest level of maturity.

OSIMM V2 provides the base model which is designed to be extended or customized by the customers and the consulting organizations. OSIMM has seven dimensions across seven maturity levels. The state represented by these levels are given below:

Level 1: Silo - No integration implementation.
Level 2: Integrated - Yes, technology is in place to integrate the silos, but they don’t extend to common standards in data or business process.
Level 3: Componentized - IT systems componentized, but the components are not loosely coupled thus limiting the interoperability
Level 4: Service - Services are identified and implemented and composite systems are built from the loosely coupled services. However, the composition and the service definition itself are implemented using bespoke code rather than by a declarative flow language, thus limiting agility.
Level 5: Composite Services - Construction of a business process is possible through use of BPM tools by assembling the relevant interacting services is possible. At this level agility is possible, but developers still have a role to play under the guidance of business analysts.
Level 6: Virtualized services - At this level the services are virtualized, i.e. a level of indirection is introduced (like a facade layer based on the dependency injection principles), with which the services are more decoupled from the infrastructure on which it is running. But still definition and implementation requires the help of developers.
Level 7: Dynamically Re-configurable Services - At this level Composite services are assembled in run time by the business analysts or by the system itself, based on various parameters and using service repositories.

Let us now examine Level 7 in detail across the seven domains. Note that like in any other Maturity Model, definition of a level means that the system meets all the previous levels.

Business Dimension:

OSIMM specifies about 19 assessment questions under the business domain and the answers to these questions will be used in assessing the current state. The following are the key characteristics of this domain in practice:

High Agility - Enterprises services are available on demand.
Existence of EA - A well defined EA exists, which includes a formal end-to-end definition of business process flow
Use of BPM - BPM is in use for defining and testing process flows necessary to meet well defined SLAs.
Integraton - Services being integrated across the enterprise and externally with business partners

Organization & Governance domain:

Under this domain, the focus is on how an organization formally defines and documents their organization and governance processes. OSIMM recommends about 11 questions to gather required information which helps the assessor to form opinion on the current state. When at level 7, the following are key characteristics are expected to be demonstrated:

Adaptive Enterprise - Ability to adjust operations quickly and smoothly to meet the rapid changes in the market, technology and priorities and being able to evolve with the global economy, so that the services never become obsolete.
Aligned with Business Strategy - Services are modelled and managed as elements of the evolving business strategy.
Measurement & Monitoring - Appropriate service metrics are identified, defined and automatically gathered and are effectively used in monitoring and in decision making.
Governance - SOA governance is part of the organization culture

Method Dimension:

This domain is about the formal use of SOA architectural design, construction and deployment methodology in the organization. OSIMM suggests about 10 questions, with which the assessor will be able to gather required information to map the maturity indicator to the associated maturity attributes. Here the key characteristics that maps to Level 7:

Adaptive Enterprise - The design, construction and deployment methodologies evolve in line with the industry standards and best practices.
Dynamic Services - Formal methods exist to leverage architectural constructs and assets for supporting virtualization and dynamic services and BPM.
Consistent Adoption - Existence of best practice guidance to facilitate consistent adoption of SOA, Virtualization, Middleware technology, Service Registry, etc.

Application Dimension:

In this domain, the focus is on the Application architecture being designed using the SOA principles, usage of constructs such as loose-coupling, separation of concerns and usage of related technologies such as XML, webservices, service bus, service registry, virtualization etc. The following key characteristics are required to be assessed at Level 7:

Adaptive Enterprise - In the sense that the enterprise shall be able to adapt to and use the evolving SOA design principles, the tools and technologies.
Dynamic Application Assembly - The application architecture and design should support the dynamic re-configuration of the services and support consumption of services by external business partners.
Design Patterns - Extensive use of SOA / ESB design patterns to support BPM.

Architecture Dimension:

The formal use of SOA methods, principles, patterns, frameworks and techniques in the SOA architecture is evaluated under this domain. There are 11 questions to help the assessors to gather required information. The key characteristics of this domain to be an that are required to be assessed at Level 7 are the following:

Adaptive Enterprise - While the use of formal methods, principles, patterns frameworks and techniques is necessary, these should also evolve in line with the industry practices in architecting SOA services .
Formal Enterprise BIS - Design and implementation of formal enterprise Business Information Services supporting both the enterprise and external entities.
Integration - Appropriate architecture is used to support enterprise wide integration and also externally to support partner entities.

Information Dimension:

This domain focuses on the existence of a formal information architecture that supports a master data model and implements a common business data vocabulary. The assessors can make use of 13 questions suggested by OSIMM to assess this domain. The following are the key characteristics required to be assessed at Level 7:

Adaptive Enterprise - In the sense to have the information architecture evolve in line with the industry standards and practices.
Vocabularies - Existence of Business data vocabularies, with ability to easily expand or enhance to support new services, business partners and process re-configuration.
Data definition - Business data is defined using semantic web constructs or ontologies.
BIS Model - A formal enterprise Business Information Model has been designed and implemented that includes both enterprise and external relationship entities.
Master Data - Existence and use of Master Data services.

Infrastructure & Management Domain:

This domain is about the IT Infrastructure that supports the non-functional and operational requirements and SLAs needed to operate an SOA environment. There are 12 questions that will be useful for the assessors to gather the required information. The key characteristics that need to be established in this domain are:

Adaptive Enterprise - In the sense that the infrastructure and related facilities used in supporting the SOA environment should evolve with the industry best.
Service Management - Tracks and predicts changes to the services necessary to optimize the quality.
Service re-usability - services are re-used in new and dynamic ways without negatively impactin the Quality of Service.
Security - Service security policies dynamic and managed in real time.

While the first three Levels are classified as Foundational, the Level 4 is where existence of minimal SOA practices are expected. That way, we can say that one should atleast start at Level 4 and should look forward to transition to higher levels. The Open Group’s OSIMM also lays out the Assessment methods and the assessment questions to facilitate the adoption.

References:

OSIMM v 2 Technical Standard.

Saturday, December 29, 2012

Resilient Systems - Survivability of Software Systems

Resilience as we all know is an ability to withstand through tough times.There is also another term quite interchangeably used, which is Reliability. But Reliability and Resilience are different. Reliability is about a system or a process that has zero tolerance to failure or the one that should not fail. In other words, when we talk about reliable systems, the context is that failure is not expected or rather acceptable. Whereas Resilience is about the ability to recover from failures. What is important to understand about resilience is that failure is expected and is inherent in any systems or processes, which might be triggered due to changes to the platform, environment and data. While Reliability is about the system’s robustness of not failing, Resilience is its ability to sense or detect failures ahead and then prevent it from encountering such events that lead to failure and when it cannot be avoided, allow it to happen and then recover from the failure sooner.

A working definition for resilience (of a system) developed by the Resilient Systems Working Group (RSWG) is as follows:

“Resilience is the capability of a system with specific characteristics before, during and after a disruption to absorb the disruption, recover to an acceptable level of performance, and sustain that level for an acceptable period of time.“ The following words were clarified:

The term capability is preferred over capacity since capacity has a specific meaning in the design principles.

The term system is limited to human-made systems containing software, hardware, humans, concepts, and processes. Infrastructures are also systems.

The term sustain allows determination of long-term performance to be stated.

Characteristics can be static features, such as redundancy, or dynamic features, such as corrective action to be specified.

Before, during and after – Allows the three phases of disruption to be considered.

Before – Allows anticipation and corrective action to be considered

During – How the system survives the impact of the disruption

After – How the system recovers from the disruption

Disruption is the initiating event of a reduction is performance. A disruption may be either a sudden or sustained event. A disruption may either be internal (human or software error) or external (earthquake, tsunami, hurricane, or terrorist attack).

Evan Marcus, and Hal Stern in their book Blueprints for High Availability, define a resilient system as one that can take a hit to a critical component and recover and come back for more in a known, bounded, and generally acceptable period of time.

In general Resilience is a term of concern for Information Security professionals as the final impact of disruption (from which a system needs to recover), could mostly be on Availability which is one of the three tenets of Information Security (CIA - Confidentiality, Integrity and Availability). But there is a lot for System designers and developers, especially those tasked to build mission critical systems to be concerned about Resilience and architect the systems to build in a required level of Resilience characteristics. For a system to be resilient, it should draw necessary support from dependent software and hardware components, systems and the platform. For instance a disruption for a web application can even be from network outage, security attacks at the network layer, which the software has no control over. But it is important to consider software resiliency in relation to the resiliency of the entire system, including the human and operational components.

The PDR (Protect - Detect - React) strategy is no longer as effective as it used to be due to various factors. It is time that predictive analytics and some of the disruptive technologies like big data and machine learning need a consideration in enhancing the system resiliency. Based on the logs of various inter-connected applications or components and other traffic data on the network, intelligence need to be built into the system to a combination of number of possible actions. For instance, if there is a reason to suspect an attacker attempting to gain access to the systems, a possible action could be to operate the system at a reduced access mode, i.e. parts of the systems may be shut down or parts of the networks to which the system is exposed could be blocked, etc.

OWASP’s AppSensor project is worth checking by the architects and developers. The AppSensor project defines a conceptual framework and methodology that offers prescriptive guidance to implement intrusion detection and automated response into an existing application. AppSensor defines over 50 different detection points which can be used to identify a malicious attacker.Appsensor provides guidance in the form of possible responses for each such identified malicious atacer.

The following are some of the factors that need to be appropriately addressed to enhance the resilience of the systems:

Complexity - With systems continuously evolving and many discrete systems and components increasingly integrated into today’s IT eco-system, the complexity is on the rise. This makes the resiliency routines as built in to individual systems needing a constant review and revision.

Cloud - Cloud computing is gaining higher acceptance and as organizations embrace cloud for its IT needs, the location of data, systems and components are widespread across the globe and that brings in challenge for those involved in building resilience.

Agility - To stay on top of the competition, organizations need agility in their business processes, which means rapid changes in the underlying systems and this could be a challenge as this will call for a constant check to ensure that the changes being introduced does not downgrade or compromise the resiliency level of the systems.

While there are techniques and guiding principles which when followed and applied, the resilience of the systems can be greatly improved, such design or implementation comes with a price and that is where the economics of Resiliency needs to be considered. For instance, mission critical software systems like the ones used in medical devices, need to have a high resilience characteristic, but quite many of the business systems can have a higher tolerance level and thereby being less resilient. However, it is good to document the expected resilience level at the initial stage and work on it in the early life cycle of the system development. thinking about resilience later in the life cycle may not be any good as implementation will call for higher investment.

References:

Crosstalk - The journal of Defense Software Engineering Vol 22 No:6

Resilient Systems Working Group

OWASP - AppSensor Project

Saturday, December 15, 2012

Effective vs Ineffective Security Governance

Continuing with my earlier blog on Measuring the Performance of EA, I was looking for methods and measures that can be used for measuring the effectiveness of the security program in an enterprise. I happened to read a CERT article titled as Characteristics of Effective Security Governance which contains a good comparision of what is effective and what is ineffective. I have reproduced it here in this blog for a quick reference. The original article of CERT though out dated is worth reading.

Effective	Ineffective or Absent
Board members understand that information security is critical to the organization and demand to be updated quarterly on security performance and breaches. The board establishes a board risk committee (BRC) that understands security’s role in achieving compliance with applicable laws and regulations, and in mitigating organization risk. The BRC conducts regular reviews of the ESP. The board’s audit committee (BAC) ensures that annual internal and external audits of the security program are conducted and reported.	Board members do not understand that information security is in their realm of responsibility, and focus solely on corporate governance and profits. Security is addressed adhoc, if at all. Reviews are conducted following a major incident, if at all. The BAC defers to internal and external auditors on the need for reviews. There is no audit plan to guide this selection.
The BRC and executive management team set an acceptable risk level. This is based on comprehensive and periodic risk assessments that take into account reasonably foreseeable internal and external security risks and magnitude of harm. The resulting risk management plan is aligned with the entity’s strategic goals, forming the basis for the company's security policies and program.	The CISO locates boilerplate security policies, inserts the organization's name, and has the CEO sign them. If a documented security plan exists, it does not map to the organization’s risk management or strategic plan, and does not capture security requirements for systems and other digital assets.
A cross-organizational security team comprised of senior management, general counsel, CFO, CIO, CSO and/or CRO, CPO, HR, internal communication/public relations, and procurement personnel meet regularly to discuss the effectiveness of the security program, new issues, and to coordinate the resolution of problems.	CEO, CFO, general counsel, HR, procurement personnel, and business unit managers view information security as the responsibility of the CIO, CISO, and IT department and do not get involved. The CSO handles physical and personnel security and rarely interacts with the CISO. The general counsel rarely communicates particular compliance requirements or contractual security provisions to managers and technical staff, or communicates on an ad-hoc basis.
The CSO/CRO reports to the COO or CEO of the organization with a clear delineation of responsibilities and rights separate from the CIO. Operational policies and procedures enforce segregation of duties (SOD) and provide checks and balances and audit trails against abuses.	The CISO reports to the CIO. The CISO is responsible for all activities associated with system and information ownership. The CRO does not interact with the CISO or consider security to be a key risk for the organization.
Risks (including security) inherent at critical steps and decision points throughout business processes are documented and regularly reviewed. Executive management holds business leaders responsible for carrying out risk management activities (including security) for their specific business units. Business leaders accept the risks for their systems and authorize or deny their operation.	All security activity takes place within the security department, thus security works within a silo and is not integrated throughout the organization. Business leaders are not aware of the risks associated with their systems or take no responsibility for their security.
Critical systems and digital assets are documented and have designated owners and defined security requirements.	Systems and digital assets are not documented and not analyzed for potential security risks that can affect operations, productivity, and profitability. System and asset ownership are not clearly established.
There are documented policies and procedures for change management at both the operational and technical levels, with appropriate segregation of duties. There is zero tolerance6 for unauthorized changes with identified consequences if these are intentional.	The change management process is absent or ineffective. It is not documented or controlled. The CIO (instead of the CISO) ensures that all necessary changes are made to security controls. In effect, SOD is absent.
Employees are held accountable for complying with security policies and procedures. This includes reporting any malicious security breaches, intentional compromises, or suspected internal violations of policies and procedures.	Policies and procedures are developed but no enforcement or accountability practices are envisioned or deployed. Monitoring of employees and checks on controls are not routinely performed.
The ESP implements sound, proven security practices and standards necessary to support business operations.	No or minimal security standards and sound practices are implemented. Using these is not viewed as a business imperative.
Security products, tools, managed services, and consultants are purchased and deployed in a consistent and informed manner, using an established, documented process. They are periodically reviewed to ensure they continue to meet security requirements and are cost effective.	Security products, tools, managed services, and consultants are purchased and deployed without any real research or performance metrics to be able to determine their ROI or effectiveness. The organization has a false sense of security because it is using products, tools, managed services, and consultants.
The organization reviews its enterprise security program, security processes, and security’s role in business processes. The goal of the ESP is continuous improvement.	The organization does not have an enterprise security program and does not analyze its security processes for improvement. The organization addresses security in an ad-hoc fashion, responding to the latest threat or attack, often repeating the same mistakes.
Independent audits are conducted by the BAC. Independent reviews are conducted by the BRC. Results are discussed with leaders and the Board. Corrective actions are taken in a timely manner, and reviewed.	Audits and reviews are conducted after major security incidents, if at all.

The article also lists eleven characteristics of effective security governance in addition to listing the Ten challenges to implementing an effective security governance. I would highly recommend you to read the full article.

References:
CERT’s resources on Governing for Enterprise Security

CERT and CERT Coordination Center are registered in the U.S. Patent and Trademark Office by Carnegie Mellon University

Thursday, December 13, 2012

Implementing IT Balanced Scorecard

Source: ISACA, Board Briefing on IT Governance 2nd edition

With IT increasingly becoming an enabler of business, more and more organizations are looking for effective and efficient management of IT, so that the investment in IT fetches optimum value. On the same lines, the need for better IT Governance is being felt by the Board of increasing number of organizations. One of the key domain of IT Governance is Performance Measurement. Going by "what is not measured cannot be managed", there need to be plans and processes in place for measuring the performance of IT so that it can be better governed.

Much of the value returned by IT are intangible. While it is easy to measure the tangible benefits, measuring intangible benefits is difficult. Business Scorecard (BSC) which evolved in the early 1990s has evolved into an very useful tool for measuring both tangible and intangible benefits segmented into four perspectives - Financial, Customer, Internal Process and Learning. IT BSC as derived from the Business Scorecards were found to be a a very effective measurement system addressing the concerns of reporting the intangible benefits to the Board.
The Balanced Scorecard as it has evolved over a period of time is being looked at not just as a performance measurement tool, but as a strategic planning and management system. This is because, the Balanced Scorecards can be cascaded down smaller business units including IT and aggregated upwards to the higher-level. IT BSC, which is cascaded from the Business Scorecard can be further subdivided into one for each of the technology domains, for instance one for managing the IT Operations and another to manage the IT Development areas. While doing so, it is important to maintain the linkages between each such cascaded Scorecards and this way the Balanced Scorecard can facilitate Strategy Mapping, thereby improving the Alignment of the objectives of the smaller business and IT units into the business strategy.

The perspective of the IT BSC may be redefined to better represent the IT organization. For instance, the following four perspectives may be used in IT BSC:

Corporate Contribution - Equivalent to the Finance perspective of the Balanced Scorecard, this represents the view of business executives on the IT department.
Customer Orientation - Equivalent to the Customer perspective of the Balanced Scorecard, this represents the view of the end users on the IT department.
Operational Excellence - Equivalent to Process perspective of the Balanced Scorecard, this represents the effectiveness and efficiency of various standards, processes and policies followed by the IT department.
Future Orientation - Equivalent to Learning and Growth perspective of the Balanced scorecard, this represent a view of how well IT is prepared to meet the future needs of the business.

To be effective, the following three principles need to be built into the balanced scorecards:

Cause-and-effect relationships - the identified performance measures have a cause and effect relationships amongst them, for instance a measure on Improved developer skills (Future Orientation perspective) as a cause will result in improved quality in the applications delivered(Operational Excellence perspective), which in turn should contribute for user statisfaction (User Orientation perspective)
Sufficient performance drivers - While it is common to measure all the possible outcomes (measuring what you have done), it is also important to identify and include suufficient performance drivers(how you are doing). A good mix of both outcome measures and performance drivers are essential for the Scorecard to be effective.
Linkage to financial measures - IT Scorecard, being cascaded from the Enterprise Business Scorecard, the measures in the IT Scorecard should link up to a corresponding measure in the top-level business scorecard.

To have the Balanced Scorecard implemented as part of the IT Governance initiative, the following steps are recommended:

Obtain commitment - Make a presentation to the board and executives explaining the concepts, benefits and cost of implementing it and get a commitment to go ahead.
Kick-off - Kick off the Balanced Scorecard initiative as a project and as part of this activity, train the staff and identify the project team members.
Strategy map - Get an understanding of the corporate business strategy and the sub unit level strategies and then establish a strategy map.
Metrics selection - Understand the existing metrics if any and identify the required metrics, which should be a good mix of both outcome measures and performance drivers
Metrics definition - With respect to each identified metric, create a standard definition, related processes to collect and manage the data. As part of this, the cause and effect relationships should also be clarified and the linkage with higher level scorecards should also be established.
Assign ownership - Assign owners for each metric.
Define Targets - With respect to each metric, set targets (may be a range) for the function heads to achieve and devise strategic initiatives to achieve these targets.
Act on the results - Have the appropriate executive management or board as may be required to review the resulting measures and then act on the results.
Review periodically - The metric definitions, the linkages and the cause-effect relationships may require revision based on experience and this achieved through periodic reviews.

Successful execution of strategy requires the successful alignment of four components: the strategy, the organization, the employees and the management systems. As Kaplan and Norton put it, “Strategy execution is not a matter of luck. It is the result of conscious attention, combining both leadership and management processes to describe and measure the strategy, to align internal and external organizational units with the strategy, to align employees with the strategy through intrinsic and extrinsic motivation and targeted competency development programs and finally, to align existing management processes, reports and review meetings, with the execution, monitoring and adapting of the strategy.”

Sunday, December 9, 2012

Measuring Performance of Enterprise Architecture

Enterprise Architecture is viewed as an important function in IT Governance and it plays a vital role in aligning IT with the Business. This function is expected to define the technical direction and to ensure application of principles of Architecture to the design and maintenance of IT systems, which in turn should be in alignment with the business vision, mission and strategies and the role that the IT within the organization is expected to play. While the support for commitment and funding is important, it is also important that the EA function consider the following(not exhaustive) to be successful:

Alignment with the business strategy and the culture of the organization,
Actively involve in projects to ensure that the principles of design and evolution are adhered to and ensure the continued focus on the business requirements.
Offer technical consultancy for all the business and IT functions both internal and external.
Acting as a gate for all decisions impacting the design & evolution architecture.

IT Governance views EA as the hub of the IT wheel with linkages with various processes, components and goals of the enterprise and some of such key and enabling links are:

Promoting and enabling Business Agility
Providing standards, policies and principles for the IT Project, Program and Portfolio management function
Guides and enables cost management and consolidation
Facilitates cost-effective, scalable integration of various IT systems
Supports IT Governance by defining / providing the conceptual and technical priorities and thereby promoting informed decision making

Going by the premise, what is not measured does not get managed, it is important to identify the measurable objectives for the Architecture function itself, so that it is well managed and its contribution to the success of the organization is established. While measurement of Architectural activities is difficult, COBIT suggests a set of measurable outcomes and performance measures, some of which are the following:

Number of technology solutions that are not aligned with the business strategy - One of the objectives of the EA function should be to ensure that the technology solutions chosen or implemented are aligned to the business strategy a measure around this could be very useful to establish that the number of misaligned solutions are on the decline. There could be other measures derived around this and could be represented as a relative measure to the total solutions.

Percent of non-compliant technology projects and platforms planned - With the fast changing business environment, there will be times when the business will need to solutions technology projects that are not compliant with the standards and principles laid out by EA. EA has the responsibility to carefully review such needs and grant waivers. Such waivers should be for a shorter term and should be backed with a plan to normalize it. At times, this could call revision in the standards, policies or principles. A measure around this could be very useful that the EA is effective in dealing with non-compliant technology projects and platforms.

Decreased number of technology platforms to maintain - Standardization is one of the objectives of EA, which could contribute to cost reductions and reduce the technical complexity. Statistics and surveys show that enterprises without an active IT Governance / EA function have more multiple applications requiring different platforms for the same business requirement being used by different departments. With an effective EA function, these should be very less in number and should decline over a period. A measure around this is a very good indicator of EA being effective in this area.

Reduced application deployment effort and time-to-market - Supporting Business agility is yet another key objective of the EA function. Today’s businesses are operating in a highly dynamic industry environment and in order to stay competitive and to sustain its market position, need support from IT to have the new or changed capabilities with a reduced time-to-market. A delayed delivery from IT could mean an opportunity lost. A measure around this indicator would really be helpful in establishing how IT is supporting the business changes.

Increased interoperability between systems and applications - It is quite common that most enterprises have multiple applications for specific needs, but there is a need to have these applications share data and information amongst each other. With cloud computing gaining wider acceptance, most enterprises are looking at discrete cloud based solutions. With the benefits outweighing the concerns and constraints, and that the industry is working towards addressing these concerns, there will be increased focus on move to cloud. This will mean hosted applications from various providers would need to be interoperable and working with other in house applications. EA should ensure that the technology and solutions acquired or designed should support this important attribute i.e. interoperability. A measure around this parameter would be an important indicator of EA function’s effectiveness.

Percent of IT budget assigned to technology infrastructure and research - Yet another expectation from the EA function is that it should help businesses in leveraging emerging technology to its advantage. This will require the Architects to be continuously looking for newer technologies and its application areas, and recommend such technology or solutions for implementation so that the business will get the most out of IT to accomplish its mission. It is also important the extent of this activity should be in line with the identified and stated role of IT in the organization. While the percentage of IT budget used for research is an useful measure, there could be other useful measures derived from this, for instance, number of research solutions getting implemented as a percentage to total number of solutions implemented in a given period. Business satisfaction on timely identification and analysis of technology opportunities is another related measure which is indicative of the outcome of this research.

Number of months since the last technology infrastructure review - With the fast changing IT space, it is important to ensure that the technology infrastructure is of continued relevance to meeting the business objectives and if needed changes should be considered. A measure indicating that the EA function is performing review of the technology infrastructure periodically is a good indicator of its effectiveness. Measures derived around this review could be based on the outcome of the review will also be very useful.

There is no one size fits all in IT and as such the measures indicated above need to be tailored to suit the organization and the role of IT within the organization.

References:

COBIT - an IT Governance framework from ISACA.

Developing a successful governance strategy - A best practice guide for decision makers in IT from The National Computing Center, UK

Monday, November 12, 2012

PPM - Project Categorization

PPM is about ‘doing the right things right’ where things refer to work efforts, projects and programs, doing the right things refer to prioritizing and selecting projects and programs in line with the strategy & objectives of the organization and doing things right refers to execution of projects and programs in such a way the benefits are realized.

Portfolios, programs and projects are all formal expressions of resource assignment decisions. Accordingly, a systematic approach to collecting, selecting, planning and managing them is key to a high-quality Governance process. Projects, programs and portfolios share a common life cycle formed around four key stage gates, which are Create, Select, Plan and Manage. In the context of portfolio, an important and first task is to determine the investment types or categories with which the project or program are categorized. The purpose of this blog is to examine the need and methods of project categorization.

Purpose of Categorization

Broadly, the following are the purposes for which organizations find categorization useful.

Strategic alignment: Certain projects share common characteristics that will mean a common approach with respect to prioritization, project tracking and monitoring, strategic visibility, etc. Appropriate categorization of projects would offer better visibility in terms of strategy execution and benefits realization.

Capability specialization: Another reason, organizations would be interested in categorizing projects is to identify and group similar projects, so that they all share common tools, technology and terminologies, which will help better management of resources and communication across projects.

Promote project approach: Though this purpose is of minor nature, it helps organization to promote the project culture and differentiate operations from projects and also to use a common methodology to manage such projects

The following table presents a mind map of organizational purposes around categorization:

Primary Purpose	Sub Purpose	Benefits
Strategic Alignment	Selecting / prioritizing or projects / programs	Alignment commitments with capabilities
		Managing Risk / controlling exposure
		Allocating budget
		Balancing portfolio
		Identifying approval process
	Planning, Tracking adn Reporting of	Resource usage
		Performance, Results, Value
		Investments
		Comparability across projects, divisions, organizations
	Creating strategic visibility	Visibility across projects, divisions, organizations
Capability Specialization	Capability Alignment	Choosing Risk Mitigation Strategy
		Choosing Contract Type
		Choosing Project Organization Structure
		Choosing Methods and Tools
		Matching of Skill sets to Projects
		Allocating Project to Organizational unit
		Setting Price
		Enhancing Credibility with Clients
	Capability Development	Developing Methods and Tools
		Managing Knowledge
		Developing Human Resources
		Adapting to Market / Customer / Client
Promoting a Project Approach	Providing a Common Language	Facilitate better Communication
Promoting a Project Approach	Distinguishing Projects from Operations	To Better Manage the Work Efforts

Categorization Schemes

None of the standards or frameworks specify a particular scheme with which to categorize and it is left to the organizations to decide on the scheme that best suit their needs. It has been found that most of the organizations use a multi-dimensional composite attribute based categorization schemes.

Here again the following three broad approaches are in wide use:

Hierarchical scheme: In this scheme, projects are hierarchically grouped based on multiple attributes. For instance, at the top level, the projects may be categorized as small , medium and large, based on the estimated investments and then further categorized into the application areas like, Infrastructure, Enterprise Applications, Field Applications, etc. There could be further categorizations as well. Under this scheme, each project falls under one unique category under each level.

Parallel scheme: Unlike in the hierarchical scheme, in this case, projects are assigned several sets of attributes. For instance, the projects may be categorized by complexity, technology and strategic importance and a project may find a place in all three categories.

Composite scheme: In this scheme, the categories are determined based on the result of applying more than one attribute. For instance, the category project management complexity may be determined by combining multiple attributes like team size, number of units or modules to be delivered, development efforts, duration, etc. Similarly a category deployment complexity may be based on attributes like process impact, end user scope of impact, project profile, project motivation, etc.

Conclusion

There is no single scheme or approach that best suits an organization. It is for the people involved in the IT / Business Governance to carefully choose the categorization based on the organization’s Vision and Strategy and then create a process to consistently determine the appropriate category for a project so that the relevant strategic and tactical tools and methodologies can be applied to that project as well, which in turn will ensure realization of the intended benefits with a desired level of efficiency and effectiveness.

References:

Aligning Capability with Strategy: Categorizing to do the Right Projects and do them Right

Project Portfolio Management: Doing the Right Things Right

Pages