Tech Bytes by Kannan Subbiah

Saturday, May 18, 2013

Why Software Product Delivey is not Identical to a Car Delivery?

I happened to sit in one of the project review meeting where client raised a question on software delivery expecting it to be used in production on the day of delivery by the vendor. He went ahead and started comparing it with a use case of a customer driving off a car upon taking delivery. I know all the project managers out there will jump in to say that don't compare apple with orange. On the one hand, yes, software development is unique and cannot be compared with production of a tangible product. On the other hand, attempts are being made by various standards organizations in helping the industry achieve a high maturity process capability and thus deliver production quality software consistently.

Software product vendors selling or licensing software products can however be compared to Car manufacturers. Take for example, Microsoft Office product suite, one can just go and buy it off the shelf and use it, just like driving off a car. This is possible for product vendors because, the product vendors over a period acquire enormous amount of knowledge on the targeted product domain and in turn subjecting it through as many cycles of testing before announcing its readiness.

A typical car from concept to production may take atleast few years and it would be in the order of around 3 to 6 years. During this period, the product undergoes various cycles of tests, which include crash test, drivability on the road conditions of the target market, etc. Similarly, software product vendors do conceptualize the product idea and then work on it over a period. The vendors are attempting to achieve a faster time to market, but by adopting newer product development methodologies. The one that works best is to identify the smallest piece of the software that can meet certain specific use cases and then build on top of it over a period of time.

In case of a software product, it is the vendor's responsibility to subject the product through various tests in production like environments before getting it out to customers. The tests include alpha and beta tests, where in interested end users are engaged to use it in production environments and collect the feedback on various aspects and the developers working on it to have all those critical issues addressed. Achieving a zero defect state may not be a possibility and vendors take balanced approach as to whether wait until all the identified issues be addressed or reach out to the market with certain known issues, which can be addressed in subsequent product releases.

But it is a different approach when it comes to bespoken software projects. The major challenges with the bespoken software projects that differentiates from software products include:

It is the Client who specifies the requirement. Though the vendor might facilitate documenting the requirements, it is the customer who formally agrees as to what need to be produced.
Lack of understanding and agreement on the non functional requirements, which are not documented in most cases.
The vendor might not be an expert in the target domain area. Though the vendor has expertise in the domain area, it is the customer's need that is final and not the vendor's understanding of the requirement.
The ever changing requirements. By the time, the vendor delivers the software, the requirements as agreed by the client would have undergone change due to various reasons.
It is difficult to unambiguously understand the requirements as the project progresses through various phases involving humans with varying abilities.
There are dependencies with various pre-existing or emerging software and hardware environments in the production environments of the client.
The client has the roles and responsibility to assume to make the project successful. However, it is for the vendor to ensure that the client understands their role and responsibility throughout project life cycle.

Another point to consider for this discussion is what constitutes delivery. A well written SoW (Statement of Work) clearly lists down the acceptance criteria, which when met would constitute acceptance of the delivery by the client. For a given requirements, no two vendors would build an identical solution. That's due to the tools, technologies out there for use and the varying intellectual abilities of those involved in building the software. It is important as to what the client wants, than how the vendor delivers the solution. For this reason, the client shall assume the responsibility of performing an user acceptance tests. Some times, the delivery of a software might mean implementation in production environment, which might involve data migration, appropriately configuring various pre-existing software and / or hardware.

In the end, it is all about managing the expectations of the client. It is not just setting at the start, but needs to be appropriately managed throughout the project life cycle.

Sunday, May 12, 2013

Application Integration - Business & IT Drivers

Design Patterns are finding increased use for its apparent benefits, i.e. when there is a solution for a common recognized problem, why reinvent the solution again, instead use the solution design that is expected to address the problem. However it is not a one size fits all, as there are a combination of factors that may influence the choice of the design and architecture. Enterprise Application Integration is a complex undertaking and it requires a thorough understanding of the applications being integrated and the various Business and IT drivers that necessitate the integration.

There are numerous approaches, methods and tools to design and implement Application Integration and it is always a challenge in choosing the right combination of design, tools and methods to accomplish the business need. Selecting the right design requires good knowledge of the problem on hand and the other related attributes that drives the solution. In this blog, let us try to identify various business and IT drivers that the architects need to understand well before making a choice of the design, method, tool or approach.

It is also worth understanding that these drivers should be considered collectively. While a particular tool or design might seem to solve the specific need, it might require certain other needs to be compromised. Thus it is a mix of art and science that the Architects need to apply.

Business Drivers:

Business Process Efficiency - The level of efficiency that the business wants achieve through the application integration is a basic need that should be considered. Being a basic need there is a tendency amongst the Architects to neglect to have this documented and thus leaving a chance failure in this area. Knowing this will also be useful in validating the design through the implementation phases.
Latency of Business Events - Certain business events are time sensitive and need to be propagated to the target systems within a time interval. Achieving a low latency integration may require tooling the components at much a lower layer of the computing hardware.
Information / Data Flow direction - Whether the data or information should flow in one way or should it be a request / response combination has an influence on the choice of design.
Routing / Distribution - The information or data flow might be just a broadcast or in some cases, there have to be a response for each request. Similarly, the target applications may be one or many, which could be dynamic based on certain data combinations. These routing and brokering rules have to be identified to orchestrate the data flow appropriately.
Level of Automation - End to end automation might require changes to the source and / or target applications. Alternatively, it might be a best choice to leave this to be semi automatic, so that the existing systems and related process may remain unchanged. An example of this could be participation of a legacy system which is likely to be sunset in the near term, in which case changes to the legacy system is not preferred.
Inter-Process Dependencies - Dependencies between processes would determine whether to use serial or parallel processing. It is important to identify and understand this need, so that processes which can be processed in parallel can be identified and designed accordingly, so as to achieve efficiency.

IT Drivers:

Budget / TCO - Any enterprise Initiative would be budget driven and the Return on Investment must be established to get the consent of the project governance body (or Steering Committee). The Choice of tools, design and the approach should be made considering the allocated budget and aiming to achieve a lower TCO (Total Cost of Ownership).
Technology and Skills - It is also important that the design and architecture considers the technology in use in the organization and the availability of skilled resources to build and as well as maintain the integration implementation. Application Integration solutions require a continuous maintenance as either of the source or target systems or even the business processes are likely to undergo changes.
Legacy Investment - All enterprises will have investments in legacy systems as the technology obsolescence is happening faster than planned. It would be prudent for enterprises to explore opportunities to get returns out of such legacy investments where possible. The Architects shall consider this aspect while designing integration solutions and thus facilitate planned sunset of Legacy systems over a period.
Application Layers - Integration can be achieved at various layers, viz. Data Layer, Application Layer, UI Layer, etc. The choice of the integration layer depends on various business drivers and an appropriate choice need to be made to achieve desired efficiency, latency and other needs.
Application Complexity - An enterprise is likely to have a portfolio of applications within and outside to integrate with. The complexity of the applications would have a direct influence on the design and architecture of the integration solution as well. A thorough study and evaluation of the applications is a must to come up with a good integration solution.

As you know the above is not an exhaustive list and there are various other business and IT Drivers that need consideration. In addition, the quality attributes like security, availability, performance, maintainablity, extendability etc that need consideration in choosing a right design and architecture for the application integration solution. I would be writing another blog on Quality of Service (QoS) considerations for Application Integration solutions.

Saturday, April 13, 2013

A Framework for Cloud Strategy and Planning

Cloud technologies are here to stay, but not without concerns. The concerns are different for different organizations and there is no ‘one size fits all’ solution to have these concerns addressed. It is for the enterprise to come up with a cloud strategy and carefully weigh the pros and cons of cloud adoption with due consideration to the competing constraints. When not done well, this could have telling impact on the business.

A model based approach or a framework based approach to come up with the cloud strategy could help maximize value creation out of cloud adoption. With their expertise on the standards and frameworks and on the internal business capabilities, the Enterprise Architects are in the best position to take this mantle. Though there are various standards and frameworks that can be used to strategize and plan the Cloud Adoption, Mike Walker in his article suggests a simple three step framework for Cloud Strategy and Planning, which he adopts from TOGAF and other frameworks. The following diagram visualizes the mapping of the TOGAF methods to the Cloud Strategy Planning Framework.

The following are the seven activities that forms part of the three step Cloud Strategy Planning Framework:

Strategy Rationalization:

1. Establish Scope and Approach - This activity is all about educating or envisioning the stakeholders about the cloud computing and then come up with the business model for cloud computing

2. Set Strategic Vision - This is about understanding and coming up with a strategic vision for cloud computing in the enterprise and for the purpose, due consideration shall be given to the IT and Business strategic objectives of the organization, the cloud computing patterns and technologies and the feasibility and readiness.

3. Identify & Prioritize Capabilities - This is an important activity as it lays down the criteria for evaluating the capabilities in the form of IT and business value drivers. The outcome of this activity would be the key capabilities that are subject to further analysis.

Cloud Valuation:

4. Profile Capabilities - This again is a critical activity, in which the current maturity levels of the capabilities are determined and then other risks and other constraints like architectural fit, readiness, feasibility etc are assessed.

5. Recommend Deployment Patterns - Based on the capability profiling come up with recommended cloud services and the deployment patterns based on architectural fit, value and risk. Of course this will call for due consideration of the proven practices, and the industry trend.

Business Transformation Planning

6. Define and Prioritize Opportunities - This activity is more like coming up with a business case for an opportunity for cloud transformation and yes, this should include the perceived benefits, expected risks, technology impacts, and a detailed architecture..

7. Define Business Transformation Roamap - This activity is about performing an assessment of implementation risks and dependencies and develop a transformation roadmap, which should be validated with the business users.

In line with the above framework, Mike Walker suggests a top down model as below:

You may wish to check out the original article here.

Sunday, April 7, 2013

NGINX - A High Performance HTTP Server

What you will find about NGINX from its website is that it is a free, open source light weight HTTP web server and a reverse proxy. Nginx doesn't rely on threads to handle requests. Instead it uses a much more scalable event-driven (asynchronous) architecture. This architecture uses small, but more importantly, predictable amounts of memory under load. We have found that it is holding up to what it says. Here is how we ended up using NGINX for one of our client and have found it meeting up to our expectation.

We were on a performance testing assignment of a game application, where much of the game contents were static flash files (of course they are dynamic as they had embedded action scripts for execution in the front end) and image files of which quite many are of sizes above 500KB. Need not to mention that this is the compressed size and as such enabling compression did not offered any performance gains. These files were served by Apache, which also serves dynamic php requests.

Our initial performance tests did indicated that for about 1000 concurrent hits, the Apache server memory consumption shot up considerably beyond 2 GB and also quite many requests were timing out. Examination of the server configuration did reveal that the server is configured to serve 256 child processes and also that it was configured to use the ‘prefork’ mpm module, which limits each child process to use only one thread. This technically limits the number of concurrent requests the Apache Server can serve to 256.

We also figured out that Apache needs to use ‘prefork’ module in order to safely serve PHP requests. Though later versions of PHP are found to be working with the other mpm module (‘worker’, which can use multiple threads per child process) many still have concerns around thread safety. The 256 client process limit is also a hard limit and if that needs to be increased, the Apache server needs to be rebuilt.

With this diagnosis, the choices we had with us was to have the static content served out of a server other than Apache and just leave Apache to serve the PHP requests. We just then thought of trying out NGINX and without wasting much time, we went ahead and implemented it. NGINX was then configured to listen on port 80 and has been configured to serve the contents from the root folder of Apache. Apache server was configured to listen at a non standard port and NGINX has also been configured to act as a reverse proxy for all php requests by getting them served processed by Apache.

We performed the tests again and have found tremendous improvement. The average latency at 1000 concurrent hits stress level has come down to under 5 seconds from about 80 seconds. We could also observe that NGINX was consuming just around 10MB. NGINX, by itself does not has any limitation on the number concurrent threads and it is constrained by the limits of the server itself and other daemons running on it.

Sunday, March 31, 2013

Software Complexity - an IT Risk perspective

IT risk management predominantly focuses on IT Security though there are other category of IT risks which are less known. Software complexity is such an important unusual or unpopular IT risk. Basically, software complexity is one of many non functional quality attributes, which in most cases is ignored completely, leading to associated risks occurring and thus leading to financial and non financial impacts on the business entity. In this blog, let us try to understand the software complexity in the perspective of IT risk, and shed some light on whose baby it is and how to manage or control it.

What is Software Complexity?

Let us first understand what is Software Complexity. The simple definition of complexity is how hard a software is to understand, use and / or verify it. Complexity is relative to the observer, meaning that what appears complex to one person might be well understood by another. For example, a software engineer familiar with the systems theory of control would see good structure and organization in a well-designed attitude control system, but another engineer, unfamiliar with the theory, would have much more difficulty understanding the design.

Basili defines complexity as a measure of resources expended by a system while interacting with a piece of software to perform a given task. In terms of the end users, this could be understood as the usability. i.e. how much time the user needs to spend accomplish a task using a software and how much training is required to get familiar with it. Similarly in terms of the developers, this could be understood as maintainability i.e. how easy it is for new developers to understand and work on it to further improve it or to address issues. Again, in terms of the IT infrastructure team, this could be understood as the level of efforts needed to have it deployed and support it, which again should be considered under maintainability.

Why Complexity is an evil?

Complexity as the above definition goes, requires more resources to be expended than normal and thus is counter productive. We have numerous best practices, standards and frameworks that advocate for eliminating complexity, but somehow it creeps in and pose as a challenge in most cases. The consequence of software complexity as we have put it above clearly is risk, and it could be even be a business risk, when we look at it in the end user perspective. The following is a sample list of risks that could arise due to software complexity:

Complex software is difficult to verify and validate and this could lead to new defects surfacing even after months of its production use. Depending on the severity of the defect, the impact could be low to very high.
Software complexity could render the business processes inefficient, as the end users may have to spend more time in using the software to execute certain business processes, which could result in losing the competitiveness and thereby losing market.
Failure to consider the downstream complexity by the development teams could result in a state of ‘project successful, but mission failed’. i.e. even if the project teams have successfully completed the project, the resulting software product might be complex enough to render it useless thereby rendering the entire investment in the project to be loss.
Post production support will always be challenge as it is difficult to retain trained and capable resources to maintain the complex software and this is a most common risk that enterprises are continuing to battle with.
Complexity in deploying and distributing the software is another area of concern for the infrastructure team. Sometimes, such complex software might demand so much of computing resources wherein the costs of such hardware and other infrastructure resources might outweigh the perceived benefits out of using such a software.

The above list is only illustrative and the list could be even bigger and I would let you come up with more such consequences in the form of feedback.

Measuring Software Complexity

To be managed or controlled, we should be able to measure the complexity. While the discussion of measures of software complexity is out of the scope of this blog (I might work on to write another blog on measuring software complexity), at a high level, the following measures can be used.

Complexity Metric	Primary Measure of …
Cyclomatic complexity (McCabe)	Soundness and confidence; measures the number of linearly-independent paths through a program module; strong indicator of testing effort
Halstead complexity measures	Algorithmic complexity, measured by counting operators and operands; a measure of maintainability
Henry and Kafura metrics	Coupling between modules (parameters, global variables, calls)
Bowles metrics	Module and system complexity; coupling via parameters and global variables
Troy and Zweben metrics	Modularity or coupling; complexity of structure (maximum depth of structure chart); calls-to and called-by
Ligier metrics	Modularity of the structure chart

Controlling Software Complexiity

As we have seen in the definition section, complexity is context sensitive. i.e. the same software may be viewed at different complexity level by different class of users. While this is a challenge for the developers, they find it as a useful shelter to stay away from addressing it. Developers primarily depend on the business system analysts, who in most cases play the role of representing the end users and when they fail to see the complexity which the end users would, development teams are mislead. In one of the project that we have been working on, the business system analyst, who was also tasked to triage the defects logged by end users, simply turns down such defects suggesting the users to learn to use the system. This definitely goes against controlling the complexity there by managing the IT Risk.

Controlling Software Complexity calls for adopting certain practices in the areas of architecture, design, build, testing and project management some of which are listed below:

Set the expectations: Overly stringent requirements and simplistic hardware interfaces can complicate software and a lack of consideration for testability can complicate verification efforts. Cultivating awareness about the downstream complexity and how detrimental the complexity could be, if ignored, would help in getting the much needed team collaboration and commitment.
Requirement Reviews: Ambiguous software requirements has been a cause of complexity, as many times, the developers tend to assume things. Similarly certain requirements may be unnecessary and could be overly stringent, which can be spotted in the review process leading to reduction in complexity.
Domain Skills: Having the relevant product or domain skills will go a long way in reducing downstream complexity as the development team would be familiar as to how the industry is handling such requirements.
Architecture Reviews: A good architectural design will go a long way in reducing the downstream complexity. Design and architecture reviews early on would go a long way in addressing this need. It would even make sense to have an Architecture Review Board comprising of representatives who will be able to use their expertise in the specific areas and spot problem areas.
COTS: Third party libraries usually termed as COTS (Commercially available Off the Shelf) components often come with unneeded features that adds to the complexity of the software. It is important to make a careful and thoughtful decision in case of COTS.
Software Testability: Design the Software with testability in mind, so that the complexity can be verified by the QA teams.

Conclusion:

Much of the plans for mitigating the software complexity risk revolves around the software acquisition or software development process. This calls for a close co-ordination and collaboration between the IT Risk managers and the Development teams. Though risk management is well understood as an important function of the project managers, how well the risk of complexity is managed is still questionable. The project managers are primarily concerned on those risks that might impact the project success and fail to consider those like the complexity that emerge downstream post project. It has also been observed that reduction of complexity is an important characteristic of disruptive innovation. Innovations emerge disruptive, when it addresses the complexity in already existing products or systems and thus being able to capture market segments one after the other in it course to the top of the cliff.

Saturday, March 23, 2013

Surviving Disruptive Innovations

I have recently made a presentation on Disruptive Technologies at the Chennai Chapter of ISACA. While chose the topic in the context of presenting a picture of the pace at which the disruption is happening in IT world and what are the upcoming Disruptions to watch out for. But As I was preparing the agenda and content for the presentation, I was curious to find out how successful enterprises are managing rather surviving disruptions and in the process have stumbled upon some of the research work done by Clayton Christensen.

It was interesting to observe few things from his theory, which are the following:

Good Management principles would not be of great help in managing or surviving the disruptive innovations. Christensen sites the examples of how Toyota came up disrupting General Motors. He sees a pattern in the happening of disruptions in the form of an S curve, where the top of the curve is a cliff. Leaders / Leadership teams follow the bese management principles to climb up the S Curve and when they reach top they just fall off the cliff.

Extendable core is the key enabler of of innovations becoming disruptive. The potential disruptive innovations would appear as if it is insignificant in terms of the competitive capabilities of the incumbent’s existing products and thus tempting the incumbent to ignore it. But having an extendable core within it, the new entrant quietly enhances its capabilities and slowly get into the mainstream market of the incumbent and then disrupting a whole market resulting in driving the big and well managed incumbents out of the market.

Emerges from where it is least expected. For example, we now find it very comfortable to use a smartphone to various jobs which otherwise were performed by some special purpose devices. Examples include GPS devices, Digital Cameras and even PCs. While GPS device manufacturers still believe that GPS feature of Smartphone is not a threat for them as the special purpose GPS devices have certain unique advantages, which Smartphones don’t. But be reminded that the smartphones have the extendable core and can easily address this capability gap and soon GPS devices will be a thing of the past and we are already seeing the signs of it.

While there are many other interesting observations to note, I would leave it for you to find those out. I was then curious to look into the cases of disruptions that happened in the past. the following three cases of disruptions were of interest to me:

Kodak: Kodak ruled the photography market for a whole century. Their management as all the best qualities and were praised in all respects. Kodak has many innovations to its credit and have many firsts as well. With such a performance it has recently gone into bankruptcy and has sold its patent portfolio, which included close to 1000 patents to salvage some value. It is natural for us to think that the emergence of Digital Cameras would have disrupted Kodak in a big way. But as many would know, Kodak knew that digital era is emerging and they were the first to introduce a Digital Camera in the 1970s. But then what went wrong and how did they miss to sustain that innovation and stay alive in the market? Kodak has been believing till early 200 that the Photographic films wouldn’t die so soon. The other interesting observation out of Kodak’s failure is that with a heavyweight team of experts, sustaining innovation is really expensive and the outside view is most likely ignored.

That’s where the management tend to give up on some of the innovations as the time and investment in it may not worth it as they are making decent business with their current line of products. The situation is different for new entrants as the startups usually break the rules of convention and are in a position to pursue such innovations relatively a lot cheaper and also in an progressive manner. Startups usually start to focus a market which is ignored or to which the incumbents don’t pay much attention and there by not drawing the attention of the incumbents till a point when it will be difficult for the incumbent to respond to.

NOKIA: Nokia came big in the cellular phone, but failed to get its innovation strategy right with the smartphones. Even in NOKIA’s case, its research team came up with a prototype of smartphone with internet access and touch interface, way back in 2003, but the management, again going by good management principles, citing the risk involved in the product being successful and the very high cost of its development has turned down the proposal to pursue this plan further. Exactly three years later Apple launched its iPhone.

NetFlix: Netflix case is a little different. Netflix has been very successful in its DVD Rental business and in fact has seen the emergence of disruptive innovations in the form of streaming videos. It even responded to it by pursuing its research activities in that direction and has developed a service of streaming videos. What went wrong according to analysts is that it got is business model and pricing wrong as it combined both the traditional service and the digital streaming service as a bundle and increased the pricing. Ideally it would have been more appropriate to have offered the digital streaming under a different brand or as a separate service, as the surveys indicated that the DVD rentals still account for 70% of the total video sales in the US.

Now, given that just good management principles don’t help in sustaining or surviving the disruptive innovations, what should the organizations do to stay alive in todays world where IT has enabled the disruptive innovations to emerge with much faster pace leaving very little time for the incumbents to respond to. We also keep hearing that the “break the rules” is the way to go to foster innovation. While disruption is always seen as a risk to be managed, how well enterprises come up with the right risk mitigation and contingency plans to handle the risk of disruption is still a mystery.

You may check out my presentation on the subject at Slideshare and feel free to share your views and thoughts on this topic. You may google to find out some of the great articles and papers on the theory of disruptive innovation by Clayton Christensen. You will also find some good video lectures of his on YouTube.

Saturday, February 9, 2013

Stress Testing a Multi-player Game Application

I recently had an opportunity to consult for a friend of mine on stress testing a multi-player game application. This was a totally new experience for me and this blog is to detail as to how I approached this need to simulate the required amount stress and have it tested under stressed circumstances.

About the Application Architecture

The Application was developed using Flash Action Scripts with few php scripts for some of the support activities. The multi-player platform is aided by SmartFox multi-player gaming middleware. It also makes use of MySQL. The flash files containing the action scripts and a lot of images have been hosted on an Apache web server, which also hosts the PHP. All of Apache, MySQL and SmartFox have been hosted on a single cloud hosted hardware on Linux operating system.

The test approach

My first take on the test approach was to focus on simulating stress on the server and get the client out of the scope of this test. This made sense as all of the flash action scripts get executed on the client side and in reality it is typically single user using the game on a client device and the client side application has all the CPU, memory and related resources available on the client device. Thus in reality there is no multi-player stress on the client.

Given that I will now focus on the impact of the stress on the server resources, I had to understand how the client communicates with the server and about the request / response protocols and related payload. I used Fiddler to monitor the traffic out of the client device on which the game is being played and I could only see http requests for fetching the flash and image files and few of the php files. But I could not find any traffic for the SmartFox Server on http and figured out that those requests are tcp socket requests and thus not captured by fiddler.

The test tools

At this stage, it was much clear that we need to simulate stress on apache by sending in as many http requests, we need to simulate stress on SmartFox server as well over tcp sockets. We have a choice of numerous open source tools to simulate http traffic. I chose JMeter for http traffic simulation, which is open source, UI driven, easy to setup and use. It also supports multi node load simulation.

I need figure out for a tool for simulating load on sockets. I checked with SmartFox to see if they offer a stress test tool, but they don’t. A search through the SmartFox forums revealed that a custom tool is the way to go and to make it easier, we can use one of SmartFox client API libraries, which are available for .NET, Java, Action Script and few other languages. I settled for .NET route as C# is the language in which I have been working with in the recent years.

I have built a multi-threaded custom .NET tool using the SmartFox Client API to simulate the stress on the SmartFox. To my surprise, the SmartFox Client API library has not been designed to work with multi threading and SmartFox support confirmed this behaviour. I then decided redesign my custom tool to use the multi-process architecture and it worked fine..

I needed a server monitoring tool to monitor and measure various server performance parameters under stress conditions. I have found the cloud based NewRelic as the tool of choice to monitor the Linux Server hosting the game components.

The test execution

I had JMeter configured on three nodes (one being the monitoring node) and had set it up to spawn the desired number of threads. I had the custom .NET tool on another client and set it up to spawn the desired number of processes making a sequence of tcp socket requests. I also engaged couple of QA resources to play the game record the user experience under stress conditions.

The test execution went well and we could gather the needed data to form an opinion and make recommendations.

References:

Apache JMeter

NewRelic

Multi Process Architecture in C#

Saturday, January 26, 2013

Solution Architecture - Aligning Solutions with Business Needs

As the Enterprise Architecture & IT Governance adoption is on the rise, we keep hearing a lot about IT - Business alignment. ISACA positions Strategy Alignment as an important knowledge area along with five other knowledge areas. Enterprise Architecture being an important function in architecting or engineering the enterprise itself has a major role to play in appropriately deriving the IT strategy from the business strategy and ensure its execution as intended. Within the overall Enterprise Architecture function, it is the Solution Architecture function that acts as a bridge between Business Architecture and the Technology / Application Architecture. So, the Solution Architecture is a key role in transforming the business needs into IT solutions and it is this function that is key to ensure that the IT solutions are in alignment with the business needs and strategies.

Now, most of the Solution Architects are well aware of this important need at the early stage of solution design and Architecture, things keep going in the wrong way and the business in general are not happy with the solutions delivered. While most of the Solution Architects do well in the initial phase of the solutioning, somewhere down the line things tend to stray away and let gaps creep in resulting in a misaligned solution being delivered. A close examination of the causes of various misaligned projects reveal the causes which are listed below. This list is not exhaustive and is not in any order of priority.

Requirement Analysis

Typically a detailed Business Requirements Specification document is the input for the Solution Architecture function and the solution that the Architects propose largely depends on the veracity of this document. A good knowledge on the business domain of the organization in general and the ability to apply critical review on the requirements would bring out possible gas that if ignored might creep into the solution implementation leading to gaps. Needless to stress here is the non functional requirements, which on most occasions remain undocumented and a largely contributes for the gaps to creep in. Lack of formal processes that call for iterative reviews of the business case and in turn the requirements specifications is cause of concern.

Generally the Solution Architects should be engaged from the initial stages of problem definition, so that they will be in a position to gather the needed details, which might not figure in the requirement specifications.

Expectation Management

Traditionally, the business just throws in the problem they want to solve or the opportunity they want to exploit and probably participate in defining the requirements. Thereafter, they just leave it to the Architects and developers to do whatever and finally deliver the solution. In one of a project for a leading multinational bank, it so happened that when the solution was delivered, they reacted stating that this is not what we wanted and they found that the requirements specification as duly accepted by them was in total mismatch with what was expected. It is good and easier to have this expectation mismatch ironed out early on, before getting deep into implementation. While the expectation mismatch is generally on the non functional needs, like usability, portability, reliability, etc, at times, even the functional requirements also leaves room for misunderstanding by different teams leading to misalignment. The best way to address this area is to involve the business in the periodical reviews by using Agile practices. Even with Agile, this could happen if those representing the business do not have the same understanding of the business needs at large.

Validating the solution design with the business representatives and keeping a watch on the expected design outcome as the implement progresses for possible impacts on the expectation is the best way to address this concern.

Skill Gaps

Solutioning is not a pure science and one should be able to blend the science and art. Getting the best solution depends on the Solution Architects’ ability to visualize the final outcome given the constraints and choice of tools and making sure that this what the business would want. While the skill gaps with other related functions would still be a cause for concern, the deficiencies with Solution Architecture function could mean dear as they play a part early on in the life cycle.

Choice of Tools & Technology

While the choice of tools and technology is not generally within the domain of the Solution Architect, it is important to be knowledgeable of various tools and technology that can be leveraged within the constraints of the IT strategy will go a long way in proposing a best solution. Another concern around this issue is that as the Technical Architects or the development team encounter feasibility issues with a chosen tool or technology, the Solution Architect has to jump in again and align or tweak his solution to overcome such issues without impacting the expected final outcome.

Culture, Communication & Collaboration

While continuous review and monitoring would help bring out exceptions, it is for the organizaiton to nurture a culture amongst the teams so that everyone knows what is the business and IT strategy and the value of the work that they deliver towards IT solutions. The Solution Architecture team should have a thorough understanding of the IT Strategy and ensure that the solutions that they propose are in line with it. It is also important that the teams should collaborate well and work for the common objective of ensuring that the execution of the solution design progress in the expected manner and timely escalation of exceptions for appropriate action.

Roles & Responsibilities

Lack of clear definition of roles and responsibilities would lead to negligence in the deliverables by individuals and teams. As with a project I cited earlier, the business representative who approved the requirements specification did so without being sure of what he is approving. Further reviews and monitoring are very much needed to spot such lapses early on. The culture of the organization should also be such that the teams and team members own the problem and contribute for the solution to the expectation of the business.

Review and Monitoring

Depending on the size and strategic importance of the projects, the program and project management team should keep a watch on any potential impact on the expected outcome and exceptions if any observed are analysed for possible changes to the solution design and implementation and in turn managing the expectations appropriately. This would call for identifying the key goals and outcomes upfront and come up with appropriate metrics to keep track of.

Another area to keep a watch on are the assumptions, constraints and the dependencies which have been identified as part of the project charter. It would be impossible to iron out all ambiguities at the start of the project and as such it is a prevalent practice to start off by making assumptions on such ambiguities and proceed. However efforts should be on to get those assumptions validated and make them unambiguous as the project progresses. Any change in the assumption might trigger redesigning the solution. Same is the case with the various known constraints and dependencies with which the solution is designed and being implemented.

As we could observe all the stakeholders have a role to play in ensuring the alignment with business need. However, the Solution Architecture function sets it off and the other teams carry on from there. The Solution Architects should hold the key, being in close touch with the implementation teams, making themselves available for clarifications, being receptive to emerging issues around the solution design and coming up with needed improvement to the solution designs.

Saturday, January 12, 2013

Characteristics of High Maturity SOA Implementation

Cloud adoption is increasing and so is SOA implementation. As the analysts point out that be it cloud adoption or SOA adoption, unless appropriate governance framework is in place ahead of the adoption journey, it is highly likely that the perceived benefits might not be realized. Benchmarking is a good approach to identify the current state of SOA implementation as compared to an industry recognized maturity model. This also helps to define a target state and then work towards achieving it. Amongst many SOA maturity models the Open Group's OSIMM v2 is quite popular and defines seven levels. While the first three levels are classified as foundational, Level 4 through 7 expects certain demonstrated characteristics and the Level 7 indicates the highest level of maturity.

OSIMM V2 provides the base model which is designed to be extended or customized by the customers and the consulting organizations. OSIMM has seven dimensions across seven maturity levels. The state represented by these levels are given below:

Level 1: Silo - No integration implementation.
Level 2: Integrated - Yes, technology is in place to integrate the silos, but they don’t extend to common standards in data or business process.
Level 3: Componentized - IT systems componentized, but the components are not loosely coupled thus limiting the interoperability
Level 4: Service - Services are identified and implemented and composite systems are built from the loosely coupled services. However, the composition and the service definition itself are implemented using bespoke code rather than by a declarative flow language, thus limiting agility.
Level 5: Composite Services - Construction of a business process is possible through use of BPM tools by assembling the relevant interacting services is possible. At this level agility is possible, but developers still have a role to play under the guidance of business analysts.
Level 6: Virtualized services - At this level the services are virtualized, i.e. a level of indirection is introduced (like a facade layer based on the dependency injection principles), with which the services are more decoupled from the infrastructure on which it is running. But still definition and implementation requires the help of developers.
Level 7: Dynamically Re-configurable Services - At this level Composite services are assembled in run time by the business analysts or by the system itself, based on various parameters and using service repositories.

Let us now examine Level 7 in detail across the seven domains. Note that like in any other Maturity Model, definition of a level means that the system meets all the previous levels.

Business Dimension:

OSIMM specifies about 19 assessment questions under the business domain and the answers to these questions will be used in assessing the current state. The following are the key characteristics of this domain in practice:

High Agility - Enterprises services are available on demand.
Existence of EA - A well defined EA exists, which includes a formal end-to-end definition of business process flow
Use of BPM - BPM is in use for defining and testing process flows necessary to meet well defined SLAs.
Integraton - Services being integrated across the enterprise and externally with business partners

Organization & Governance domain:

Under this domain, the focus is on how an organization formally defines and documents their organization and governance processes. OSIMM recommends about 11 questions to gather required information which helps the assessor to form opinion on the current state. When at level 7, the following are key characteristics are expected to be demonstrated:

Adaptive Enterprise - Ability to adjust operations quickly and smoothly to meet the rapid changes in the market, technology and priorities and being able to evolve with the global economy, so that the services never become obsolete.
Aligned with Business Strategy - Services are modelled and managed as elements of the evolving business strategy.
Measurement & Monitoring - Appropriate service metrics are identified, defined and automatically gathered and are effectively used in monitoring and in decision making.
Governance - SOA governance is part of the organization culture

Method Dimension:

This domain is about the formal use of SOA architectural design, construction and deployment methodology in the organization. OSIMM suggests about 10 questions, with which the assessor will be able to gather required information to map the maturity indicator to the associated maturity attributes. Here the key characteristics that maps to Level 7:

Adaptive Enterprise - The design, construction and deployment methodologies evolve in line with the industry standards and best practices.
Dynamic Services - Formal methods exist to leverage architectural constructs and assets for supporting virtualization and dynamic services and BPM.
Consistent Adoption - Existence of best practice guidance to facilitate consistent adoption of SOA, Virtualization, Middleware technology, Service Registry, etc.

Application Dimension:

In this domain, the focus is on the Application architecture being designed using the SOA principles, usage of constructs such as loose-coupling, separation of concerns and usage of related technologies such as XML, webservices, service bus, service registry, virtualization etc. The following key characteristics are required to be assessed at Level 7:

Adaptive Enterprise - In the sense that the enterprise shall be able to adapt to and use the evolving SOA design principles, the tools and technologies.
Dynamic Application Assembly - The application architecture and design should support the dynamic re-configuration of the services and support consumption of services by external business partners.
Design Patterns - Extensive use of SOA / ESB design patterns to support BPM.

Architecture Dimension:

The formal use of SOA methods, principles, patterns, frameworks and techniques in the SOA architecture is evaluated under this domain. There are 11 questions to help the assessors to gather required information. The key characteristics of this domain to be an that are required to be assessed at Level 7 are the following:

Adaptive Enterprise - While the use of formal methods, principles, patterns frameworks and techniques is necessary, these should also evolve in line with the industry practices in architecting SOA services .
Formal Enterprise BIS - Design and implementation of formal enterprise Business Information Services supporting both the enterprise and external entities.
Integration - Appropriate architecture is used to support enterprise wide integration and also externally to support partner entities.

Information Dimension:

This domain focuses on the existence of a formal information architecture that supports a master data model and implements a common business data vocabulary. The assessors can make use of 13 questions suggested by OSIMM to assess this domain. The following are the key characteristics required to be assessed at Level 7:

Adaptive Enterprise - In the sense to have the information architecture evolve in line with the industry standards and practices.
Vocabularies - Existence of Business data vocabularies, with ability to easily expand or enhance to support new services, business partners and process re-configuration.
Data definition - Business data is defined using semantic web constructs or ontologies.
BIS Model - A formal enterprise Business Information Model has been designed and implemented that includes both enterprise and external relationship entities.
Master Data - Existence and use of Master Data services.

Infrastructure & Management Domain:

This domain is about the IT Infrastructure that supports the non-functional and operational requirements and SLAs needed to operate an SOA environment. There are 12 questions that will be useful for the assessors to gather the required information. The key characteristics that need to be established in this domain are:

Adaptive Enterprise - In the sense that the infrastructure and related facilities used in supporting the SOA environment should evolve with the industry best.
Service Management - Tracks and predicts changes to the services necessary to optimize the quality.
Service re-usability - services are re-used in new and dynamic ways without negatively impactin the Quality of Service.
Security - Service security policies dynamic and managed in real time.

While the first three Levels are classified as Foundational, the Level 4 is where existence of minimal SOA practices are expected. That way, we can say that one should atleast start at Level 4 and should look forward to transition to higher levels. The Open Group’s OSIMM also lays out the Assessment methods and the assessment questions to facilitate the adoption.

References:

OSIMM v 2 Technical Standard.

Saturday, December 29, 2012

Resilient Systems - Survivability of Software Systems

Resilience as we all know is an ability to withstand through tough times.There is also another term quite interchangeably used, which is Reliability. But Reliability and Resilience are different. Reliability is about a system or a process that has zero tolerance to failure or the one that should not fail. In other words, when we talk about reliable systems, the context is that failure is not expected or rather acceptable. Whereas Resilience is about the ability to recover from failures. What is important to understand about resilience is that failure is expected and is inherent in any systems or processes, which might be triggered due to changes to the platform, environment and data. While Reliability is about the system’s robustness of not failing, Resilience is its ability to sense or detect failures ahead and then prevent it from encountering such events that lead to failure and when it cannot be avoided, allow it to happen and then recover from the failure sooner.

A working definition for resilience (of a system) developed by the Resilient Systems Working Group (RSWG) is as follows:

“Resilience is the capability of a system with specific characteristics before, during and after a disruption to absorb the disruption, recover to an acceptable level of performance, and sustain that level for an acceptable period of time.“ The following words were clarified:

The term capability is preferred over capacity since capacity has a specific meaning in the design principles.

The term system is limited to human-made systems containing software, hardware, humans, concepts, and processes. Infrastructures are also systems.

The term sustain allows determination of long-term performance to be stated.

Characteristics can be static features, such as redundancy, or dynamic features, such as corrective action to be specified.

Before, during and after – Allows the three phases of disruption to be considered.

Before – Allows anticipation and corrective action to be considered

During – How the system survives the impact of the disruption

After – How the system recovers from the disruption

Disruption is the initiating event of a reduction is performance. A disruption may be either a sudden or sustained event. A disruption may either be internal (human or software error) or external (earthquake, tsunami, hurricane, or terrorist attack).

Evan Marcus, and Hal Stern in their book Blueprints for High Availability, define a resilient system as one that can take a hit to a critical component and recover and come back for more in a known, bounded, and generally acceptable period of time.

In general Resilience is a term of concern for Information Security professionals as the final impact of disruption (from which a system needs to recover), could mostly be on Availability which is one of the three tenets of Information Security (CIA - Confidentiality, Integrity and Availability). But there is a lot for System designers and developers, especially those tasked to build mission critical systems to be concerned about Resilience and architect the systems to build in a required level of Resilience characteristics. For a system to be resilient, it should draw necessary support from dependent software and hardware components, systems and the platform. For instance a disruption for a web application can even be from network outage, security attacks at the network layer, which the software has no control over. But it is important to consider software resiliency in relation to the resiliency of the entire system, including the human and operational components.

The PDR (Protect - Detect - React) strategy is no longer as effective as it used to be due to various factors. It is time that predictive analytics and some of the disruptive technologies like big data and machine learning need a consideration in enhancing the system resiliency. Based on the logs of various inter-connected applications or components and other traffic data on the network, intelligence need to be built into the system to a combination of number of possible actions. For instance, if there is a reason to suspect an attacker attempting to gain access to the systems, a possible action could be to operate the system at a reduced access mode, i.e. parts of the systems may be shut down or parts of the networks to which the system is exposed could be blocked, etc.

OWASP’s AppSensor project is worth checking by the architects and developers. The AppSensor project defines a conceptual framework and methodology that offers prescriptive guidance to implement intrusion detection and automated response into an existing application. AppSensor defines over 50 different detection points which can be used to identify a malicious attacker.Appsensor provides guidance in the form of possible responses for each such identified malicious atacer.

The following are some of the factors that need to be appropriately addressed to enhance the resilience of the systems:

Complexity - With systems continuously evolving and many discrete systems and components increasingly integrated into today’s IT eco-system, the complexity is on the rise. This makes the resiliency routines as built in to individual systems needing a constant review and revision.

Cloud - Cloud computing is gaining higher acceptance and as organizations embrace cloud for its IT needs, the location of data, systems and components are widespread across the globe and that brings in challenge for those involved in building resilience.

Agility - To stay on top of the competition, organizations need agility in their business processes, which means rapid changes in the underlying systems and this could be a challenge as this will call for a constant check to ensure that the changes being introduced does not downgrade or compromise the resiliency level of the systems.

While there are techniques and guiding principles which when followed and applied, the resilience of the systems can be greatly improved, such design or implementation comes with a price and that is where the economics of Resiliency needs to be considered. For instance, mission critical software systems like the ones used in medical devices, need to have a high resilience characteristic, but quite many of the business systems can have a higher tolerance level and thereby being less resilient. However, it is good to document the expected resilience level at the initial stage and work on it in the early life cycle of the system development. thinking about resilience later in the life cycle may not be any good as implementation will call for higher investment.

References:

Crosstalk - The journal of Defense Software Engineering Vol 22 No:6

Resilient Systems Working Group

OWASP - AppSensor Project

Pages