Sunday, April 10, 2016

Economics of Software Resiliency

Resilience is a design feature that facilitates the software to recover from occurrence of an disruptive event. As it is evident, this is kind of automated recovery from disastrous events after occurrence of such events. Yes, given an option, we would want the software that we build or buy has the resilience within it. Obviously, the resilience comes with a cost and the economies of benefit should be seen before deciding on what level of resilience is required. There is a need to balance the cost and effectiveness of the recovery or resilience capabilities against the events that cause disruption or downtime. These costs may be reduced or rather optimized if the expectation of failure or compromise is lowered through preventative measures, deterrence, or avoidance.

There is a trade-off between protective measures and investments in survivability, i.e., the cost of preventing the event versus recovering from the event. Another key factor that influences this decision is that cost of such event if it occurs. This suggests that a number of combinations need to be evaluated, depending on the resiliency of the primary systems, the criticality of the application, and the options as to backup systems and facilities.

This analysis in a sense will be identical to the risk management process. The following elements form part of this process:


Identify problems


The events that could lead to failure of the software are numerous. Developers know that exception handling is an important best practices one should adhere to while designing and developing a software system. Most modern programming languages provide support for catching and handling of exceptions.  This will at a low level help in identifying the exceptions encountered by a particular application component in the run-time. There may be certain events, which can not be handled from within the component, which require an external component to monitor and handle the same. Leave alone the exception handling ability of the programming language, the architects designing the system shall identify and document such exceptions and accordingly design a solution to get over such exception, so that the system becomes more resilient and reliable. The following would primarily bring out possible problems or exceptions that need to be handled to make the system more resilient:


  • Dependency on Hardware / Software resources - Whenever the designed system need to access a hardware resource, for example a specified folder in the local disk drive, expect a situation of the folder not being there, the application context doesn't have enough permissions to perform its actions, disk space being exhausted, etc. This equally applies to software resources like, an operating system, a third party software component, etc.
  • Dependency on external Devices / Servers / Services / Protocols - Access to external devices like printers, scanners, etc., or other services exposed for use by the application system, like an SMTP service for sending emails, database access, a web service over HTTPS protocol, etc. could also cause problems, like the remote device not being reachable, or a protocol mismatch, request or response data inconsistency, access permissions etc. 
  • Data inconsistency - In complex application systems, certain scenarios could lead to a situation of inconsistent internal data which may lead to the application getting into a dead-lock or never ending loop. Such a situation may have cascading effect as such components will consume considerable system resources quickly and leading to a total system crash. This is a typical situation in web applications as each external request is executed in separate threads and when each such thread get into a 'hung' state, over a period, the request queue will soon surpass the installed capacity. 


Cost of Prevention / recovery


The cost of prevention depends on the available solutions to overcome or handle such exceptions. For instance, if the issue is about the SMTP service being unavailable, then the solution could be to have an alternate redundant, always active SMTP service running out of a totally different network environment, so that the system can switch over to such alternate service if it encounters issues with the primary one. While the cost of implementing the handling of multiple SMTP services and a fail-over algorithm may not be significant, but maintaining redundant SMTP service could have significant cost impact. Thus with respect to each such event that may have an impact on the software resilience, the total cost for a pro-active solution vis-a-vis a reactive solution should be assessed.

Time to Recover & Impact of Event


While the cost of prevention / recovery as assessed above will be an indicator of how expensive the solution is, the Time to Recover and the Impact of such an event happening will indicate the cost of not having the event handled or worked around. Simple issues like a database dead-lock may be reactively handled by the DBAs who will be monitoring for such issues and will act immediately when such an event arise. But issues like, the network link to an external service failing, may mean an extended system unavailability and thus impacting the business. So, it is critical to assess the time to recover and the impact that such an event may have, if not handled instantly.

Depending on the above metric, the software architect may suggest an cost-effective solution to handle each such events. The level of resiliency that is appropriate for an organization depends on how critical the system in question is for the business, and the impact of the lack of resilience for the business. The organization understands that the resiliency has its own cost-benefit. The architects should have this in mind and design solutions to suit the specific organization.

The following are some of the best practices that the architects and the developers should follow while designing and building the software systems:
  • Avoid usage of proprietary protocols and software that makes migration or graceful degradation very difficult.
  • Identify and handle single points of failure. Of course, building redundancy has cost.
  • Loosely couple the service integrations, so that inter-dependence of services is managed appropriately.
  • Identify and overcome weak architecture / designs within the software modules or components.
  • Anticipate failure of every function and design for fall-back-scenarios, graceful degradation when appropriate.
  • Design to protect state in multi‐threaded and distributed execution environments.
  • Expect exceptions and implement safe use of inheritance and polymorphism 
  • Manage and handle the bounds of various software and hardware resources.
  • Manage allocated resources by using it only when needed.
  • Be aware of timeouts of various services and protocols and handle it appropriately

Sunday, March 20, 2016

Big Data for Governance - Implications for Policy, Practice and Research

A recent IDC forecast shows that the Big Data technology and services market will grow at a 26.4% compound annual growth rate to $41.5 billion through 2018, or about six times the growth rate of the overall information technology market. Additionally, by 2020 IDC believes that line of business buyers will help drive analytics beyond its historical sweet spot of relational (performance management) to the double-digit growth rates of real-time intelligence and exploration/discovery of the unstructured worlds.

This predicted growth is expected to have significant impact on all organizations, be it small, medium or large, which include exchanges, banks, brokers, insurers, data vendors and technology and services suppliers. This also extends beyond the organization with the increasing focus on rules and regulations designed to protect a firm’s employees, customers and shareholders as well as the economic wellbeing of the state in which the organization resides. This pervasive use and commercialization of big data analytical technologies is likey to have far reaching implications in meeting regulatory obligations and governance related activities. 

Certain disruptive technologies such as complex event processing (CEP) engines, machine learning, and predictive analytics using emerging big-data technologies such as Hadoop, in-memory, or NoSQL illustrate a trend in how firms are approaching technology selection to meet regulatory compliance requirements. A distinguishing factor between big data analytics and regular analytics is the performative nature of Big Data and how it goes beyond merely representing the world but actively shapes it.


Analytics and Performativity


Regulators are staying on top of the big data tools and technologies and are leveraging the tools and technologies to search through the vast amount of organizational data both structured and unstructured to prove a negative. This forces the organizations to use the latest and most effective forms of analytics and thus avoid regulatory sanctions and stay compliant.  Analytical outputs may provide a basis for strategic decision making by regulators, who may refine and adapt regulatory obligations accordingly and then require firms to use related forms of analytics to test for compliance. Compliance analytics are not simply reporting on practices but also shaping them through accelerated decision making changing strategic planning from a long term top down exercise to a bottom up reflexive exercise. Due to the 'automation bias' or the underlying privileged nature of the visualization algorithms, compliance analytics may not be neutral in the data and information they provide and the responses they elicit.

Technologies which implement surveillance and monitoring capabilities may also create self-disciplined behaviours through a pervasive suspicion that individuals are being currently observed or may have to account for their actions in the future. The complexity and heterogeneity of underlying data and related analytics provides a further layer of technical complexity to banking matters and so adds further opacity to understanding controls, behaviours and misdeeds. 

 Design decisions are embedded within technologies shaped by underlying analytics and further underpinned by data. Thus, changes to part of the systems may cause a cascading effect on the outcome. Data accuracy may also act to unduly influence outcomes. This underscores the need to understand big data analytics at the level of micro practice and from the bottom up. 


Information Control and Privacy


The collection and storage of Big Data, raises concerns over privacy. In some cases, the uses of Big Data can run afoul of existing privacy laws. In all cases, organizations risk backlash from customers and others who object to how their personal data is collected and used. This can present a challenge for organizations seeking to tap into Big Data’s extraordinary potential, especially in industries with rigorous privacy laws such as financial services and healthcare. Some wonder if these laws, which were not developed with Big Data in mind, sufficiently address both privacy concerns and the need to access large quantities of data to reach the full potential of the new technologies.

The challenges to privacy arise because technologies collect so much data and analyze them so efficiently that it is possible to learn far more than most people had predicted or can predict . These challenges are compounded by limitations on traditional technologies used to protect privacy. The degree of awareness and control can determine information privacy concerns; however, the degree may depend on personal privacy risk tolerance. In order to be perceived as being ethical, an organization must ensure that individuals are aware that their data is being collected, and they have control of how their data is used. As data privacy regulations impose increasing levels of administration and sanctions, we expect policy makers at the global level to be placed under increased pressure to mitigate regulatory conflicts and multijurisdictional tensions between data privacy and financial services’ regulations.

Technologies such as social media or cloud computing facilitate data sharing across borders, yet legislative frameworks are moving in the opposite direction towards greater controls designed to prevent movement of data under the banner of protecting privacy. This creates a tension which could be somewhat mediated through policy makers’ deeper understanding of data and analytics at a more micro level and thereby appreciate how technical architectures and analytics are entangled with laws and regulations. 

The imminent introduction of data protection laws will further require organizations to account for how they manage information, requiring much more responsibility from data controllers. Firms are likely to be required to understand the privacy impact of new projects and correspondingly assess and document perceived levels of intrusiveness. 


Implementing an Information Governance Strategy


The believability of analytical results when there is limited visibility into trustworthiness of the data sources is one of the foremost concern that an end user will have.  A common challenge associated with adoption of any new technology is walking the fine line between speculative application development, assessing pilot projects as successful, and transitioning those successful pilots into the mainstream. The enormous speeds and amount of data processed with Big Data technologies can cause the slightest discrepancy between expectation and performance to exacerbate quality issues. This may be further compounded by Metadata complications when conceiving of definitions for unstructured and semi-structured data.  

This necessitates the organizations to work towards developing an enterprise wide information governance strategy with related policies. The governance strategy shall encompass continued development & maturation of processes and tools for data quality assurance, data standardization, and data cleansing. The management of meta-data and its preservation, so that it can be evidenced to regulators and courts, should lso be considered when formulating strategies and tactics. The policies should be high-level enough to be relevant across the organization while allowing each function to interpret them according to their own circumstances. 

Outside of regulations expressly for Big Data, lifecycle management concerns for Big Data are fairly similar to those for conventional data. One of the biggest differences, of course, is in providing needed resources for data storage considering the rate at which the data grows. Different departments will have various lengths of time in which they will need access to data, which factors into how long data is kept. Lifecycle principles are inherently related to data quality issues as well, since such data is only truly accurate once it has been cleaned and tested for quality. As with conventional data, lifecycle management for Big Data is also industry specific and must adhere to external regulations as such.

Security issues must be part of an Information Governance strategy whichwill require current awareness of regulatory and legal data securityobligations so that a data security approach can be developed based on repeatable and defensible best practices. 

Sunday, January 3, 2016

Enterprise Architecture - Guiding Principles

Enterprise Architecture (EA) artifacts must be developed with a clear understanding of how the EA will be used and who will use it. The EA may be used as a tool for evaluating design alternatives and selecting optimal solutions, as a guide providng insights into how practices will be streamlined or improved through automation or as a plan for needed investments and an understanding of what costs savings will be achieved through consolidation. Throughout, the people involved in the development and maintenance of an EA Framework shall consistently follow certain guiding principles, so that the EA contributes to the vision and mission of the enterprise. That makes the guiding principles of most important and mostly the first step in developing EA.


Enterprise architecture principles serve as a Framework for decision making by providing guidance about the preferred outcomes of a decision in a given context. This acts as a mechanism for harmonizing decision making across organization functions & departments in addition to guiding the selection and evolution of information systems to be as consistent and cost effective as possible. Alignment with enterprise architecture principles should be a goal for any initiative and will result in fewer obstacles, surprises and course corrections later in the project.


The usefulness of principles is in their general orientation and perspective; they do not prescribe specific actions. A given principle applies in some contexts but not all contexts. Different principles may conflict with each other, such as the principle of accessibility and the principle of security. Therefore, applying principles in the development of EA requires deliberation and often tradeoffs. The selection of principles to apply to a given EA is based on a combination of the general environment of the enterprise and the specifics of the goals and purpose of the EA. The application of appropriate principles facilitates grounding, balance, and positioning of an EA. Deviating from the principles may result in unnecessary and avoidable long-term costs and risks.


Typically there will be a set of overarching general principles and specific principles with respect to Business Architecture, Application & Systems, Data & Information, Security, etc. The following are some of the generic guiding principles that could be applicable to all enterprises.


Maximize Value

Architectures are designed to provide long-term benefits to the enterprise. Decisions must balance multiple criteria based on business needs. Every strategic decision must be assessed from a cost, risk and benefit perspective. Maximizing the benefit to the enterprise requires that information system decisions adhere to enterprise-wide drivers and priorities. Achieving maximum enterprise-wide benefits will require changes in the way information systems are planned and managed. Technology alone will not bring about change. To maximize utility, some functions or departments may have to concede their preferences for the benefit of the entire enterprise.


Business Continuity

As system operations become more pervasive, the enterprise become more dependent on them. This calls for ensuring reliability and scalability to suit the current and perceived future use of such systems throughout their design and use. Business premises throughout the enterprise must be provided with the capability to continue their business functions regardless of external events. Hardware failure, natural disasters, and data corruption should not be allowed to disrupt or stop enterprise activities. The enterprise business functions must be capable of operating on alternative information delivery mechanisms. Applications and systems must be assessed for criticality and impact on the enterprise's mission in order to determine the level of continuity that is required as well as on the need for an appropriate recovery plan.


Applications & Systems Architecture

Applications and Systems should be scalable to support use by different size organizations and to handle decline or growth in business levels. While the unexpected surge or decline in the volumes are to be handled, support for horizontal scaling is also essential. Enterprise applications should be easy to support, maintain, and modify. Enterprise applications that are easy to support, maintain, and modify lower the cost of support, and improve the user experience. Applications and Systems shall have the following characteristics: Flexibility, Extensibility, Availability, Interoperability, Maintainability, Manageability and Scalability


Legal and Regulatory Compliance

Information system management processes must comply with all relevant contracts, laws, regulations and policies. Enterprise policy is to abide by laws, policies, and regulations. This will not preclude business process improvements that lead to changes in policies and regulations. The enterprise must be mindful to comply with laws, regulations, and external policies regarding the collection, retention, and management of data.Education and access to the rules. Efficiency, need, and common sense are not the only drivers. Changes in the law and changes in regulations may drive changes in our processes or applications. Staff need to be educated about the importance of regulatory compliance and their responsibility to maintain it. Where existing information systems are non-compliant they must be strategically brought into compliance.


Leverage investments

All systems shall leverage existing and planned components, enterprise software, management systems, infrastructure, and standards. It is impossible to accurately predict everything upfront. A try before you buy approach validates investment plans, designs and technologies. Prototypes enable users to provide early feedback about the design of the solution. If the enterprise capability is incomplete or deficient, efforts will be made to address the deficiency as against duplicating or investing further in building such new capabilities. This will allow us to achieve maximum utility from existing investments.


Risk Based Approach to Security

Following a risk-based approach provides the enterprise with an opportunity to: Identify threats to projects, initiatives, data and the ongoing operation of information systems; Effectively allocate and use resources to manage those risks; Avoid unwarranted speculation, misinterpretation and inappropriate use; and Improve stakeholder confidence and trust. Information systems, data and technologies must be protected from unauthorized access and manipulation. Enterprise information must be safe-guarded against inadvertent or unauthorized alteration, sabotage, disaster or disclosure. The cost and level of safeguards and security controls must be appropriate and proportional to the value of the information assets and the severity, probability and extent of harm


Continuous Improvement

The rate of change and improvement in the worldwide information technology market has led to extremely high expectations regarding quality, availability and accessibility. As a result, ICT must deliver projects and service-level agreements (SLAs) on progressively shorter deadlines and information systems with increasingly higher quality in an effective cost-control manner. This demand requires an operating model that continuously reviews and improves upon current practices and processes. Routine tasks that can be automated should be, but only where the benefit justifies the cost. The complexity of the process, the potential time savings and the potential for error reduction should be factored into the benefit. Processes and tasks must be analyzed and understood to determine the opportunity for improvement and automation. Service outages, errors and problems need to be analyzed to understand and improve upon deficiencies in existing processes and practises. Manual integration, where data is copied from one information system to another by hand, should give way to automated processes that are repeatable, timely and less prone to error.


Responsive Change Management

Changes to the enterprise information environment are implemented in a timely manner. If people are to be expected to work within the enterprise information environment, that information environment must be responsive to their needs. Processes may need to be developed to manage priorities and expectations. This principle will, at times conflict with other principles. When this occurs, the business need must be considered but initiatives must also be balanced with other enterprise architecture principles. Without this balanced perspective short-term considerations, supposedly convenient exceptions and inconsistencies, will rapidly undermine the management of information systems.


Technology Independence

Business architecture describes the business model independent of its supporting technology and provides the foundation for the analysis of opportunities for automation. Eliminate technology constraints when defining business architecture and ensure automated processes are described at the business process level for analysis and design. Enterprise functions and IT organizations must have a common vision of both a unit’s business functions and the role of technology in them. They have joint responsibility for defining the IT needs and ensuring that the solutions delivered by the development teams meet expectations and provide the projected benefits. Independence of applications from the supporting technology allows applications to be developed, upgraded and operated under the best cost-to-benefit ratio. Otherwise technology, which is subject to continual obsolescence and vendor dependence, becomes the driver rather than the user requirements themselves.


Data is a Shared Resource

Timely access to accurate data is essential to improving the quality and efficiency of enterprise decision making. It is less costly to maintain timely, accurate data and share it from a single application than it is to maintain duplicate data in multiple applications with multiple rules and disparate management practices. The speed of data collection, creation, transfer and assimilation is driven by the ability of the enterprise to efficiently share these islands of data across the organizations. A shared data environment will result in improved decision making and support activities as we will rely on fewer sources (ultimately one) of accurate and timely managed data. Data sharing will require a significant cultural change. This principle of data sharing will need to be balanced with the principle of data security. Under no circumstance will the data sharing principle cause confidential data to be compromised.

The above is not an exhaustive list. The set of principles actually depends on the enterprise's vision and mission and as the EA is aligned to such vision and mission, the principles should also be formulated with alignment in mind. While the above principles are generic and may be used by all enterprises, it is important to state the principle in a structured manner. The principle shall be supported with a rationale, so that the users can understand, why this principle exist and to what extent the same can be traded-off when a conflict arise.