Tech Bytes by Kannan Subbiah: Design

Showing posts with label Design. Show all posts

Friday, November 21, 2025

How Artificial Intelligence is Reshaping the Software Development Life Cycle (SDLC)

Artificial Intelligence (AI) is no longer a futuristic concept confined to research labs. It has reshaped numerous industries, with being one of its most profoundly affected domains. It’s a powerful, tangible force transforming every stage of the Software Development Life Cycle (SDLC). From initial planning to final maintenance, are automating tedious tasks, boosting code quality, and accelerating the pace of innovation, marking a fundamental shift from traditional, sequential processes to a more dynamic, intelligent ecosystem.

In the past, software engineering depended heavily on human expertise for tasks like gathering requirements, designing systems, coding, and performing functional tests. However, this landscape has changed dramatically as AI now automates many routine operations, improves analysis, boosts collaboration, and greatly increases productivity. With AI tools, workflows become faster and more efficient, giving engineers more time to concentrate on creative innovation and tackling complex challenges. As these models advance, they can better grasp context, learn from previous projects, and adapt to evolving needs.

AI is streamlining the software development lifecycle (SDLC), making it smarter and more efficient. This article explores how AI-driven platforms shape software development, highlighting challenges and strategic benefits for businesses using .

Impact Across the SDLC Phases

The Software Development Life Cycle (SDLC) has long been a structured framework guiding teams through planning, building, testing, and maintaining software. But with the rise of artificial intelligence—especially and —the SDLC is undergoing a profound transformation. Let’s explore how each phase of the SDLC is getting transformed into.

1. Project Planning:

AI streamlines project management by automating tasks, offering , and supporting . This shift allows project managers to focus on strategy, problem-solving, and leadership rather than administrative duties.

Automated Task Management: AI automates time-consuming, repetitive administrative tasks like scheduling meetings, assigning tasks, tracking progress, and generating status reports.
Predictive Analytics and Risk Management: By analyzing vast amounts of historical data and current trends, AI can predict potential issues like project delays, budget overruns, and resource shortages before they occur. This allows for proactive risk mitigation and contingency planning.
Optimized Resource Allocation: AI algorithms can analyze team members' skills, workloads, and availability to recommend the most efficient allocation of resources, ensuring that the right people are assigned to the right tasks at the right time.
Enhanced Decision-Making: AI provides project managers with real-time, data-driven insights by processing large datasets faster and more objectively than humans. It can also run "what-if" scenarios to simulate the impact of different decisions, helping managers choose the optimal course of action.
Improved Communication and Collaboration: AI tools can transcribe and summarize meeting notes, identify action items, and power chatbots that provide quick answers to common project queries, ensuring all team members are aligned and informed.
Cost Estimation and Control: AI helps in creating more accurate cost estimations and tracking spending patterns to flag potential overruns, contributing to better budget adherence.

2. Requirements Gathering

This phase traditionally relies on manual documentation and subjective interpretation. AI introduces data-driven clarity.

Requirements Gathering: AI can transcribe meetings, summarize discussions, and automatically format conversations into structured documents like user stories and acceptance criteria. It can also analyzes raw stakeholder input, market research, and other unstructured data to identify patterns and key requirements.
Automated Requirements Analysis: Artificial intelligence technologies are capable of evaluating requirements for clarity, completeness, consistency, and potential conflicts, while also identifying ambiguities or incomplete information. Advanced tools employing Natural Language Processing (NLP) systematically analyze user stories, technical specifications, and client feedback—including input from social media platforms—to detect ambiguities, inconsistencies, and conflicting requirements at an early stage. Additionally, AI systems can facilitate interactive dialogues to clarify uncertainties and reveal implicit business needs expressed by analysts.
Non-Functional Requirements: AI tools help identify non-functional needs such as regulatory and security compliance based on the project's scope, industry, and stakeholders. This streamlines the process and saves time.

3. Design and Architecture

AI streamlines software design by speeding up prototyping, automating routine tasks, optimizing with predictive analytics, and strengthening security. It generates design options, translates business goals into technical requirements, and uses fitness functions to keep code aligned with architecture. This allows architects to prioritize strategic innovation and boosts development quality and efficiency.

Optimal Architecture Suggestions: Generative AI agents can analyze project constraints and suggest optimal design patterns and architectural frameworks (like microservices vs. monolithic) based on industry best practices and past successful projects.
Automated UI/UX Prototyping: Generative AI can transform natural language prompts or even simple hand-drawn sketches into functional wireframes and high-fidelity mockups, significantly accelerating the design iteration process.
Automated governance and fitness functions: AI can generate code for fitness functions (which check if the implementation adheres to architectural rules) from a higher-level description, making it easier to manage architectural changes over time.
Guidance on design patterns: AI can analyze vast datasets of real-world projects to suggest proven and efficient design patterns for complex systems, including those specific to modern, dynamic architectures.
Focus on strategic innovation: By handling more of the routine and complex analysis, AI allows human architects to focus on aligning technology with long-term strategy and fostering innovation.

4. Development (Coding)

AI serves as an effective "pair programmer", automating repetitive tasks and improving code quality. This enables developers to concentrate on complex problem-solving and design, rather than being replaced.

Intelligent Code Generation: Tools like GitHub Copilot and Amazon CodeWhisperer use Large Language Models (LLMs) to provide real-time, context-aware code suggestions, complete lines, or generate entire functions based on a simple comment or prompt, dramatically reducing boilerplate code.
AI-Powered Code Review: Machine learning models are trained on vast codebases to automatically scan and flag potential bugs, security vulnerabilities (like SQL injection or XSS), and code style violations, ensuring consistent quality and security before the code is even merged.
Documentation and Code Explanation: Using Natural Language Processing (NLP), AI can generate documentation and comments from source code, ensuring that projects remain well-documented with minimal manual effort.
Learning and Upskilling: AI serves as an interactive learning aid and tutor for developers, helping them quickly grasp new programming languages or frameworks by explaining concepts and providing context-aware guidance.

AI is shifting developers’ roles from manual coding to strategic "." Critical thinking, business insight, and remain vital. AI can manage routine tasks, but human validation is necessary for security, quality, and goal alignment. Developers skilled in AI tools will be highly sought after.

5. Testing and Quality Assurance (QA)

AI streamlines software testing and quality assurance by automating tasks, predicting defects, and increasing accuracy. AI tools analyze data, create test cases, and perform validations, resulting in better software and user experiences.

Automated Test Case Generation: AI can analyze requirements and code logic to automatically generate comprehensive unit, integration, and user acceptance test cases and scripts, covering a wider range of scenarios, including complex edge cases often missed by humans.
Predictive Bug Detection: AI-powered analysis of code changes, historical defects, and application behavior can predict which parts of the code are most likely to fail, allowing QA teams to prioritize testing efforts where they matter most.
Self-Healing Tests: Advanced tools can automatically update test scripts to adapt to UI changes, drastically reducing the maintenance overhead for automated testing.
Smarter visual validation: AI-powered tools can perform visual checks that go beyond simple pixel-perfect comparisons, identifying meaningful UI changes that impact user experience.
Predictive analysis: AI uses historical data to predict areas with higher risk of defects, helping to prioritize testing efforts more efficiently.
Enhanced performance testing: AI can simulate real user behavior and stress-test software under high traffic loads to identify performance bottlenecks before they affect users.
Continuous testing: AI integrates with CI/CD pipelines to provide continuous, automated testing throughout the development lifecycle, enabling faster and more frequent releases without sacrificing quality.
Data-driven insights: By analyzing vast datasets from past tests, AI provides valuable, data-driven insights that lead to better decision-making and improved software quality assurance processes.

6. Deployment

Artificial intelligence is integral to modern software deployment, streamlining task automation, enhancing continuous integration and delivery (CI/CD) pipelines, and strengthening system reliability with advanced monitoring capabilities. AI-driven solutions automate processes such as testing and deployment, analyze performance metrics to anticipate and address potential issues, and detect security vulnerabilities to safeguard applications. By transitioning deployment practices from reactive to proactive, AI supports greater efficiency, stability, and security throughout the software lifecycle.

Intelligent CI/CD: AI can analyze deployment metrics to recommend the safest deployment windows, predict potential integration issues, and even automate rollbacks upon detecting critical failures, ensuring a more reliable Continuous Integration/Continuous Deployment pipeline.
Automated testing and code review: AI automates code quality checks, identifies vulnerabilities, and uses intelligent test automation to prioritize tests and reduce execution time.
Streamlined processes: By automating routine tasks and using data to optimize workflows, AI helps streamline the entire delivery pipeline, reducing deployment times and improving efficiency.

7. Operations & Maintenance

AI streamlines software operations by predicting failures, automating coding and testing, and optimizing resources to boost performance and cut costs.

Real-Time Monitoring and Observability: AI-driven tools continuously monitor application performance metrics, system logs, and user behavior to detect anomalies and predict potential performance bottlenecks or system failures before they impact users.
Automated Documentation: AI can analyze code and system changes to automatically generate and update technical documentation, ensuring that documentation remains accurate and up-to-date with the latest software version.
Root Cause Analysis: AI tools can sift through massive amounts of logs, metrics, and traces to find relevant information, eliminating the need for manual, repetitive searches. AI algorithms identify subtle and complex patterns across large datasets that humans would miss, linking seemingly unrelated events to a specific failure. By automating the initial analysis and suggesting remediation steps, AI significantly reduces the time-to-resolution for critical bugs.

The Future: AI as a Team Amplifier, Not a Replacement

The integration of artificial intelligence into the software development life cycle (SDLC) does not signal the obsolescence of ; rather, it redefines their roles. AI facilitates automation of repetitive and low-value activities—such as generating boilerplate code, creating test cases, and performing basic debugging—while simultaneously enhancing human capabilities.

This evolution enables developers and engineers to allocate their expertise toward higher-level, strategic concerns that necessitate creativity, critical thinking, sophisticated architectural design, and a thorough understanding of business objectives and user requirements. The AI-supported SDLC promotes the development of superior software solutions with increased efficiency and security, fostering an intelligent, adaptive, and automated environment.

AI serves to augment, not replace, the contributions of by managing extensive data processing and pattern recognition tasks. The synergy between AI's computational proficiency and human analytical judgment results in outcomes that are both more precise and actionable. Engineers are thus empowered to concentrate on interpreting AI-generated insights and implementing informed decisions, as opposed to conducting manual data analysis.

Sunday, November 13, 2016

A Software Product Vs Project

In short, a software Project is all about to execute a Statement of Work of an internal or external customer, where what customer required is right irrespective of what is ideal or what the end user would expect. Though some projects are scoped in such a way that certain aspects of non-functional requirements are left to the choice of the project teams.

Product development isn’t about implementing what the customer wanted to. In product development, the product manager owns and comes up with the product requirements. A large product or product suite, typically comprise of many projects and will evolve over time.

Unlike a project the product will be improved continuously without an end date based on feedback from end users and the product team prioritizes what needs to be built next based on its perceived value for its target users or customers.

A project on the other hand is funded with specific goals, a business case in mind and with finite expected value and cost.

Here is an attempt to bring out the differences between a software project and product and such differences are categorised as below:

The Mindset:

Projects are many a times started off with main focus on to deliver on time, under budget, within scope and with a temporary team. All these constraints are set in stone and any deviation is viewed seriously, which may impact the course of the project depending on the methodology adopted. So, the mindset of the project team will be with primary focus on the project parameters that determine the success of delivery and may not be the success of the product that the project may form part of. This is more so as the resources keep changing and the resources with no or little knowledge on the business domain may still deliver the project, but the product may be crappy.

Products tend to have a longer lifetime than projects and mostly built with more focus on the outcome instead of the output. Product teams are given the freedom and responsibility to think of a strategy they believe will result in the best product within a boundary of product framework. This leads to less waste and more creativity being introduced into the product development process, allowing room for embracing changes continously.

Management:

The product roadmap is key for the success of the prodct and as such, the product manager shall align the product vision and strategy with that of the business. A Project Manager, on the other hand, is responsible for executing on a predefined objective.

A Project Managers function is to create a plan, that the project will follow, and then to drive the people involved in the project to follow that plan with as little change as possible. If deviations from the planned execution are beyond an accepted threshold, the Project Manager must escalate and explain the situation to the stakeholders, who in turn will either accept the deviation or may choose to fail the project.

A product manager with the focus on constantly evaluating the viability of the product, will typically follow an agile approach with shorter sprints of developments, so the product evolves incrementally, delivering values at every stage.

Motivation:

With the primary focus of the project team being on delivering on time and within budget, the team does not have enough room to be creative enough. This brings down the motivation because the teams lose a sense of purpose and the autonomy in how to operate.

On the other hand, as typically, the resources stay longer with the product teams, they get aligned to the product strategy and the vision and thus they are given the freedom to bring in their thinking and creativity into the product, process and methodology. The feedback and collaboration with stakeholders enables the right environment, where the resources reach a higher potential and operate autonomously, resulting in better problem solving, higher ownership of outcomes, and faster time to market.

Tools:

Product management software and project management software are entirely different tools — each designed for a different type of role, to help address different business needs. Product management software helps product managers organize, develop, and communicate the product strategy, while project management software helps project managers in track the execution and incidentally manage the resource allocation, risk and issue management.

Scope:

Product scope is defined as "The features and functions that characterize a product, service, or result". Whereas the project scope is defined as "The work performed to deliver a product, service, or result with the specified features and functions".

The Product Scope defines all the capabilities of a product from the User point of view. The Product is the end result of your project and characterizes by the Product Scope. Thus, the Product Scope description includes features of a product, how the product will look like using these features, and how will it work. Product Scope also describe the ways of measuring the product performance.

The Project Scope on the other hand is an agreement of the work which is needed to deliver the product, service, or result. To develop a product features, you establish a project which has a schedule, budget, and resource allocation. In other words, the work you do to construct your product is the Project Scope.

Design & Architecture:

The product owner or manger is responsible for defining the architecture and design of the product, which should take the following into consideration:

Business Idea & Strategy
Identifying and Creating a product feature
Aligning with Market Trends
Define Product Performance Indicators
Prioritize the implementation of features and bugs

Though a project may include the product architecture and design as part of the scope, the focus of the project team will be more on the following:

Defining the project scheduling, taking into account the deliverables at various milestones.
Monitoring the budget
Planning and managing resources
Problem and issue management
Risk management
Managing the scope creep.

Saturday, May 23, 2015

Factors Affecting Software Resiliency

The digital transformation is happening everywhere right from small private firms to government organizations. On the personal front, connected things is coming on, where by every thing that we have or use will be smart enough to connect and communicate with other things(systems). This in effect means there will be an increased reliance on IT systems to accomplish various tasks. This will call for high order of resilience on the part of such systems and the absence of which may lead to disasterous situation.

As we all know, the word resiliency means 'the ability to bounce-back after some events'. In otherwords, it is a capability of withstanding any shock or impact without any major deformation or rupture. In software terms, resilience is the persistence of the avoidance of failures when facing a change or in a deviated circumstance.

To design a resilient system, one should first understand the various factors that work against the resiliency. Here are some such factors:

Design Flaws

Design and Architecture of the systems is a major factor that works in favor or against the resiliency requirement. The architects shall while designing the system or solution should have a good understanding of what could go wrong and provide for an exception handling ability, so that all exceptions are appropriately handled, making the system not to go down and instead recover from such exception and continue to operate. The architects have many options today in terms of tools, technologies, standards, methodologies and frameworks that help buidling resiliency within. It is the ability of choosing the right combination of tools, technologies, etc for the specific systems that will decide on the resilience capability of the system.

Software Complexity

The size and complexity of software systems is increasing, thus the ways in which a system can fail also increases. It is fair to assume that the increase in failure possibilities does not bear a linear or additive relationship to system complexity. Typically, the complexity of the software systems increases as it evolves by responding to the changing business needs. This is more so as the tools and technologies used to design and build the software are becoming outdated, making it difficult in maintaining the systems.

This complexity attribute makes it increasingly difficult to incorporate resiliency routines that will respond effectively to failures in the individual systems and in their complex system. The cost of achieving an equivalent level of resiliency due to the complexity factor should be added to that of the individual systems

Interdependency and Interconnectivity

We are living in a connected world and systems of many of today's businesses depend on connectivity with their partner entities to do their business. This adds multiple points of failures over and above the network connectivity. The system resiliency is increasingly dependent on the resiliency of systems different other organizations over which the entity has no control. This means that a failure or outage of a business partner's system can have a ripple effect. This situation requires the systems need to be aware and capable of such failure or outage with other connected systems and the ability to recover from such events should be designed within.

Rapid Changes

Thanks to the evolving digital economy, the business needs are changing too frequently and thus needing system changes. Every change in an existing system, for sure will add a bit of complexity, as the architecture on which the system originally designed wouldn't have considered the changes that are coming through. Many a times, considering the time to market, such changes need to be implemented quicker than expected, leaving the software designers to adopt a quick and dirty approach to deliver the change, leaving a permanent solution for a later time period. The irony is that there will never be a time when the 'permanent solution' is implemented.

Change is one of the key source of adding complexity to the Software systems. However, the evolving tools, technologies and methodologies come to the rescue, so that the Architects design systems and solutions in such a way to pave way for embracing such changes and to embed the resiliency factors in the design.

A frequently held criticism of Common Criteria testing is that, by the time the results are available, there is a good chance that the tested software has already been replaced. The danger here is that the new software may contain new vulnerabilities that may not have existed in prior versions. Thus, determining that an obsolete piece of software is sufficiently resilient is not particularly indicative of the state of the newest version and, therefore, is not very useful

Conclusion

Higher levels of resilience can be achieved by leveraging Machine Learning and Big Data tools and techniques. As the world is moving towards more and more connected things, high order of resilience is critical. With Machine Learning capability, the systems and devices can be embedded with algorithms that make them learn from past events and the data collected from various other connected networks and systems in addition to the ambient data. The systems can be designed to predict the health of various underlying components and thus its own health as well. Based on such prediction, the components may choose to use alternate approaches, like using alternate network protocols like Wireless, Bluetooth, etc, or choose to connect to a different component or system altogether.

Sunday, July 20, 2014

A Checklist for Architecture & Design Review

Mostly the security requirements remain undocumented and is left to the choice or experience of the architects and developers thus leaving vulnerabilities in the application, which hackers exploit to launch an attack on the enterprise's digital assets. Security threats are on the rise and is now being considered as a Board Item as the impact of security breach is very high and could cause monetary and non monetary losses.

One of the key aspects of the IT Governance is to ensure that the investments made in software assets are optimal and there is a quantifiable return on such investments. This also means that such investment does not lead to risks that could lead to damages. Most of us are well aware that reviews play a key role in ensuring the quality of the software assets. As such, in this blog post, I have tried to come up with a checklist for reviewing the architecture and design of a software application.

While the choice of specific design best practice is interdependent on another, a careful tradeoff is necessary. For a detailed discussion on Trade off Analysis of Software Quality Attributes. Each of the checklist item listed here needs further elaboration and identification of specific practices, which will depend on the enterprise architecture and design principles of the organization.

Deployment Considerations

The design references the security policy of the organization and is in compliance of the same.
The application components are designed to comply with the various networking and other infrastructure related security restrictions like firewall rules, using appropriate secure protocols, etc.
The trust level with which the application accesses various resources are known and are in line with the acceptable practices.
The design supports the scalability requirements such as clustering, web farms, shared session management.
The design identifies the configuration / maintenance points, and the access to the same is manageable.
Communication with various local or remote components of the application is using secure protocols.
The design addresses performance requirements by adhering to relevant design best practices.

Application Architecture Considerations

Input Validation

Whether the design identifies all entry points and trust boundaries of the application.
Appropriate validations are in place for all inputs that comes from ourside the trust boundary.
The input validation strategy that the application adopted is modular and consistent.
The validation approach is to constrain, reject, and then sanitize input.
The design addresses potential canonicalization issues.
The design addresses SQL Injection, Cross Site Scripting and other vunerabilities
The design applies defense in depth to the input validation strategy by providing input validation across tiers.

Authentication

The design identifies the identities or roles that are used to access resources across the trust boundaries.
Service account or such other predefined identity requirements to, if so needed to access variuos system resources are identified and documented.
User credentials or authentication tokens are stored in secure manner and access to the same is appropriately controlled and managed.
Where the credentials are shared over the network, appropriate security protocol and encryption techniques are used.
Appropriate account management policies are considered.
In case of authentication failures, the error information displayed is minimal so that it does not reveal any clues that could make the credential guessing easier.
The design adopts a policy of using least-privileged accounts.
Password digests with salt are stored in the user store for verification.
Password rules are defined so that the stronger passwords are enforced.

Authorization

The user role design offers sufficient separation of privileges and considers authorization
granularity.
Multiple gatekeepers are envisaged for defense in depth.
The application’s identity is restricted in the database to access-specific stored procedures and does not have permissions to access tables directly.
Access to system level resources are restricted unless there is an absolute necessity.
Code Access Security requirements are established and considered.

Configuration Management

Stronger authentication and authorization is considered for access to administrration modules.
Secure protocols are used for remote administration of the application.
Configuration data is stored in a secured store and access to the same is appropriately controlled and managed
Least-privileged process accounts and service accounts are used.

Sensitive Data

Design recognizes sensitive data and considers appropriate checks and controls on the same.
Database connections, passwords, keys, or other secrets are not stored in plain text.
The design identifies the methodology to store sensitive data securely. Appropriate algorithms and
key sizes are used for encryption.
Error logs, audit logs or such other application logs does not store sensitive data in plain text.
The design identifies protection mechanisms for sensitive data that is sent over the network.

Session Management

The contents of authentication cookies are encrypted.
Session lifetime is limited and times out upon expiration.
Session state is protected from unauthorized access.
Session identifiers are not passed in query strings.

Cryptography

Platform-level cryptography is used and it has no custom implementations.
The design identifies the correct cryptographic algorithm and key size for the application’s data encryption requirements.
The methodology to secure the encryption keys is identified and the same is in line with the acceptable best practices.
The design identifies and establishes the key recycle policy for the application.

Parameter Manipulation

All input parameters are validated including form fields, query strings, cookies, and HTTP headers.
Sensitive data is not passed in query strings or form fields.
HTTP header information is not relied on to make security decisions.
View state is protected using MACs.

Exception Management

The design outlines a standardized approach to structured exception handling across the application.
Application exception handling minimizes the information disclosure in case of an exception.
Application errors are logged to the error log, and the design provides for periodic review of such logs.
Sensitive data is not logged as part of the error logs, but where necessary, the same is logged with appropriate de-identification technique

Auditing and Logging

The design identifies the level of auditing and logging necessary for the application and identifies the key parameters to be logged and audited.
The design considers how to flow caller identity across multiple tiers at the operating system or application level for auditing.
The design identifies the storage, security, and analysis of the application log files

Saturday, November 9, 2013

Webservice Security Standards

SOA adoption is on the rise and Webservices is predominantly used for its implementation. Webservice messages are sent across the network in an XML format defined by the W3C SOAP specification. Webservices have come a long way and has sufficiently matured to offer the required tenets especially on the security domain. In this blog let us have a quick look at the available standards with respect to the security dimensions and look at how the related security requirements are addressed.

Secure Messaging

WS-Security - This specification was originally developed by IBM, Microsoft and Verisgn and OASIS (Organization for the Advancement of Structured Information Standards) continued the work on this standard. This standard addresses the Integrity and Confidentiality requirements of the webservice messages. The specification describes the signing, encrypting of the SOAP messages and also about attaching security tokens. Various signature formats and encryption algorithms are supported. The security tokens supported include: X.509 Certificates, Kerberos tickets, User ID/Password credentials, SAML assertions and custom tokens. Due to the increased size of the SOAP messages and the cryptographic requirements, this standard requires significantly higher compute resources and network bandwidth.
SSL/TLS - SSL was developed by Netscape Communications Corporation in 1994 to secure transactions over the World Wide Web. Soon after, the Internet Engineering Task Force (IETF) began work to develop a standard protocol that provided the same functionality. They used SSL 3.0 as the basis for that work, which became the TLS protocol. In applications design, TLS is usually implemented on top of any of the Transport Layer protocols, encapsulating the application-specific protocols such as HTTP, FTP, SMTP, NNTP and XMPP. Historically it has been used primarily with reliable transport protocols such as the Transmission Control Protocol (TCP). This standard helps address the Strong authentication, message privacy and integrity requirements.

Resource Protection

XACML - eXtensible Access Control Markup Language defines a declarative access control policy language implemented in XML and a processing model describing how to evaluate access requests. Version 3.0 of this standard has been published by OASIS in January 2013. The new features of the latest version of this standard include: Multiple Decision Profile, Delegation, Obligation Expressions, Advice Expressions and Policy Combination Algorithms.While there are many ways the base language can be extended, many environments will not need to do so. The standard language already supports a wide variety of data types, functions, and rules about combining the results of different policies. In addition to this, there are already standards groups working on extensions and profiles that will hook XACML into other standards like SAML and LDAP, which will increase the number of ways that XACML can be used.
XrML - Developed by Content Guard, a subsidiary of Xerox, and supported by Microsoft, eXtensible Rights Markup Language would provide a universal method for specifying rights and issuing conditions associated with the use and protection of content in a digital rights management system. XrML licenses can be attached to WS-Security in the form of tokens. XACML and XrML both deal with authorization. They share requirements from many of the same application domains. Both share the same concepts but use different terms. Both are based on XML Schema. Microsoft's Active Directory Rights Management Services (AD RMS) uses the eXtensible rights Markup Language (XrML) in licenses, certificates, and templates to identify digital content and the rights and conditions that govern use of that content.
RBAC, ABAC - Similar to XrML, RBAC and ABAC are established approaches to define and implement Role Based Access Control and Attribute Based Access Controls and can be attached to WS-Security as tokens. The use of RBAC or ABAC to manage user privileges (computer permissions) within a single system or application is widely accepted as a best practice.
EPAL - The Enterprise Privacy Authorization Language (EPAL) is an interoperability language for exchanging privacy policy in a structured format between applications and can be leveraged for addressing the privacy concerns with the SOAP messages. An EPAL policy categorizes the data an enterprise holds and the rules which govern the usage of data of each category. Since EPAL is designed to capture privacy policies in many areas of responsibility, the language cannot predefine the elements of a privacy policy. Therefore, EPAL provides a mechanism for defining the elements which are used to build the policy.

Negotiation of Contracts

ebXML - e-business XML is a modular suite of standards advanced by OASIS and UNCEFACT and approved as ISO 15000. While the ebXML standards seek to provide formal XML-enabled mechanisms that can be implemented directly, the ebXML architecture is focused on concepts and methodologies that can be more broadly applied to allow practitioners to better implement e-business solutions. ebXML provides companies with a standard method to exchange business messages, conduct trading relationships, communicate data in common terms and define and register business processes. A CPA (Collaboration Protocol Agreement) document is the intersection of two CPP documents, and describes the formal relationship between two parties.
SWSA - The SWSA(Semantic Web Services Architecture) interoperability architecture covers the support functions to be accomplished by Semantic Web agents (service providers, requestors, and middle agents). While not all operational environments will find it necessary to support all functions to the same degree, the distributed functions to be addressed by this architecture to include: Dynamic Service Discovery, Service Engagement (Negotiating & Contracting), Service Process Enactment & Management, Semantic Web Community Support Services, Semantic Web Service Lifecycle & Resource Management Services and Cross Cutting Issues.

Trust Management

WS-Trust - The goal of WS-Trust is to enable applications to construct trusted SOAP message exchanges. This trust is represented through the exchange and brokering of security tokens. This specification provides a protocol agnostic way to issue, renew, and validate these security tokens. The Web service security model defined in WS-Trust is based on a process in which a Web service can require that an incoming message prove a set of claims (e.g., name, key, permission, capability, etc.). If a message arrives without having the required proof of claims, the service SHOULD ignore or reject the message. A service can indicate its required claims and related information in its policy as described by WS-Policy and WS-PolicyAttachment specifications.
XKMS - XML Key Management Specification is a protocol developed by W3C which describes the distribution and registration of public keys. Services can access an XKMS compliant server in order to receive updated key information for encryption and authentication. The XML Key Management Specification (XKMS) allows for easy management of the security infrastructure, while the Security Assertion Markup Language (SAML) makes trust portable. SAML provides a mechanism for transferring assertions about authentication of entities between various cooperating entities without forcing them to lose ownership of the information.
SAML - Security Assertion Markup Language is a product of the OASIS Security Services Technical Committee intended for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider. SAML allows business entities to make assertions regarding the identity, attributes, and entitlements of a subject (an entity that is often a human user) to other entities, such as a partner company or another enterprise application. SAML specifies three components: assertions, protocol, and binding. There are three assertions: authentication, attribute, and authorization. Authentication assertion validates the user's identity. Attribute assertion contains specific information about the user. And authorization assertion identifies what the user is authorized to do. Protocol defines how SAML asks for and receives assertions. Binding defines how SAML message exchanges are mapped to Simple Object Access Protocol (SOAP) exchanges.
WS-Federation - WS-Federation extends the WS-Security, WS-Trust and WS-SecurityPolicy by describing how the claim transformation model inherent in security token exchanges can enable richer trust relationships and advanced federation of services. A fundamental goal of WS-Federation is to simplify the development of federated services through cross-realm communication and management of Federation Services by re-using the WS-Trust Security Token Service model and protocol. A variety of Federation Services (e.g. Authentication, Authorization, Attribute and Pseudonym Services) can be developed as variations of the base Security Token Service.

Security properties

WS-Policy, WS-SecurityPolicy - WS-Policy represents a set of specifications that describe the capabilities and constraints of the security policies on intermediaries and end points and how to associate policies with services and end points. Web Services Policy is a machine-readable language for representing these Web service capabilities and requirements as policies. Policy makes it possible for providers to represent such capabilities and requirements in a machine-readable form. A policy-aware client uses a policy to determine whether one of these policy alternatives (i.e. the conditions for an interaction) can be met in order to interact with the associated Web Service. Such clients may choose any of these policy alternatives and must choose exactly one of them for a successful Web service interaction. Clients may choose a different policy alternative for a subsequent interaction.
WS-ReliableMessaging, WS-Reliability - WS-ReliableMessaging, was originally written by BEA Systems, Microsoft, IBM, and Tibco and later submitted to the OASIS Web Services Reliable Exchange (WS-RX) Technical Committee for adoption and approval.Prior to WS-ReliableMessaging, OASIS produced a competing standard WS-Reliability that was supported by a coalition of vendors. The protocol allows endpoints to meet the guarantee for the delivery assurances namely, Atmost Once, Atleast Once, Exactly Once and In Order. Persistence considerations related to an endpoint's ability to satisfy the delivery assurances are the responsibility of the implementation and do not affect the wire protocol.

Sunday, April 7, 2013

NGINX - A High Performance HTTP Server

What you will find about NGINX from its website is that it is a free, open source light weight HTTP web server and a reverse proxy. Nginx doesn't rely on threads to handle requests. Instead it uses a much more scalable event-driven (asynchronous) architecture. This architecture uses small, but more importantly, predictable amounts of memory under load. We have found that it is holding up to what it says. Here is how we ended up using NGINX for one of our client and have found it meeting up to our expectation.

We were on a performance testing assignment of a game application, where much of the game contents were static flash files (of course they are dynamic as they had embedded action scripts for execution in the front end) and image files of which quite many are of sizes above 500KB. Need not to mention that this is the compressed size and as such enabling compression did not offered any performance gains. These files were served by Apache, which also serves dynamic php requests.

Our initial performance tests did indicated that for about 1000 concurrent hits, the Apache server memory consumption shot up considerably beyond 2 GB and also quite many requests were timing out. Examination of the server configuration did reveal that the server is configured to serve 256 child processes and also that it was configured to use the ‘prefork’ mpm module, which limits each child process to use only one thread. This technically limits the number of concurrent requests the Apache Server can serve to 256.

We also figured out that Apache needs to use ‘prefork’ module in order to safely serve PHP requests. Though later versions of PHP are found to be working with the other mpm module (‘worker’, which can use multiple threads per child process) many still have concerns around thread safety. The 256 client process limit is also a hard limit and if that needs to be increased, the Apache server needs to be rebuilt.

With this diagnosis, the choices we had with us was to have the static content served out of a server other than Apache and just leave Apache to serve the PHP requests. We just then thought of trying out NGINX and without wasting much time, we went ahead and implemented it. NGINX was then configured to listen on port 80 and has been configured to serve the contents from the root folder of Apache. Apache server was configured to listen at a non standard port and NGINX has also been configured to act as a reverse proxy for all php requests by getting them served processed by Apache.

We performed the tests again and have found tremendous improvement. The average latency at 1000 concurrent hits stress level has come down to under 5 seconds from about 80 seconds. We could also observe that NGINX was consuming just around 10MB. NGINX, by itself does not has any limitation on the number concurrent threads and it is constrained by the limits of the server itself and other daemons running on it.

Saturday, December 29, 2012

Resilient Systems - Survivability of Software Systems

Resilience as we all know is an ability to withstand through tough times.There is also another term quite interchangeably used, which is Reliability. But Reliability and Resilience are different. Reliability is about a system or a process that has zero tolerance to failure or the one that should not fail. In other words, when we talk about reliable systems, the context is that failure is not expected or rather acceptable. Whereas Resilience is about the ability to recover from failures. What is important to understand about resilience is that failure is expected and is inherent in any systems or processes, which might be triggered due to changes to the platform, environment and data. While Reliability is about the system’s robustness of not failing, Resilience is its ability to sense or detect failures ahead and then prevent it from encountering such events that lead to failure and when it cannot be avoided, allow it to happen and then recover from the failure sooner.

A working definition for resilience (of a system) developed by the Resilient Systems Working Group (RSWG) is as follows:

“Resilience is the capability of a system with specific characteristics before, during and after a disruption to absorb the disruption, recover to an acceptable level of performance, and sustain that level for an acceptable period of time.“ The following words were clarified:

The term capability is preferred over capacity since capacity has a specific meaning in the design principles.

The term system is limited to human-made systems containing software, hardware, humans, concepts, and processes. Infrastructures are also systems.

The term sustain allows determination of long-term performance to be stated.

Characteristics can be static features, such as redundancy, or dynamic features, such as corrective action to be specified.

Before, during and after – Allows the three phases of disruption to be considered.

Before – Allows anticipation and corrective action to be considered

During – How the system survives the impact of the disruption

After – How the system recovers from the disruption

Disruption is the initiating event of a reduction is performance. A disruption may be either a sudden or sustained event. A disruption may either be internal (human or software error) or external (earthquake, tsunami, hurricane, or terrorist attack).

Evan Marcus, and Hal Stern in their book Blueprints for High Availability, define a resilient system as one that can take a hit to a critical component and recover and come back for more in a known, bounded, and generally acceptable period of time.

In general Resilience is a term of concern for Information Security professionals as the final impact of disruption (from which a system needs to recover), could mostly be on Availability which is one of the three tenets of Information Security (CIA - Confidentiality, Integrity and Availability). But there is a lot for System designers and developers, especially those tasked to build mission critical systems to be concerned about Resilience and architect the systems to build in a required level of Resilience characteristics. For a system to be resilient, it should draw necessary support from dependent software and hardware components, systems and the platform. For instance a disruption for a web application can even be from network outage, security attacks at the network layer, which the software has no control over. But it is important to consider software resiliency in relation to the resiliency of the entire system, including the human and operational components.

The PDR (Protect - Detect - React) strategy is no longer as effective as it used to be due to various factors. It is time that predictive analytics and some of the disruptive technologies like big data and machine learning need a consideration in enhancing the system resiliency. Based on the logs of various inter-connected applications or components and other traffic data on the network, intelligence need to be built into the system to a combination of number of possible actions. For instance, if there is a reason to suspect an attacker attempting to gain access to the systems, a possible action could be to operate the system at a reduced access mode, i.e. parts of the systems may be shut down or parts of the networks to which the system is exposed could be blocked, etc.

OWASP’s AppSensor project is worth checking by the architects and developers. The AppSensor project defines a conceptual framework and methodology that offers prescriptive guidance to implement intrusion detection and automated response into an existing application. AppSensor defines over 50 different detection points which can be used to identify a malicious attacker.Appsensor provides guidance in the form of possible responses for each such identified malicious atacer.

The following are some of the factors that need to be appropriately addressed to enhance the resilience of the systems:

Complexity - With systems continuously evolving and many discrete systems and components increasingly integrated into today’s IT eco-system, the complexity is on the rise. This makes the resiliency routines as built in to individual systems needing a constant review and revision.

Cloud - Cloud computing is gaining higher acceptance and as organizations embrace cloud for its IT needs, the location of data, systems and components are widespread across the globe and that brings in challenge for those involved in building resilience.

Agility - To stay on top of the competition, organizations need agility in their business processes, which means rapid changes in the underlying systems and this could be a challenge as this will call for a constant check to ensure that the changes being introduced does not downgrade or compromise the resiliency level of the systems.

While there are techniques and guiding principles which when followed and applied, the resilience of the systems can be greatly improved, such design or implementation comes with a price and that is where the economics of Resiliency needs to be considered. For instance, mission critical software systems like the ones used in medical devices, need to have a high resilience characteristic, but quite many of the business systems can have a higher tolerance level and thereby being less resilient. However, it is good to document the expected resilience level at the initial stage and work on it in the early life cycle of the system development. thinking about resilience later in the life cycle may not be any good as implementation will call for higher investment.

References:

Crosstalk - The journal of Defense Software Engineering Vol 22 No:6

Resilient Systems Working Group

OWASP - AppSensor Project

Sunday, October 14, 2012

Application Architecture Review - Security

In continuation of the Architecture review series, let us focus on security review in this blog. With information security breaches hitting the news headlines quite frequently, many enterprises are realizing the real need to manage this security risk and be resilient. As such, it is possible that as an Architect, you might have been called for to perform a security review of the existing applications. I have tried to put together the following areas of concern, which need a closer look to form an opinion whether the application architecture is secure enough.

In general the broad areas of concern for the security architects should be the following:

Authentication – Review the tools, technology and the approach used by the application to establish the identity of the application users for possible deficiencies. In this connection the following specific areas need attention.

Look for identification of the legitimate human and system users of the application in the requirements document which in turn are validated with appropriate business scenarios.
If the application exposes interface to external systems, understand how access by such systems are identified and authenticated. Also understand how secure such other external systems are and if possible ask for a security assessment of such other systems.
Identify how users are authenticated, whether two factor or three factor authentication.
Check if Single Sign On is implemented and in such case, understand how it is implemented, what tools and technology are used. If the Identity provider is external to the system boundary, then also check how the information in transit between the identity provider and the application is secured.
In case of external identity providers, it would also be worth checking the security practices followed by the Service Provider and whether they are being subject to regular external independent security assessment.
If the application maintains the user information locally and authenticates against it, ensure whether identity related data is secured appropriately from unauthorized access.
It would also be worth understanding how the database servers authenticate the application or the application user. If the application users happen to be the users of the database as well, then the mechanisms implemented to prevent such users directly accessing the database needs to be scrutinized.

Authorization - Each of the identified human or system users would be operating on the application by assuming defined roles and the authorization to access various components or information should be dependent on such roles. Get a clear view of how the roles and authorization are implemented in the application. The following specific areas are worth the attention in this regard.

Check if there exists an information sensitivity policy or information privacy policy as relevant to the data or information being accessed or managed by the application.
Understand how the defined roles are mapped to the various datasets in terms of the permission to the Create, Read, Update and Delete. It would also be good to examine the various roles defined by the organization whether they are in line with that of the principles of segregation of duties and look for how the users with multiple or overlapping roles are handled by the system.
With a view to improve application performance, developers tend to create interfaces (both visual and non-visual) in such a way they are chunky as against being chatty. While this holds good in terms of application performance, the datasets being served need to be reviewed with respect to the information sensitivity and the role based permission restrictions should be applied to all internal and external APIs and interfaces.

Availability / Scalabiltiy – Systems are designed to process data in the expected and timely manner so that the information users make the most of it, and perform the business operations efficiently and effectively. The general experience is that the systems perform very well in the initial testing phase and when it is deployed in production its behaviour could be different and might slow down considerably due to various environment and load related issues. As an architect it is essential that the proper estimation is done for expected user and data growth and the application is designed to meet such needs. Examination of the following areas might reveal how the application meets this concern.

Check out my own blog on Architecture Review – Scalability to know more on this specific area.

Auditability – The systems should be designed to log certain events, which could be potentially lead to security breach. These logs should be readable when needed by users with appropriate roles and should be monitored periodically. Event alerts also help notifying the administrators on the occurrence of certain type of events, which may require immediate attention. Examine the following areas of the application design to form an opinion on this concern.

Review the Application architecture to understand how the event alert and logging mechanism is implemented.
Review for completeness of various events that are being handled and the relevant data is being logged. Examine if any sensitive data is being logged and if so, whether role based access restrictions is also implemented around the log data.
Check how the event log data is organized and stored and also look for existence of any policy or procedures around managing such log data.
Understand the regulatory needs, which many times govern the data to be logged and how long such log data need to be retained.
The log data grows too fast and many times if the storage of log data is within the same production database of the application, there is a possibility that this growth may impact the performance of the application itself impacting the Availability needs. Depending on the volume and growth rate of the data, ensure that the chosen tools and technology is adequate and appropriate.

This blog is not an exhaustive checklist and just intended to bring out the broad concerns which at a minimum should be considered in the Architecture Review. TOGAF 9.1 has in its ADM Guidelines and Techniques has listed the design considerations with respect to building security as part of the design and architecture. These security design considerations can be used for an exhaustive security review, which also covers the implementation, change management and the IT infrastructure.

Also check out my own blog titled as Building Secure Application, which is abour making security part of the SDLC.

Sunday, September 30, 2012

Building Secure Applications

The awareness on security has increased manifold and thanks to the frequent news headlines on security breaches and compromises leaving organizations with heavy damage and / or loss. The board has started realizing that information security is one of the primary causes of most of the IT Risks that their organization has to mitigate or to be ready to face. But the application design and development processes has not seen significant change to ensure that the delivered application is secure and would help the organization bring down the risk exposure.

Most of the software engineers and architects involved in application design and development still have very less security awareness and they believe that that security is something that the IT Infrastructure team will take care of. On the other hand, the IT infrastructure team believes in the investment they would have made in robust security tools and expect that they have done their part to protect the organization and pass on any application specific security issues back to the development teams. Then, whose responsibility it is to ensure building a secure application?

This is where an independent, enterprise wide security consulting team needs to extend the security consulting to various project teams within the organization. But still, this will not solve the problem completely, as it is for the development teams to be security aware and design and build security applications.

Here is what the security consulting team can offer to the project teams:

Pre-project security reviews: This helps the decision makers to be aware of what security threat landscape that they are going to be dealing with upon implementation of the project. This could even have an impact on the project costing as additional cost might have to be incurred to mitigate the security risks that this project may bring on and in most cases it is not just one time investment and it has to be recurring operating expenses as well. This helps the organization to make informed decisions considering the extent of impact on the organization’s risk exposure.

Design level security reviews: Like in case of software quality, earlier one spot the security concerns, it would cheaper and easier to factor in the mitigation plans. At this stage, the security consulting teams can help the project teams in the following ways:

A high level evaluation of the security concerns that the design could expose and suggest possible security controls to mitigate such concerns.
Equip the project stake holders with necessary information associated with the identified security concerns, so that they can take better risk management decisions.
Extend guidance to the project teams on the choice of controls and solutions that might best address the security concerns.
Perform research and exploration on any the technology or feature is quite new and innovative which could potentially open a new thread landscape
Offer training to the design and development teams about the most common security vulnerabilities and the attack patterns, so that they design and build counter measures early on.

Implementation Security Reviews: Security consulting team offers to perform the reviews and verifications in the later phase of the application development, so that vulnerabilities if any exist may not slip through to production. Typically the services offered by the security consulting team are one or more of the following:

Ensure that the security concerns identified in the design review is fully addressed.
Perform security specific reviews on the code and this could be on a sampling basis, depending on the acceptable risk level of the application.
Use automated tools that will examine the code and / or the packaged component or application.
Review the security specific test cases as created by the software testing team and suggest additional test cases to ensure better coverage in the security testing.

Having a security consulting team will not just be enough to build secure applications. The Software Engineering Process which defines and describes the approach and methodologies used for project execution should also mandate for practising and consuming the services of the Security Consulting teams and their review reports should be identified as entry and exit criteria for various phases of the application development.

Security education is also required to ensure that the project team is aware why security is so important for the organization. The following measures towards security education would help the teams to design and develop secure applications:

Include a module of security education in the employee induction program, so that every new employee is oriented towards the security needs of the organization.
Offer in-depth security training to architects and select software developers and ensure that every project team has at least couple of such trained resources.
Ensure that the coding guidelines document also factors writing secure code.
Hold periodic training sessions where in addition to security related presentations, recent security review findings and or defects found during the security testing can be taken up for discussion and in the end come up with solutions to prevent such issues creeping in.
If the organization publishes a periodic news bulletin, ensure that security is a mandated section in it and use it to publish security related updates.

As it can be observed, security is not just one person or department’s responsibility and it should be the outcome of collaborated efforts of concerned teams with the common goal to build a secure application. The security consulting team can also be external to the organization, i.e. this function can be outsourced to security consuling firms who would be specializing in the area of application security practice.

Sunday, August 26, 2012

Solution Architecture - Basic Principles

As I have written in my earlier blog post Solution Architect – Understanding the Role, the Solution Architecture practice area demands a wider knowledge in business and technical areas. The solution Architects need to be jack of all trades rather than being the master of a specific area. The Solution Architects should be able to bridge the gaps that the business users have on technical space and those that the technical teams have on the business areas.

Architecting a good solution is always challenging as the context and the technology keeps changing too fast over time. A solution which was perfect in a time frame may not be so after a time period. Given that each of the non-functional quality attributes might adversely affect one or more other non-functional quality attributes, the Solution Architects should be able to do the balancing act on these attributes in line with the business needs and other factors that could be foreseen in the near and longer term.

Here are some of the basic principles, which when practiced, would help a solution architect to deliver a good solution.

Business Drives Information Technology

Just in case, if one has the expertise in IT, then it is important to take off that hat and wear the business hat while architecting solutions. Business does not mind which technology or tools that would be in use, but they want their business problems solved with reasonable longevity and other non-functional requirements. Ignoring or undermining business expectations or priorities could lead to building a solution with excessive engineering or technical complexity, and could result in higher cost and delays.

Just Enough Architecture, Just in Time

Let the solutions evolve based on business priorities. It would be ideal to take one business problems at a time based on its priority and architect the solution keeping in mind the first principle and factoring the ability to change and scale in response to the business needs. This will help the businesses get the solutions faster and in increments. This will also help the Architects adopt agile methods and ensure timely responses to business changes as mostly needed.

Common Business Solutions

There is a tendency for departments to go in for solutions on their own as they have their own budgets to spend. This many times leads to a situation where multiple solutions being in use for the same problem domain within various departments. It is a must for architects to have a complete view of the problems and solutions within the enterprise and the solutions architected should be common and usable across multiple departments. If need be, the existing solution can be enhanced to meet the changing requirements of different department. This principle when practiced will facilitate easy maintenance and will ensure better data governance.

Conform to Data Governance principles

Data and information is used for decision making at various levels and it is important that the data is organized and maintained at the highest possible quality level. The need to understand the sensitivity levels of the data within and across departments and partner systems is also equally important while architecting the solution. With big data era emerging, organizations are looking at handling petabytes of structured and unstructured data to be managed. It is imperative to have the Data Governance policies and processes in place and the Solution Architecture team should ensure that the solutions are in compliance with it.

Comply with Information Security Framework

Similar to the Data Governance policies, the organization should have Information Security policies and related framework in place and the Solution Architecture team should ensure that the solutions are in conformance with it. Security must be designed into the solutions from the inception and adding it later could result in higher cost and delays. This however is dependent on the organization’s business context and its risk appetite.

Also read the following other blog posts on Architecture Review - Scalability

Friday, July 13, 2012

Architecture Review - Scalability

Scalability is an important quality attribute of any system, be it hardware or software. But in most cases, the need for a scalability check or review is felt only when certain signs of scalability problems show up. Typically, the following are such signs that call for a scalability review of an existing application.

When changes requested on certain subsystems are turned down by the development team(s) citing that it is a complex subsystem and any change to it might call for huge efforts in terms of regression testing or else, it could lead to a bigger impact on the whole system. This indicates that there are certain components or sub systems which prevent the system from scaling.
Months after production usage, the application performance gradually slows down and there is a tendency to accept the performance slow down or to pump in more hardware to compensate the slowdown. This is again is an important sign that the application is not scaling to take on the ever growing user base and transaction volumes.

There could be more signs that could indicate that there are scalability issues within the application. It is unfortunate that scalability reviews are not done in the initial design phase, so that these post production troubles won’t show up. While reviewing an existing application for potential scalability issues may be easy, the solutions for addressing those may not be really easy. That could be because of the underlying design & architecture of the application and its inter-dependencies with other systems in use. Let us examine certain important aspects to look into to spot potential scalability problems.

Distributed architecture: While distributed design is likely to improve performance, it could lead to scalability issues when one or more components or sub system relies on the local resources. Another reason for this to be reviewed with care is an ill designed system may call for too much of communication across physical and logical boundaries of various subsystems and would rely more on the communication infrastructure.

Component interaction: Examine how the components or subsystems are designed to interact with each other and how closely they are positioned. Too much of component interactions could lead to network congestion and also result in very high latencies which results in performance scalability issues later on when the usage increases. Measure the payload and the latency of such inter component interactions and isolate the components that need redesign. Keeping the data and the behaviour closer will reduce the interactions across boundaries and as a result keeps the latency under check.

Resource contentions: Look for potential limitations on the hardware or software resources used by the application. For instance, if the application produces huge amounts of log data, on the same disk where its transaction data is stored, the write requests may encounter resource contentions. Similarly, how fast the data files grow and how does the disk subsystems support such growth. Possible solutions for such issues are using resource pooling, message queues or such other asynchronous mechanisms.

Remote Communications: It would always be beneficial to limit the remote calls to the minimum or else, too many remote calls may expose the system too much on the reliability and availability of the communication infrastructure. Ensure required validations are performed ahead of the remote calls, so that unnecessary remote calls are avoided. Where possible, the remote calls should be stateless and asynchronous. Synchronous calls may hold up the communication channels and associated resources for longer period which could be the potential cause for performance and scalability issues. Use of message queues may help in decoupling subsystems from being held up for synchronous responses.

Cache Management: While use of Cache can help achieve better performance, it could also prevent the application from scaling in a load balanced environment, unless a distributed caching mechanism is designed and used.

State Management: Look for how the state of the persistent objects is being managed. Stateless objects scale better than the stateful objects. Distributed state management is the solution to address the state management issue in a load balanced environment. Always prefer stateless components or services as this will perform well and at the same time scale well.

Here are some of the best practices that help achieve high scalability

Prefer stateless asynchronous communications as this will free up resources considerably and supports load balancing.
Design the application into multiple fault isolated subsystems with ability of being deployed on different hardware environments (or isolated application pools), so that faults in one subsystem does not impact the other subsystems. This partition can be either by service categories or by customer segments.
Use distributed cache solutions, so that the cached data is available on multiple clustered environments.
Use distributed databases with appropriate replication so that loads can be distributed.
Do not depend too much on the specific capabilities of the RDBMS, as this might couple the application tightly to one vendor’s RDBMS. High degree of scalability can be achieved by keeping the business logic outside of the RDBMS.
Spot the potential scalability issues early on by performing design reviews during development and by performing periodic load and performance tests.
Do not ignore the capacity planning activity early during the pre-project phase, as it could significantly impact the application usage in production over a period of time. Also be aware of the data growth rates and have a road map to support the ever increasing data and volume growth.
Do not ignore the root cause analysis as many times when developers roll in a fix for a defect, they are not fixing the root cause, which could come back later as a scalability bottleneck.

Update:

Also read this MSDN Library article which lists down five key considerations for a scalable design.

Saturday, July 7, 2012

Direct Database Updates – A Cause of Concern

Many organizations still have the practice of directly updating the production databases to fix data integrity issues. This shows that the one or more applications deployed on top of the database are not reliable enough to maintain the database integrity. This is one of the biggest concerns for the information security auditors as this requires certain resources being granted he access privilege to the production databases. This opens up opportunity for internal hackers to indulge into fraudulent activities.

There could be a multitude of reasons which could lead to such a situation, needing frequent database updates. The following are some such reasons that impact the reliability:

Incomplete requirements – It may be possible that the business rules and / or validations are not completely gathered and documented.
Design deficiencies – Design deficiencies like inappropriate error handling, managing the concurrency, etc. could also lead to data integrity issues.
Shared database across multiple applications – When multiple applications use a shared database, it might possible that some business rules or data validation requirements might be implemented differently or some applications might have technology or design limitations leading to introducing data integrity issues.
Creeping code complexity over a period of application maintenance – As the applications move into maintenance cycle, and as newer resources may get on to maintain the application code base, chances are high that due to the growing complexity and lack of complete knowledge, issues might slip through the development and sometimes QA phase as well.
Lack of adequate QA / Reviews – Review is a very effective technique to identify potential issues way ahead in the application development life cycle. But, unfortunately, most organizations does not give importance to requirement, design and code reviews or don’t get it done effectively. This review or QA deficiency could impact the reliability.

Though the software development process has matured enough, organizations tend to compromise in some of the quality attributes which might lead to a situation of the application being not reliable. Thus, it may not be possible to completely eliminate the need for direct database updates. However, a process with adequate checks and controls should be put in place around this activity to ensure that the chances of security breach through this channel are under control. At a minimum, he following checks and controls need to be in place to have the database updates in control.

Every request for database update should originate from business function heads and should formally be supported by a service request as logged in to an appropriate tracking system or into a register.
Every such request shall be reviewed by the analysts and / or architects to identify whether the data update is necessary and there not another way of fixing this using any of the application features.
The review should also suggest two solutions, one being the isolating the specific data table and columns that need to be updated (corrective action) and the other being the possible enhancement to the application(s) to prevent such integrity issue from occurring in the future. The review should also identify the constraints in implementing the data fix, for instance some of the fixes may warrant that they should be executed ahead or after a specific scheduled job or sometimes may need the database to be taken offline before execution.
In most cases, these issues would be very hard to investigate, as the occurrence would be rare and upon encountering a unique combination of data / program flow. It would be beneficial if the result of such review flows into the process and necessary checks and controls are put in place to prevent such issues slipping through the review and testing phases of the SDLC.
On completion of the review, developers may be engaged to create necessary SQL scripts that are required for such updates.
This shall be subject to review by the analysts and / or architects and then subject to testing by the QA team.
Once the review and test results are clear the scripts shall be forwarded to the DBAs who should execute the scripts in production. Ideally such data updates should be performed in batches and the affected tables / objects should be backed up prior to execution, so that the old data can be restored when needed.
The DBAs should maintain a record of such execution and the resulting log data and the same shall be subject to periodic audit, so as to ensure that the scripts remain unaltered and that no additional unwanted activities happen along with script execution.
None of the resources involved in this process except the DBAs should have access to production database. For the purpose of investigation or troubleshooting certain cases, a clone of the production data may be made available on request and should be taken off when the its intended purpose is complete. It is important to have a practice of masking sensitive data while making such production clones and also should have restricted access over the network.
It is important that the responsibilities are divided amongst different groups and the associated employees should have demonstrated high credibility in the past and the accountability should be well established.
A periodic end to end audit should be performed, which should track right from the origination of the service request to its execution in the production database and any non-compliance must be seriously dealt with.

More than these checks and controls, the organization should look for declining database update requests over a period of time, which is an indicator of improving system reliablity. Another way to look at the improvement is that the recurring requests of the same nature should vanish after two or three occurrances. The organization's software engineering process also should call far adequate checks and controls which will contribute to improved system reliability.

Sunday, July 1, 2012

Software Architecture Reviews

Review is a powerful technique that contributes to software quality. Various artifacts of the software development lifecycle are subject to review to ensure that any deficiencies could be spotted early on and addressed sooner, before letting it slip through further phases and in turn consuming more efforts than expected down the line. One such important review is the review of the software architecture. If you are asked to review the architecture of a software, it could be due to one of the following reasons.

Possibly, you are a Senior Architect and is expected to complement your fellow Architect by reviewing his work and thereby helping him and in turn the organization to get the best possible software Design. Some or most organizations mandate this need as part of their engineering process. When this review is done effectively, the benefits are huge, as this review occurs early in the development life cycle.
One or more of the custom built application(s) used in the organization are suspected to have certain serious reliability / performance issue and you are engaged to come up with an analysis and a plan to set it right. If this situation arise, then it is very much evident that the first one did not happen or it wasn’t done well. In some cases, such situation arise when the stake holders knowingly compromise on certain software quality attributes initially and then surprised to see its impact down the line as it hits back.
You are possibly looking out to license a product and are evaluating its suitability to your organization. In this, case you will probably have a checklist of items created based on the IT policies and framework of your organization and this is highly dependent on the information revealed by the product vendor.

Though there could be more reasons, the above are some of the primary reasons as to why one would need to perform an architectural review. In spite of as many reviews and testing, issues slips through and challenges the IT architects at some point down the line. Resolution of such issues may call for certain specific reviews and the method and approach would be different based on the type of the problem. For instance, if there be a data breach, a security review of the architecture is what is needed to not only identify the root cause for the current problem, but also to identify potential vulnerabilities and come up with solutions to plug those gaps.

These specific reviews can be typically associated with the broad software quality attributes, which are also termed as non-functional-requirements. The best way to approach these specific reviews is to start with an architectural review. A review checklist would be a good tool to use for the purpose, but the checklist should be exhaustive enough to cover necessary areas, so that the reviewer can get the right and required inputs and would be in a good position to form an opinion about the possible deficiencies and can relate it with the problem being attempted to be resolved.

Keep a watch on my blog for more on specific architecture reviews.

Pages