Tech Bytes by Kannan Subbiah

Sunday, October 14, 2012

Application Architecture Review - Security

In continuation of the Architecture review series, let us focus on security review in this blog. With information security breaches hitting the news headlines quite frequently, many enterprises are realizing the real need to manage this security risk and be resilient. As such, it is possible that as an Architect, you might have been called for to perform a security review of the existing applications. I have tried to put together the following areas of concern, which need a closer look to form an opinion whether the application architecture is secure enough.

In general the broad areas of concern for the security architects should be the following:

Authentication – Review the tools, technology and the approach used by the application to establish the identity of the application users for possible deficiencies. In this connection the following specific areas need attention.

Look for identification of the legitimate human and system users of the application in the requirements document which in turn are validated with appropriate business scenarios.
If the application exposes interface to external systems, understand how access by such systems are identified and authenticated. Also understand how secure such other external systems are and if possible ask for a security assessment of such other systems.
Identify how users are authenticated, whether two factor or three factor authentication.
Check if Single Sign On is implemented and in such case, understand how it is implemented, what tools and technology are used. If the Identity provider is external to the system boundary, then also check how the information in transit between the identity provider and the application is secured.
In case of external identity providers, it would also be worth checking the security practices followed by the Service Provider and whether they are being subject to regular external independent security assessment.
If the application maintains the user information locally and authenticates against it, ensure whether identity related data is secured appropriately from unauthorized access.
It would also be worth understanding how the database servers authenticate the application or the application user. If the application users happen to be the users of the database as well, then the mechanisms implemented to prevent such users directly accessing the database needs to be scrutinized.

Authorization - Each of the identified human or system users would be operating on the application by assuming defined roles and the authorization to access various components or information should be dependent on such roles. Get a clear view of how the roles and authorization are implemented in the application. The following specific areas are worth the attention in this regard.

Check if there exists an information sensitivity policy or information privacy policy as relevant to the data or information being accessed or managed by the application.
Understand how the defined roles are mapped to the various datasets in terms of the permission to the Create, Read, Update and Delete. It would also be good to examine the various roles defined by the organization whether they are in line with that of the principles of segregation of duties and look for how the users with multiple or overlapping roles are handled by the system.
With a view to improve application performance, developers tend to create interfaces (both visual and non-visual) in such a way they are chunky as against being chatty. While this holds good in terms of application performance, the datasets being served need to be reviewed with respect to the information sensitivity and the role based permission restrictions should be applied to all internal and external APIs and interfaces.

Availability / Scalabiltiy – Systems are designed to process data in the expected and timely manner so that the information users make the most of it, and perform the business operations efficiently and effectively. The general experience is that the systems perform very well in the initial testing phase and when it is deployed in production its behaviour could be different and might slow down considerably due to various environment and load related issues. As an architect it is essential that the proper estimation is done for expected user and data growth and the application is designed to meet such needs. Examination of the following areas might reveal how the application meets this concern.

Check out my own blog on Architecture Review – Scalability to know more on this specific area.

Auditability – The systems should be designed to log certain events, which could be potentially lead to security breach. These logs should be readable when needed by users with appropriate roles and should be monitored periodically. Event alerts also help notifying the administrators on the occurrence of certain type of events, which may require immediate attention. Examine the following areas of the application design to form an opinion on this concern.

Review the Application architecture to understand how the event alert and logging mechanism is implemented.
Review for completeness of various events that are being handled and the relevant data is being logged. Examine if any sensitive data is being logged and if so, whether role based access restrictions is also implemented around the log data.
Check how the event log data is organized and stored and also look for existence of any policy or procedures around managing such log data.
Understand the regulatory needs, which many times govern the data to be logged and how long such log data need to be retained.
The log data grows too fast and many times if the storage of log data is within the same production database of the application, there is a possibility that this growth may impact the performance of the application itself impacting the Availability needs. Depending on the volume and growth rate of the data, ensure that the chosen tools and technology is adequate and appropriate.

This blog is not an exhaustive checklist and just intended to bring out the broad concerns which at a minimum should be considered in the Architecture Review. TOGAF 9.1 has in its ADM Guidelines and Techniques has listed the design considerations with respect to building security as part of the design and architecture. These security design considerations can be used for an exhaustive security review, which also covers the implementation, change management and the IT infrastructure.

Also check out my own blog titled as Building Secure Application, which is abour making security part of the SDLC.

Sunday, September 30, 2012

Building Secure Applications

The awareness on security has increased manifold and thanks to the frequent news headlines on security breaches and compromises leaving organizations with heavy damage and / or loss. The board has started realizing that information security is one of the primary causes of most of the IT Risks that their organization has to mitigate or to be ready to face. But the application design and development processes has not seen significant change to ensure that the delivered application is secure and would help the organization bring down the risk exposure.

Most of the software engineers and architects involved in application design and development still have very less security awareness and they believe that that security is something that the IT Infrastructure team will take care of. On the other hand, the IT infrastructure team believes in the investment they would have made in robust security tools and expect that they have done their part to protect the organization and pass on any application specific security issues back to the development teams. Then, whose responsibility it is to ensure building a secure application?

This is where an independent, enterprise wide security consulting team needs to extend the security consulting to various project teams within the organization. But still, this will not solve the problem completely, as it is for the development teams to be security aware and design and build security applications.

Here is what the security consulting team can offer to the project teams:

Pre-project security reviews: This helps the decision makers to be aware of what security threat landscape that they are going to be dealing with upon implementation of the project. This could even have an impact on the project costing as additional cost might have to be incurred to mitigate the security risks that this project may bring on and in most cases it is not just one time investment and it has to be recurring operating expenses as well. This helps the organization to make informed decisions considering the extent of impact on the organization’s risk exposure.

Design level security reviews: Like in case of software quality, earlier one spot the security concerns, it would cheaper and easier to factor in the mitigation plans. At this stage, the security consulting teams can help the project teams in the following ways:

A high level evaluation of the security concerns that the design could expose and suggest possible security controls to mitigate such concerns.
Equip the project stake holders with necessary information associated with the identified security concerns, so that they can take better risk management decisions.
Extend guidance to the project teams on the choice of controls and solutions that might best address the security concerns.
Perform research and exploration on any the technology or feature is quite new and innovative which could potentially open a new thread landscape
Offer training to the design and development teams about the most common security vulnerabilities and the attack patterns, so that they design and build counter measures early on.

Implementation Security Reviews: Security consulting team offers to perform the reviews and verifications in the later phase of the application development, so that vulnerabilities if any exist may not slip through to production. Typically the services offered by the security consulting team are one or more of the following:

Ensure that the security concerns identified in the design review is fully addressed.
Perform security specific reviews on the code and this could be on a sampling basis, depending on the acceptable risk level of the application.
Use automated tools that will examine the code and / or the packaged component or application.
Review the security specific test cases as created by the software testing team and suggest additional test cases to ensure better coverage in the security testing.

Having a security consulting team will not just be enough to build secure applications. The Software Engineering Process which defines and describes the approach and methodologies used for project execution should also mandate for practising and consuming the services of the Security Consulting teams and their review reports should be identified as entry and exit criteria for various phases of the application development.

Security education is also required to ensure that the project team is aware why security is so important for the organization. The following measures towards security education would help the teams to design and develop secure applications:

Include a module of security education in the employee induction program, so that every new employee is oriented towards the security needs of the organization.
Offer in-depth security training to architects and select software developers and ensure that every project team has at least couple of such trained resources.
Ensure that the coding guidelines document also factors writing secure code.
Hold periodic training sessions where in addition to security related presentations, recent security review findings and or defects found during the security testing can be taken up for discussion and in the end come up with solutions to prevent such issues creeping in.
If the organization publishes a periodic news bulletin, ensure that security is a mandated section in it and use it to publish security related updates.

As it can be observed, security is not just one person or department’s responsibility and it should be the outcome of collaborated efforts of concerned teams with the common goal to build a secure application. The security consulting team can also be external to the organization, i.e. this function can be outsourced to security consuling firms who would be specializing in the area of application security practice.

Sunday, September 23, 2012

Data De-identification Dilemma

De-identification is a process of removing various elements of the dataset, so that the data row would cease to be personally identifiable to an individual. This is all about protecting the privacy of the users of systems as backed by legislations prevalent in many countries. While HIPAA in the US is the most known act that provides for protection of personally identifiable data, many other countries also have promulgated legislations to regulates the handling of such data in varying degrees.

Most organizations are increasingly becoming security aware as they are getting impacted by the related risks of not appropriately protecting the data and information assets. For the purpose this discussion we can assume that appropriate checks and controls are in place for data in the active store. But the cloud evolution and increasing integration of external systems requires that the data when exchanged or disclosed to any interconnected system or stored elsewhere on the cloud to support different needs including back up or business analytics requires that such datasets that is so disclosed or stored elsewhere need to be de-identified, so that the privacy interests of the such individuals are protected and in turn comply with applicable privacy legislations.

Under HIPAA, individually identifiable health information is de-identified if the following specific fields of data are removed or generalized:

Names
Geographic subdivisions smaller than a state
All elements of dates (except year) related to an individual (including dates of admission, discharge, birth, death)
Telephone & FAX numbers
Email addresses
Social security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate / license numbers
Vehicle identifiers and serial numbers including license plates
Device identifiers and serial numbers
Web URLs
Internet protocol addresses
Biometric identifiers (including finger and voice prints)
Full face photos and comparable images
Any unique identifying number, characteristic or code

In today’s context, a vast amount of personal information is becoming available from various public and private sources all around the world, which include public records like, telephone directories, property records, voters register and even the social networking sites. The chances of using these data to link against de-identified data and there by being able to re-identify the individual is high. Professor Sweeney testified that there is a 0.04% chance that data de-identified under the health rule’s methodology could be re-identified when compared to voter registration records for a confined population.

Others have also written about the shortcomings of de-identification. A June 2010 article by Arvind Narayanan and Vitaly Shmatikov offers a broad and general conclusion:

The emergence of powerful re-identification algorithms demonstrates not just a flaw in a specific anonymization technique(s), but the fundamental inadequacy of the entire privacy protection paradigm based on “de-identifying” the data.

With various tools and technologies, it may be possible at times to achieve probably absolute de-identification. However, it seems unlikely that there is a general solution that will work for all types of data, all types of users, and all types of activities. Thus, we continue to face the possibility that de-identified personal data shared for research and other purposes may be subject to re-identification.

There is a wide variance in the regulatory requirement on the subject amongst various legislations. While some require removal of specific data fields, some mandates for adherence to certain administrative processes and few others require compliance to one or more standards.

Robert Gellman in his paper titled as The deidentification dilemma: A legislative and contractual proposal, calls for a contractual solution, backed by a new legislation. However, irrespective of it being backed by legislation or not it would be wise to follow this approach as it helps bind the data recipients to the requirements of the data discloser. With the use of SaaS applications on the rise the chances of the data being stored elsewhere and being on the wire is very high. The increasing need for data and application integrations over the cloud across various partner organizations is again makes the need for such a contractual solution a must.

The core proposal in the legislation is a voluntary data agreement, which is a contract between a data discloser and a data recipient. The PDDA will only apply to those who choose to accept its terms and penalties through a data agreement. The PDDA establishes standards for behaviour and civil and criminal penalties for violations. In exchange, there are benefits to the discloser and recipient.

With the above requirement and understanding on the de-identification of data, let us list down the possible circumstances, which will mandate data de-identification as below:

All non production database instances, which includes the development, test, training and production support instances of the databases as may be maintained by an organization. It is quite prevalent that the DBAs do maintain and run scripts to anonymize the personal data before such instance is exposed for general use by the intended users. But it is also important to ensure that the anonymization is in line with regulatory requirements of the region depending upon where such instances are hosted.
The increased use of business analytics call for maintenance of one or more data marts, which happens to be a replica of the production database. While it would absolutely fine, if such data marts store data summarized at a level such that each row does not represent one individual, care has to be taken just in case the micro level data is also maintained in the mart to facilitate drill through.
Application controls – All systems that work with databases containing personally identifiable information should be designed in such a way that appropriate access controls are built in to protect the sensitive information from being displayed or extracted out.
Remote workers & mobility challenges – Organizations have started accepting the culture of remote working and employee mobility. That means that the employees would be accessing the data through one or more applications from anywhere in the world using a multitude of devices. This call for an appropriate policy, checks and controls to be compliant with the privacy legislations.
Partner systems – In today’s connected world, business partners, who might be customers or vendors or even contracted outsourced service providers to gain access to the systems and databases of the organization. This certainly calls for a careful evaluation of the culture and voluntary agreement by such parties to be compliant with the organization’s data privacy needs. This even calls for periodic training and audit for the employees and systems of such partner organization.

Today’s lack of clear definitions, de-identification procedures, and legal certainty can impede some useful data sharing. It can also affect privacy of users when the lack of clarity about de-identification results in sharing of identifiable data that could have been avoided. The approach proposed by Robert Gellman will make available a new tool that fairly balances the needs and interests of data disclosers, data users, and data subjects. The solution could be invoked voluntarily by data disclosers and

data recipients. Its use could also be mandated by regulation or legislation seeking to allow broader use of personal data for beneficial purposes.

Reference:

The Deidentification Dilemma: A Legislative and Contractual Proposal
-- Robert Gellman - Version 2.4, July 12, 2010

Saturday, September 22, 2012

Cloud Computing - Governance Challenges

I recently happened to read a Technical Note published in October 2006 by Software Engineering Institute titled as ‘System-of-Systems Governance: New Patterns of Thought’. The technical note was primarily aimed at organizations like Department of Defence, where in multiple organizations and systems need to collaborate to form part of DoD as the bigger enterprise. The Note discusses about some of the key areas where the traditional Governance would need a review and revision to handle the very nature of the System-of-Systems.

CIOs are in favour of embracing cloud and as such would have contracted for multiple software systems (SaaS) for different needs, for instance Salesforce for its CRM needs, ServiceNow for its IT Service Management, Windows Azure for custom application development, NetSuite for its ERP needs and so on. Similarly the IT organization would have contracted with different cloud service providers for its Infrastructure (IaaS), Storage and Platform (PaaS) needs. The XaaS list is growing as we are now seeing offerings from vendors for Database as a Service, Security as a Service, Identity as a Service, etc. With all this multiple contracted system components, the IT organization certainly has challenges in implementing governance practices, as there are diverse systems components forming part of the larger system.

In today’s world, with increasing cloud adoption and outsourcing activities, we could see business organizations are in almost in the same state as it is described of DoD, i.e. System-of-Systems and the key governance challenges very much hold good to be addressed. The following are the five amongst the six areas that need to be looked into to realign the governance practice within an enterprise:

Collaboration and Authority

With systems and components owned by various organizations being in use, even if owners of constituent systems are unusually committed to the system of systems, a single authority is likely to be ineffective. And if authority is essential to the enforcement of IT policy, then without sufficient or inadequate authority independent vendor organizations can always be expected to have reluctance in adopting shared policies. Collaboration amongst the constituent system owners is required at least in problem solving, participating in decision making process, coming up with provisional solutions to meet an emergency need. While the cloud computing is still maturing, there is a need for standards to emerge which should facilitate the necessary collaboration both technically and in terms of policy federation.

Motivation and Accountability

There should be motivation for anyone to adhere to or adapt to a shared policy. Enforcing such shared standards or policies across independent vendors would not work as effectively. The Technical Note refers to Zadek’s five stages of learning that organizations go through to achieve the benefit of voluntary collaboration. These are:

Clic on the image to view a better image

The table above not only shows what motivates organizations at different stages but also reveals what they may need to learn. For example, a defensive organization that claims common system-of-systems practices are irrelevant may need to be educated about threats to its reputation due to its lack of voluntary compliance. At all stages, we need policies to give individuals and organizations the incentive to do the right thing.

Multiple Models

A simple example to highlight how this area is significant could be the security implementation within the component systems. Each component and system vendor may have different security models implemented within the individual systems and that complicates the governance of the overall organization challenging. This calls for the need for a dynamic governance model, which can map and interact between different models of the individual components. While security is just one example there are other areas where the components are designed and modelled differently. While framing the Governance policies to suit all such systems certainly is not the good approach, the governance framework should provide for variables based on type of systems or type of service or a similar category.

Expectation of Evolution

This can be easily related to change and release management of independent vendor systems and the change of the common infrastructure of the enterprise itself. If governance cannot eliminate the independent evolution of components within the system of systems, it should aim to reduce the harmful effects of uncontrolled evolution by the component systems. Thus, policies must be created and enforced to provide rules and guidance for components as they change.

At a minimum, governance for evolution should include rules and guidelines for

informing other components systems (when known) 12 of the changes in the interfaces to and functionality of one system
coordinating schedules with other component systems so that those that have to change can do so together (when backward compatibility of interfaces cannot be maintained)
maintaining multiple versions of the system when schedules cannot be coordinated
developing each system to insulate it from changes in other component systems
minimizing the perturbations to interfaces when changing a system

Highly Fluid Processes

Agility is the order of the day within every function of the enterprise. Being agile and responding to changes quickly gives the competitive edge to the enterprises and that is equally applicable for the Governance Framework as well.

Planning for rapid changes in system-of-systems governance is needed. For example, governance strategies may provide a mechanism for adapting to rapid policy change, such as a way to relax security policies to achieve some urgent goal and then tighten them up again. Governance policies should be framed in such a way that those around the systems or components which is likely to see problems or changes too quickly should have flexible policies and whereas as we move away from such systems, it should be relatively stable. For example, a neighbourhood of closely related systems might be the first to notice a problem with a current component or process and will need to respond quickly. At the extreme, where neighbourhoods of related systems are themselves fluid, some details of system-of-systems governance policies might be negotiated.

Summary

As with the technical note, I have not attempted to suggest solution for the governance challenges listed above. Organizations like ISACA have been researching on Governance issues and its widely adopted COBIT framework would have solutions for these problems, which we will explore in my future blog posts.

References:
SEI's Technical Note - System-of-Systems Governance

Saturday, September 15, 2012

Leveraging Lessons Learned

Success = failure + failure + failure … Sounds familiar?

Leadership experts and management gurus have said enough about how failures lead to success. That is very true for the individuals when the respective person takes it in the right context and work on the causes of the failure to overcome it in the next opportunity. But how does this work in reality for the organization?

If you have been part of a project, which has failed to deliver the promised features on time or at the agreed cost, you are most likely out of that organization, as the management want to penalize those involved in it. In the process, the organization loses as it did not want to capitalize on the lessons learned by the team through the failed project and the new team that takes over might commit same or even different mistakes, which could again lead to failure.

Agile projects are likely to fare better in this space as Agile project management calls for identifying things that went well and those did not went well at the end of every sprints. Here again the one question that remains to be answered is, how does the scrum master and the teams deal with the things that did not went well in the earlier sprint. Yet another question that needs to be answered is how open are the project team members in openly admitting their own errors and omissions, which could have adversely impacted the project.

As far as the development teams, there are so much to be learnt on a daily basis, for example, the defects uncovered in unit testing, findings in the requirements, design and code reviews and even the project issues could lead to a great lesson to be learned by every other member of the team.

Here are few ideas that will help the organization in leveraging the lessons learned by the teams through various errors, mistakes and omissions.

Mentor the teams to the effect that they demonstrate accountability and responsibility and that admitting a mistake early on is a good thing. The earlier, the triggers are known, it is better as other members of the team would stay away from committing the same mistakes.
Coach the teams to share, share and share with their peers and even across the teams. This can be accomplished by removing the mind blocks within the employees in admitting their own mistakes and they should be encouraged to share those for the good of themselves and the organization. It is the tendency of the employees that when they uncover any issues during unit testing and reviews, they would just fix it themselves and do not report it further.
Encourage teams to share their previous experiences every now and then and for sure there will be some takeaways from such experiences for some members of the team.
Bring in a culture within the organization which will discourage egos and emotions which are found to be the barriers for sharing.
Promote risk management and encourage every employee to participate in it. It is needless to say that every identified risk has the potential of becoming an issue and soon can come in way to prevent the project from being successful. Past experience and lessons learned is a great source of risk identification.
Above all, make the sharing the lessons easy by putting in place an appropriate knowledge base platform and train and encourage employees to use it.

Though the above ideas are more suitable for IT services organization, they can be practiced in any other organization as well with some tweaks.

Here is an interesting article to read on, where in Ken Bruss discusses about leveraging lessons learned for competitive advantage.

Sunday, August 26, 2012

Solution Architecture - Basic Principles

As I have written in my earlier blog post Solution Architect – Understanding the Role, the Solution Architecture practice area demands a wider knowledge in business and technical areas. The solution Architects need to be jack of all trades rather than being the master of a specific area. The Solution Architects should be able to bridge the gaps that the business users have on technical space and those that the technical teams have on the business areas.

Architecting a good solution is always challenging as the context and the technology keeps changing too fast over time. A solution which was perfect in a time frame may not be so after a time period. Given that each of the non-functional quality attributes might adversely affect one or more other non-functional quality attributes, the Solution Architects should be able to do the balancing act on these attributes in line with the business needs and other factors that could be foreseen in the near and longer term.

Here are some of the basic principles, which when practiced, would help a solution architect to deliver a good solution.

Business Drives Information Technology

Just in case, if one has the expertise in IT, then it is important to take off that hat and wear the business hat while architecting solutions. Business does not mind which technology or tools that would be in use, but they want their business problems solved with reasonable longevity and other non-functional requirements. Ignoring or undermining business expectations or priorities could lead to building a solution with excessive engineering or technical complexity, and could result in higher cost and delays.

Just Enough Architecture, Just in Time

Let the solutions evolve based on business priorities. It would be ideal to take one business problems at a time based on its priority and architect the solution keeping in mind the first principle and factoring the ability to change and scale in response to the business needs. This will help the businesses get the solutions faster and in increments. This will also help the Architects adopt agile methods and ensure timely responses to business changes as mostly needed.

Common Business Solutions

There is a tendency for departments to go in for solutions on their own as they have their own budgets to spend. This many times leads to a situation where multiple solutions being in use for the same problem domain within various departments. It is a must for architects to have a complete view of the problems and solutions within the enterprise and the solutions architected should be common and usable across multiple departments. If need be, the existing solution can be enhanced to meet the changing requirements of different department. This principle when practiced will facilitate easy maintenance and will ensure better data governance.

Conform to Data Governance principles

Data and information is used for decision making at various levels and it is important that the data is organized and maintained at the highest possible quality level. The need to understand the sensitivity levels of the data within and across departments and partner systems is also equally important while architecting the solution. With big data era emerging, organizations are looking at handling petabytes of structured and unstructured data to be managed. It is imperative to have the Data Governance policies and processes in place and the Solution Architecture team should ensure that the solutions are in compliance with it.

Comply with Information Security Framework

Similar to the Data Governance policies, the organization should have Information Security policies and related framework in place and the Solution Architecture team should ensure that the solutions are in conformance with it. Security must be designed into the solutions from the inception and adding it later could result in higher cost and delays. This however is dependent on the organization’s business context and its risk appetite.

Also read the following other blog posts on Architecture Review - Scalability

Friday, August 17, 2012

Taking over - stay away from wrong battles

If you are about to take up a Senior Managerial role in an different organization, it is important that you are able to settle down at the right pace and pick up the right battle to make a mark in the first few weeks of taking over. While it is true that the management has through multiple rounds of discussions have tried their best to understand your abilities and have got convinced that you are the person to take the organization further down the roadmap, there could be challenges which you would not have faced before and you should take little care about few things like the following.

The takeover session

Usually, you might have a chance to have few rounds of discussions with your predecessor as part of the handing over process. It is important to use this very effectively. Among other things, the important items to pick up in this session are

Get to know why your predecessor is leaving, and this would help you to plan and carefully handle such pain areas so that you also don’t end up getting into a battle.
Get to know from your predecessor as to his opinions about people, process and technology in the organization and this would give you certain handles to pick up and carry on with.
Get to know as to what he has been upto in the past three to six months period so that to understand his unfinished initiatives and if certain initiatives failed why so. This will help you to understand the various constraints with which he has been operating and most likely those constraints would hold on for you too to deal with.
Get to know about the strategy, vision and goals of the organization and the roadmap to achieve those. It might be possible for you to identify certain areas to work on, but again don’t jump into action plans you need to get 360 degree view of the issues.

Just in case, you don’t have this opportunity of a smooth hand over, try to get same inputs from the next level executives, but use such inputs with care as you might want to validate the same from few other sources.

The Cultural Values

Each organization has its own culture that suits the most for the teams and the business. As part of your taking over, it is important that you understand he organizational culture, the morale of the employees and if required you may spend little more time to make yourself a fit into the prevalent culture and gain the confidence of the teams. While there is a chance that the given culture could be the cause of certain pain points and may need a change, you may not want to pick up such battles so soon as it could lead to the teams not accepting you as a leader. As you might have come with a different cultural background, it is so easy for you to get carried away and make missteps.

Spot the problem areas and the pain points

While you would have got to know some of the priority areas that need immediate attention, before jumping into action, spend more time to talk to various teams to understand completely as to the current state of various projects and initiatives that they are upto. Depending on your approach, style and your experience, you would spot certain pain points that need attention. Capture those for later action. Sooner you identify those it will help you to settle down quickly. It would also be a good idea to spend some time in understanding the failed projects or initiatives in the recent times, which would help you pick up certain process areas to revisit and work on. You might have to use your tactical and people skills here so that the teams open up to you freely and you get a good handle of the areas to work on.

Perform a careful analysis of these items for their impact on other aspects like, culture, process and technology which will help you to categorize and prioritize these areas and come up with a revised roadmap for the near term and the longer term.

In the process of settling in, it is very much likely that you will try to use your experience and suggest course correction or jump into actions in the middle, which if not done well could land you into trouble as might start facing resistance from some quarters. Though these could be overcome with authority, it may not work well if you use it early on. In such cases, you should be convincingly demonstrate to such teams that the course correction is much needed given the situation and take them into confidence that way. Picking up wrong battle early on could prove costly.

As many leaders say, a good leader is a good listener. So listen more early on to get to know the perspective better and try to pick up some lessons. For a while, you should forget your experience and expertise and try to be a learner and keep listening. Once you are done listening, do an alnalysis and in that process, you may use your experience. Keep in mind that there is no single best way to accomplish a thing and there could be multiple ways and means and it could be so that you might have a chance to pick up certain new things that might work well too.

Please understand that this is not a complete guide for you to just practice blindly. This could be completely out of context in some situations and may not hold good at all. However, what is to be taken out this is that try to stay away from picking wrong battles early on.

Friday, July 13, 2012

Architecture Review - Scalability

Scalability is an important quality attribute of any system, be it hardware or software. But in most cases, the need for a scalability check or review is felt only when certain signs of scalability problems show up. Typically, the following are such signs that call for a scalability review of an existing application.

When changes requested on certain subsystems are turned down by the development team(s) citing that it is a complex subsystem and any change to it might call for huge efforts in terms of regression testing or else, it could lead to a bigger impact on the whole system. This indicates that there are certain components or sub systems which prevent the system from scaling.
Months after production usage, the application performance gradually slows down and there is a tendency to accept the performance slow down or to pump in more hardware to compensate the slowdown. This is again is an important sign that the application is not scaling to take on the ever growing user base and transaction volumes.

There could be more signs that could indicate that there are scalability issues within the application. It is unfortunate that scalability reviews are not done in the initial design phase, so that these post production troubles won’t show up. While reviewing an existing application for potential scalability issues may be easy, the solutions for addressing those may not be really easy. That could be because of the underlying design & architecture of the application and its inter-dependencies with other systems in use. Let us examine certain important aspects to look into to spot potential scalability problems.

Distributed architecture: While distributed design is likely to improve performance, it could lead to scalability issues when one or more components or sub system relies on the local resources. Another reason for this to be reviewed with care is an ill designed system may call for too much of communication across physical and logical boundaries of various subsystems and would rely more on the communication infrastructure.

Component interaction: Examine how the components or subsystems are designed to interact with each other and how closely they are positioned. Too much of component interactions could lead to network congestion and also result in very high latencies which results in performance scalability issues later on when the usage increases. Measure the payload and the latency of such inter component interactions and isolate the components that need redesign. Keeping the data and the behaviour closer will reduce the interactions across boundaries and as a result keeps the latency under check.

Resource contentions: Look for potential limitations on the hardware or software resources used by the application. For instance, if the application produces huge amounts of log data, on the same disk where its transaction data is stored, the write requests may encounter resource contentions. Similarly, how fast the data files grow and how does the disk subsystems support such growth. Possible solutions for such issues are using resource pooling, message queues or such other asynchronous mechanisms.

Remote Communications: It would always be beneficial to limit the remote calls to the minimum or else, too many remote calls may expose the system too much on the reliability and availability of the communication infrastructure. Ensure required validations are performed ahead of the remote calls, so that unnecessary remote calls are avoided. Where possible, the remote calls should be stateless and asynchronous. Synchronous calls may hold up the communication channels and associated resources for longer period which could be the potential cause for performance and scalability issues. Use of message queues may help in decoupling subsystems from being held up for synchronous responses.

Cache Management: While use of Cache can help achieve better performance, it could also prevent the application from scaling in a load balanced environment, unless a distributed caching mechanism is designed and used.

State Management: Look for how the state of the persistent objects is being managed. Stateless objects scale better than the stateful objects. Distributed state management is the solution to address the state management issue in a load balanced environment. Always prefer stateless components or services as this will perform well and at the same time scale well.

Here are some of the best practices that help achieve high scalability

Prefer stateless asynchronous communications as this will free up resources considerably and supports load balancing.
Design the application into multiple fault isolated subsystems with ability of being deployed on different hardware environments (or isolated application pools), so that faults in one subsystem does not impact the other subsystems. This partition can be either by service categories or by customer segments.
Use distributed cache solutions, so that the cached data is available on multiple clustered environments.
Use distributed databases with appropriate replication so that loads can be distributed.
Do not depend too much on the specific capabilities of the RDBMS, as this might couple the application tightly to one vendor’s RDBMS. High degree of scalability can be achieved by keeping the business logic outside of the RDBMS.
Spot the potential scalability issues early on by performing design reviews during development and by performing periodic load and performance tests.
Do not ignore the capacity planning activity early during the pre-project phase, as it could significantly impact the application usage in production over a period of time. Also be aware of the data growth rates and have a road map to support the ever increasing data and volume growth.
Do not ignore the root cause analysis as many times when developers roll in a fix for a defect, they are not fixing the root cause, which could come back later as a scalability bottleneck.

Update:

Also read this MSDN Library article which lists down five key considerations for a scalable design.

Saturday, July 7, 2012

Direct Database Updates – A Cause of Concern

Many organizations still have the practice of directly updating the production databases to fix data integrity issues. This shows that the one or more applications deployed on top of the database are not reliable enough to maintain the database integrity. This is one of the biggest concerns for the information security auditors as this requires certain resources being granted he access privilege to the production databases. This opens up opportunity for internal hackers to indulge into fraudulent activities.

There could be a multitude of reasons which could lead to such a situation, needing frequent database updates. The following are some such reasons that impact the reliability:

Incomplete requirements – It may be possible that the business rules and / or validations are not completely gathered and documented.
Design deficiencies – Design deficiencies like inappropriate error handling, managing the concurrency, etc. could also lead to data integrity issues.
Shared database across multiple applications – When multiple applications use a shared database, it might possible that some business rules or data validation requirements might be implemented differently or some applications might have technology or design limitations leading to introducing data integrity issues.
Creeping code complexity over a period of application maintenance – As the applications move into maintenance cycle, and as newer resources may get on to maintain the application code base, chances are high that due to the growing complexity and lack of complete knowledge, issues might slip through the development and sometimes QA phase as well.
Lack of adequate QA / Reviews – Review is a very effective technique to identify potential issues way ahead in the application development life cycle. But, unfortunately, most organizations does not give importance to requirement, design and code reviews or don’t get it done effectively. This review or QA deficiency could impact the reliability.

Though the software development process has matured enough, organizations tend to compromise in some of the quality attributes which might lead to a situation of the application being not reliable. Thus, it may not be possible to completely eliminate the need for direct database updates. However, a process with adequate checks and controls should be put in place around this activity to ensure that the chances of security breach through this channel are under control. At a minimum, he following checks and controls need to be in place to have the database updates in control.

Every request for database update should originate from business function heads and should formally be supported by a service request as logged in to an appropriate tracking system or into a register.
Every such request shall be reviewed by the analysts and / or architects to identify whether the data update is necessary and there not another way of fixing this using any of the application features.
The review should also suggest two solutions, one being the isolating the specific data table and columns that need to be updated (corrective action) and the other being the possible enhancement to the application(s) to prevent such integrity issue from occurring in the future. The review should also identify the constraints in implementing the data fix, for instance some of the fixes may warrant that they should be executed ahead or after a specific scheduled job or sometimes may need the database to be taken offline before execution.
In most cases, these issues would be very hard to investigate, as the occurrence would be rare and upon encountering a unique combination of data / program flow. It would be beneficial if the result of such review flows into the process and necessary checks and controls are put in place to prevent such issues slipping through the review and testing phases of the SDLC.
On completion of the review, developers may be engaged to create necessary SQL scripts that are required for such updates.
This shall be subject to review by the analysts and / or architects and then subject to testing by the QA team.
Once the review and test results are clear the scripts shall be forwarded to the DBAs who should execute the scripts in production. Ideally such data updates should be performed in batches and the affected tables / objects should be backed up prior to execution, so that the old data can be restored when needed.
The DBAs should maintain a record of such execution and the resulting log data and the same shall be subject to periodic audit, so as to ensure that the scripts remain unaltered and that no additional unwanted activities happen along with script execution.
None of the resources involved in this process except the DBAs should have access to production database. For the purpose of investigation or troubleshooting certain cases, a clone of the production data may be made available on request and should be taken off when the its intended purpose is complete. It is important to have a practice of masking sensitive data while making such production clones and also should have restricted access over the network.
It is important that the responsibilities are divided amongst different groups and the associated employees should have demonstrated high credibility in the past and the accountability should be well established.
A periodic end to end audit should be performed, which should track right from the origination of the service request to its execution in the production database and any non-compliance must be seriously dealt with.

More than these checks and controls, the organization should look for declining database update requests over a period of time, which is an indicator of improving system reliablity. Another way to look at the improvement is that the recurring requests of the same nature should vanish after two or three occurrances. The organization's software engineering process also should call far adequate checks and controls which will contribute to improved system reliability.

Sunday, July 1, 2012

Software Architecture Reviews

Review is a powerful technique that contributes to software quality. Various artifacts of the software development lifecycle are subject to review to ensure that any deficiencies could be spotted early on and addressed sooner, before letting it slip through further phases and in turn consuming more efforts than expected down the line. One such important review is the review of the software architecture. If you are asked to review the architecture of a software, it could be due to one of the following reasons.

Possibly, you are a Senior Architect and is expected to complement your fellow Architect by reviewing his work and thereby helping him and in turn the organization to get the best possible software Design. Some or most organizations mandate this need as part of their engineering process. When this review is done effectively, the benefits are huge, as this review occurs early in the development life cycle.
One or more of the custom built application(s) used in the organization are suspected to have certain serious reliability / performance issue and you are engaged to come up with an analysis and a plan to set it right. If this situation arise, then it is very much evident that the first one did not happen or it wasn’t done well. In some cases, such situation arise when the stake holders knowingly compromise on certain software quality attributes initially and then surprised to see its impact down the line as it hits back.
You are possibly looking out to license a product and are evaluating its suitability to your organization. In this, case you will probably have a checklist of items created based on the IT policies and framework of your organization and this is highly dependent on the information revealed by the product vendor.

Though there could be more reasons, the above are some of the primary reasons as to why one would need to perform an architectural review. In spite of as many reviews and testing, issues slips through and challenges the IT architects at some point down the line. Resolution of such issues may call for certain specific reviews and the method and approach would be different based on the type of the problem. For instance, if there be a data breach, a security review of the architecture is what is needed to not only identify the root cause for the current problem, but also to identify potential vulnerabilities and come up with solutions to plug those gaps.

These specific reviews can be typically associated with the broad software quality attributes, which are also termed as non-functional-requirements. The best way to approach these specific reviews is to start with an architectural review. A review checklist would be a good tool to use for the purpose, but the checklist should be exhaustive enough to cover necessary areas, so that the reviewer can get the right and required inputs and would be in a good position to form an opinion about the possible deficiencies and can relate it with the problem being attempted to be resolved.

Keep a watch on my blog for more on specific architecture reviews.

Pages