Tech Bytes by Kannan Subbiah

Sunday, September 23, 2012

Data De-identification Dilemma

De-identification is a process of removing various elements of the dataset, so that the data row would cease to be personally identifiable to an individual. This is all about protecting the privacy of the users of systems as backed by legislations prevalent in many countries. While HIPAA in the US is the most known act that provides for protection of personally identifiable data, many other countries also have promulgated legislations to regulates the handling of such data in varying degrees.

Most organizations are increasingly becoming security aware as they are getting impacted by the related risks of not appropriately protecting the data and information assets. For the purpose this discussion we can assume that appropriate checks and controls are in place for data in the active store. But the cloud evolution and increasing integration of external systems requires that the data when exchanged or disclosed to any interconnected system or stored elsewhere on the cloud to support different needs including back up or business analytics requires that such datasets that is so disclosed or stored elsewhere need to be de-identified, so that the privacy interests of the such individuals are protected and in turn comply with applicable privacy legislations.

Under HIPAA, individually identifiable health information is de-identified if the following specific fields of data are removed or generalized:

Names
Geographic subdivisions smaller than a state
All elements of dates (except year) related to an individual (including dates of admission, discharge, birth, death)
Telephone & FAX numbers
Email addresses
Social security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate / license numbers
Vehicle identifiers and serial numbers including license plates
Device identifiers and serial numbers
Web URLs
Internet protocol addresses
Biometric identifiers (including finger and voice prints)
Full face photos and comparable images
Any unique identifying number, characteristic or code

In today’s context, a vast amount of personal information is becoming available from various public and private sources all around the world, which include public records like, telephone directories, property records, voters register and even the social networking sites. The chances of using these data to link against de-identified data and there by being able to re-identify the individual is high. Professor Sweeney testified that there is a 0.04% chance that data de-identified under the health rule’s methodology could be re-identified when compared to voter registration records for a confined population.

Others have also written about the shortcomings of de-identification. A June 2010 article by Arvind Narayanan and Vitaly Shmatikov offers a broad and general conclusion:

The emergence of powerful re-identification algorithms demonstrates not just a flaw in a specific anonymization technique(s), but the fundamental inadequacy of the entire privacy protection paradigm based on “de-identifying” the data.

With various tools and technologies, it may be possible at times to achieve probably absolute de-identification. However, it seems unlikely that there is a general solution that will work for all types of data, all types of users, and all types of activities. Thus, we continue to face the possibility that de-identified personal data shared for research and other purposes may be subject to re-identification.

There is a wide variance in the regulatory requirement on the subject amongst various legislations. While some require removal of specific data fields, some mandates for adherence to certain administrative processes and few others require compliance to one or more standards.

Robert Gellman in his paper titled as The deidentification dilemma: A legislative and contractual proposal, calls for a contractual solution, backed by a new legislation. However, irrespective of it being backed by legislation or not it would be wise to follow this approach as it helps bind the data recipients to the requirements of the data discloser. With the use of SaaS applications on the rise the chances of the data being stored elsewhere and being on the wire is very high. The increasing need for data and application integrations over the cloud across various partner organizations is again makes the need for such a contractual solution a must.

The core proposal in the legislation is a voluntary data agreement, which is a contract between a data discloser and a data recipient. The PDDA will only apply to those who choose to accept its terms and penalties through a data agreement. The PDDA establishes standards for behaviour and civil and criminal penalties for violations. In exchange, there are benefits to the discloser and recipient.

With the above requirement and understanding on the de-identification of data, let us list down the possible circumstances, which will mandate data de-identification as below:

All non production database instances, which includes the development, test, training and production support instances of the databases as may be maintained by an organization. It is quite prevalent that the DBAs do maintain and run scripts to anonymize the personal data before such instance is exposed for general use by the intended users. But it is also important to ensure that the anonymization is in line with regulatory requirements of the region depending upon where such instances are hosted.
The increased use of business analytics call for maintenance of one or more data marts, which happens to be a replica of the production database. While it would absolutely fine, if such data marts store data summarized at a level such that each row does not represent one individual, care has to be taken just in case the micro level data is also maintained in the mart to facilitate drill through.
Application controls – All systems that work with databases containing personally identifiable information should be designed in such a way that appropriate access controls are built in to protect the sensitive information from being displayed or extracted out.
Remote workers & mobility challenges – Organizations have started accepting the culture of remote working and employee mobility. That means that the employees would be accessing the data through one or more applications from anywhere in the world using a multitude of devices. This call for an appropriate policy, checks and controls to be compliant with the privacy legislations.
Partner systems – In today’s connected world, business partners, who might be customers or vendors or even contracted outsourced service providers to gain access to the systems and databases of the organization. This certainly calls for a careful evaluation of the culture and voluntary agreement by such parties to be compliant with the organization’s data privacy needs. This even calls for periodic training and audit for the employees and systems of such partner organization.

Today’s lack of clear definitions, de-identification procedures, and legal certainty can impede some useful data sharing. It can also affect privacy of users when the lack of clarity about de-identification results in sharing of identifiable data that could have been avoided. The approach proposed by Robert Gellman will make available a new tool that fairly balances the needs and interests of data disclosers, data users, and data subjects. The solution could be invoked voluntarily by data disclosers and

data recipients. Its use could also be mandated by regulation or legislation seeking to allow broader use of personal data for beneficial purposes.

Reference:

The Deidentification Dilemma: A Legislative and Contractual Proposal
-- Robert Gellman - Version 2.4, July 12, 2010

Saturday, September 22, 2012

Cloud Computing - Governance Challenges

I recently happened to read a Technical Note published in October 2006 by Software Engineering Institute titled as ‘System-of-Systems Governance: New Patterns of Thought’. The technical note was primarily aimed at organizations like Department of Defence, where in multiple organizations and systems need to collaborate to form part of DoD as the bigger enterprise. The Note discusses about some of the key areas where the traditional Governance would need a review and revision to handle the very nature of the System-of-Systems.

CIOs are in favour of embracing cloud and as such would have contracted for multiple software systems (SaaS) for different needs, for instance Salesforce for its CRM needs, ServiceNow for its IT Service Management, Windows Azure for custom application development, NetSuite for its ERP needs and so on. Similarly the IT organization would have contracted with different cloud service providers for its Infrastructure (IaaS), Storage and Platform (PaaS) needs. The XaaS list is growing as we are now seeing offerings from vendors for Database as a Service, Security as a Service, Identity as a Service, etc. With all this multiple contracted system components, the IT organization certainly has challenges in implementing governance practices, as there are diverse systems components forming part of the larger system.

In today’s world, with increasing cloud adoption and outsourcing activities, we could see business organizations are in almost in the same state as it is described of DoD, i.e. System-of-Systems and the key governance challenges very much hold good to be addressed. The following are the five amongst the six areas that need to be looked into to realign the governance practice within an enterprise:

Collaboration and Authority

With systems and components owned by various organizations being in use, even if owners of constituent systems are unusually committed to the system of systems, a single authority is likely to be ineffective. And if authority is essential to the enforcement of IT policy, then without sufficient or inadequate authority independent vendor organizations can always be expected to have reluctance in adopting shared policies. Collaboration amongst the constituent system owners is required at least in problem solving, participating in decision making process, coming up with provisional solutions to meet an emergency need. While the cloud computing is still maturing, there is a need for standards to emerge which should facilitate the necessary collaboration both technically and in terms of policy federation.

Motivation and Accountability

There should be motivation for anyone to adhere to or adapt to a shared policy. Enforcing such shared standards or policies across independent vendors would not work as effectively. The Technical Note refers to Zadek’s five stages of learning that organizations go through to achieve the benefit of voluntary collaboration. These are:

Clic on the image to view a better image

The table above not only shows what motivates organizations at different stages but also reveals what they may need to learn. For example, a defensive organization that claims common system-of-systems practices are irrelevant may need to be educated about threats to its reputation due to its lack of voluntary compliance. At all stages, we need policies to give individuals and organizations the incentive to do the right thing.

Multiple Models

A simple example to highlight how this area is significant could be the security implementation within the component systems. Each component and system vendor may have different security models implemented within the individual systems and that complicates the governance of the overall organization challenging. This calls for the need for a dynamic governance model, which can map and interact between different models of the individual components. While security is just one example there are other areas where the components are designed and modelled differently. While framing the Governance policies to suit all such systems certainly is not the good approach, the governance framework should provide for variables based on type of systems or type of service or a similar category.

Expectation of Evolution

This can be easily related to change and release management of independent vendor systems and the change of the common infrastructure of the enterprise itself. If governance cannot eliminate the independent evolution of components within the system of systems, it should aim to reduce the harmful effects of uncontrolled evolution by the component systems. Thus, policies must be created and enforced to provide rules and guidance for components as they change.

At a minimum, governance for evolution should include rules and guidelines for

informing other components systems (when known) 12 of the changes in the interfaces to and functionality of one system
coordinating schedules with other component systems so that those that have to change can do so together (when backward compatibility of interfaces cannot be maintained)
maintaining multiple versions of the system when schedules cannot be coordinated
developing each system to insulate it from changes in other component systems
minimizing the perturbations to interfaces when changing a system

Highly Fluid Processes

Agility is the order of the day within every function of the enterprise. Being agile and responding to changes quickly gives the competitive edge to the enterprises and that is equally applicable for the Governance Framework as well.

Planning for rapid changes in system-of-systems governance is needed. For example, governance strategies may provide a mechanism for adapting to rapid policy change, such as a way to relax security policies to achieve some urgent goal and then tighten them up again. Governance policies should be framed in such a way that those around the systems or components which is likely to see problems or changes too quickly should have flexible policies and whereas as we move away from such systems, it should be relatively stable. For example, a neighbourhood of closely related systems might be the first to notice a problem with a current component or process and will need to respond quickly. At the extreme, where neighbourhoods of related systems are themselves fluid, some details of system-of-systems governance policies might be negotiated.

Summary

As with the technical note, I have not attempted to suggest solution for the governance challenges listed above. Organizations like ISACA have been researching on Governance issues and its widely adopted COBIT framework would have solutions for these problems, which we will explore in my future blog posts.

References:
SEI's Technical Note - System-of-Systems Governance

Saturday, September 15, 2012

Leveraging Lessons Learned

Success = failure + failure + failure … Sounds familiar?

Leadership experts and management gurus have said enough about how failures lead to success. That is very true for the individuals when the respective person takes it in the right context and work on the causes of the failure to overcome it in the next opportunity. But how does this work in reality for the organization?

If you have been part of a project, which has failed to deliver the promised features on time or at the agreed cost, you are most likely out of that organization, as the management want to penalize those involved in it. In the process, the organization loses as it did not want to capitalize on the lessons learned by the team through the failed project and the new team that takes over might commit same or even different mistakes, which could again lead to failure.

Agile projects are likely to fare better in this space as Agile project management calls for identifying things that went well and those did not went well at the end of every sprints. Here again the one question that remains to be answered is, how does the scrum master and the teams deal with the things that did not went well in the earlier sprint. Yet another question that needs to be answered is how open are the project team members in openly admitting their own errors and omissions, which could have adversely impacted the project.

As far as the development teams, there are so much to be learnt on a daily basis, for example, the defects uncovered in unit testing, findings in the requirements, design and code reviews and even the project issues could lead to a great lesson to be learned by every other member of the team.

Here are few ideas that will help the organization in leveraging the lessons learned by the teams through various errors, mistakes and omissions.

Mentor the teams to the effect that they demonstrate accountability and responsibility and that admitting a mistake early on is a good thing. The earlier, the triggers are known, it is better as other members of the team would stay away from committing the same mistakes.
Coach the teams to share, share and share with their peers and even across the teams. This can be accomplished by removing the mind blocks within the employees in admitting their own mistakes and they should be encouraged to share those for the good of themselves and the organization. It is the tendency of the employees that when they uncover any issues during unit testing and reviews, they would just fix it themselves and do not report it further.
Encourage teams to share their previous experiences every now and then and for sure there will be some takeaways from such experiences for some members of the team.
Bring in a culture within the organization which will discourage egos and emotions which are found to be the barriers for sharing.
Promote risk management and encourage every employee to participate in it. It is needless to say that every identified risk has the potential of becoming an issue and soon can come in way to prevent the project from being successful. Past experience and lessons learned is a great source of risk identification.
Above all, make the sharing the lessons easy by putting in place an appropriate knowledge base platform and train and encourage employees to use it.

Though the above ideas are more suitable for IT services organization, they can be practiced in any other organization as well with some tweaks.

Here is an interesting article to read on, where in Ken Bruss discusses about leveraging lessons learned for competitive advantage.

Sunday, August 26, 2012

Solution Architecture - Basic Principles

As I have written in my earlier blog post Solution Architect – Understanding the Role, the Solution Architecture practice area demands a wider knowledge in business and technical areas. The solution Architects need to be jack of all trades rather than being the master of a specific area. The Solution Architects should be able to bridge the gaps that the business users have on technical space and those that the technical teams have on the business areas.

Architecting a good solution is always challenging as the context and the technology keeps changing too fast over time. A solution which was perfect in a time frame may not be so after a time period. Given that each of the non-functional quality attributes might adversely affect one or more other non-functional quality attributes, the Solution Architects should be able to do the balancing act on these attributes in line with the business needs and other factors that could be foreseen in the near and longer term.

Here are some of the basic principles, which when practiced, would help a solution architect to deliver a good solution.

Business Drives Information Technology

Just in case, if one has the expertise in IT, then it is important to take off that hat and wear the business hat while architecting solutions. Business does not mind which technology or tools that would be in use, but they want their business problems solved with reasonable longevity and other non-functional requirements. Ignoring or undermining business expectations or priorities could lead to building a solution with excessive engineering or technical complexity, and could result in higher cost and delays.

Just Enough Architecture, Just in Time

Let the solutions evolve based on business priorities. It would be ideal to take one business problems at a time based on its priority and architect the solution keeping in mind the first principle and factoring the ability to change and scale in response to the business needs. This will help the businesses get the solutions faster and in increments. This will also help the Architects adopt agile methods and ensure timely responses to business changes as mostly needed.

Common Business Solutions

There is a tendency for departments to go in for solutions on their own as they have their own budgets to spend. This many times leads to a situation where multiple solutions being in use for the same problem domain within various departments. It is a must for architects to have a complete view of the problems and solutions within the enterprise and the solutions architected should be common and usable across multiple departments. If need be, the existing solution can be enhanced to meet the changing requirements of different department. This principle when practiced will facilitate easy maintenance and will ensure better data governance.

Conform to Data Governance principles

Data and information is used for decision making at various levels and it is important that the data is organized and maintained at the highest possible quality level. The need to understand the sensitivity levels of the data within and across departments and partner systems is also equally important while architecting the solution. With big data era emerging, organizations are looking at handling petabytes of structured and unstructured data to be managed. It is imperative to have the Data Governance policies and processes in place and the Solution Architecture team should ensure that the solutions are in compliance with it.

Comply with Information Security Framework

Similar to the Data Governance policies, the organization should have Information Security policies and related framework in place and the Solution Architecture team should ensure that the solutions are in conformance with it. Security must be designed into the solutions from the inception and adding it later could result in higher cost and delays. This however is dependent on the organization’s business context and its risk appetite.

Also read the following other blog posts on Architecture Review - Scalability

Friday, August 17, 2012

Taking over - stay away from wrong battles

If you are about to take up a Senior Managerial role in an different organization, it is important that you are able to settle down at the right pace and pick up the right battle to make a mark in the first few weeks of taking over. While it is true that the management has through multiple rounds of discussions have tried their best to understand your abilities and have got convinced that you are the person to take the organization further down the roadmap, there could be challenges which you would not have faced before and you should take little care about few things like the following.

The takeover session

Usually, you might have a chance to have few rounds of discussions with your predecessor as part of the handing over process. It is important to use this very effectively. Among other things, the important items to pick up in this session are

Get to know why your predecessor is leaving, and this would help you to plan and carefully handle such pain areas so that you also don’t end up getting into a battle.
Get to know from your predecessor as to his opinions about people, process and technology in the organization and this would give you certain handles to pick up and carry on with.
Get to know as to what he has been upto in the past three to six months period so that to understand his unfinished initiatives and if certain initiatives failed why so. This will help you to understand the various constraints with which he has been operating and most likely those constraints would hold on for you too to deal with.
Get to know about the strategy, vision and goals of the organization and the roadmap to achieve those. It might be possible for you to identify certain areas to work on, but again don’t jump into action plans you need to get 360 degree view of the issues.

Just in case, you don’t have this opportunity of a smooth hand over, try to get same inputs from the next level executives, but use such inputs with care as you might want to validate the same from few other sources.

The Cultural Values

Each organization has its own culture that suits the most for the teams and the business. As part of your taking over, it is important that you understand he organizational culture, the morale of the employees and if required you may spend little more time to make yourself a fit into the prevalent culture and gain the confidence of the teams. While there is a chance that the given culture could be the cause of certain pain points and may need a change, you may not want to pick up such battles so soon as it could lead to the teams not accepting you as a leader. As you might have come with a different cultural background, it is so easy for you to get carried away and make missteps.

Spot the problem areas and the pain points

While you would have got to know some of the priority areas that need immediate attention, before jumping into action, spend more time to talk to various teams to understand completely as to the current state of various projects and initiatives that they are upto. Depending on your approach, style and your experience, you would spot certain pain points that need attention. Capture those for later action. Sooner you identify those it will help you to settle down quickly. It would also be a good idea to spend some time in understanding the failed projects or initiatives in the recent times, which would help you pick up certain process areas to revisit and work on. You might have to use your tactical and people skills here so that the teams open up to you freely and you get a good handle of the areas to work on.

Perform a careful analysis of these items for their impact on other aspects like, culture, process and technology which will help you to categorize and prioritize these areas and come up with a revised roadmap for the near term and the longer term.

In the process of settling in, it is very much likely that you will try to use your experience and suggest course correction or jump into actions in the middle, which if not done well could land you into trouble as might start facing resistance from some quarters. Though these could be overcome with authority, it may not work well if you use it early on. In such cases, you should be convincingly demonstrate to such teams that the course correction is much needed given the situation and take them into confidence that way. Picking up wrong battle early on could prove costly.

As many leaders say, a good leader is a good listener. So listen more early on to get to know the perspective better and try to pick up some lessons. For a while, you should forget your experience and expertise and try to be a learner and keep listening. Once you are done listening, do an alnalysis and in that process, you may use your experience. Keep in mind that there is no single best way to accomplish a thing and there could be multiple ways and means and it could be so that you might have a chance to pick up certain new things that might work well too.

Please understand that this is not a complete guide for you to just practice blindly. This could be completely out of context in some situations and may not hold good at all. However, what is to be taken out this is that try to stay away from picking wrong battles early on.

Friday, July 13, 2012

Architecture Review - Scalability

Scalability is an important quality attribute of any system, be it hardware or software. But in most cases, the need for a scalability check or review is felt only when certain signs of scalability problems show up. Typically, the following are such signs that call for a scalability review of an existing application.

When changes requested on certain subsystems are turned down by the development team(s) citing that it is a complex subsystem and any change to it might call for huge efforts in terms of regression testing or else, it could lead to a bigger impact on the whole system. This indicates that there are certain components or sub systems which prevent the system from scaling.
Months after production usage, the application performance gradually slows down and there is a tendency to accept the performance slow down or to pump in more hardware to compensate the slowdown. This is again is an important sign that the application is not scaling to take on the ever growing user base and transaction volumes.

There could be more signs that could indicate that there are scalability issues within the application. It is unfortunate that scalability reviews are not done in the initial design phase, so that these post production troubles won’t show up. While reviewing an existing application for potential scalability issues may be easy, the solutions for addressing those may not be really easy. That could be because of the underlying design & architecture of the application and its inter-dependencies with other systems in use. Let us examine certain important aspects to look into to spot potential scalability problems.

Distributed architecture: While distributed design is likely to improve performance, it could lead to scalability issues when one or more components or sub system relies on the local resources. Another reason for this to be reviewed with care is an ill designed system may call for too much of communication across physical and logical boundaries of various subsystems and would rely more on the communication infrastructure.

Component interaction: Examine how the components or subsystems are designed to interact with each other and how closely they are positioned. Too much of component interactions could lead to network congestion and also result in very high latencies which results in performance scalability issues later on when the usage increases. Measure the payload and the latency of such inter component interactions and isolate the components that need redesign. Keeping the data and the behaviour closer will reduce the interactions across boundaries and as a result keeps the latency under check.

Resource contentions: Look for potential limitations on the hardware or software resources used by the application. For instance, if the application produces huge amounts of log data, on the same disk where its transaction data is stored, the write requests may encounter resource contentions. Similarly, how fast the data files grow and how does the disk subsystems support such growth. Possible solutions for such issues are using resource pooling, message queues or such other asynchronous mechanisms.

Remote Communications: It would always be beneficial to limit the remote calls to the minimum or else, too many remote calls may expose the system too much on the reliability and availability of the communication infrastructure. Ensure required validations are performed ahead of the remote calls, so that unnecessary remote calls are avoided. Where possible, the remote calls should be stateless and asynchronous. Synchronous calls may hold up the communication channels and associated resources for longer period which could be the potential cause for performance and scalability issues. Use of message queues may help in decoupling subsystems from being held up for synchronous responses.

Cache Management: While use of Cache can help achieve better performance, it could also prevent the application from scaling in a load balanced environment, unless a distributed caching mechanism is designed and used.

State Management: Look for how the state of the persistent objects is being managed. Stateless objects scale better than the stateful objects. Distributed state management is the solution to address the state management issue in a load balanced environment. Always prefer stateless components or services as this will perform well and at the same time scale well.

Here are some of the best practices that help achieve high scalability

Prefer stateless asynchronous communications as this will free up resources considerably and supports load balancing.
Design the application into multiple fault isolated subsystems with ability of being deployed on different hardware environments (or isolated application pools), so that faults in one subsystem does not impact the other subsystems. This partition can be either by service categories or by customer segments.
Use distributed cache solutions, so that the cached data is available on multiple clustered environments.
Use distributed databases with appropriate replication so that loads can be distributed.
Do not depend too much on the specific capabilities of the RDBMS, as this might couple the application tightly to one vendor’s RDBMS. High degree of scalability can be achieved by keeping the business logic outside of the RDBMS.
Spot the potential scalability issues early on by performing design reviews during development and by performing periodic load and performance tests.
Do not ignore the capacity planning activity early during the pre-project phase, as it could significantly impact the application usage in production over a period of time. Also be aware of the data growth rates and have a road map to support the ever increasing data and volume growth.
Do not ignore the root cause analysis as many times when developers roll in a fix for a defect, they are not fixing the root cause, which could come back later as a scalability bottleneck.

Update:

Also read this MSDN Library article which lists down five key considerations for a scalable design.

Saturday, July 7, 2012

Direct Database Updates – A Cause of Concern

Many organizations still have the practice of directly updating the production databases to fix data integrity issues. This shows that the one or more applications deployed on top of the database are not reliable enough to maintain the database integrity. This is one of the biggest concerns for the information security auditors as this requires certain resources being granted he access privilege to the production databases. This opens up opportunity for internal hackers to indulge into fraudulent activities.

There could be a multitude of reasons which could lead to such a situation, needing frequent database updates. The following are some such reasons that impact the reliability:

Incomplete requirements – It may be possible that the business rules and / or validations are not completely gathered and documented.
Design deficiencies – Design deficiencies like inappropriate error handling, managing the concurrency, etc. could also lead to data integrity issues.
Shared database across multiple applications – When multiple applications use a shared database, it might possible that some business rules or data validation requirements might be implemented differently or some applications might have technology or design limitations leading to introducing data integrity issues.
Creeping code complexity over a period of application maintenance – As the applications move into maintenance cycle, and as newer resources may get on to maintain the application code base, chances are high that due to the growing complexity and lack of complete knowledge, issues might slip through the development and sometimes QA phase as well.
Lack of adequate QA / Reviews – Review is a very effective technique to identify potential issues way ahead in the application development life cycle. But, unfortunately, most organizations does not give importance to requirement, design and code reviews or don’t get it done effectively. This review or QA deficiency could impact the reliability.

Though the software development process has matured enough, organizations tend to compromise in some of the quality attributes which might lead to a situation of the application being not reliable. Thus, it may not be possible to completely eliminate the need for direct database updates. However, a process with adequate checks and controls should be put in place around this activity to ensure that the chances of security breach through this channel are under control. At a minimum, he following checks and controls need to be in place to have the database updates in control.

Every request for database update should originate from business function heads and should formally be supported by a service request as logged in to an appropriate tracking system or into a register.
Every such request shall be reviewed by the analysts and / or architects to identify whether the data update is necessary and there not another way of fixing this using any of the application features.
The review should also suggest two solutions, one being the isolating the specific data table and columns that need to be updated (corrective action) and the other being the possible enhancement to the application(s) to prevent such integrity issue from occurring in the future. The review should also identify the constraints in implementing the data fix, for instance some of the fixes may warrant that they should be executed ahead or after a specific scheduled job or sometimes may need the database to be taken offline before execution.
In most cases, these issues would be very hard to investigate, as the occurrence would be rare and upon encountering a unique combination of data / program flow. It would be beneficial if the result of such review flows into the process and necessary checks and controls are put in place to prevent such issues slipping through the review and testing phases of the SDLC.
On completion of the review, developers may be engaged to create necessary SQL scripts that are required for such updates.
This shall be subject to review by the analysts and / or architects and then subject to testing by the QA team.
Once the review and test results are clear the scripts shall be forwarded to the DBAs who should execute the scripts in production. Ideally such data updates should be performed in batches and the affected tables / objects should be backed up prior to execution, so that the old data can be restored when needed.
The DBAs should maintain a record of such execution and the resulting log data and the same shall be subject to periodic audit, so as to ensure that the scripts remain unaltered and that no additional unwanted activities happen along with script execution.
None of the resources involved in this process except the DBAs should have access to production database. For the purpose of investigation or troubleshooting certain cases, a clone of the production data may be made available on request and should be taken off when the its intended purpose is complete. It is important to have a practice of masking sensitive data while making such production clones and also should have restricted access over the network.
It is important that the responsibilities are divided amongst different groups and the associated employees should have demonstrated high credibility in the past and the accountability should be well established.
A periodic end to end audit should be performed, which should track right from the origination of the service request to its execution in the production database and any non-compliance must be seriously dealt with.

More than these checks and controls, the organization should look for declining database update requests over a period of time, which is an indicator of improving system reliablity. Another way to look at the improvement is that the recurring requests of the same nature should vanish after two or three occurrances. The organization's software engineering process also should call far adequate checks and controls which will contribute to improved system reliability.

Sunday, July 1, 2012

Software Architecture Reviews

Review is a powerful technique that contributes to software quality. Various artifacts of the software development lifecycle are subject to review to ensure that any deficiencies could be spotted early on and addressed sooner, before letting it slip through further phases and in turn consuming more efforts than expected down the line. One such important review is the review of the software architecture. If you are asked to review the architecture of a software, it could be due to one of the following reasons.

Possibly, you are a Senior Architect and is expected to complement your fellow Architect by reviewing his work and thereby helping him and in turn the organization to get the best possible software Design. Some or most organizations mandate this need as part of their engineering process. When this review is done effectively, the benefits are huge, as this review occurs early in the development life cycle.
One or more of the custom built application(s) used in the organization are suspected to have certain serious reliability / performance issue and you are engaged to come up with an analysis and a plan to set it right. If this situation arise, then it is very much evident that the first one did not happen or it wasn’t done well. In some cases, such situation arise when the stake holders knowingly compromise on certain software quality attributes initially and then surprised to see its impact down the line as it hits back.
You are possibly looking out to license a product and are evaluating its suitability to your organization. In this, case you will probably have a checklist of items created based on the IT policies and framework of your organization and this is highly dependent on the information revealed by the product vendor.

Though there could be more reasons, the above are some of the primary reasons as to why one would need to perform an architectural review. In spite of as many reviews and testing, issues slips through and challenges the IT architects at some point down the line. Resolution of such issues may call for certain specific reviews and the method and approach would be different based on the type of the problem. For instance, if there be a data breach, a security review of the architecture is what is needed to not only identify the root cause for the current problem, but also to identify potential vulnerabilities and come up with solutions to plug those gaps.

These specific reviews can be typically associated with the broad software quality attributes, which are also termed as non-functional-requirements. The best way to approach these specific reviews is to start with an architectural review. A review checklist would be a good tool to use for the purpose, but the checklist should be exhaustive enough to cover necessary areas, so that the reviewer can get the right and required inputs and would be in a good position to form an opinion about the possible deficiencies and can relate it with the problem being attempted to be resolved.

Keep a watch on my blog for more on specific architecture reviews.

Saturday, June 23, 2012

The pressure points of Cloud Adoption

The values and benefits of cloud adoption is increasingly clear and well known. Not to be carried away with these values and benefits, it is important to identify and be aware of the pressure points that the Cloud Adoption brings in as called out by ISACA in its white paper titled as ‘Guiding Principles for Cloud Computing Adoption and Use. Essentially the differences in technology itself and its use impacts the way IT is governed and managed and the management’s reaction to these impacts brings on the pressure points as well, which need to be managed.

Differences such as change in cost allocation from capital to operational may have consequences that may not apparent at the beginning. For instance, contracting for a cloud based software would be an operational spend and may have a lower cost of entry and thus, such decisions may fall outside the review and approval process. While in most cases, the pressure points are to be managed as risks, these are not necessarily risks.

Speed and Agility

The time-to-market is a driver for cloud adoption as solutions to meet market needs would be available more quicker at lesser cost though there could be gaps in meeting the exact requirements. This agile exploitation in a reduced time frame puts greater pressure on enterprise, in which culture, process, and human factors related to technology have been developed to support longer development cycles and long term technology use. This pressure when not handled appropriately could result in increased risk level in the following areas:

An unbalanced prioritisation of value over trust in technology solution choices
Missed opportunities when other alternatives are not considered
Recovery mishaps because fallback positions are not fully exploited
Missing functionality if full requirements are not identified
Increased long-term costs due to reliance on multiple short-lived solutions
Reduced performance when enterprises are hesitant to introduce new solutions because of existing technology investments

Changing Boundaries

The reliance on cloud providers calls for change in the roles and responsibilities within the enterprise and transfer certain responsibilities to outside parties. Contracts and SLAs with service providers attempt to assign accountabilities, but governance dictates that the enterprises themselves, their board and management remain accountable. With this, the locus of decision making changes from governance functions to business line leaders. This change in the organizational boundaries can put greater pressure on enterprises. The risk outcomes out of this pressure point could be:

Role confusion when accountabilities and responsibilities are not clearly defined
Diminished effectiveness when decisions are made without engaging in a wider consideration of trust and value before cloud acquisition
Failure to satisfy constituent and end-user expectations for protection and privacy
Project delay and increased costs due to the need for personnel with governance responsibilities to revisit cloud plans
Unclear specifications of provider responsibilities and accountabilities in SLAs
Incomplete information being provided to board members and senior management

New Technologies and Technology Expectations

Cloud follows a sequence of disruptions in how technology is viewed, integrated into organizational strategy and managed and in how IT risks are identified and managed. Areas of high pressure can result when strategy and enterprise architecture do not consider the unique qualities of cloud computing and when enterprise processes and procedures do not easily adapt to changes made possible by cloud computing. The following risks could be the outcome of this pressure point:

Missed opportunities to extract value from the integration of cloud and internal systems
Increased vulnerability from incompatibilities and inconsistencies between cloud and internal systems
Less than expected results when human factors are not considered in the design and integration of cloud services and infrastructures
Levels of organizational performance that do not meet expectations because cloud solutions do not fully support organizational processes
Levels of technical performance that do not meet expectations because processes do not take full advantage of cloud capabilities

Level Playing Field

Cloud computing removes the advantage that large enterprises have traditionally had in terms of availability of technology specialists and technical sophistication. Smaller enterprises now have the ability to leverage the cloud services and use technology sophistication that large enterprise used to enjoy. This brings the small and medium enterprises on an equal position with much larger enterprises. This level playing field can have an impact on the strategy and its implementation. Ignoring this impact can result in increase of risk levels in the following areas:

New entrants claiming a segment of traditional market dominance
Strategies that do not address competitor capabilities
Less-than-expected benefits received from technology-dependent solutions

Utility Services and Service Supply Chains

With cloud computing, where computing is viewed as a utility, focus is shifting to the value and benefits obtained from such utilities. Agile enterprises benefits from solutions that can be used as needed and discarded when they no longer provide value. This view of computing as a utility and the delivery of solutions supply chain of information systems solutions puts greater pressure on enterprises that contain a culture that is not accepting of utility solutions, a structure that does not facilitate cooperative planning and processes that cannot take advantage of computing solutions provided as supply chain of utilities. Ignoring this could result in the following risk outcomes:

Over-investment of resources in planning and building internally developed information system solutions
Less-than-optimal results when value-producing cloud utilities are missing from the total solution
Duplication of effort when specialist services available through cloud providers are not integrated as part of system management
Less-than-expected results when utility components are not integrated into and managed as an information system capability supply chain

In the white paper, ISACA suggests that enterprises follow a six guiding principles that can help illuminate the path for cloud adoption. Click here to download the complete white paper which is available for registered (free registration) users.

Saturday, June 16, 2012

Key risk areas that can impact the project success

Till recently and to some extent, even now, some of us don’t want our insurance advisor talk about our own risk of life. It has been the belief for some not to think or talk about the risk of losing life. However, things are changing and most of us today are managing personal risks well by atleast transferring the risk to the insurer for financial protection. Risk management is not just financial protection and there is more to it, even though finance makes the core part in most cases. The same way, if we look at managing software projects, the project manager and sponsors had to deal with so many issues every now and then, before it was felt that preventing such issues coming along the way is better than dealing with such issues, which is what is risk management.

The risk managers or risk experts always think of a what if scenario for every action / decision so that all possible risks are identified early on and this in the past was thought to be creating ‘negative vibes’ and did not receive much support from the project sponsors. Because, when much of the risks are identified upfront, it might be so that the project may have to be shelved at the start itself. Here again, things have changed and most CxOs are accepting ‘failing fast’ a better option than failing at the end. Failing at start is an even better option. Refer my own blog on ‘failing at start’.

Given that most of the project sponsors and project managers have realized that risk management is better than its only alternative crisis management and are believing that risk management is a key area of project management, let us explore the key risk areas in a broader context that need a close watch which if not could impact the success of an end to end software project.

User Expectations

Even though a well written functional specifications exist, and the development team developing to that requirement, there is a potential risk of end users when they look at the final product go back and say that “this is not working in a way we want’, and pushing back the product for rework. This is mainly due to the fact that everything around the business is changing with time. The more time the development team take in involving the end users, the chances are very much likely that the expectation would have changed. Agile is the solution to address this risk area, where by the end users are participating in the development and small chunks of the product are delivered every now and then for user feedback.

There are more to it, most projects do not have the non functional requirements documented and much of the user expectations go around such software capabilities, for instance application performance is a key non functional requirement, which the development team is expected to take care of as part of. While the solution for this lies with the solution architect, the project manager and the stakeholders should not lose sight of this important area and should be managing the user expectations all through the project execution.

If one can identify or spot the potential risks around the user expectation and manage it well, the chances of the project successfully reaching the milestone is very high.

Technology Shocks

This is a broader risk area and could be broken down into many sub areas. Many projects hit a road block when it get closer to production deployment, by when the infrastructure team may find heavy investment in terms of hardware and software tools required to support the product in production. Some projects even faces issues in the early stages as well, as the development team may find some tools or technology not suitable for the given solution.

A well done pre-project risk assessment can help address this risk as the architects involves in such assessments can anticipate and call out such road blocks, which might help the project sponsors to take a call. Development teams in most cases would want to jump on to the project and start building everything themselves without considering re-usable off the shelf components being available for most capabilities. This tendency can add to the schedule risk, as the project may take longer due to technical issues that the team may face.

Managing risks in this area early on is very important as certain choices might be very difficult to reverse in the middle of the project. It could also be a case where implementing certain requirements with a given technology platform or tools could be much more complex than with certain other tools. Much of this responsibility lies with the Architects, who should do a good job to take the project pass through this risk area.

Skill gaps

While this item is to some extent related to technology shock, there is more to skill than the technology itself and that made me to call this out as a separate risk area itself. Soft skills play a key role in taking the project on the success path. Staff attrition is inevitable in software projects and as such managing the dependency on people is very important. The resources holding key roles should have the right attitude of hand holding the teams, willing to share the knowledge, quick and effective in on-boarding new members to the team.

Such skills of key resources holding the lead role even influence staff leaving the team or even the organization and it could be some of the best resources of the project team exiting for such reasons. The Project Sponsors should play an important role here in getting to know of such risk items and manage them well so as to keep the morale of the team high and get the best of the team.

Every member of the project should contribute towards the success of the project, keeping in mind the project goals and objectives. It is not uncommon that some key members of the project make certain decisions in such a way that is beneficial to him or them and in turn putting the success of the project at risk. This is a sort of political behaviour by some members within the project. Here again, the sponsors should involve more into the project and look for such risks spotted early on, so that control measures can be put in place by changing the team composition or by imparting necessary training or counselling to the needy members.

This third risk, when identified and managed well will ensure the team members collaborate and communicate well and deliver their best which eventually will contribute towards project success.

Lack of Risk awareness

Finally, lack of risk awareness by the project stakeholders is the biggest risk. This calls for a proper risk strategy and objectives at the organization level and at every project level and letting every member of the project to actively participate in risk identification and management.

While most project managers do mandate risk management as part of the project charter and do maintain a risk register, they fail to apply the risk management principles consistently, which eventually lead to incorrect risk prioritization. Not to be stressed, the controls and tasks identified to reduce the risk level also need to be monitored on par with other project tasks for timely actions. It is also a common mistake that the project managers do by missing out to estimate the efforts for risk management and not making it part of the project schedule.

Another important aspect of risk management, which is normally ignored is the communication and continuous monitoring. Risks need continuous monitoring and need different levels of communication or escalation depending on the risk level. Most projects have a stale risk register, where the risks are just identified and no monitoring or follow ups being done on them. Use of an appropriate risk management tool is recommended as it will ensure the visibility of the risks to all concerned with automatic alerts and escalations and also will facilitate consolidation of risks across multiple projects, there by facilitating managing risks at enterprise level.

Pages