Saturday, December 24, 2011

Driving fast into the Tech Lane


As I was driving down to a restaurant with a friend of mine, we were chatting about another common friend and his new venture on mobile applications. The conversation soon gained technical flavor and it was a nice drive into the fast changing technology lane. Here are some excerpts from our conversation during the drive.

On why enterprises are in a hurry to port existing applications to mobile platform...

The technology is evolving so fast and enterprises will soon be embracing mobile devices which range from smart phones to tablets. Every tech worker owns a mobile smart device of his or her choice. Most such workers are holding senior positions in the enterprise and are very keen to use it to perform their work and for the purpose, try to influence the IT heads to allow such devices in work environment. This in fact is a challenge for the CIOs in terms of information security and confidentiality. But as this trend is growing, the IT heads have no option than to embrace this trend and start regulating this with a formal BYOD (Bring Your Own Device) policy, controls and governance framework around it.

On how BYOD is relevant in the context of mobile applications...

Yes, as the BYOD is gaining increased acceptance, the next big challenge is to get existing applications working on such devices, so that the employees don’t have to be provided with a desktop or even laptop. This in turn drives the need for porting the applications to mobile platform. Many tools and methodologies are emerging in this space so as to facilitate building mobile applications from ground up and also to port existing legacy applications to mobile platform. Write once deploy any where is the USP for today’s development tool vendors.

On how legacy applications can be ported...

This is where the Service Orientation is gaining importance. Business services are identified and exposed as reusable services and then build a portal application on top of it to appropriately present it for end user access on a variety of devices. The organizations would also consider embracing the cloud based SaaS applications to replace the legacy applications. And yes, migration to cloud could be a daunting task but CIOs are seeing a longer term benefit in doing so. An alternative shorter term solution could be to get a virtual desktop on the mobile device and then work on whatever legacy app that runs on the desktop.

About the concerns on cloud...

Yes, there still are certain concerns that keep organizations away from the cloud. However this trend is changing. Most organizations have already moved less critical applications to the public cloud. Like we have central / reserve banks regulating the banking industry, it is time for the industry consortium to come up with an independent regulatory body / framework, which can help establish the trust amongst the enterprises, which in turn will ease some of the security concerns. While industries like Banks and healthcare providers have reasons to be concerned to embrace cloud, other industries are showing serious signs of embracing the cloud.

On the amount of data that banks process and manage and whether that could be a deterrent for cloud adoption...

Be it cloud or not, data quality and data maintenance is going to emerge as a critical function. Dirty data and redundant data is being identified as having considerable impact on the profits of the organization. Tools have emerged in assuring data quality, data de-duplication and master data management. Computing hardware and related technologies like virtualization has made vertical and horizontal scaling very easy and thereby making the usage of these data intensive tools a possibility.

We both enjoyed this conversation and I am sure, you would also enjoy reading this. 

Friday, December 16, 2011

Debugging a performance problem


As with any typical Application development, performance is mostly conveniently ignored in all the phases of the development life cycle. In spite of it being a key non functional requirement it mostly remains undocumented. It is more so, as the development, test and UAT environments may not really represent the real world production usage of the application as some of the performance problems could not be spotted earlier. Even if the application is put to load test, there are certain in the production environment, like data growth, user load, etc, which may lead to performance degradation over a period of time.

While most performance problems could easily be spotted and resolved, some could be a challenge and may require sleepless nights to resolve. A structured approach may help addressing such issues within reasonably quicker time frame. Here is a step by step approach which should work in most cases.

1.       Understand the production environment

It is important to understand the production environment thoroughly so as to identify the various hardware & networking resources and the middleware components involved in the application delivery. In a typical n-tiered application, it is possible that there could be multiple appliances and servers through which a requested passes through and get processed before responding back to the user with response. Also understand which of these components are capable of collecting logs / metrics or capable of being monitored in real time.

2.       Understand the specific feedback from the end users

Gather details like who noticed the performance degradation, at what time frame, whether it is repeating at pattern or just pulling the system down. Also understand if the entire application is slowing down or some specific application components are not performing. Also try to experience the problem first hand, sitting alongside an end user or if possible use an appropriate user credentials to experience the performance issue. The ‘who’ also matters as in certain circumstances, the application slow down may be for a user associated with some specific role as the amount of data to be processed and transmitted may differ based on the user role.

3.       Review available logs and metrics

Gather available logs and metrics data collected by various hardware and software components and look for information that could be relevant to the specific application, or more specifically the set of requests that could demonstrate the performance issue. As Logging itself could be performance overkill, it would be ideal to switch off the logs or to set it to collect only minimal logs. If that be the case, configure or effect necessary code change to achieve appropriate level of logging and then try to collect the required details by re-deploying the application on to a production equivalent environment.

4.       Isolate the problem area

This step is very important and could be very challenging too. Take the help of developers and performance and load testing tools, to simulate the problem and in the meanwhile monitor for key measurement data as the request and response pass through various hardware and software components.

By analyzing the data gathered from the application end user or out of the first hand experience, and with the available logs and metrics try to isolate the issue to a specific hardware or software component. This is best done by doing the following step by step:

a.       Trace the request from the UI to the final destination, which typically may be the Database.

b.      If the request could reach the final destination, then measure the time taken for the request to cross various physical and logical layers and look for any information that could cause the slow down. If a hardware resource is over utilized, it could so happen that the requests would be queued up or rejected after a time out. Look for such information in the logs.

c.       Then review the response cycle and try to spot the delays in the return path.

d.      Try the elimination technique whereby, the involved component one after the other from the bottom is cleared of performance bottleneck.

Experience and expertise on the application and the infrastructure architecture could come in handy to spot the problem area quickly. It is possible that there could be multiple problems whether contributing to the problem on hand or not. This situation may lead to shift in focus on different areas resulting in longer time to resolve the problem. It is important to always stay in focus and proceeding in the right direction.

5.       Simulate the problem in Test /UAT environment

Make sure that the findings are correct by simulating the problem multiple times. This will reveal much more data and help characterize the problem better.

6.       Perform reviews

If the problem area has already been isolated in any of the steps above, then narrow the scope of the review to the components involved in the isolated problem area. If not, then the scope of review is little wider and look for problem areas in every component involved in the request response cycle. Code reviews to debug performance issues require unique skills. For instance, looping blocks, disk usage, processor intensive operations could be the candidates for a detailed review. Similarly, in case of distributed application, look for too many back and forth calls to different physical tiers could easily contribute to performance problem. Good knowledge on the various third party components and Operating System APIs consumed in the application may sometimes be helpful.

When the problem is isolated to a server and the application components seem to have no issues, then it might be possible that any other services or components running on the server might cause load on the server resources there by impacting the application being reviewed. If the problem is isolated to Database server, then look for dead locks, appropriate indexes etc. Sometimes, lack of archival / data retention policies could result in the database tables growing in a much faster pace leading to performance degradation.

7.       Identify the root cause

By now one should have identified the specific application procedure or function that could be the cause of the problem on hand. Have it validated by doing more simulations and tests in environments equivalent to production.

8.       Come up with solution

It is just not over yet, as root cause identification should be followed by a solution. Sometimes, the solution to the problem may require change in the architecture and might have a larger impact on the entire application. An ideal solution should prevent the problem from recurring and at the same time it should not introduce newer problems and should require minimal efforts. Alternatively if the ideal solution is not a possibility with various constraints, a break-fix solution should be offered so that the business continues and also plan for having the ideal solution implemented in the longer term.

Hope this one is useful read for those of you in production support. Feel free to share your thoughts on this subject in the form of comments.