Saturday, January 9, 2010

Machine, application or database ?! Where's the bottleneck?

When a bottleneck happens how can you know whether
  • It's an application problem
  • Machine is undersized
  • SQL requires optimization
  • Database is mis-configured
Let's look at some graphical examples of each of these cases.
What do you think the issue is on database below?
First let's take a look at the components of this screen.
The screen has 6 important parts
  1. Databases
  2. Load of selected database measured in Average Active Sessions (AAS)
  3. Maximum CPU line
  4. Top SQL
  5. Top Bottlenecks
  6. Top Sessions
First, on top left, is a list of our databases we have registered.
Second and most important for our purposes of performance monitoring and tuning is the load of the database measured in a single powerful unified metric Average Active Sessions (AAS). AAS cans easily and quickly show a performance bottleneck on a database when comparing the AAS value to our yardstick, the maximum CPU line. The line "Max CPU" represents the number of CPU processors on the machine. If we have one CPU then only one user can can be running on the CPU at a time, and if the AAS line shows more than one user on CPU, then we know that the second user isn't really on CPU but actually waiting for CPU. When CPU is the bottleneck and users are waiting for CPU, then how do we know whether we should add more CPUs to the machine or if we should spend time trying to tune the application? The question of buying a bigger machine or tuning the application can become a huge debate wasting vast amounts of time in meeting after meeting, but with the profiling screen it's easy to determine clearly without a debate.
In the above chart, do you think it's an application problem or a machine resource problem?
In order to tell we have to find out where is that demand coming from. To find out where the demand is coming from we can look at Top SQL and Top Session tables below the load chart. In our case shown here the load of well distributed over all sql in Top SQL and all sessions inTop Session. There is no outlier or resource hog. In this case it's the machine that's underpowered. What does a case look like where we should tune the application?
In this case, again the CPU demand is more than the machine can supply but if we look at "Top SQL" we can see that the first SQL statement (with the large green bar) uses up much more CPU than any of the rest, actually 60%! If we could get it down to 10% CPU then we'd save 50% of the CPU usage on the machine ! Thus in this case it's worth our while to spend a day or week or even a couple weeks trying to tune that one SQL statement instead of buying a bigger machine.
Finally, how do we know when the database configuration is a problem? We know it's a configuration problem when we are seeing something other that CPU as the bottleneck in Top Bottleneck section. Here's an example
In this case we can see the load is higher than the Max CPU line but the load is coming from brown colored bars and the the green CPU colored bars. If we look at Top SQL we see that there is only one SQL taking up almost all the load, but it's not because of CPU which would be a green bar, but some other color. What does this other color represent? We can look at the Top Bottleneck section and see that it is "log file switch (incomplete)" which basically means the log files are too small, ie the database is not correctly configured. This bottleneck was resolved simply by increasing the log size.

No comments:

Post a Comment