Friday, January 16, 2009

Priming For Performance Testing

One of the last few endeavors I've had involved planning for some performance testing of the critical application stack for extremely large volume business. Specifically, they were looking to make some architectural changes because of performance and needed to know which ones to make. Which explains why I was in a room with a bunch of big-wigs yet again explaining why the performance test is more than throwing requests/transactions/load at the system and seeing what breaks.

After quite a bit of round-and-round, it became evident that I had to bring us back to basics and get on the same page. This is hardly new and I find myself often giving impromptu primers on Performance Testing. This posts covers the basics and illustrates a simple example of how to apply them.

Let's get started. . .

When it comes to performance testing, there are two generally accepted types. You can Prove, or you can Predict. These are not equivalent. The names may be self-evident but let me give some quick examples to clarify.

If you are seeking to Prove, you put load on a system and determine that the system can in fact handle the load within set constraints. For example, it handles X requests using only Y memory and Z processing units with Q response time. You know it to be true because you actually did the work. It's not simulated work, there aren't stubs, there is no approximation involved.

If you are seeking to Predict, you put load on the system under constraints in such a way as to understand how the system will react to changes in either load or constraints. For example, the system performed X requests with Q response time utilizing Y memory and Z processing units. Further it performed (X*2) requests with Q response time utilizing (Y*2) memory and Z processing units. Therefore I predict the system can perform (X*3) requests with Q response time utilizing (X*3) memory and Z processing units. Obviously it is never this simple as there are many ways that variables interact, but you get the idea.

As part of the discussion, it became clear that an understanding of the basic math involved would be helpful. Naturally you can't expect a bunch of execs to sit still for a calculus lesson. But it was possible to give them a simple example to demonstrate how complicated even the "simple" vectors can become very quickly. So we walked through using a very simple statistical probability formula to calculate a Poisson distribution for the number of concurrent users.

The simplest form of this generally only uses three variables:
  1. User Population
    This is the population or total sample size. Just because there are billions of people in the world, not all of them will be using your application. Hopefully.
  2. Session Length
    This is a measure of how long the operation that each user performs. As you can imagine, this one is hard to simplify and in sophisticated models is the quickest to require much more work to derive the real value.
  3. Availability Window
    This is a measure of the time range in which the application will be available for use. Usually you want to exclude maintenance windows, or perhaps reduce this to only include normal business hours.

Using these variables, we can create a formula c = (p * s) / A to allow us to find a concurrency distribution.

c = Concurrence
p = User Population
s = Session Length
A = Availability Window

So for example, there are 2000 employees who have access to the portal. Each user spends an average of 7 minutes submitting an expense report. They only do this from work during normal business hours (9am-6pm or 9 hours).

Therefore the likely concurrency is (2000 * 7) / 540 or 26 users in a 7 minute session using the application.

These are of course just numbers by probability, not proof. But it would help you figure out by taking further percentages, what your concurrency expectations might be.

This is really just a small scratch on the surface of a very large topic but it helped to demonstrate how large and complicated it can be. Which is why you should really on expertise and not guesswork. And why things that appear really simple in models like this, typically are not.

No comments: