Friday, January 16, 2009

Priming For Performance Testing

One of the last few endeavors I've had involved planning for some performance testing of the critical application stack for extremely large volume business. Specifically, they were looking to make some architectural changes because of performance and needed to know which ones to make. Which explains why I was in a room with a bunch of big-wigs yet again explaining why the performance test is more than throwing requests/transactions/load at the system and seeing what breaks.

After quite a bit of round-and-round, it became evident that I had to bring us back to basics and get on the same page. This is hardly new and I find myself often giving impromptu primers on Performance Testing. This posts covers the basics and illustrates a simple example of how to apply them.

Let's get started. . .

When it comes to performance testing, there are two generally accepted types. You can Prove, or you can Predict. These are not equivalent. The names may be self-evident but let me give some quick examples to clarify.

If you are seeking to Prove, you put load on a system and determine that the system can in fact handle the load within set constraints. For example, it handles X requests using only Y memory and Z processing units with Q response time. You know it to be true because you actually did the work. It's not simulated work, there aren't stubs, there is no approximation involved.

If you are seeking to Predict, you put load on the system under constraints in such a way as to understand how the system will react to changes in either load or constraints. For example, the system performed X requests with Q response time utilizing Y memory and Z processing units. Further it performed (X*2) requests with Q response time utilizing (Y*2) memory and Z processing units. Therefore I predict the system can perform (X*3) requests with Q response time utilizing (X*3) memory and Z processing units. Obviously it is never this simple as there are many ways that variables interact, but you get the idea.

As part of the discussion, it became clear that an understanding of the basic math involved would be helpful. Naturally you can't expect a bunch of execs to sit still for a calculus lesson. But it was possible to give them a simple example to demonstrate how complicated even the "simple" vectors can become very quickly. So we walked through using a very simple statistical probability formula to calculate a Poisson distribution for the number of concurrent users.

The simplest form of this generally only uses three variables:
  1. User Population
    This is the population or total sample size. Just because there are billions of people in the world, not all of them will be using your application. Hopefully.
  2. Session Length
    This is a measure of how long the operation that each user performs. As you can imagine, this one is hard to simplify and in sophisticated models is the quickest to require much more work to derive the real value.
  3. Availability Window
    This is a measure of the time range in which the application will be available for use. Usually you want to exclude maintenance windows, or perhaps reduce this to only include normal business hours.

Using these variables, we can create a formula c = (p * s) / A to allow us to find a concurrency distribution.

c = Concurrence
p = User Population
s = Session Length
A = Availability Window

So for example, there are 2000 employees who have access to the portal. Each user spends an average of 7 minutes submitting an expense report. They only do this from work during normal business hours (9am-6pm or 9 hours).

Therefore the likely concurrency is (2000 * 7) / 540 or 26 users in a 7 minute session using the application.

These are of course just numbers by probability, not proof. But it would help you figure out by taking further percentages, what your concurrency expectations might be.

This is really just a small scratch on the surface of a very large topic but it helped to demonstrate how large and complicated it can be. Which is why you should really on expertise and not guesswork. And why things that appear really simple in models like this, typically are not.

Friday, January 09, 2009

A Truth Revealed...

If you've done any offshore work, and I do mean really done the work not just been near the work, you know what a house of cards some of the industry players can be. Finally it seems we are getting more details on exactly how messed up this market can get.

Just recently the head of one of the larger organizations, Satyam Computer Services, released a letter detailing the ongoing fraudulent activity that was riddling the corporate books. If you've ever competed against these guys you have invariably asked yourself, "How are they making money?" as they priced you out of a deal with ridiculously low costs. Now you know: They Didn't.

You can read more in the New York Times and elsewhere on the web.

The near term impact means that there will be quite a few companies including GE, GM, Nestle, Caterpillar, Coca-Cola, Microsoft, Pfizer and the US government who will be looking for new IT providers pretty quick. The real downside is that those companies have been budgeting for years using unrealistic offshore rates. Shifting that burden to more realistic rate schedules on short notice could be quite a jolt to many corporate bottom-lines. The ripple effect is that it will take money away from advancements, growth and sustainability just when we need this type of spending the most.

It's a fun thing to be able to say "I told you so." when you find out a company who consistently undercut the market in extremely detrimental ways finally comes clean. Unfortunately, it's not very satisfying. You see, we have the same experience in many markets and segments of our economy. Rather than paying what something should be worth and recognizing that money spent is money that feeds our nation, we continually strive to cut costs and do things on the cheap. Commoditizing high-tech doesn't help anyone, anywhere. Not in this country, and not in the country you are moving the work too. And besides not helping in the short-term, the long-term effects of depressed rate structures create unrealistic expectations and remove incentives for growth. They have ripple effects throughout the economies and cultures involved and can do irreversible damage when reality finally does come crashing down.

This situation is not unlike the consumer credit crisis or the disease that is WalMart. If people think they can pay less, they will. Even if it isn't best for them or anyone in the long-term. In many cases, even if it isn't best for them in the short-term! That's why people spend multiple dollars in gas to drive to a further store and save a few pennies. That's why people buy organic food that has to be shipped from another country to a horrendous environmental detriment instead of recognizing that industrialization is VITAL to a sustainable civilization. But I digress.

In the future, we won't always get such a clear explanation of the full cost of those ludicrous deals we take advantage of every day. Sometimes it isn't just plain-old fraud that's at the root. But there are always costs and impacts and we should try and understand the big picture instead of being naive consumers. Remaining ignorant about the supply and service chains that provide everything we consume and rely on is a surefire way to create an even bigger mess in the future (like we need a bigger one than gas prices?).

Next time you see a deal that seems to good to be true, remember the old adage and realize: It Is.