Wednesday, October 07, 2009

Azure Adventures

If you are going to be using Azure, make sure your SQL Server instance isn't on a compressed drive. Also, running in Windows Advanced Server is much easier than trying to juggle all these pieces on Vista. If, like me, you already have an instance of SQL Server you need to use the DSINIT tool to get storage initialized before trying to run anything. It's not a big deal just use dsinit /sqlinstance:. from the azure sdk bin directory. One trick is to use the period for an unnamed instance. If, like me, you are running Advanced Server, you need to also run Visual Studio in elevated-mode if you plan to debug at all. And who's kidding, you have to debug right? I've been playing with the SQL Azure CTP as well and I see some good things in store. I'll post some more useful learnings as I port more code into the cloud.

Tuesday, July 07, 2009

Packaging Contracts for Services

In working on a green field system recently (green field means it's new from scratch development) the discussion for a series of services turned to the subject of which type of data interaction should be used. Specifically, should the service be designed using message contracts or data contracts?

Since it became clear that the basic inflection points were not well understood, I had to hold an impromptu clinic. After we finished, I wrote down the salient points and turned it into this post.

The first thing to consider when choosing packaging formats is whether the messages need to be conformed. You see, many technologies don't comply with the published contract specifications choosing instead to rely on the lowest level XML serialization for interchange. If these messages already are in use, or you intend to interoperate with a system that has such restrictions, the choice is usually made for you.

If you are playing green field (as in the conversation sparking this post) it helps to understand that data contracts have good portability which makes them widely applicable. Message contracts are less capable and therefore can have less portability .

So the less conformant you are required to be, the more reason to choose data contracts. The more conformant your requirements, the more likely you will need message contracts or even base XML serialization.

Tuesday, June 02, 2009

What's The Hold-up?

You know you are having one of those days when you're writing code like this:

select top 5
(select substring(text, (statement_start_offset / 2 ) + 1, ((case when statement_end_offset = -1 then len(convert(nvarchar(max),text)) * 2 else statement_end_offset end ) - statement_start_offset ) / 2 )
from sys.dm_exec_sql_text(sql_handle) ) AS statement
,last_worker_time
,total_worker_time
,last_elapsed_time
,total_elapsed_time
,execution_count
from sys.dm_exec_query_stats
order by last_elapsed_time desc


This technique of pulling from query stats is great for finding those nasty performing culprits, especially in environments when you can't just attach a profiler.

In case you were wondering, I use queries like this instead of sp_who or sp_who2 because it's specifically grabbing the statement that is executing not just the batch.

Thursday, February 12, 2009

Cyclomatic Complexity

Much of my day is spent convincing engineers that there are simpler ways of implementing their designs. Much of the rest is convincing executives and managers that what they want isn't simple at all.

Invariably some little hot-head tries to argue with me that implementing a general case is more complicated than a special optimized case. If the punk in question has cracked a book they'll likely bring up some metric like Cyclomatic Complexity.

Thomas McCabe socialized this metric in the 70's as a mechanism for predicting where code would have future maintain problems. It's often thrown into the blender when people are trying to construct qualitative measures for code quality. Here's the formal equation:

CC = E - N + P

E = number of edges N = number of nodes P = number of connections or exits

It can be hard to see how equations like this apply to code so let me say it a different way: Cyclomatic Complexity represents the number of code paths through a section code.

How do you count the code paths? Start with one for the entry into the code, and add one more for every decision point. Decision points are the control of flow statements like If…Then…Else or Switch. The CC for the following snippet is three (3).

public int adjustRange( int rangePrimary ) {
bool adjustedValue = 42;
if ( rangePrimary == 6 ) {
adjustedValue = 42;
} else {
adjustedValue = 9;
}
return adjustedValue;
}

This example was really simple but is illustrative of how the CC may be calculated.

So how complex is complex? As with most metrics, the interpretation is very subjective. Since the higher the number, the more complex the code, it stands to reason that code with higher numbers would be more prone to issues. Perhaps bugs because of the logic convolutions, or issues with maintenance because of the number of things impacted by even simple changes. In a sweeping generalization I can say that for my reviews, code with values higher than 20 would be suspect, and higher than 30 would likely not pass.

The important thing is to realize it isn't a measure of quality, just an indicator where better organization or testing might be advisable. Personally, I tend to gauge an engineers competence inversely to the CCs of the code they routinely produce. In my experience, engineers who write better code routinely produce code with low CCs and vice versa.

Friday, February 06, 2009

A Goofball CEO and a Silly Journalist

If you run a major international company, you shouldn't be dropping cuss words in public. IMHO.

In a recent article, the SAP co-CEO Leo Apotheker was quoted making some of the dumbest CEO comments I've ever heard. I wonder what how big his *ahem* bonus must be for him to so blatantly disregard common sense.

The follow-on article was equally hilarious. It shows how much pressure the Old Timers are under to stay relevant. Add this to the recent absurdity of the Satyam Scandal and it becomes clear some house cleaning is in order.

I found the most insightful comment in the follow-on to be:
The core of most cost/time overruns stems from CIOs committing to ERP, but middle managers insisting after implementation is already well underway that the software be changed to accommodate legacy business processes rather than the other way around. -- Eric Krangel

Amen, brother.

But wait, it doesn't stop there. You have to read the comments too. Here's my favorite:
Accenture made me rich when we IPO'd it in 2001.
I'm willing to forgive all else.

-- Maurice (commented on the Alley Insider article)

That pretty much sums up how the Bozo Effect is getting worse. I think it's time to go back into retirement. Again.

Or maybe a start-up? Hmmm...

Friday, January 16, 2009

Priming For Performance Testing

One of the last few endeavors I've had involved planning for some performance testing of the critical application stack for extremely large volume business. Specifically, they were looking to make some architectural changes because of performance and needed to know which ones to make. Which explains why I was in a room with a bunch of big-wigs yet again explaining why the performance test is more than throwing requests/transactions/load at the system and seeing what breaks.

After quite a bit of round-and-round, it became evident that I had to bring us back to basics and get on the same page. This is hardly new and I find myself often giving impromptu primers on Performance Testing. This posts covers the basics and illustrates a simple example of how to apply them.

Let's get started. . .

When it comes to performance testing, there are two generally accepted types. You can Prove, or you can Predict. These are not equivalent. The names may be self-evident but let me give some quick examples to clarify.

If you are seeking to Prove, you put load on a system and determine that the system can in fact handle the load within set constraints. For example, it handles X requests using only Y memory and Z processing units with Q response time. You know it to be true because you actually did the work. It's not simulated work, there aren't stubs, there is no approximation involved.

If you are seeking to Predict, you put load on the system under constraints in such a way as to understand how the system will react to changes in either load or constraints. For example, the system performed X requests with Q response time utilizing Y memory and Z processing units. Further it performed (X*2) requests with Q response time utilizing (Y*2) memory and Z processing units. Therefore I predict the system can perform (X*3) requests with Q response time utilizing (X*3) memory and Z processing units. Obviously it is never this simple as there are many ways that variables interact, but you get the idea.

As part of the discussion, it became clear that an understanding of the basic math involved would be helpful. Naturally you can't expect a bunch of execs to sit still for a calculus lesson. But it was possible to give them a simple example to demonstrate how complicated even the "simple" vectors can become very quickly. So we walked through using a very simple statistical probability formula to calculate a Poisson distribution for the number of concurrent users.

The simplest form of this generally only uses three variables:
  1. User Population
    This is the population or total sample size. Just because there are billions of people in the world, not all of them will be using your application. Hopefully.
  2. Session Length
    This is a measure of how long the operation that each user performs. As you can imagine, this one is hard to simplify and in sophisticated models is the quickest to require much more work to derive the real value.
  3. Availability Window
    This is a measure of the time range in which the application will be available for use. Usually you want to exclude maintenance windows, or perhaps reduce this to only include normal business hours.

Using these variables, we can create a formula c = (p * s) / A to allow us to find a concurrency distribution.

c = Concurrence
p = User Population
s = Session Length
A = Availability Window

So for example, there are 2000 employees who have access to the portal. Each user spends an average of 7 minutes submitting an expense report. They only do this from work during normal business hours (9am-6pm or 9 hours).

Therefore the likely concurrency is (2000 * 7) / 540 or 26 users in a 7 minute session using the application.

These are of course just numbers by probability, not proof. But it would help you figure out by taking further percentages, what your concurrency expectations might be.

This is really just a small scratch on the surface of a very large topic but it helped to demonstrate how large and complicated it can be. Which is why you should really on expertise and not guesswork. And why things that appear really simple in models like this, typically are not.

Friday, January 09, 2009

A Truth Revealed...

If you've done any offshore work, and I do mean really done the work not just been near the work, you know what a house of cards some of the industry players can be. Finally it seems we are getting more details on exactly how messed up this market can get.

Just recently the head of one of the larger organizations, Satyam Computer Services, released a letter detailing the ongoing fraudulent activity that was riddling the corporate books. If you've ever competed against these guys you have invariably asked yourself, "How are they making money?" as they priced you out of a deal with ridiculously low costs. Now you know: They Didn't.

You can read more in the New York Times and elsewhere on the web.

The near term impact means that there will be quite a few companies including GE, GM, Nestle, Caterpillar, Coca-Cola, Microsoft, Pfizer and the US government who will be looking for new IT providers pretty quick. The real downside is that those companies have been budgeting for years using unrealistic offshore rates. Shifting that burden to more realistic rate schedules on short notice could be quite a jolt to many corporate bottom-lines. The ripple effect is that it will take money away from advancements, growth and sustainability just when we need this type of spending the most.

It's a fun thing to be able to say "I told you so." when you find out a company who consistently undercut the market in extremely detrimental ways finally comes clean. Unfortunately, it's not very satisfying. You see, we have the same experience in many markets and segments of our economy. Rather than paying what something should be worth and recognizing that money spent is money that feeds our nation, we continually strive to cut costs and do things on the cheap. Commoditizing high-tech doesn't help anyone, anywhere. Not in this country, and not in the country you are moving the work too. And besides not helping in the short-term, the long-term effects of depressed rate structures create unrealistic expectations and remove incentives for growth. They have ripple effects throughout the economies and cultures involved and can do irreversible damage when reality finally does come crashing down.

This situation is not unlike the consumer credit crisis or the disease that is WalMart. If people think they can pay less, they will. Even if it isn't best for them or anyone in the long-term. In many cases, even if it isn't best for them in the short-term! That's why people spend multiple dollars in gas to drive to a further store and save a few pennies. That's why people buy organic food that has to be shipped from another country to a horrendous environmental detriment instead of recognizing that industrialization is VITAL to a sustainable civilization. But I digress.

In the future, we won't always get such a clear explanation of the full cost of those ludicrous deals we take advantage of every day. Sometimes it isn't just plain-old fraud that's at the root. But there are always costs and impacts and we should try and understand the big picture instead of being naive consumers. Remaining ignorant about the supply and service chains that provide everything we consume and rely on is a surefire way to create an even bigger mess in the future (like we need a bigger one than gas prices?).

Next time you see a deal that seems to good to be true, remember the old adage and realize: It Is.