Thursday, January 14, 2010

Why Is The Build, The Build?

Someone asked me recently about why I structure builds the way I do.

For background, I should explain that I have a reference SQL build that I've been using for many years. It uses command files (.cmd) to iterate over a directory structure and create (or re-create) a database. It has no dependancies, works with integrated or sql server security, logs outputs, etc. Pretty much the usual.

In this reference build, the structure is broken down so that tables are separate from keys and defaults and constraints and indexes, etc. Procedures are broken down by type, views are seperated similarly. The configuration scripts for users, data, etc. are all broken down as well. There are dynamic drop scripts that tear everything down before attempting to build everything back up.

There were four specific concerns that came up about why it's structured the way it is that were raised. And it took some time go through the rationale. I figured I would capture the details of the thinking for you in this post.

The first was about using granular scripts. After all, most tools, and many shops just merge artifacts into either a big script, or lumps of connected objects. For example, merging keys and constraints into tables. The second was about the use of the tear down scripts. If all the scripts are designed to create objects and not perform alters, why the need for tearing down en masse at the start? The third concern was about how we handle errors. Specifically, the build only outputs errors from every script, and doesn't actually stop the build. Lastly, the placement of indexes within the build, before the data load was questioned. Typical data loading would want indexes removed so that the load happens quickly and then indexes can be restored with the server online. These are all great questions.

To address these concerns we need to set the expectations for the build process and how it fits into a larger team effort. The core expectation is that these scripts will be used by an automated process deploying a database repeatedly into multiple environments as well as by individual engineers who want to deploy locally or alternatively as needed. It is also expected that the scripts will be managed by a larger body of engineers as opposed to a single engineer or team who will manage all scripts.

We can certainly discuss why having a single definition for the build or being able to leverage the larger team model for script development is advantageous or relevant at this level, but for this discussion these expectations are foundational.

Given these expectations it becomes important to understand as early as possible, as many errors as possible that exist in every build. Essentially, since there are potentially many people making many changes, all present in a single build, we would want to uncover all the problems in a single pass because the different issues have a good chance of being unrelated. For example, changing an index or a table default won't affect whether a procedure will build so we want to catch all these errors in one pass.

This also translates into loading the data before the indexes. If there is going to be a problem with the data because of an index, you would want to find it where the data is managed, not where the index is managed.

Further, by using granular files, individuals change independent elements like keys, constraints, defaults, and so forth and the success or not of that change will be tied to a particular file which can be tied to a particular engineer and a particular change. In short, we want to know about all the errors as early as possible, but we also want to be able to pinpoint the source and responsible party for every error as efficiently as possible too.

So a quick review:
  1. Use Granular Scripts - so we can detect all errors independently and pinpoint the source and responsible party as quickly as possible.
  2. Use Tear Down Helpers - so engineers who are working locally or in alternative environments can restore to known point as quickly as possible with as little effort as possible.
  3. Don't Stop On Errors - so we can find all errors independently with a single run of the build.
  4. Create Indexes Before Loading Data - so we can find errors in the data load and pinpoint the source and responsible party as quickly as possible.

Hopefully it is clear how the specific patterns support the expectations.

No comments: