Wednesday, July 30, 2008

Master Data Management

Recently I had the pleasure of discussing with a prospective client the bulky subject of Master Data Management (MDM). In this situation the client was considering a variety of MDM solutions and wanted some specific direction into which technology/vendor to choose from.

Now it is no secret that I tend to favor Microsoft products but in this case the platform already in place was Oracle so naturally, I wanted to give them more general advice instead of just pushing MS MDM.

There are quite a few upfront activities that must be done regardless of which products you are going to use. I've listed some introductory efforts that need to be done earlier in the process and that can be done before production selection.

  • Governance
    To get the ball rolling, you really need to figure out what data you are talking about, where it lives, who thinks they own it now, and what organization you need to address this effort. Here are some suggestions on how to make this go smoothly:

    1. Identify External Data Ownership
      Often there will be multiple owners identified for the same data. Being able to identify these different uses in their various applications is a significant lever used throughout the rest of the process. Without a clear understanding of where the data is currently 'owned' and its criticality to various applications, mis-matched expectations will cause problems.
    2. Formalize Ownership in MDM
      This often requires significant negotiation and compromise from the various stakeholders. In organizations where the inter-organization trust level is high, this is more straight-forward. In some environments, only a top-down directive will accomplish this. Whichever tactic is used, establishing formal ownership is a prerequisite for success.
    3. Identify Data Domains
      This sounds a lot easier than it always ends of being. Attributes utilized by disparate systems often have subtleties in their definitions which must be accommodated. Being able to provide synonym mapping and transition approaches is imperative to keeping comfort levels high between organizations.
    4. Formalize Domain Administration Process
      Having an established (and hopefully standardized!) set of processes to perform CRUD operations on domains can take the stress out of relationships where trust levels are sub par between applications.
    5. Establish Organizational Governance
      This is often rolled up into Change Management or a similar phase, but in reality needs to be an ongoing activity. These organizational groupings will allow for escalation procedures, conflict resolution, and increased visibility into the data lifecycles.

  • Once you have the basic framework to talk about data ( and who owns it, how it gets maintained, etc. ) you can then delve into the specifics of the data.

  • Model Dimensions
    If governance is properly addressed, this should be a fairly straightforward exercise for data architects. They'll go through some steps similar to the following:

    1. Identify Dimensions
      What data are you really talking about? You need to pick a single name for each domain. Sometimes the same type of data is called different things in different places. You have to formalize on a common taxonomy.
    2. Identify Consuming Applications
      There may be downstream consumers beyond the currently perceived data owners. Make sure you understand the data lifecycle and flows so you can estimate impacts properly.
    3. Identify Entities Per Dimension
      Make sure you flesh out the taxonomy down to the most granular level necessary. The more detail you include now, the more hours saved later.
    4. Identify Entity Attributes
      This is a critical step that is often treated as a second-class citizen. In reality, it is a key driver. Without precision about attributes, the potential values and rules (which come next) can't be properly defined. It also means your estimates will be incorrect.
    5. Quantify Attribute Values
      This is only as important as the complexity of the data rules (which come next) and the accuracy of your rule implementation estimates need to be. If your sources have this well defined, it should be straightforward to ensure this is comprehensive.
    6. Identify Data Rules
      This is sometimes referred to ambiguously as Business Rules which is very imprecise. Here we are speaking not to rules governing operations or activities (hence the term Business) but instead the states, lifecycles, and value cohorts for entities and domains.

As you can well imagine this is a significant undertaking for any organization, and can be done in a technology agnostic way. Once you've identified and quantified this information, you can begin to look at transition plans, data flow planning and infrastructure. These are very technology and environmentally specific.

As the proverb states: "Measure twice, cut once."