Recently I was asked for my opinion on the subject of enterprise indexing. This is usually lumped in as part of the enterprise search space but there are quite good reasons that indexing is being treated more as a first class citizen these days. Since I made time to write some thoughts down, however crudely, I figured I might share it here.
Summary
Enterprise indexing is an aspirational goal but has generally found to not be practical for enterprises of significant size. Focusing on isolated but consistent patterns for indexing individual organizations within the enterprise will generally unlock more value and gain substantively more adoption. I have found the generally accepted approach is to begin with this isolated indexing of individual organizations. This is seen as a first step on the journey to enterprise indexing. We just don't mention it's a never-ending road.
Opinion
Using a search index to surface data and make it widely and easily accessible is an imperative for any business. However, “enterprise indexing” as we typically understand it, proves to be mostly impractical. There are three primary reasons for this impractically.
The first is because the vast majority of corporate data is sensitive by nature. Consider how many business systems actually allow open access across the enterprise. There are many reasons for this information containment, most of which are immovable.
Moving beyond the sensitivity concerns to the second point, it is also understood that the vast majority of corporate data is useless without significant context. Consider how many acronyms, slang words, abbreviations, and specialized definitions exist in the typical organizational vocabulary. Even given the relatively new technical ability for tools to automatically discern the presence of this context does not address the massive effort to create correlations and harmonize these contexts between organizations. For example, an organization 1 that serves the public and is served by another organization 2. They both refer to clients/customers but one means the public and one means employees. This is just a simple example, what if sometimes organization 1 served employees and public? These are real issues that compound the problem of context.
Lastly, organizations operate in silos for practical reasons as well as sensitivity (ie. the need to govern the flow of information). For example, consider how many forecasts and versions of forecasts roll up in the typical capital planning process. Without the ability to understand what data is correct or accurate when the volume is hierarchical and broad, the reliability of the index for answers goes way down and therefore the relevance to users and therefore adoption.
Having expounded on the challenges of doing it for an enterprise doesn't excuse the necessity to do within organizations inside the enterprise. Every organization across an enterprise should utilize consist toolsets and processes for indexing and searching as a first step. Only when this is happening will an enterprise index have the context to provide relevance and security to the information being indexed. As the patterns to publish and consume matures, information quality can begin to mature which may ultimately allow for these isolated indexes to be aggregated and provide a wider scope.
Because of the complexities, the typical consultative answer is to start by rolling out these "enterprise" index and search tools to one organization at a time and then to bring others on incrementally. While an idea of merit and suitable for smallish enterprises or when the desire for information transparency is actually quite shallow, this typically hits a few harmonizing and integration snags and the effort stalls. For example, failure to provide necessary context makes results unusable to users so adoption stagnates. Or a sensitivity issue is uncovered and the whole thing is scrapped for being too risky. And so on and so forth. If it were a consistent and ongoing part of your enterprise strategy and operational initiatives for information management it can easily thrive. This requires that it be treated as an on-going operational concern that requires sustained investment and effort. Enterprises that don't acknowledge this and embark with their eyes open and clear expectations will inevitably give up. Which is pretty much the current state for most of them. With a little digging you find this is true even for those who say they do it well. The true user adoption and penetration numbers, the actual productivity metrics rarely tell this same story.