Friday, August 02, 2013

Data About Food

I've just spent the last many days reviewing, testing, quantifying, validating, and otherwise taking the measure of the various data sources that exist for and

The USDA gives great nutrition data and a sampling of volume-metric data for a pretty wide swath. But then you have manufacturers and brands and alternative nutritional information. And you have categorization and ingredient discrepancies, all manner of portioning, form, and packaging options. And so on, and so on.

Not surprising some of the best food data comes from the online shopping hubs and support establishments who want to make it easy for you to buy through their channels. After all, who shops online without looking at pictures, even if it is just for a bottle of ketchup or can of soup?

As typically arises in these scenarios, the categorization schemes they each favor isn't particularly overlapping or supportive. They certainly aren't designed to co-exist as they have a vested interest in keeping you in their garden of data. So you go with Option A with 10k foods or Option B with a partially overlapping 50k of foods. Maybe Option A is better organized with better data and easier access. Option B might be much more expensive or just have no reliable categorization scheme. Choices, choices.

In an enterprise we would call this a Master Data Management problem. In the real-world, it's just business. Good thing I know a little about addressing these MDM issues in really big enterprises. How to rationalize 10's of thousands of foods ain't no thing if you've got the right patterns and the discipline to do a little wrangling.  I just needed to wrestle this foundation to the bedrock so that now I can take on much more interesting concerns. Like merging some hardcore analytics, massive datasets, and some truly sick algorithms all towards helping people get and stay healthy.

Let me finish this glass of scotch and I'll tell you more about the coolest new stuff we are working on in connected health...

No comments: