The idea of amassing huge amounts information is an attractive theme for many organizations. But what’s often overlooked is how all that data, while a potentially powerful asset, can also become a big liability. Collecting information carries with it both a moral and a monetary responsibility since data needs to be cleansed of private information not needed for analyses, as well as safeguarded, and eventually disposed of properly. But what I would call “healthy” data practices are lacking at many companies. That arises from a disconnect between the incentives and risks of data harvesting. While the benefits of data mining can be quantified and realized in the short term, the risks take longer to appear and are harder to measure.
A good case it point is how organization try to make data anonymous. Often attempts aren’t even made, but when they are efforts often fail because there’s little awareness of the context in which the data lives. Take the case of Massachusetts' HIV patient data that was “anonymized” and then released to the researchers with patients’ ZIP codes and dates of birth as the only identifiers. Problem was, however, that public voting records which contain both of those data points, made it easy to match up names for many of the patients.
In general, businesses look at the merging of data sets—much like what happened in Massachusetts—as an opportunity. In combined data sets, the value of the amalgamation is often greater than sum of the parts. Consequently, companies accumulate low-value or even seemingly useless data on the theory that at some point in the future it will become more valuable as circumstances change or when another data stream can be merged with the information. This compulsion to harvest not only leads to a higher risk of data leakage, since large stores of data are tough to safeguard, but the sheer IT overhead of maintaining and storing the information is high.
It’s incumbent upon organizations to have a better idea of the long term risks of data harvesting. One key point is to realize that information, unlike physical assets, is nearly impossible to recapture once it leaks out. A single glitch in data security can expose large quantities of data to the entire Internet for years.
What to do about this? A critical first step is to create a greater awareness among IT decision makers about the full life-cycle cost of data. Second, stricter governance over data is needed with well defined end-to-end responsibilities from data gathering to disposal. Optimally, all organizational units would make their information needs transparent to a central data privacy function where usage processes are defined, and risk assessments of various usage scenarios are developed. Much like computer security, a trade-off must be struck between data privacy and businesses' needs for information. A sensible evaluation of the overall risks and costs of processing data will allow companies to find finding the optimal point on this trade-off curve.
Data is an important asset. But the full cost of mining and protecting it needs to become part of the executive mindset.