Big Data Analytics and Document Management

Big data analytics have seen a huge growth in the past few years. Fueled by technology innovations that have made the collection, storage, and processing of huge amounts of data, companies have leveraged their data to try to improve their efficiency, customer experience, market impact, etc. Now that the technology has become so pervasive, companies are leveraging it with their own document management systems (DMS) to glean more insights into their internal and external workings.

Caution should be taken however, according to the authors of a recent paper in Science. The authors use Google’s Flu Trends as the primary case study. Google Flu Trends purports to show and track flu outbreaks around the world. It does this primarily by tracking search terms related to the flu as a proxy for infected individuals. Their reasoning is that as a person gets sick, they are very likely to go online and search for symptoms, treatments, and general information about the disease. Ideally, because Flu Trends is updated in near-real time, it should work as a kind of early warning system. The definitive data on flu outbreaks, however, is collected and aggregated from health care facilities around the world which can take significant time. In reality, the authors say, Google’s proxy is not accurate and therefore the analysis and its results are incorrect.

The reason that the proxy is not accurate is because it is based on the assumption that people almost always search for flu-related terms only when they, or someone they know, is getting sick. The reality is that people also tend to search for flu related topics when flu is in the news. If there’s a major outbreak, or news outlets talk about the predictions for the upcoming flu season, more people will search for flu information. Flu Trends will then take that uptick in flu searches as an indicator of an outbreak of the infection.

The upshot of the paper is not that Google’s Flu Trends (and by extension, Big Data) is terrible and should be removed. Actually, they admit that it provides helpful data in addition to the official data from the Center for Disease Control (CDC). The point the researchers want to make is that, counter to some claims out there, big data and its analyses are not a substitute for more traditional forms of data collection and analysis.

What this means for users of document management systems and other big data systems is that the analysis of their data is not perfect and is not necessarily accurate. Many companies use analyses on their data sets to do their thinking for them. They use it as a new, fully automated way to find correlations amongst the data. As many of us know however, correlation does not equal causation. Without thinking critically about what the data contains and what the analyses actually show, companies can be lead to incorrect conclusions.

In summary, leverage the big data in your document management systems, but carefully use the analyses these systems can perform to supplement more traditional methods, not replace them.

Leave a Reply