Dark data is the kind of data that does not become a part of the decision making for organizations. This is generally the data from logs and sensors and other kinds of transactional records which are available but generally ignored. The largest portion of the yearly big data collected by organizations is also dark data.
Dark data does not usually play a vital role in analytics because:
- Companies do not want to use their bandwidth on additional data processing
- There’s a lack of technical resources
- Organizations do not believe dark data adds any value to their analytics
All of these are valid reasons for the data taking the back seat. But today we have a string of data-centric technological advances. Together, they present a heightened ability to ingest, source, analyze, and store large volumes of data. With that, it becomes important for organizations to recognize this largely untapped volume of data.
The conventional way to use this data would be to systematically drain all of it into a waterhouse of data. This is followed by the identification, reconciliation, and rationalization of the data. The reporting follows soon after. While the process is pretty methodical, there might not be as many projects that truly call for such a need.
The Immense Volume of Dark Data in Enterprise
At the moment, we have solid evidence to suggest that as much as 90% of all data used in enterprises could be dark. Since industries are now storing large data volumes in the ‘lake’, it should be natural to tag the data appropriately as it gets stored. Perhaps the key is to extract the metadata out of this data and then storing it.
Profiling and exploring the data can be done using one or a combination of tools that are already available in the market. Cognitive computing and machine learning can further increase processing power and open up possibilities of making intelligent use of dark data.
Dark data may or may not have an identifiable structure. For example, most contacts and reports in organizations are structured. But over the course of time, they add up to the pile of dark data. Unstructured data can be small bits of personally identifiable info like birth dates and billing details. In the very recent past, this type of data would remain dark.
Machine learning can help organize this data in an automated manner. It can then be connected to other attributes of data to generate the complete view of the data. Using geolocation data is slightly trickier though. While it is extremely valuable, the lifespan is rather short. A collection of historical geolocation data sets can be further leveraged using machine learning to aid in predictive analysis of data.
Recognition of regular data as dark data
Other sets of data often considered “dark” in the past include data from sensors, logs, emails, and even voice transcripts. The longest stretch they would get in terms of application would be vested in troubleshooting purposes. Not many would look to make such data a part of actual decision making. Now that we can convert voice or text (and vice versa) and use the data to gather intelligence, there are many use cases that draw advantage of data traditionally considered dark.
An IDC estimate suggests that the total volume of data could be somewhere close to 44ZB (zettabytes) in 2020. This data explosion will be influenced by many new data generators like the Internet of Things. And unless we light up this data with new technology and processes, a large volume of it will continue to stay dark.
The first and obvious step will be to make all the dark data available for exploration. The second step is to categorize the data, scrape out the metadata and do a quality check for all the extracted data. Modern tools for data management and data visualization provide the ability to explore the data visually. This determines whether or not the data can be illuminated to remove the visual noise.
The myriad advances in Artificial Intelligence (AI) will definitely aid in uncovering the secrets of the oft-ignored “dark data”. However, the trick is still in using the data prudently. Wrong use of data will inadvertently result in incorrect predictions and may invite regulatory sanctions.
The vastness of dark data demands handling by Big Data and AI experts. In addition, there needs to be a clear plan about the application of the data once it is sorted. At Futran Solutions, we work with a pool of incredibly talented Big Data and Artificial Intelligence experts who can help your organization make the most of dark data. Contact us today to talk solutions in big data and artificial intelligence.