Data-driven analysis has been the buzzword in business for years now, and most informed business owners understand that they need to adopt data-driven practices if they want to succeed and compete in today’s market. They have probably even gone so far as to invest in technology that captures data in a variety of ways and may have plans to continue data collection expansion in the future.
Many of them, however, don’t quite know what to do with all of that data. The more data that you collect, the more difficult and unwieldy it becomes to navigate. Data is only as useful as it is usable, and data that is inaccessible or disorganized is not much better than having no data collection at all.
What is the solution to this problem? A data catalog, and you can expect to soon hear a lot more about them as a necessary tool for any business attempting to acclimate to today’s data-driven world.
What does a Data Catalog do?
To put it simply, a data catalog makes data organized and accessible for the teams and individuals who need to use it to make decisions. One way to explain it is that a data catalog provides a knowledge graph of information. While the term “knowledge graph” may or may not be familiar to you, you have most certainly seen one in action—probably on a daily basis.
The most familiar example of a knowledge graph is Google’s search results. You’ve likely noticed that many search terms result in a box of information that is aggregated data from multiple sources. For example, let’s say that you are searching for the name of an actor to see more about her performances and filmography. If you type “Ellen Page” into Google’s search engine, you’ll get the typical list of results based on Google’s algorithm, but you will also see a box of information that’s not linked to any one specific page.
This box contains lots of different information. It has multiple photos of Page, a brief biography (linked to Wikipedia), where she was born, her height, her nationality, her spouse, and her education listed. It also includes thumbnails of her most popular film and television appearances as well as links to her social media accounts. At the bottom, there are thumbnails of other actors labeled “People also search for.” All of this information is presented in an easy-to-read format that makes it incredibly quick to scan.
While this box of information may not seem particularly impressive on its face, it is precisely its unassuming quality that makes it so important. This box represents Google’s knowledge graph, an algorithm that was able to scan data sets of multiple types and aggregate them into one incredibly user-friendly format that puts the most important and frequently needed information front and center. It also predicts what kind of behavior someone using that search term might have next, creating efficient and effective pathways through complex sets of data without any additional effort on the user’s part.
How can a Data Catalog help businesses?
Think about the different kinds of data a business typically collects. There are spreadsheets full of sales records, databases filled with customer information, unstructured content like policy and procedure handbooks and internal memos, and so much more. This information is not only massive in its scope, but it also has to be accessed by people who need it for different reasons and in different formats. The marketing team may look at the exact same information as the product improvement team, but what they see will be filtered through their particular lenses and purposes.
A data catalog helps to take this expansive and often overwhelming collection of data and turn it into a user-friendly format that provides the necessary information with ease. Data catalogs use knowledge graph technology to combine, connect, and filter data from a variety of different sources. The results are beneficial for businesses in many ways. It saves time both because the data is able to be accessed more quickly and because team members need less training to use it. It also capitalizes on the past data collection investments, ensuring that all the effort and resources used to capture important data can be fully utilized.
Put simply, a data catalog is a way to make your data work for you instead of the other way around.
As Director of Enterprise Analytics, James helped Thomson Reuters establish data management capabilities and an enterprise-wide analytics competency.
Latest posts by James Nanscawen (see all)
- The Case for Data Democratization: Data Scientists Can’t Keep Up . . . Nor Should They - July 14, 2019
- What is a Data Catalog and Why Do You Need One? - June 26, 2019
- How Satya Nadella used Data to Reignite Microsoft - March 18, 2019