The card catalog consisted of wooden or metal cabinets that contained rows and rows of small drawers, each of which contained hundreds of cards. A single card listed the author, date published and other pertinent information about a book. The cards were then arranged, in alphabetical order, within the drawers by author, title, subject matter and other indexes. Importantly, the card also listed the location number of the actual book in the library. Even though most library catalogs were computerized in the 1980s’, card catalogs continued to be used (primarily as a backup system) and thus were printed in the United States until 2015.
I mention this brief history lesson to illustrate a point about the function of a catalog. Although some card catalogs or even card catalog rooms were or are quite ornate (e.g., the Bill Blass Public Catalog Room at the main Manhattan branch of the New York Public Library), that’s not why you go to the library. You go to the library to get a book. The card catalog was simply the means to find and get the book.
So, just like the old paper card catalog told you where the book was located so you could go get it, the data catalog should do more than just make it easy for users to explore what data is being stored within the enterprise. The data catalog should also make it easy for the user to get that data.
A typical challenge for many data analysts and scientists is the length of time it takes for them to acquire data from a new source. Many times, the only way to get this new data is to submit a request to IT or the BI team and then wait. And wait. And wait. When the data is finally delivered, it sometimes isn’t exactly what the user expected. And so, the whole request process must be repeated.
A data catalog will at least partially solve this issue by enabling data analysts and scientists to explore, understand and find for themselves what data is being stored throughout the company. No more submitting a request and waiting! But, analysts also want to use the data, not just look at it. That’s why it’s also important that the data catalog be able to move a copy of the desired data directly into their BI tool or other applications, so they can immediately begin to uncover insights within the data.
Just to be clear, I think It’s a good thing for a data catalog to enable easy searching and exploration by having lots of characteristics defined against datasets and data fields. But that shouldn’t be the end goal. No one went to the library to just admire the card catalog.
New to the topic of data catalogs? This video gives a brief overview.
Or click here to learn more about Qlik’s catalog capabilities.