If you’re looking for a data catalog, join the party. By one estimate, nearly 93% of organizations have deployed a data catalog or plan to do so. But as your purchasing decision draws closer, you might find yourself overwhelmed by the sheer number of catalogs on the market and the features they offer. That’s when you turn to a recent Eckerson CDO TechVent who focused on the popular product category.
The TechVent CDO is a brand new program launched by the Eckerson Group late last year, designed to dig deeper into specific data and analytics products, explore and compare their relative characteristics, and provide information and advice on purchasing decisions. This first event, held on December 15 – which included datanami as a media partner focused on data catalogs, a category of tools we have covered extensively in these digital pages.
“We created CDO TechVent because we know how overwhelming data and analytics technology has become,” said Wayne Eckerson, President and Founder of Eckerson Group, at the event. “Our goal is to make it easy for you to compare leading data catalog products and perhaps more importantly, to give you an idea of what’s possible with data catalog technology.”
The Eckerson Group pulled out all the stops in its first TechVent CDO. In addition to keynotes from Eckerson Group analyst Sanjeev Mohan, the event featured a panel discussion by four data catalog vendors, including Alation, BigID, Data.world and erwin. There were also virtual breakout rooms manned by representatives from these vendors and even virtual product prep that allowed attendees to compare their own needs with shrink-wrapped solutions. Over 100 people attended the sessions, a recording of which can be found here.
According to an Eckerson survey conducted just before the virtual event, only 10% of organizations have “fully deployed” a data catalog, but only 7% of organizations have no interest in data catalogs, according to Mohan, a former analyst at Gartner.
“So I think the consensus is there: data catalogs are hot,” Mohan says. “There’s no doubt about that.”
An evolving category
However, that doesn’t mean all is well in the data catalog world. Due to their legacy as computing-focused tools, there are still legacy remnants affecting certain products in the space, such as difficulty in deployment, which was a commonly cited drawback for data catalogs in Eckerson research.
Overall, however, the category is moving away from that IT-focused legacy. SaaS deployments of data catalog tools simplify complex deployments, and many vendors offer services to help their customers get up and running quickly, says Mohan.
And instead of focusing on defensive use cases such as risk and compliance, which were the primary reason for deploying a data catalog in the past, products today are used to open business cases. more offensive use for analysis and AI. That, in turn, introduced new people to the tools, Mohan says.
“Previously, only a few people actually used the data catalog,” he says, “but now we’re starting to see data analysts, data scientists, even data engineers. We’re seeing wider adoption in terms of the number and types of users using it. »
Catalogs in the past were static and primarily rule-based, whereas machine learning and AI are widely used in today’s data catalogs. Mohan noted that in a few years, getting algorithms to make the first pass on data classification won’t be a big deal. “It’s the only way to manage the large amount of data we will need,” he says.
Data catalogs are increasingly becoming a place for data teams to collaborate. Users benefit from being able to see data metrics, such as data quality scores, for business data. Integration with other data tools, such as master data management (MDM) and extract, transform, and load (ETL) tools – and increasingly with real-time data in continuous – also helps to raise the profile of data catalogs in the enterprise.
Additionally, in many tools, users can preview dashboards and other types of content that would normally be found in full-featured BI tools. “So for really fast data analysis, you can visualize the data from one source,” says Mohan. “Now, to create important Business Intelligence reports and dashboards, you still use a BI tool. But we see that there is an expansion in the scope of data catalogs.
As a Gartner analyst, Mohan immersed himself in the Magic Quadrants, which are ubiquitous in Connecticut society. At Eckerson, Mohan devised a similar, but different, metric to rank the different products in the data catalog.
“This one is, if I may say so, a bit controversial because we tried to put the large number of data catalogs into some sort of framework,” he says. “It’s very similar to a Gartner Magic Quadrant, but it has nothing to do with it.”
Mohan’s quadrant divides data catalogs by the degree of feature integration (standalone/pureplay or fully integrated) on the Y axis and the level of adoption (emerging or established) on the X axis.
The quadrant with fully integrated and well-established data catalogs has household names, such as Alation, Collibra, Informatica, and IBM. Some of the emerging integrated data catalog vendors include Quest Erwin (Quest Software purchased Erwin in January 2021), OneTrust, Ovaledge, Alex Solutions, Precisely, Zaloni, and Hitachi Vantara.
“Because they’re emerging, that doesn’t mean they’re new,” adds Mohan. “Quest Erwin has been around for a very long time, but now they are starting to expand their presence in the market.”
On the pureplay side of the aisle, Mohan lists some well-established catalog providers, such as the aforementioned Data.world and BigID, as well as catalogs from a host of tech giants, such as Microsoft Azure, Google Cloud, AWS, Oracle, SAP, and SAS. Emerging providers of pureplay data catalogs include Promethium, Atlan (which just announced a $50 million Series B today), Boomi, Global IDs, MANTA Software, and Octopai.
Finally, Mohan released his bonus list, which is a collection of open source data catalogs such as Amundsen (supported by commercial company Stemma), DataHub (supported by commercial company Acryl Data), LF Egeria and Apache Hive/ Apache Atlas. These products have been placed in the emerging pureplay quadrant.
That’s quite a list, but wait, Mohan has more! The analyst says he tracks 75 data catalog products, so he decided to share some of the integrated data catalogs that are out there. Not all of them are really integrated, as they can be used on their own. But it’s worth getting their names out there, if only to demonstrate the scale of development taking place in this very active space.
The vendor panel also provided lots of good advice on how to select and deploy a data catalog, as well as some pitfalls to watch out for. Stay tuned to Datanami for a future story on this. In the meantime, check out Eckerson Group’s upcoming TechVent CDO, taking place April 26 on the related topic of data governance tools. You can register for this free event here.
Alation adds $110 million to its catalog
Data.world aims to rethink data catalogs
Google enters data catalog business and updates BigQuery
Editor’s note: This article has been corrected. Mohan tracks 75 data catalog products, not 7,500. datanami regret the mistake.