Data Observability for Databricks Register


Takeaways from CDO TechVent on Data Observability

The Eckerson Group recently presented a CDO TechVent that explored data observability, “Data Observability: Managing Data Quality and Pipelines for the Cloud Era.” Hosted by Wayne Eckerson, president of Eckerson Group, Dr. Laura Sebastian-Coleman, Data Quality […]

  • 3 min read

The Eckerson Group recently presented a CDO TechVent that explored data observability, “Data Observability: Managing Data Quality and Pipelines for the Cloud Era.” Hosted by Wayne Eckerson, president of Eckerson Group, Dr. Laura Sebastian-Coleman, Data Quality Director at Prudential Financial, and Eckerson VP of Research Kevin Petrie, the virtual event kicked off with a keynote overview of data observability products and best practices, followed by a technology panel discussion, “How to Evaluate and Select a Data Observability Platform,” with four industry experts.

Here are some of the highlights and interesting insights from the roundtable discussion on the factors data leaders should consider when looking at data observability solutions.

Josh Benamram, CEO of Databand, said it really depends on what problem you’re facing, or which team is trying to solve which problem. It’s going to be different for different stakeholders. For example, if your challenge is maintaining SLAs, your evaluation criteria would lean more towards Ops-oriented solutions that cater to data platform teams that need to manage the overall system. On the other hand, if your observability requirements are more targeted towards analysts, your criteria would be more oriented towards understanding the health of data tables rather than the overall pipeline. 

This idea of identifying the problems you’re trying to solve first, or as Wayne put it, “not putting the tools before the horse,” was a consistent refrain among all panelists.

Seth Rao, CEO of FirstEigen, said he would start by asking three questions: Where do you want to observe? What do you want to observe? How much do you want to automate? If you’re running Snowflake, there are a bunch of vendor solutions; but if you’re talking about data lakes, there’s a different set of solutions. Pipelines are yet another group of solutions. If you’re looking to observe the data itself, that’s a different type of observability altogether. Different solutions automate different pieces of observability. He suggested not trying to “boil the ocean” with one product that tries to do everything. He feels that you’ll get only an average product for all functions. Rather, he said, get flexibility of tooling—like Lego blocks that connect with other Lego blocks in your ecosystem.

This point drew the biggest reaction from the attendees (at lest as evidenced by the Q&A chat). Who’s responsible for integrating all the different tools? We already don’t have enough time! A couple of panelists tackled the argument head-on, either in the panel discussion or in breakout sessions. 

Specifically, Rohit Choudhary, CEO of Acceldata, said that the purpose of observability is to simplify everything data teams have to do. You don’t have enough data engineers as it is, and now you’re asking data leaders to invest in a bunch of different data observability tools. Instead of actually help them solve problems, you’re handing them more problems. He said to look at two things when evaluating data observability solutions: what it is capable of today, and what its roadmap looks like and what use cases it will support moving forward. Observability means different things to different people, and it all depends on whether the offering fits your maturity model. Smaller organizations with analytics teams of 10-20 people are probably fine with point solutions. But large enterprises that are dealing with data pipelines at petabyte scale are dealing with much greater complexity. For them, it would be prohibitively expensive to build their own observability solution. 

Chris Santiago, Unravel Data VP of Solutions Engineering, was of the same opinion but looked at things from a different slant. He agreed that different tools—system-specific point tools, native cloud vendor capabilities, various data quality monitoring solutions—all have strengths and weaknesses, with insight into different “pieces of the puzzle.” But rather than connect them all together as discrete building blocks, observability would be better realized by extracting all the relevant granular details, correlating them into a holistic context, and analyzing them with ML and other analytical algorithms so that data teams get the intelligence they need in one place. The problems data teams face—around performance, quality, reliability, cost—are all interconnected, so you’re saving a lot of valuable time and reducing manual effort to have as much insight as possible in a single pane of glass. He refers to such comprehensive capabilities as DataOps observability.

The dimension of cost was something Eckerson analyst Kevin Petrie highlighted in the wrap-up as a key emerging factor. He’s seeing an increased focus on FinOps capabilities, which Chris called out specifically: it’s not just making sure pipelines are running smoothly, but understanding where the spend is going and who the “big spenders” are, so that observability can uncover opportunities to optimize for cost and control/govern the cloud spend.

That’s the cost side for the business, but he said it’s also crucial to understand the profit side. Companies are investing millions of dollars in their modern data stack, but how are we measuring whether they’re getting the value they expected from their investment? Can the observability platform help make sense of all the business metrics in some way? Because at the end of the day, all these data projects have to deliver value. 

Check out Unravel’s breakout presentation, A DataOps Observability Dialogue: Empowering DevOps for Data Teams.