One of the highlights of the DataOps Unleashed 2022 virtual conference was a roundtable panel discussion on building versus buying when it comes to your data stack. Build versus buy is a question for all layers of the enterprise infrastructure stack. But in the last five years — even in just the last year alone — it’s hard to think of a part of IT that has seen more dramatic change than that of the modern data stack.
These transformations shape how today’s businesses engage and work with data. Moderated by Lightspeed Venture Partners’ Nnamdi Iregbulem, the panel’s three conversation partners — Andrei Lopatenko, VP of Engineering at Zillow; Gokul Prabagaren, Software Engineering Manager at Capital One; and Aaron Richter, Data Engineer at Squarespace — weighed in on the build versus buy question and walked us through their thoughts:
- What motivates companies to build instead of buy?
- How do particular technologies and/or goals affect their decision?
These issues and other considerations were discussed. A few of the highlights follow, but the entire session is available on demand here.
What are the key variables to consider when deciding whether to build or buy in the data stack?
Gokul: I think the things which we probably consider most are what kind of customization a particular product offers or what we uniquely need. Then there are the cases in which we may need unique data schemas and formats to ingest the data. We must consider how much control we have of the product and also our processing and regulatory needs. We have to ask how we will be able to answer those kinds of questions if we are building in-house or choosing to adopt an outsourced product.
Aaron: Thinking from the organizational perspective, there are a few factors that come from just purchasing or choosing to invest in something. Money is always a factor. It’s going to depend on the organization and how much you’re willing to invest.
Beyond that a key factor is the expertise of the organization or the team. If a company has only a handful of analysts doing the heavy-lifting data work, to go in and build an orchestration tool would take them away from their focus and their expertise of providing insights to the business.
Andrei: Another important thing to consider is the quality of the solution. Not all the data products on the market have high quality from different points of view. So sometimes it makes sense to build something, to narrow the focus of the product. Compatibility with your operations environment is another crucial consideration when choosing build versus buy.
What’s the more compelling consideration: saving headcount or increasing productivity of the existing headcount?
Aaron: In general, everybody’s oversubscribed, right? Everybody always has too much work to do. And we don’t have enough people to accomplish that work. From my perspective, the compelling part is, we’re going to make you more efficient, we’re going to give you fewer headaches, and you’ll have fewer things to manage.
Gokul: I probably feel the same. It depends more on where we want to invest and if we’re ready to change where we’re investing: upfront costs or running costs.
Andrei: And development costs: do we want to buy this, or invest in building? And again, consider the human equation. It’s not just the number of people in your headcount. Maybe you have a small number of engineers, but then you have to invest more of their time into data science or data engineering or analytics. Saving time is a significant factor when making these choices.
How does the decision matrix change when the cloud becomes part of the consideration set in terms of build versus buy?
Gokul: I feel like it’s trending towards a place where it’s more managed. That may not be the same question as build or buy. But it skews more towards the manage option, because of that compatibility, where all these things are available within the same ecosystem.
Aaron: I think about it in terms of a cloud data warehouse: some kind of processing tool, like dbt; and then some kind of orchestration tool, like Airflow or Prefect; and there’s probably one pillar on that side, where you would never think to build it yourself. And that’s the cloud data warehouse. So you’re now kind of always going to be paying for a cloud vendor, whether it’s Snowflake or BigQuery or something of that nature.
So you already have your foot in the door there, and you’re already buying, right? So then that opens the door now to buying more things, adding things on that integrate really easily. This approach helps the culture shift. If a culture is very build-oriented, this allows them to be more okay with buying things.
Andrei: Theoretically you want to have your infrastructure independent on cloud, but it never happens, for multiple reasons. Firstly, cloud company tools make integration work much easier. Second, of course, once you have to think about multi-cloud, you must address privacy and security concerns. In principle, it’s possible to be independent, but you’ll often run into a lot of technical problems. There are multiple different factors when cloud becomes key in deciding what you will make and what tools to use.