Getting Real With Data Analytics

By: Kunal Agarwal
Posted on: July 30, 2020
  • Twitter
  • Facebook
  • LinkedIn

CDO Sessions: Getting Real with Data Analytics Acting on Market Risks & Opportunities

On July 9th, big data leaders, Harinder Singh from Anheuser-Busch, Kumar Menon at Equifax, and Unravel’s Sandeep Uttamchandani, joined together for our CDO Session, hosted by our co-founder and CEO, Kunal Agarwal, to discuss how their companies have adapted and evolved during these challenging times.


Kunal: Hi guys, welcome to this chat today with big data leaders. Thank you everybody for joining today’s call. We’re all going through very uncertain, unprecedented times, to say the least, with pandemic, combined with all the geo social unrest. Business models and analytics that have developed over the last several years have perhaps been thrown out of the window, or need to be reworked for what we call “the new normal”. We’ve seen companies take a defensive position for the first couple of weeks when this pandemic hit and now we’re looking at companies taking an offensive position. We have an excellent panel here to discuss how they’re using data within their companies to be able to uncover both risks and opportunities. Can we start with your background and your current role?

Harinder: Hey guys, it’s Harinder Singh. I lead data strategy and architecture and Anheuser-Busch InBev, also known as AB InBev. AB InBev is a $55 billion revenue company with 500 plus brands, including Corona, Budweiser, and Stella. We operate in hundreds of countries. Personally, I’ve been in this industry for about 20 years and prior to being at AB InBev, I was at Walmart, Stanford, and a few other companies. I’m very excited to be here part of the panel.

Kumar: Hey guys, it’s Kumar Menon. I lead Data Fabric at Equifax. I’ve been at Equifax for a couple of years and we are a very critical part of the credit life cycle within the economy in almost all the regions that we operate in. So it puts us in an interesting position during situations like these. I’ve been in the industry for 25 years, doing data work primarily in two major, highly regulated industries, life sciences and pharmaceuticals, and financial services. That’s the experience that I’ve been able to bring to Equifax to be able to really rethink how we use data and analytics to deliver new value to our customers.

Sandeep: Hey everyone, it’s a pleasure to be here. I’m Sandeep Uttamchandani, the VP of Engineering and the Chief Data Officer at Unravel Data. My career has basically been a blend of building data products, as well as running data engineering, at large scale, most recently at Intuit. My passion is basically, how do you democratize data? How do you make data self serve and really sort of the culture of data driven for enterprises? At Intuit, I was responsible for data for Intuit QuickBooks, a $3 billion franchise and continuing that passion at Unravel, we’re really building out what we refer to as the self serve platform, looking at telemetry data combined across resources, clusters, jobs, and applications, and really making sense of it as a true data product. I’m really excited to be talking more about what changes we’ve gone through and how we bounce back.

Kunal: Thank you, Sandeep. To gather some context, why don’t we go around the room and understand what your companies are doing at this particular time? What’s top of mind for your businesses?

Harinder: What’s on top of our mind today for our business is the following: Number one, taking care of our people. We operate in 126 countries and we don’t just ship our products, we actually brew locally, sell locally, meaning everything from the grain, the farmer, the brewery, is local in that country or city. For us, taking care of our people is our number one priority and there are different ways to do that. For example, in our corporate offices, it can be as simple as sending office chairs to people’s houses.

Our second priority is taking care of our customers. When I say customers, I’m not talking just about consumers, I’m talking actually about bars and restaurants, our B2B customers. They have been significantly impacted because people are not going into bars and restaurants. We have done that by using credit, doing advanced purchases, coming up with alliances with other companies, and creating gift cards that people can buy to use later. And finally, another thing that’s on top of mind is taking care of finances. We’re a big company and we don’t know how long this will last, so we want to make sure that we’re in it for the long run. We’re very fortunate that our leadership in the C level has very strong finance backgrounds and even in technology, people actually come from finance. So that’s definitely something that we are doing as a business for sure.

Kunal: That’s fascinating, Harinder. That was the sequence of things that we expect from a big company. Hopefully, once live sports start again, the sales will start picking back up. Kumar, I’d love to hear from you as well.

Kumar: Before our current situation, back in late 2017, Equifax had a significant, unfortunate breach, post which, we as a company made a serious commitment to ensure that we look at everything that we do as a business in terms of our product strategy, our platform strategy, our security posture, and transform the company to really helping our customers, in turn helping the consumer. We were in the middle of a massive transformation and we’re still currently going through that. We were already moving at a quick pace to rebuild all of our data and application platforms and refactoring process portfolio, our engagement with our customers, etc. The situation, I would say, has helped us even more.

When you look at the pandemic scenario, the credit economy has taken on a life of its own, in terms of how banks and other financial institutions are looking at data and at the impact of the pandemic on their portfolios and customers. As a company, we’ve been really looking at the more macroeconomic impact of the situation on the consumers in the regions that we operate. Traditional data indicators or signals don’t really hold as much value as they would in normal times, in these unique times, and we’re constantly looking at new indicators that would help not only our customers, but eventually the consumers, go through the situation in a much better way.

We are actually helping banks and other financial institutions reassess, and while we’ve seen businesses uptake in certain areas, in other areas of course, we see lower volumes. But overall, as a company we’ve actually done pretty well. And in executing the transformation, we’ve actually had to make some interesting choices. We’ve had to accelerate things and we’re taking a really offensive strategy, where we quickly deploy the new capabilities that we’re building. This will help businesses in regions with slower uptake execute better and serve their customers and eventually the consumers better.

Kunal: We would love to hear some of those changes that you’re making in our next round of questions. Sandeep?

Sandeep: I’ll answer this question by drawing from my own experience as well as my discussions talking to a broader set of CDO’s on this topic. There are three high level aspects that are loud and clear. First, I think data is more important than ever. During these uncertain times, it is important to use data to make decisions, which is clear across pretty much every vertical that I’m tracking.

The other aspect here is the need for agility and speed. If you look at the traditional analytical models and ML models built over years and years of data, the question is, do they represent the new normal? How do you now make sure that what you’re representing in the form of predictive algorithms and forecasting is even valid? Are those assumptions right? This is leading to the need to build a whole set of analytics, ML models, and data pipelines really quickly and I’ve seen people finding and looking for signals and data which they otherwise wouldn’t have.

The last piece of the puzzle, as a result of this, is more empowerment of the end users. End users here would be the beta users, the data analysts, the data scientists. In fact, everyone within the enterprise needs data at this time, whether I’m a marketer trying to decide a campaign, or if I’m in sales trying to decide pricing. Data allows you to determine how you react to the competition, how you react to changing demand, and how you react to changing means.

Big Data Projects

Kunal: So let’s talk some data. What data projects have you started are accelerated during these times? And how do you continue to leverage your data at your different companies while thinking about cutting costs and extending your resources?

Harinder: As times are changing, everything we do as a company has to change and align with that. To start off, we were already investing quite heavily in digital transformation to take us to the next level. The journey to taking a business view end to end, including everything from farmers, to customers, to breweries and corporations, has already begun. And due to COVID, we have really expedited the process.

Second, we had some really big projects to streamline our ERP systems. AB InBev grew heavily through M&A, we acquired many companies and partnered with many companies small and big. Each company was a successful business in its own right, meaning they had their own data and technology.

Definitely, the journey to Cloud is big as well. Some aspects of our organization were already multi-Cloud but if you look at the positive side of this crisis,on the Cloud journey as well, it really pushed us hard to go there faster. The same is true for remote work. Something that would have taken three to five years to execute happened overnight.

So the question then becomes, well, how do we manage the costs? Because all of these things that I’m talking about expediting require a budget to go with it. One thing we’ve done is reprioritize some of our initiatives. While these things that I talked about earlier have gone from, let’s say, priority three to priority one, we have some initiatives that we were working on that have been pushed to the backburner.

Let me give you some examples of managing or cutting costs. I run the data platform and the approach there was to scale, scale, scale, because the business is growing and we’re bringing all these different countries and zones online into Cloud. We still want to grow, but we’ve gone from looking at scale to focusing on how we can optimize and have more of a real time inventory of what’s needed, rather than having it three months ahead. The fact that you’re in the Cloud enables you to do that. It’s a positive thing on both sides because it helps expedite the journey to the Cloud, while moving to the Cloud helps you keep your inventory lean. Then, in terms of just doing some basic sanity checks, are there systems that have been around, but just not significantly used? Or are there software that we need less of? Or if there are things, in terms of technology, hardware, software or applications, that we need more of because of COVID, can we negotiate better because of scale again?

Kunal: Alright Harinder, so you’ve been just scrutinizing everything, scrutinizing size, scrutinizing projects, and making sure that you’re scaling in an optimized fashion, and not scaling out fast for unforeseen loads and warranties, if I summarized that correctly.

Harinder: You’re absolutely right. I think we were in a unique position to do that because our company follows a zero-based budget model, which essentially means that at the start of each year, we don’t just build upon from where we were last year. We start from scratch every single year, so that’s already in our culture, our DNA. And once or twice a year, we just had to take the playbooks out and do it again. That’s actually quite easy for us as a company to do versus, I can imagine, big companies that may have a tough time doing that.

Kunal: One last question before we move on to Kumar. What about some of the challenges that Cloud presented to you?

Harinder: Anybody going into the Cloud has to keep in mind two things. One is that it’s a double edged sword. It gives convenience when it’s time to market fast, but you also have to be very careful about security. All of these Cloud vendors, Google, Amazon, or Azure, spend more in security than companies can. So, the Cloud security out of the box is much better compared to an on-prem system. But you also have to be careful about how you manage it, configure it, enforce it, so on and so forth.

The second part to me is the cost. If you do a true comparison and don’t manage your cost properly, then Cloud costs can be much higher. If used properly and managed properly, Cloud costs are much better for business. A lot of people and companies that I talked to say that they are going to move to Cloud to save costs, but while moving to the Cloud is part of that, that’s step one. You must also make sure you manage the cost and watch out for it, especially in the very beginning, and prioritize the cost equally. Those two things, when done in combination, really kind of take care of the bottleneck issue with moving to Cloud.

Kunal: Yeah, Cloud definitely needs guardrails. Harinder, thank you so much for that.

Sandeep: I just want to quickly add to Harinder’s points. Just from our own experience, when we entered the Cloud in the past, we had to repurpose, using one instance for 10 hours versus 10 instances for one hour. I completely resonate with that point, Harinder. You also mentioned multi-Cloud and I would love to learn more.

Kunal: How about you, Kumar?

Kumar: For us, since we were already executing this blazing transformation, we didn’t really have to start anything specifically new. We went through some reprioritization of our roadmaps and were already executing at a serious pace, looking to complete this global transformation in a two year timeframe. So what we really focused on from an acceleration or a reprioritization perspective was deploying the capabilities as quickly as possible into the global footprint. Once the pandemic hit, we had to think about the impact on our portfolio. Most of our customers are big financial institutions and we quickly realized that traditional data points are no longer as predictive for understanding the current scenario, as Sandeep mentioned before. So we had to really reevaluate and look at how we can bring our data together in a much faster way, in a much more frequent manner, that can help our customers understand portfolios better. And obviously, how does this impact our traditional predictive models that we deploy for credit decisioning, fraud, and other areas where we saw some significant uptake in certain ways? All this required the capability to be deployed much faster.

Our transformation was based on a Cloud first strategy, so we are 100% on the Cloud. That helped us accelerate pushing these capabilities out into the global regions at a much faster pace and we completed the global deployment of several of our platforms over the last a couple of months or so.

From a data projects perspective, our goal throughout this transformation has been to enable faster ingestion of new data into our ecosystem, bringing this data together and driving better insights for customers. So we’re constantly looking for new data sources that we can acquire that can add value to the more traditional and the very high fidelity data sources we already have. When you look at our footprint in a particular region, we actually have some of the most important data about a consumer within the region that we operate in. In a traditional environment, that data is very unique and very powerful, but when you look at a scenario like the pandemic situation that we’re in, we have to bring in data and figure out how the current situation impacts customers, therefore understanding consumers better.

Also, anything that we produce has to be explainable. While we absolutely have a lot of desire to and currently do very advanced techniques around analytics, using ML and AI for several things, for some of our regular literary businesses, everything has to be explainable. So we’ve accelerated some of our work in the explainable AI space and we think that’s going to be an interesting play in the market as more and more regulations start governing how we use data and how we help our customers or the consumers eventually own the data. We,in fact, own a patent in the industry that allows for credit decisioning using explainable AI capabilities.

Kunal: We’d love to hear about some of the signals that weren’t considered earlier that are now considered. Would you be able to share some of those, Kumar?

Kumar: Absolutely. So, we have some new data sets that not many of the credit rating agencies or other financial institution data providers have today. For example, the standard credit exchange is all banking loan data that we get at the consumer level that every credit rating agency has. But we also have a very highly valuable data asset called the work number, which is information about people’s employment and income. We also have a utilities exchange, where we get utility payment information of consumers. I can talk about some insights that you don’t even have to be a genius to think about, opportunities that you can literally unravel through combining this data.

If you were to just look at a traditional credit score that is based on credit data, as an example, I could say, “Kumar Menon worked in the restaurant business and has a credit score of 800”. In a traditional way of looking at credit scoring, I would be still a fairly worthy customer to lend money to. Looking at the industry that I’m working in, maybe there is a certain element of risk that is now introduced into the equation, because I’m in the restaurant business, which is obviously not doing well. So what does that mean when I look at Kumar Menon as a consumer? There are things that you can do to understand the portfolio and plan better. I’m not saying that all data points are valid, but understanding the portfolio helps financial institutions prepare better, help consumers, work with consumers to better understand forbearance scenarios, and help them work out scenarios where you don’t have to go into default. I mean, the goal of the financial institution was to bring more people into the credit industry, which is what we are trying to enable more.

By providing financial institutions with more data, we’re helping them become more aware of potential risks or scenarios that may not be visible in a traditional paradigm

Kunal: That’s very interesting. Thanks so much for sharing, Kumar. Moving on to you, Sandeep.

Sandeep: Piggybacking on what Harinder and Kumar touched on, one of the key projects has been accelerated movement to the Cloud. When you think about moving to the Cloud, it’s a highly non trivial process. On one side, you have your data and thousands of data sets, then on the other side, you have these active pipelines that run daily, pipelines, dashboards, ML models feeding into the product. So the question really becomes, what is the sequence to move these? Some pipelines are fairly isolated data sets with, I would say, trivial query constructs being used but on the other side, you’ll be using some query constructs which are deeply embedded in the on-prem system, highly non trivial to move, requiring rethinking of the schema, rethinking the partitioning logic, the whole nine yards. This is a huge undertaking. How do you sequence? How do you minimize risk? These are life systems and the analogy here is, how do you change the engine of the plane while the plane is running? We don’t have the luxury to say, “okay, these pipelines, these dashboards or models won’t refresh for a week”.

The other aspect is the meaning of data. Traditionally, I think as data professionals, the number one question is, where is the data and how do I get to the data? With data sets, which attribute is authentic and which attribute has been refreshed? During the pandemic, the question is now slightly changing into not just “where is my data?, but “is this the right data to use”? Are these the right signals I should be relying on? This new normal requires a combination of both the understanding of the business and the expertise there, combined with the critical aspects of data and the data platform. So there’s clearly increasing synergy in several places as people think about, “okay, how do I rework my models?” It’s a combination of building this out as well as using the right data.

The last piece is how you shorten the whole process of getting a pipeline or an insight developed. We are writing apps to do things which we haven’t done before, no matter which vertical you’re in and the moment you have these apps coming out at a fast pace, in production, there are a lot of discoveries. In terms of misusing the data, scanning, adjoining across a billion row table, all these aspects can inundate any system. Comparing it to a drainage system, one bad query is like a clog that stops everything, affecting everything below. I think that’s the other aspect, increasing awareness of how we fortify the CICD and the process of improving that

Kumar: That’s a very interesting point you bring up because when we look at our data ecosystem, all the way from ingestion of raw data to what we call, purposing the data for a specific set of products, we must ensure that the models and other insights that execute on that data all stay in sync. So how do we monitor that entire ecosystem? How do we ingest data faster, deploy new models, monitor it, and understand if it’s performing in the right way? We looked at that ecosystem and we want it to be almost like a push button scenario where analysts can develop models when they’re looking at data schemas that are exactly similar to what is running in production, so that there is no rewiring of the data required. And the deployment of the models is seamless, so you’re not rewriting the models.

In many of the on-prem systems, you actually end up rewriting the model in a production system because of performance challenges, etc. So, do you really want to extend the CICD pipeline concept to the analytic space, where there is an automated mechanism for data scientists to be able to build and deploy in a way that a traditional data engineer would deploy some pipeline code? And how do we make sure that that synergy is available for us to deploy seamlessly? It’s something that we’ve actually looked at very consciously and are building it into our stack. It’s a very relevant industry problem that I think many companies are trying to solve.

Big Data Challenges and Bottlenecks

Kunal: To summarize what Kumar and Sandeep were saying, we’re growing data projects and somebody, at the end of the day, needs to make sure it runs properly. We’re looking at operations and Kumar made a comment, comparing it with the more mature DevOps lifecycle, which we are thinking about as a DataOps lifecycle. What are some of these challenges and bottlenecks that are slowing you guys down, Harinder?

Harinder: I would like to start off by giving our definition of DataOps. We define DataOps as end to end management and support of a data platform, data pipelines, data science models, and essentially the full lifecycle of consumption and production of information. Now, when we look at this sort of life cycle, there’s the basics of people, default process, and technology and data.

Starting with our people, we started building this team about three years ago, so there’s a lot of experienced talent with a blend of new and upcoming individuals. We were still in the growth phase of the team, but I think that the current situation has slowed down that process a bit.

The technology was always there but it’s more so about the adoption of it because when you have to strike the balance between growth in data projects and more need for data, usually you will have people in technology scale with it. In our case, like I said, the people team is not able to grow as fast because of the situation, so we are looking for automation. How can we utilize our team better? CICD was there in some parts of the platform while in some, it wasn’t. So we are finding those opportunities, automating the platform, and applying full CICD.

When we talk about Cloud, there are scenarios where you can move to the Cloud in different ways. You can move as an infrastructure, as a platform, as a full SAS, so we always wanted to be sort of platform agnostic and multi-Cloud. There are some things we have done, mostly on the infrastructure side, but now we are taking the middle ground a bit, moving away from infrastructure to more of a platform as a service model so that, again, going back to people, we can save some time to market by moving to that model.

On the process side, it’s about striking the right balance between governance and time to market. When you have to move fast, governance always slows down. The industry is very regulated and that means you still have to maintain a minimum requirement on compliance. Depending on which country we are in, while in the US, not so much, in other countries where we operate, there’s always the GDPR. So those requirements have to be met while we move fast to meet the demands of our internal customers for data analytics and insights. When we talk about this whole process end to end, I think it’s about how we continue to scale and meet the needs of our business, while also doing our best to strike the balance just because of the space we are in. And when I talk about regulation, I’m not just talking about the required regulation or compliance, it’s also just good data hygiene, maintaining the catalog, maintaining the glossaries. Right now it’s just that complication of sometimes speed taking over and other times governance taking over , so we’re trying to find the right balance there.

Kunal: As is every organization, Harinder, so you’re not alone there for sure. Kumar, anything to add there?

Kumar: I think he covered it pretty well. For us, when moving to the Cloud, you really have to have a different philosophy when you’re building Cloud native applications versus what you’re building on-prem. It’s really about how do you improve the skill sets of your people to think more broadly. Now you take a developer who has been developing on Java, on-prem, and she or he now has to have a little bit of an understanding of infrastructure, a little bit about the network, a little bit about how Cloud Security works so we can actually have security in the stack, versus just an overlay on the application. A lot of on-prem applications are built that way, relying on perimeter security by the network. How do you actually engineer the right IM policies into every layer of the services you’re building? How do you make sure that the encryption and decryption capabilities that you enable to the application are enterprise wide policies?

I’ve come back often to the ability to deploy into the Cloud. How do you ensure that your deployment is compliant? How do you make sure that everything is code in the Cloud, infrastructure is code, security is code, your application is code? How do you check in your CICD pipeline that you have all your controls in place so that your build fails and you don’t actually deploy if you’re violating policy? So we actually started to implement policy as code within our CICD pipeline to ensure that no bad behavior really manifests itself in production.

We’ve also been ruthlessly looking at security because of the situation we were in before, as well as the fact that we hold some very valuable high fidelity data. How do you ensure that what our security policy is on the data is also on the technology stack that operates on the data? So those have been some very interesting learnings and I wouldn’t say this has slowed us down, but these things are mandatory and we must learn and be able to master them as a company.

Regulations are ever changing. We’ve encountered new regulations, as we’re building this. New privacy laws are coming into existence, like the CCPA in California, and I think there’ll be other states pursuing similar privacy laws. Obviously, that will impact you globally when you extrapolate that GDPR and other regional laws. So when you’re deploying the Cloud, how do you make sure you’re adhering to the data residency requirements within those regions, as well as the privacy laws? How you build an architecture that can adapt and be flexible to that change is really the big challenge.

Kunal: Thank you for sharing all of that. Sandeep, any thoughts there?

Sandeep: I define DataOps as a point in the project where the honeymoon phase ends and reality sets in, the phase of making a prototype and building out the models and analytics.

On a single weekend, I’ve seen a bad query accumulate more than a hundred thousand dollars in cost. That’s an example where if you don’t really have the right guardrails, just one weekend with high end GPUs in play, trying to do ML training for a model that honestly we did not even require, you get a bill of $100,000.

I think the other thing is just the sheer root cause analysis and debugging. There are so many technologies out there and on one side, there is the philosophy of using the right tool for the job, which is actually the right thing, there is no silver bullet. But then, if you look at the other side of it, the ugly side of it, you need the skill sets required to understand how Presto works, versus how Hive works, versus how spark works, and tune it to really figure out where issues are happening. It’s much more difficult, how do you package that? Figuring it out is one of those issues which has always been there, but is now becoming even more critical.

The last thing to wrap up, and I think Kumar touched on this, is a very different way to think about some of the newer technologies. If you think of these serverless technologies like Google BigQuery or AWS Athena, they have different pricing models. Here, you’re actually being charged by the amount of data scanned and imagine a query that is basically doing a massive scan of data, incurring significant costs. So you need to incorporate all of these aspects, be it compliance, cost, root cause analysis, tuning, and so on, early on so that DataOps is seamless and you can avoid surprises.

How Big Data Professionals Can Adjust to Current Times

Kunal: Thank you. We’ll have one, one minute rapid fire round question for everybody as a parting thought. There’s several hundred people listening in right now, so what should all of us data professionals plan for as we’re thinking through this prolonged period of uncertain times? What is that one thing that they should be doing right now that they have not in the past?

Harinder: I actually have not one, but five, but they’re all very quick. First of all, empathy. We are in completely different times, so make sure you have empathy towards your team and towards your business partners.

Number two, move fast. It’s not the time to think too hard or plan, you just have to move fast and adapt.

Number three, manage your costs.

Number four, focus on your business partners internally and try to understand what their needs are, because it’s not just you, everybody is in a unique situation. So focus on your internal customers, what do they need from you, in terms of data analytics?

And finally, focus on your external customers, try to understand their needs. One of the most important things would be maybe changing the delivery model of your product or service and meeting where the customer is instead of expecting customers to come to you.

Kumar: I totally agree with focusing on internal customers. Obviously focus on the ecosystem you’re operating in so it’s your customers as well as potentially your customers’ customers. Definitely make sure that you connect a lot more with your customers and your coworkers to keep the momentum going.

I think, in several scenarios there are new opportunities that are being unearthed in the market space, so really watch out for where those opportunities lie, especially when you’re in the data space. There are new signals coming up, new ways of looking at data that can provide you better insights. So how do you constantly look at that?

Finally, I would say to keep an eye out for how fast regulations are changing. I’m sure new regulations will be in play with this new normal, so just make sure that what you build today today can withstand the challenge of the time.

Kunal: Thank you, Sandeep?

Sandeep: One piece of advice for professionals would be to also focus on data literacy and explainable insights within your organization. Not everyone understands data the way you do and when you think about insights, it’s basically three parts, what’s going on, why it is happening and how to get out of it. Not everyone will have the skills and expertise to do all three. The “what” part, what’s going on in the business, how to think about it, how to slice and dice,data professionals have a unique opportunity here to really educate and build that literacy within their enterprise for better decision making. And everything that Harinder and Kumar mentioned are spot on.

Kunal: Thank you. Guys again, this was a fantastic one hour. We had a ton of viewers here today. I hope we all took away something from these data professionals, I certainly learned a lot. Harinder, Kumar, Sandeep, thank you so much for taking time out during such crazy times and sharing your experiences, all the practical advice, and strategies with the entire data community.