Data Observability for Databricks Register

Cloud Migration

Big Data Meets the Cloud

This article by Unravel CEO Kunal Agarwal originally appeared as a Forbes Technology Council post under the title The Future of Big Data Is in the Hybrid Cloud: Part 2 and has been updated to reflect […]

  • 4 min read

This article by Unravel CEO Kunal Agarwal originally appeared as a Forbes Technology Council post under the title The Future of Big Data Is in the Hybrid Cloud: Part 2 and has been updated to reflect 2021 statistics.

With interest in big data and cloud increasing around the same time, it wasn’t long until big data began being deployed in the cloud. Big data comes with some challenges when deployed in traditional, on-premises settings. There’s significant operational complexity, and, worst of all, scaling deployments to meet the continued exponential growth of data is difficult, time-consuming, and costly.

The cloud provides the perfect solution to this problem since it was built for convenience and scalability. In the cloud, you don’t have to tinker around trying to manually configure and troubleshoot complicated open-source technology. When it comes to growing your deployments, you can simply hit a few buttons to instantly roll out more instances of Hadoop, Spark, Kafka, Cloudera, or any other big data app. This saves money and headaches by eliminating the need to physically grow your infrastructure and then service and manage that larger deployment. Moreover, the cloud allows you to roll back these deployments when you don’t really need them—a feature that’s ideal for big data’s elastic computing nature.

Big data’s elastic compute requirements mean that organizations will have a great need to process big data at certain times but little need to process it at other times. Consider the major retail players. They likely saw massive surges of traffic on their websites this past Cyber Monday, which generated a reported $10.7 billion in sales. These companies probably use big data platforms to provide real-time recommendations for shoppers as well as to analyze and catalog their actions. In a traditional big data infrastructure, a company would need to deploy physical servers to support this activity. These servers would likely not be needed the other 364 days of the year, resulting in wasted expenditures. However, in the cloud, retail companies could simply spin up the big data and the resources that are needed and then get rid of them when traffic subsides.

This sort of elasticity occurs on a day-to-day basis for many companies that are driving the adoption of big data. Most websites experience a few hours of peak traffic and few hours of light traffic each day. Think of social media, video streaming, or dating sites. Elasticity is a major feature of big data, and the cloud provides the elasticity to keep those sites performing under any conditions.

One important thing to keep in mind when deploying big data in the cloud is cost assurance. In situations like the ones described above, organizations suddenly use a lot more compute and other resources. It’s important to have set controls when operating in the cloud to prevent unforeseen, massive cost overruns. In short, a business’s autoscaling rules must operate within its larger business context so it’s not running over budget during traffic spikes. And it’s not just the sudden spikes you need to worry about. A strict cost assurance strategy needs to be in place even as you gradually migrate apps and grow your cloud deployments. Costs can rise quickly based on tiered pricing, and there’s not always a lot of visibility depending on the cloud platform.

Reasons Why Big Data Migrations Fail—and Ways to Succeed

Watch video presentation

A Hybrid Future

Of course, the cloud isn’t ideal for all big data deployments. Some amount of sensitive data, such as financial or government records, will always be stored on-premises. Also, in specific environments such as high-performance computing (HPC), data will often be kept on-premises to meet rigorous speed and latency requirements. But for most big data deployments, the cloud is the best way to go.

As a result, we can expect to see organizations adopt a hybrid cloud approach in which they deploy more and more big data in the cloud but keep certain applications in their own data centers. Hybrid is the way of the future, and the market seems to be bearing that out. A hybrid approach allows enterprises to keep their most sensitive and heaviest data on-premises while moving other workloads to the public cloud.

It’s important to note that this hybrid future will also be multi-cloud, with organizations putting big data in a combination of AWS, Azure, and Google Cloud. These organizations will have the flexibility to operate seamlessly between public clouds and on-premises. The different cloud platforms have different strengths and weaknesses, so it makes sense for organizations embracing the cloud to use a combination of platforms to best accommodate their diverse needs. In doing so, they can also help optimize costs by migrating apps to the cloud that is cheapest for that type of workload. A multi-cloud approach is also good for protecting data, enabling customers to keep apps backed up in another platform. Multi-cloud also helps avoid one of the bigger concerns about the cloud: vendor lock-in.

Cloud adoption is a complex, dynamic life cycle—there aren’t firm start and finish dates like with other projects. Moving to the cloud involves phases such as planning, migration, and operations that, in a way, are always ongoing. Once you’ve gotten apps to the cloud, you’re always trying to optimize them. Nothing is stationary, as your organization will continue to migrate more apps, alter workload profiles, and roll out new services. In order to accommodate the fluidity of the cloud, you need the operational capacity to monitor, adapt, and automate the entire process.

The promise of big data was always about the revolutionary insights it offers. As the blueprint for how best to deploy, scale, and optimize big data becomes clearer, enterprises can focus more on leveraging insights from that data to drive new business value. Embracing the cloud may seem complex, but the cloud’s scale and agility allow organizations to mine those critical insights at greater ease and lower cost.

Next Steps

Be sure to check out Unravel Director of Solution Engineering Chris Santiago’s on-demand webinar recording—no form to fill out—on Reasons Why Big Data Cloud Migrations Fail and Ways to Succeed.