This is a blog by Charles Araujo, Principal Analyst for Intellyx. This blog was first published on the Intellyx site.
Over the last few years, there has been a mad rush within enterprise organizations to move big data workloads to the cloud. On the surface, it would seem that it’s a bit of a keeping up with the Joneses syndrome with organizations moving big data workloads to the cloud simply because they can.
It turns out, however, that the business rationale for moving big data workloads to the cloud is essentially the same as the broader cloud migration story: rather than expending scarce resources on building, managing, and monitoring infrastructure, use those resources, instead, to create value.
In the case of big data workloads, that value comes in the form of uncovering insights in the data or building and operationalizing machine learning models, among others.
To realize this value, however, organizations must move big data workloads to the cloud at scale without adversely impacting performance or incurring unexpected cost overruns. As they do so, they begin to recognize that migrating production big data workloads from on-premises environments to the cloud introduces and exposes a whole new set of challenges that most enterprises are ill-prepared to address.
The most progressive of these organizations, however, are also finding an unexpected solution in the discipline of data operations.
The transition from an on-premises big data architecture to a cloud-based or hybrid approach can expose an enterprise to several operational, compliance and financial risks borne of the complexity of both the existing infrastructure and the transition process.
In many cases, however, these risks are not discovered until the transition is well underway — and often after an organization has already been exposed to some kind of meaningful risk or negative business impact.
As enterprises begin the process of migration, they often discover that their big data workloads are exceedingly complex, difficult to understand, and that their teams lack the skills necessary to manage the transition to public cloud infrastructure.
Once the migrations are underway, the complexity and variability of cloud platforms can make it difficult for teams to effectively manage the movement of these in-production workloads while they navigate the alphabet soup of options (IaaS, PaaS, SaaS, lift and shift, refactoring, cloud-native approaches, and so on) to find the proper balance of agility, elasticity, performance, and cost-effectiveness.
During this transitional period, organizations often suffer from runaway and unexpected costs, and significant operational challenges that threaten the operational viability of these critical workloads.
Worse yet, the initial ease of transition (instantiating instances, etc.) belies the underlying complexity, creating a false sense of security that is gloriously smashed.
Even after those initial challenges are overcome, or at least somewhat mitigated, however, organizations find that having migrated these critical and intensive workloads to the cloud, they now present an on-going management and optimization challenge.
While there is no question that the value driving the move of these workloads to the cloud remains valid and worthwhile, enterprises often realize it only after much gnashing of teeth and long, painful weekends. The irony, however, is that these challenges are often avoidable when organizations first embrace the discipline of data operations and apply it to the migration of their big data workloads to the cloud.
The biggest reason organizations unnecessarily suffer this fate is that they buy into the sex appeal (or the executive mandate – every CIO needs to be a cloud innovator right?) of the cloud sales pitch: just turn it on, move it over, and all is well.
While there is unquestionable value in moving all forms of operations to the cloud, organizations have repeatedly found that it is rarely as simple as merely spinning up an instance and turning on an environment.
Instead of jumping first and sorting out the challenges later, organizations must plan for the complexity of a big data workload migration. One of the simplest ways of doing so is through the application of data operations.
While data operations is a holistic, process-oriented discipline that helps organizations manage data pipelines, organizations can also apply it to significant effect during the migration process. The reason this works is that data operations uses process and performance-oriented data to manage the data pipeline and its associated workloads.
Because of this data orientation, it also provides deep visibility into those workloads — precisely what organizations require to mitigate the less desirable impacts of migrating to the cloud.
Using data operations, and tools built to support it, organizations can gather data that enables them to assess and plan their migrations to the cloud, do baselining, instance mapping, capacity forecasting and, ultimately, cost optimization.
It is this data — or more precisely, the lack of it — that is often the difference between successful big data cloud migrations and the pain and suffering that most organizations now endure.
When enterprises undertake a big data cloud migration, they must step through three core stages of the effort: planning, the migration process itself, and continual post-migration optimization.
Because most organizations lack sufficient data, they often do only limited planning efforts. Likewise, lacking data and with a pressing urgency to execute, they often rush migration efforts and charge full-steam ahead. The result is that they spend most of their time and resources dealing with the post-migration optimization efforts — where it is both most expensive and most negatively impactful to the organization.
Flipping this process around and minimizing the risk, costs, and negative impact of a migration requires the same key ingredient during each of these three critical stages: visibility.
Organizations need the ability to capture and harness data that enables migration teams to understand the precise nature of their workloads, map workload requirements to cloud-based instances, forecast capacity demands over time, and dynamically manage all of this as workload requirements change and the pipeline transitions and matures.
Most importantly, this visibility also enables organizations to plan and manage phased migrations in which they can migrate only slices of applications at a time, based on specific and targeted demands and requirements. This approach enables not only faster migrations, but also simultaneously reduces both costs and risk.
Of course, this type of data-centric visibility demands tools highly tuned to the specific needs of data operations. Therefore, those organizations that are taking this more progressive and managed approach to big data migrations are turning to purpose-built data operations tools, such as Unravel, to help them manage the process.
The business case for moving big data operations to the cloud is clear. The pathway to doing so without inflicting significant negative impact on the organization, however, is less so.
Leading organizations, therefore, have recognized that they can leverage the data operations discipline to smooth this process and give them the visibility they need to realize value from the cloud, without taking on unnecessary cost, risk or pain.
Copyright © Intellyx LLC. As of the time of writing, Unravel is an Intellyx customer. Intellyx retains final editorial control of this paper.