Migrating big data workloads to the cloud remains a key priority for our customers and Azure HDInsight is committed to making that journey simple and cost effective. HDInsight partners with Unravel whose mission is to reduce the complexity of delivering reliable application performance when migrating data from on-premises or a different cloud platform onto HDInsight. Unravel’s Application Performance Management (APM) platform brings a host of services towards providing unified visibility and operational intelligence to plan and optimize the migration process onto HDInsight.
In our previous blog we discussed why the cloud is a great fit for big data and provided a broad view of what the journey to the cloud looks like, phase by phase. In this installment and the following parts we will examine each stage in that life cycle, diving into the planning, migration, operation and optimization phases. This blog post focuses on the planning phase.
In the planning stage you must understand your current environment, determine high priority applications to migrate, and set a performance baseline to be able to measure and compare your on-premises clusters versus your Azure HDInsight clusters. This raises the following questions that need to be answered during the planning phase:
To effectively migrate big data pipelines from physical to virtual data centers, one needs to understand the dynamics of on-premises workloads, usage patterns, resource consumption, dependencies and a host of other factors.
Unravel provides detailed reports of on-premises clusters including total memory, disk, number of hosts, and number of cores used. This cluster discovery report also delivers insights on cluster topology, running services, operating system version and more. Resource usage heatmaps can be used to determine any unique needs for Azure.
Unravel can highlight application workload seasonality by user, department, application type and more to help calibrate and make the best use of Azure resources. This type of reporting can greatly aid in HDInsight cluster design choices (size, scale, storage, scalability options, etc.) to maximize your ROI on Azure expenses.
Unravel also provides data insights to enable decision making on the best strategy for storage in the cloud, by looking at specific metrics on usage patterns of tables and partitions in the on-premises cluster.
It can also identify unused or cold data. Once identified, one can then decide on the appropriate layout for the data in the cloud accordingly and make the best use of their Azure budget. Based on this information, one can distribute datasets most effectively across HDInsight storage options. For example, hottest data can be stored on disk or the highly performant object storage of Azure Data Lake Storage Gen 2 (hot), and the least used ones on the relatively less performant Azure Blob storage (cold).
There are two main options to migrate data from on-premises to Azure. Learn more information around the processes and data migration best practices.
Once you’ve identified which workloads to migrate, the planning gets a little more involved, requiring a proper APM tool to get the rest right. For everything to work properly in the cloud, you need to map out workload dependencies as they currently exist on-premises. This may be challenging when done manually, as these workloads rely on many different complex components. Incorrectly mapping these dependencies is one of the most common causes of big data application breakdowns in the cloud.
The Unravel platform provides a comprehensive and immediate readout of all the big data stack components involved in a workload. For example it could tell you that a streaming app is using Kafka, HBase, Spark, and Storm, and detail each component’s relationship with one another while also quantifying how much the app relies on each of these technologies. Knowing that the workload relies far more on Spark than Storm allows you to avoid under-provisioning Spark resources in the cloud and overprovisioning Storm.
Organizations face a similar challenge in determining resource such as disk, compute, and memory that they will need for the workload to run efficiently on the cloud. It’s a challenge to determine utilization metrics of these resources for on-premises clusters, and which services are consuming them. Unravel provides reports that precisely bring forth quantitative metrics around resources consumed by each big data workload. If resources have been overprovisioned and thereby wasted, as many organizations unknowingly do, the platform provides recommendations to reconfigure applications to maximize efficiency and optimize spend. These resource settings are then translated to Azure.
Since cloud adoption is an ongoing and iterative process, customers might want to look ahead and think about how resource needs will evolve throughout the year as business needs change. Unravel leverages predictive analytics based on previous trends to determine resource requirements in the cloud for up to six months out.
For example, workloads such as fraud detection employ several datasets including ATM transaction data, customer account data, charge location data, and government fraud data. Once in Azure, some apps require certain datasets to remain in Azure in order to work properly, while other datasets can remain on-premises without issue. Like app dependency mapping it’s difficult to determine which datasets an app needs to run properly. Other considerations are security, data governance laws (some sensitive data must remain in private datacenters in certain jurisdictions), as well as the size of data. Based on Unravel’s resource management and capacity planning reports customers can efficiently manage data placement in HDInsight storage options and on-premises to best suit their business requirements.
Unravel brings some additional visibility and predictive capabilities that can remove a lot of mystery and guesswork around Azure migrations. Unravel analyzes your big data workloads for both on-premises or for Azure HDInsight, and can provide chargeback reporting by user, department, application type, queue, or other customer defined tags.
As the final part of the planning phase, one will need to decide on the scale, VM sizes, and type of Azure HDInsight clusters to fit the workload type. This would depend on the business use case and priority of the given workload. For example a recommendation engine that needs to meet a stringent SLA at all times might require an auto-scaling HDInsight cluster so that it always has the compute resources it needs, but can also scale down during lean periods to optimize costs. Conversely if you have a workload that is fixed in resource requirements, such as a predictable batch processing app, one might want to deploy manual scaling HDInsight clusters, and size then optimally with the right VM sizes to keep costs under control.
Since choice of HDInsight VM instances is key to the success of the migration Unravel can infer the seasonality of big data workloads, and deliver recommendations for optimal server instance sizes in minutes instead of hours or days.
Given the default virtual machine sizes for HDInsight clusters provided by Microsoft, Unravel provides some additional intelligence to help choose the correct virtual machine sizes for data workloads based on three different migration strategies:
The planning phase is the critical first step towards any workload migration to HDInsight. Many organizations lack effective quantitative and qualitative guidance like the ones provided by Unravel APM during the critical planning process, and may face challenges downstream in areas of workload execution and cost optimization. Unravel’s robust APM platform can help navigate this planning phase complexity by providing tools for mapping workload dependencies, forecasting resource usage, and guiding decisions on which datasets to move, and this in turn can make the migration process much more efficient, data driven, and ultimately successful.
In our upcoming blog, we’ll look closely at migration to HDInsight.