Press Releases Product 5 Min Read

Unravel Expands Data Ingestion and MPP Support with Apache Kafka and Cloudera Impala

Press Releases Product 5 Min Read
  • Twitter
  • Facebook
  • LinkedIn

Unravel Extends its Industry Leading Position in Application Performance Management for Big Data Operations

MENLO PARK, CA—June 6, 2017—Unravel Data, the industry’s only Application Performance Management (APM) platform designed for Big Data, today announced that it has integrated support for Cloudera Impala and Apache Kafka into its platform, allowing users to derive the maximum value from those applications. Unravel continues to offer the only full-stack solution that doesn’t just monitor and unify system-level data, but rather tracks, correlates, and interprets performance data across the full-stack in-order to optimize, troubleshoot, and analyze from a single pane. Following closely on the heels of its recent announcement of Unravel 4.0, the platform continues to advance in its support of industry’s leading mission-critical big data applications.

Impala and Kafka have become critical components of the Big Data stack. As companies continue to adopt multiple Big Data technologies for their needs, the complexity and time required to diagnose and resolve performance problems have grown exponentially. In fact, according to Gartner’s March 2017 Market Guide for Hadoop Operations Providers, “Scaling Hadoop from small, pilot projects to large-scale production clusters involves a steep learning curve in terms of operational know-how that many enterprises are unprepared for.” The challenge is to take an application-first approach to manage a single view of performance within and across the many components of the stack that underpin applications like ETL, Machine Learning, and analytics. Unravel was the first company—and remains the only company—to tackle this challenge across the Big Data stack.


Cloudera Impala is an SQL query engine that runs on Apache Hadoop. It brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. But Impala queries can suffer from poor performance and failures.

Unravel for Impala boosts query performance and improves reliability. Unravel provides visuals and deep analysis for Impala queries helping users understand quickly why their query is slow. Unravel automatically tunes Impala queries for the best performance by considering many possible root causes of query slowdown and taking autonomous measures to improve query performance. For example, resource contention on some Impala workers, memory allocation problems, excessive data spilling, suboptimal join processing, presence of data skew, poor choice of data partitioning, storage, or caching, and many others.

Unravel also enables users to understand the true multi-tenant nature of their Impala workloads, helping them to avoid hotspots and failures.

With Unravel, users can quickly understand bottlenecks and causes of unpredictable performance in their Impala-based applications, allowing them to deliver on their SLAs (Service Level Agreements). Users can also identify which SQL queries from engines such as Hive or Spark are good candidates for running in Impala, and which queries are not.


Apache Kafka is an open-source stream processing platform that aims to provide a unified, high-throughput, low-latency methodology for handling real-time data feeds. Kafka has seen surging adoption—in fact, in a recent survey, 86 percent of respondents indicated that they were increasing their Kafka use. However, because Kafka is widely integrated into enterprise-level infrastructures, monitoring Kafka performance at scale has become an increasingly important issue.

In addition, while Big Data applications use Kafka as a data source for a variety of use cases (such as ingesting data into clusters, live data feeds to enable real-time processing, and replicating input data across clusters and data centers to provide 24×7 application uptime), managing these applications has become problematical. Today, problems with Kafka can arise at the level of applications and multi-tenant infrastructures, but also within Kafka itself.

Unravel for Kafka manages not only the performance, but also the predictability and reliability, of Kafka-based applications. For example, it correlates application KPIs with infrastructure and Kafka performance metrics, allowing rapid problem diagnosis in general as well as fast failure detection for Kafka-based applications.

In addition, Unravel troubleshoots and tunes applications that are unable to keep up with input data rates in Kafka. Unravel also remediates load imbalances in Kafka caused by factors such as poor data partitioning or multi-tenancy. And because Unravel understands resource allocation across the entire Big Data stack, it helps in planning Kafka capacity, which is critical to meeting application SLAs.

“There are arguably two forces at work in the evolution of Big Data,” said Kunal Agarwal, CEO at Unravel. “First is the proliferation of applications—such as Hadoop and Spark, and now Impala and Kafka—each with its own role in parsing, ingesting, or analyzing data to help owners of that data profit from it. Second is the growing number of interactions between apps on the stack, which are multiplying almost geometrically.  In such an environment, something will break—in fact, it’s bound to break. Big Data practitioners come to the table with that understanding, and yet they have no tools to cope with that complexity. And the challenges continue to multiply with the exponential growth of data in most organizations. Unravel Data—from conception, to launch, to mainstream adoption by users—is helping users find their way in making Big Data live up to its promise.”


Unravel 4.0, with support for Impala and Kafka, is available now. Companies such as Autodesk and are already using Unravel Data to manage the performance of their production Big Data systems. Unravel Data is available immediately for on-premises, cloud or hybrid Big Data deployments. Unravel Data currently supports Hadoop, Spark, Impala, and Kafka, with plans to add support for other systems such as for data ingestion (Storm, Flume), NoSQL systems (Cassandra, HBase) and MPP systems (RedShift, Presto, Drill). Unravel Data fully supports secure deployments with Kerberos, Apache Sentry, Encryption, etc.

Additional Information

Free Trial

Data Sheet

Case Studies