This blog post originally appeared on Medium.
One of the highly anticipated events every year is the keynote from Dr. Werner Vogels at the annual AWS Reinvent conference. As CTO of Amazon, Dr. Vogels has considerable influence on product and engineering innovation that directly impacts hundreds of millions of users and developers. Here are three takeaways from Dr. Vogels’ keynote this year.
1. Cloud Power is on a Tear
Dr. Vogels gave numerous examples of the sheer power and acceleration that the cloud gives to product and engineering teams. A few that caught my attention were:
- Nicole Yip from the Lego Group came to stage to talk about how Lego went from a monolithic on-premises deployment that simply could not handle spiky workloads to a cloud-native architecture that could. I could relate to Nicole’s talk. My son is into Lego Robotics big time. Our whole family stayed up at night to purchase the new Lego 51515 Robot Inventor Kit when it was launched on Oct 15. We were thrilled when our order went through hassle-free as soon as the sale opened.
- Testing the resilience of applications to failure and load is a hard task where I have often seen developers cutting corners due to lack of experience or tooling. Dr. Vogels announced the imminent launch of the AWS Fault Injection Simulator which makes it easier than ever to discover an application’s weaknesses at scale — effectively, bringing “chaos engineering as a service” to every developer, and not limited to the giant engineering teams at companies like Alibaba, Google, and Netflix.
- The AWS Graviton2 processor gives 40% better performance at 20% lesser cost, and can be readily used via the newer EC2 instances like C6g and R6g without breaking a sweat.
- Formal verification was used to show that S3 now delivers read-after-write consistency and not just eventual consistency. This guarantee will unleash a new wave of data-intensive applications that can leverage S3’s strong consistency along with its low cost, high scale, high concurrency, and high availability.
2. With Great (Cloud) Power comes Great (Operational) Responsibility
Amongst all these spectacular developments, Dr. Vogels cautioned how the time and effort spent developing any application are typically small compared to the time and effort needed to maintain the application in production (aka operations). A lot of challenges arise from the increasing complexity and strict SLAs as applications go into production: unpredictable data volumes, data skew and load imbalance, unforeseen bottlenecks, contention with other applications, slowness or failures of dependent components, and others.
Lack of observability into the application and the various platforms, services, and tenants whose performance can impact the application is similar to flying a plane through heavy clouds without the aid of flight instruments (no pun intended!). Dr. Vogels announced two observability initiatives from AWS:
- With hundreds of thousands of deployments, Prometheus is arguably now the de facto standard for collecting metrics in cloud-native architectures. And for creating dashboards, you can’t go wrong with Grafana. Thus, I was fully expecting AWS to launch managed services for Prometheus and Grafana — which were indeed among Dr. Vogels’ key announcements.
- OpenTelemetry is an open standard that is gaining momentum for capturing telemetry data. OpenTelemetry provides a software framework to collect metrics, distributed traces, resource metadata, and logs from the entire software stack and to send this data to supporting storage backends like Prometheus and Elasticsearch. The AWS distro for OpenTelemetry brings OpenTelemetry support to AWS services like EC2, Elastic Kubernetes Service (EKS), and AWS Lambda.
3. Innovation using Data-driven Algorithms and AI/ML: The Next Frontier
My favorite Dr. Vogels quote from the keynote was: “We have covered a lot of ground today. We have talked about the importance of development, how to build dependable applications, and how to effectively run them. If you paid close attention, you will notice that there has been a trend with all these things. More and more, AWS is taking tasks that can be slow, difficult, or time-consuming, and making them easier to use by using advanced technologies to simplify them. These technologies can include automated reasoning or even machine learning.”
Case in point, the large and diverse telemetry data collected from modern applications and systems overwhelms the capabilities of even the most skilled developers and operators today. The rise of frameworks like OpenTelemetry and AWS Fault Injection Simulator add to the volume and diversity of telemetry data collected. As Dr. Vogels points out, innovation in automated methods — powered by the likes of AI/ML, formal verification, etc. — becomes critical to ensure reliable, efficient, and streamlined operations.
I couldn’t agree more since this is exactly why our customers use the Unravel product to simplify their big data operations. Join us to build the future of automation from telemetry data or take our product for a spin on your operational challenges with a free trial.