Skip to content

AWS re:Invent 2025 - Keynote with Peter DeSantis and Dave Brown

In this keynote, Peter DeSantis and Dave Brown, AWS Vice Presidents, deliver a deep dive into the technology powering AWS services, emphasizing their unique approach and culture of innovation from silicon to services.

Key takeaways from the video include:

Core Attributes of AWS Cloud (1:53-7:01)

Peter DeSantis highlighted the foundational principles that guide AWS's development:

  • Security: Emphasized as the first priority, especially with the rise of AI tools being used by malicious actors (2:32)
  • Availability: Crucial for the sheer size and scale of modern applications (2:56)
  • Elasticity: The ability to scale capacity on demand, removing the need for customers to plan infrastructure (3:18). This is being extended to AI workloads to provide the same level of elasticity as services like S3 (3:59)
  • Cost Optimization: AWS is heavily investing in lowering the cost of building and running AI models and workloads (4:27)
  • Agility: The cloud enables companies to move, innovate, and pivot faster (5:26). AWS fosters this by providing a broad and deep set of building blocks (6:13)

Deep Investments in Hardware and Architecture (7:12-12:00)

DeSantis elaborated on AWS's long-term commitment to deep investments, particularly in custom silicon:

Nitro System (9:27)

Introduced as a breakthrough in 2010 to solve "jitter" and achieve bare-metal performance by offloading virtualization to dedicated hardware. Nitro also enhances security, performance, and supports a wide range of instance types (9:49). This system is now part of the seventh edition of a classic computer science textbook (10:51).

Custom Chips (11:49)

Nitro led AWS to build its own chips, including Graviton (server processors) and Trainium (AI chips).

Graviton Processors (12:00-21:29)

Dave Brown discussed Graviton processors, designed specifically for cloud workloads:

Purpose-built for Cloud (12:49)

Graviton was designed from the ground up to deliver the best price-performance for everyday cloud workloads. Organizations like Adobe, Epic Games, Formula 1, Pinterest, and SAP have seen significant performance improvements and cost reductions with Graviton (13:16).

Innovation in Cooling (15:58)

AWS developed a direct-to-silicon cooling solution for Graviton, removing traditional thermal layers to improve efficiency and reduce fan power by 33%.

Graviton 4 (18:28)

Doubled the L2 cache size (from 1MB to 2MB per core) and increased core count by 50% and L3 cache by 12%, leading to up to 30% better performance than Graviton 3 (18:37). It also added a coherent link between two CPUs for up to 192 vCPUs (19:31).

Graviton 5 (20:16)

Announcement: Graviton 5 delivers 192 cores in one package with over five times the L3 cache of previous generations, resulting in 2.6 times more L3 cache per core (20:24).

Announcement: The AWS Graviton5-based Amazon EC2 M9g instances were announced, offering up to 25% better performance than M8G and the best price-performance in EC2 today (20:44). Early customers like Airbnb, Atlassian, Honeycomb IO, and SAP are already seeing significant improvements (20:59).

Apple's Use of Swift on AWS Graviton (21:30-27:57)

Pam Murashidi from Apple discussed their experience building large internet services on AWS:

  • Apple uses AWS for services like App Store, Apple Music, Apple TV, and Podcasts (22:05)
  • They faced challenges with older languages (Java, C++) for scalability and support (23:32)
  • Apple's Swift language, known for performance, modern design, and safety features, is being increasingly adopted for server-side development, including on AWS Graviton (23:53)
  • A key example is their spam detection feature in iOS 26, which uses Swift on AWS Graviton for compute-heavy operations, maintaining privacy for hundreds of millions of users with homomorphic encryption (25:50)

Announcement: A native Swift toolchain is now available for Amazon Linux, making it the first distribution with an official package (27:33).

Evolution of Serverless Compute (28:16-35:41)

DeSantis discussed the evolution of compute, specifically the serverless paradigm:

Lambda's Origin (29:56)

Born from the need to process S3 images efficiently without managing server fleets, Lambda enabled developers to focus on code rather than infrastructure (31:26).

Lambda Managed Instances (34:09)

Announcement: Lambda Managed Instances represent the next evolution of serverless compute, bridging the gap between serverless simplicity and infrastructure control. With managed instances, Lambda functions run on EC2 instances within a customer's account, with Lambda handling provisioning, patching, availability, and scaling. Customers choose the instance type and hardware, while retaining Lambda's simplicity and operational model (34:15).

Takeaway: This opens doors for workloads previously outside Lambda, such as video processing, ML pre-processing, and high-throughput analytics (34:53).

Innovations in AI Inference (35:45-43:32)

DeSantis detailed AWS's approach to optimizing AI inference workloads:

Inference Challenges (36:27)

AI inference is complex, involving a four-stage pipeline (tokenization, prefill, decode, detokenization) with varying resource demands and latency sensitivities.

Project Mantle (38:31)

Announcement: A new inference engine powering many Amazon Bedrock models.

Bedrock Service Tiers (38:48)

Announcement: Bedrock Service Tiers allow customers to assign inference requests to different priority lanes (priority, standard, flex) to optimize for latency or efficiency.

Takeaway: Bedrock ensures fair performance by giving each customer their own queue, isolating performance from other customers' bursts (39:21).

Journal (40:27)

Announcement: A durable transaction log that continuously captures the state of each inference request, allowing resumption from failure points and enabling advanced scheduling strategies like fine-tuning (40:51).

Security and Privacy (41:46)

Bedrock integrates confidential computing to protect model weights and customer data during inference, ensuring data encryption and cryptographic assurance.

Deep Integration with AWS (42:24)

Bedrock offers tool calling support for Lambda functions, integration with OpenAI's responses API, IAM for permissions, and CloudWatch for observability.

Vector Search Capabilities (44:00-48:20)

DeSantis concluded by highlighting advancements in vector search:

Understanding Vectors (45:06)

Vectors represent data points in a mathematical space, allowing computers to understand relationships between concepts and ideas, similar to human cognition.

High-Dimensional Vectors (46:11)

Real-world vectors can have thousands of dimensions, generated by AI embedding models that learn patterns and cluster objects.

Vector Databases (47:07)

Unlike traditional databases, vector databases are built to find nearest matches to these vectors, enabling more nuanced and relevant searches.

Use Case (47:54)

Vector search is crucial for organizing and finding unstructured institutional knowledge buried in various formats like PDFs, video calls, and documents.

Resources