AWS re:Invent 2025 - Building Production-Grade Workflow Patterns with AWS Step Functions (API313)
This presentation by Eric Johnson, a Principal Developer Advocate for serverless at AWS, focuses on building production-grade workflow patterns with AWS Step Functions. He introduces the concept using a hypothetical scenario of "Tony's Pizza" experiencing massive orchestration challenges due to a "Lambda spaghetti architecture" (3:51-5:13). The core of the presentation revolves around how Step Functions can address these issues and optimize costs.
Step Function Superpowers (6:52-19:50)
1. Drag and Drop Visual Design
This feature allows for intuitive workflow creation by leveraging the AWS SDK, opening up thousands of available service integrations. It provides a real-time execution view and can be used directly within an IDE with the AWS Toolkit (7:05-8:50).
2. Direct Service Integration
Step Functions can directly make SDK calls to other AWS services, eliminating the need for intermediary Lambda functions for simple tasks. This aligns with the principle of "Don't use Lambda to transport, but use Lambda to transform" (9:08-10:27).
3. Built-in Error Handling
Step Functions offer robust error handling capabilities, including catch blocks for specific errors and the ability to configure dead-letter queues. This reduces the burden of writing custom error handling code (10:33-12:50).
4. Data Transformation with Variables and JSONata
Step Functions allow for assigning and consuming variables to pass data between states, significantly reducing code and transitions. The introduction of JSONata replaces JSON path and enables complex business logic and formulas within the workflow, potentially eliminating many "pass" states (12:36-15:30).
5. Advanced Workflow Conditional Logic
This enables dynamic routing and business logic with no-code branching and multiple conditions. It's crucial for building flexible and responsive workflows (15:37-16:04).
6. Parallel Processing
Step Functions allow for concurrent execution of tasks, which can drastically reduce overall processing time and save on transitions. An example given is video processing where segments are processed in parallel and then stitched back together (16:07-17:46).
7. Dynamic or Inline Maps and Distributed Mapping
These features enable processing of collections of data. Distributed mapping is particularly powerful, allowing for the spinning up of up to 10,000 Express Step Functions for large-scale, parallel processing tasks, leading to significant cost and time savings (17:48-20:47).
Cost Optimization and Workflow Patterns
Understanding Step Function Pricing
Express Step Functions
Request and duration-based pricing, volume-friendly, memory and duration sensitive, suitable for high-volume, short-duration, cost-sensitive, real-time processing with simple logic (21:40-23:05).
Standard Step Functions
Transition-based pricing, duration-friendly (can last up to a year), suitable for long-running, complex logic, audit requirements, human interaction, and exactly-once semantics (23:06-24:32).
Choosing Between Express and Standard
The general rule of thumb is to start with Express and move to Standard when duration or feature set requires it (24:40-25:32).
Eliminating Callbacks/Polling (28:35-32:00)
Polling is identified as a significant cost driver and can be replaced by:
Request and Response Pattern
Fire and forget, no waiting for a response (29:51-30:17).
.sync Pattern
Waits for a response but doesn't return data, suitable for services that offer this integration (30:24-31:00).
Task Token Pattern
The most powerful, sends a request and waits for a callback with a specific token, allowing data to be returned. This is demonstrated with Stripe integration and an IoT button for kitchen prep (31:05-32:56). Proper timeout settings are critical when using callbacks (33:09-34:00).
Activities Pattern (35:20-36:44)
This lesser-known but powerful pattern allows Step Functions to create an activity, which acts as a managed queue. Any number of Step Functions can send data to this activity, and a worker (e.g., a Lambda function or an ECS container) can pick up the task and call back when done. This significantly reduced costs for a company from $450 to $1 for a single invocation.
Direct API Gateway to Step Function Integration (37:02-38:29)
Instead of using an intermediary Lambda function for routing, API Gateway can directly invoke a Step Function. This requires VTL (Velocity Templating Language) for input and output transformations, which can be complex but is made easier with tools like Kirao.
Cost Savings Results
By implementing these patterns and strategies, Tony's Pizza's baseline monthly cost was reduced significantly:
- Initial monthly total for Step Functions and Lambda: $1,770 (27:53-27:56)
- After removing two loops with callbacks: $732 (savings of $1,044) (34:28-34:32)
- Further optimization by replacing the remaining polling loop with an activity pattern led to a total monthly savings of $587, eliminating all polling loops (36:51-36:57)
Presenter: Eric Johnson - Principal Developer Advocate for Serverless at AWS