step-functions
Scannednpx machina-cli add skill itsmostafa/aws-agent-skills/step-functions --openclawAWS Step Functions
AWS Step Functions is a serverless orchestration service that lets you build and run workflows using state machines. Coordinate multiple AWS services into business-critical applications.
Table of Contents
Core Concepts
Workflow Types
| Type | Description | Pricing |
|---|---|---|
| Standard | Long-running, durable, exactly-once | Per state transition |
| Express | High-volume, short-duration | Per execution (time + memory) |
State Types
| State | Description |
|---|---|
| Task | Execute work (Lambda, API call) |
| Choice | Conditional branching |
| Parallel | Execute branches concurrently |
| Map | Iterate over array |
| Wait | Delay execution |
| Pass | Pass input to output |
| Succeed | End successfully |
| Fail | End with failure |
Amazon States Language (ASL)
JSON-based language for defining state machines.
Common Patterns
Simple Lambda Workflow
{
"Comment": "Process order workflow",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ValidateOrder",
"Next": "ProcessPayment"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ProcessPayment",
"Next": "FulfillOrder"
},
"FulfillOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:FulfillOrder",
"End": true
}
}
}
Create State Machine
AWS CLI:
aws stepfunctions create-state-machine \
--name OrderWorkflow \
--definition file://workflow.json \
--role-arn arn:aws:iam::123456789012:role/StepFunctionsRole \
--type STANDARD
boto3:
import boto3
import json
sfn = boto3.client('stepfunctions')
definition = {
"Comment": "Order workflow",
"StartAt": "ProcessOrder",
"States": {
"ProcessOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"End": True
}
}
}
response = sfn.create_state_machine(
name='OrderWorkflow',
definition=json.dumps(definition),
roleArn='arn:aws:iam::123456789012:role/StepFunctionsRole',
type='STANDARD'
)
Start Execution
import boto3
import json
sfn = boto3.client('stepfunctions')
response = sfn.start_execution(
stateMachineArn='arn:aws:states:us-east-1:123456789012:stateMachine:OrderWorkflow',
name='order-12345',
input=json.dumps({
'order_id': '12345',
'customer_id': 'cust-789',
'items': [{'product_id': 'prod-1', 'quantity': 2}]
})
)
execution_arn = response['executionArn']
Choice State (Conditional Logic)
{
"StartAt": "CheckOrderValue",
"States": {
"CheckOrderValue": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.total",
"NumericGreaterThan": 1000,
"Next": "HighValueOrder"
},
{
"Variable": "$.priority",
"StringEquals": "rush",
"Next": "RushOrder"
}
],
"Default": "StandardOrder"
},
"HighValueOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessHighValue",
"End": true
},
"RushOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessRush",
"End": true
},
"StandardOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessStandard",
"End": true
}
}
}
Parallel Execution
{
"StartAt": "ProcessInParallel",
"States": {
"ProcessInParallel": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "UpdateInventory",
"States": {
"UpdateInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:UpdateInventory",
"End": true
}
}
},
{
"StartAt": "SendNotification",
"States": {
"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:SendNotification",
"End": true
}
}
},
{
"StartAt": "UpdateAnalytics",
"States": {
"UpdateAnalytics": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:UpdateAnalytics",
"End": true
}
}
}
],
"Next": "Complete"
},
"Complete": {
"Type": "Succeed"
}
}
}
Map State (Iteration)
{
"StartAt": "ProcessItems",
"States": {
"ProcessItems": {
"Type": "Map",
"ItemsPath": "$.items",
"MaxConcurrency": 10,
"Iterator": {
"StartAt": "ProcessItem",
"States": {
"ProcessItem": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessItem",
"End": true
}
}
},
"ResultPath": "$.processedItems",
"End": true
}
}
}
Error Handling
{
"StartAt": "ProcessWithRetry",
"States": {
"ProcessWithRetry": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:Process",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
},
{
"ErrorEquals": ["States.Timeout"],
"IntervalSeconds": 5,
"MaxAttempts": 3,
"BackoffRate": 1.5
}
],
"Catch": [
{
"ErrorEquals": ["CustomError"],
"ResultPath": "$.error",
"Next": "HandleCustomError"
},
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "HandleAllErrors"
}
],
"End": true
},
"HandleCustomError": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:HandleCustom",
"End": true
},
"HandleAllErrors": {
"Type": "Fail",
"Error": "ProcessingFailed",
"Cause": "An error occurred during processing"
}
}
}
CLI Reference
State Machine Management
| Command | Description |
|---|---|
aws stepfunctions create-state-machine | Create state machine |
aws stepfunctions update-state-machine | Update definition |
aws stepfunctions delete-state-machine | Delete state machine |
aws stepfunctions list-state-machines | List state machines |
aws stepfunctions describe-state-machine | Get details |
Executions
| Command | Description |
|---|---|
aws stepfunctions start-execution | Start execution |
aws stepfunctions stop-execution | Stop execution |
aws stepfunctions describe-execution | Get execution details |
aws stepfunctions list-executions | List executions |
aws stepfunctions get-execution-history | Get execution history |
Best Practices
Design
- Keep states focused — one purpose per state
- Use meaningful state names
- Implement comprehensive error handling
- Use Parallel for independent tasks
- Use Map for batch processing
Performance
- Use Express workflows for high-volume, short tasks
- Set appropriate timeouts
- Limit Map concurrency to avoid throttling
- Use SDK integrations when possible (avoid Lambda wrapper)
Reliability
- Retry transient errors
- Catch and handle specific errors
- Use idempotent operations
- Enable X-Ray tracing
Cost Optimization
- Use Express for short workflows (< 5 minutes)
- Combine related operations to reduce transitions
- Use Wait states instead of Lambda delays
Troubleshooting
Execution Failed
# Get execution history
aws stepfunctions get-execution-history \
--execution-arn arn:aws:states:us-east-1:123456789012:execution:MyWorkflow:exec-123 \
--query 'events[?type==`TaskFailed` || type==`ExecutionFailed`]'
Lambda Timeout
Causes:
- Lambda running too long
- Task timeout too short
Fix:
{
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"TimeoutSeconds": 300,
"HeartbeatSeconds": 60
}
State Stuck
Check:
- Task state waiting for callback
- Wait state not yet elapsed
- Activity worker not responding
Invalid State Machine
# Validate definition
aws stepfunctions validate-state-machine-definition \
--definition file://workflow.json
References
Source
git clone https://github.com/itsmostafa/aws-agent-skills/blob/main/skills/step-functions/SKILL.mdView on GitHub Overview
AWS Step Functions provides serverless workflow orchestration by coordinating AWS services through state machines. It supports Standard long-running flows and Express high-volume tasks, with built-in error handling, retries, and parallel execution. This skill helps you design, implement, and debug end-to-end workflows.
How This Skill Works
Define a workflow in Amazon States Language (ASL) as a JSON state machine. Step Functions executes states such as Task, Choice, Parallel, Map, and Wait, while managing retries, backoffs, and data flow between steps. It integrates with Lambda, API Gateway, and other AWS services, and you can monitor executions in the console.
When to Use It
- Coordinating multiple AWS services into a business process (e.g., order processing, data pipelines).
- Implementing error handling, retries with backoff, and failure paths.
- Running long-lived workflows (Standard) or high-volume, short-duration tasks (Express).
- Orchestrating parallel branches to improve throughput and efficiency.
- Debugging and auditing executions with step-by-step state history.
Quick Start
- Step 1: Define your ASL state machine JSON that models the workflow.
- Step 2: Deploy the state machine using AWS CLI or boto3 to create it.
- Step 3: Start an execution with input data and monitor via the console or CloudWatch.
Best Practices
- Choose Standard vs Express based on durability, latency, and scale needs.
- Design idempotent tasks and implement retries with exponential backoff.
- Keep state machine definitions focused; use Map/Parallel to handle batches.
- Filter sensitive data from inputs/outputs and minimize payloads.
- Enable CloudWatch logs, metrics, and alarms to monitor executions.
Example Use Cases
- Simple Lambda Workflow: ValidateOrder → ProcessPayment → FulfillOrder.
- Create and deploy a state machine via AWS CLI (create-state-machine) or boto3 (create_state_machine).
- Start an execution with an input payload like an order and track its progress.
- Routing with Choice state based on total value or priority (high-value or rush orders).
- Processing items in parallel using Parallel or Map states to speed up workloads.