What is AWS Step Functions?

A serverless orchestration service that coordinates AWS services with state machines.

What state types exist?

Task, Choice, Parallel, Map, Wait, Pass, Succeed, and Fail.

What is the difference between Standard and Express?

Standard runs long-lived, durable state machines with exactly-once semantics; Express handles high-volume, short-duration tasks with per-execution pricing.

step-functions

Scanned

npx machina-cli add skill itsmostafa/aws-agent-skills/step-functions --openclaw

Files (1)

SKILL.md

9.5 KB

AWS Step Functions

AWS Step Functions is a serverless orchestration service that lets you build and run workflows using state machines. Coordinate multiple AWS services into business-critical applications.

Core Concepts
Common Patterns
CLI Reference
Best Practices
Troubleshooting
References

Core Concepts

Workflow Types

Type	Description	Pricing
Standard	Long-running, durable, exactly-once	Per state transition
Express	High-volume, short-duration	Per execution (time + memory)

State Types

State	Description
Task	Execute work (Lambda, API call)
Choice	Conditional branching
Parallel	Execute branches concurrently
Map	Iterate over array
Wait	Delay execution
Pass	Pass input to output
Succeed	End successfully
Fail	End with failure

Amazon States Language (ASL)

JSON-based language for defining state machines.

Common Patterns

Simple Lambda Workflow

{
  "Comment": "Process order workflow",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ValidateOrder",
      "Next": "ProcessPayment"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ProcessPayment",
      "Next": "FulfillOrder"
    },
    "FulfillOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:FulfillOrder",
      "End": true
    }
  }
}

Create State Machine

AWS CLI:

aws stepfunctions create-state-machine \
  --name OrderWorkflow \
  --definition file://workflow.json \
  --role-arn arn:aws:iam::123456789012:role/StepFunctionsRole \
  --type STANDARD

boto3:

import boto3
import json

sfn = boto3.client('stepfunctions')

definition = {
    "Comment": "Order workflow",
    "StartAt": "ProcessOrder",
    "States": {
        "ProcessOrder": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...",
            "End": True
        }
    }
}

response = sfn.create_state_machine(
    name='OrderWorkflow',
    definition=json.dumps(definition),
    roleArn='arn:aws:iam::123456789012:role/StepFunctionsRole',
    type='STANDARD'
)

Start Execution

import boto3
import json

sfn = boto3.client('stepfunctions')

response = sfn.start_execution(
    stateMachineArn='arn:aws:states:us-east-1:123456789012:stateMachine:OrderWorkflow',
    name='order-12345',
    input=json.dumps({
        'order_id': '12345',
        'customer_id': 'cust-789',
        'items': [{'product_id': 'prod-1', 'quantity': 2}]
    })
)

execution_arn = response['executionArn']

Choice State (Conditional Logic)

{
  "StartAt": "CheckOrderValue",
  "States": {
    "CheckOrderValue": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.total",
          "NumericGreaterThan": 1000,
          "Next": "HighValueOrder"
        },
        {
          "Variable": "$.priority",
          "StringEquals": "rush",
          "Next": "RushOrder"
        }
      ],
      "Default": "StandardOrder"
    },
    "HighValueOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:function:ProcessHighValue",
      "End": true
    },
    "RushOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:function:ProcessRush",
      "End": true
    },
    "StandardOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:function:ProcessStandard",
      "End": true
    }
  }
}

Parallel Execution

{
  "StartAt": "ProcessInParallel",
  "States": {
    "ProcessInParallel": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "UpdateInventory",
          "States": {
            "UpdateInventory": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:...:function:UpdateInventory",
              "End": true
            }
          }
        },
        {
          "StartAt": "SendNotification",
          "States": {
            "SendNotification": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:...:function:SendNotification",
              "End": true
            }
          }
        },
        {
          "StartAt": "UpdateAnalytics",
          "States": {
            "UpdateAnalytics": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:...:function:UpdateAnalytics",
              "End": true
            }
          }
        }
      ],
      "Next": "Complete"
    },
    "Complete": {
      "Type": "Succeed"
    }
  }
}

Map State (Iteration)

{
  "StartAt": "ProcessItems",
  "States": {
    "ProcessItems": {
      "Type": "Map",
      "ItemsPath": "$.items",
      "MaxConcurrency": 10,
      "Iterator": {
        "StartAt": "ProcessItem",
        "States": {
          "ProcessItem": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:function:ProcessItem",
            "End": true
          }
        }
      },
      "ResultPath": "$.processedItems",
      "End": true
    }
  }
}

Error Handling

{
  "StartAt": "ProcessWithRetry",
  "States": {
    "ProcessWithRetry": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:function:Process",
      "Retry": [
        {
          "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
          "IntervalSeconds": 2,
          "MaxAttempts": 6,
          "BackoffRate": 2
        },
        {
          "ErrorEquals": ["States.Timeout"],
          "IntervalSeconds": 5,
          "MaxAttempts": 3,
          "BackoffRate": 1.5
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["CustomError"],
          "ResultPath": "$.error",
          "Next": "HandleCustomError"
        },
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "HandleAllErrors"
        }
      ],
      "End": true
    },
    "HandleCustomError": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:function:HandleCustom",
      "End": true
    },
    "HandleAllErrors": {
      "Type": "Fail",
      "Error": "ProcessingFailed",
      "Cause": "An error occurred during processing"
    }
  }
}

CLI Reference

State Machine Management

Command	Description
`aws stepfunctions create-state-machine`	Create state machine
`aws stepfunctions update-state-machine`	Update definition
`aws stepfunctions delete-state-machine`	Delete state machine
`aws stepfunctions list-state-machines`	List state machines
`aws stepfunctions describe-state-machine`	Get details

Executions

Command	Description
`aws stepfunctions start-execution`	Start execution
`aws stepfunctions stop-execution`	Stop execution
`aws stepfunctions describe-execution`	Get execution details
`aws stepfunctions list-executions`	List executions
`aws stepfunctions get-execution-history`	Get execution history

Best Practices

Design

Keep states focused — one purpose per state
Use meaningful state names
Implement comprehensive error handling
Use Parallel for independent tasks
Use Map for batch processing

Performance

Use Express workflows for high-volume, short tasks
Set appropriate timeouts
Limit Map concurrency to avoid throttling
Use SDK integrations when possible (avoid Lambda wrapper)

Reliability

Retry transient errors
Catch and handle specific errors
Use idempotent operations
Enable X-Ray tracing

Cost Optimization

Use Express for short workflows (< 5 minutes)
Combine related operations to reduce transitions
Use Wait states instead of Lambda delays

Troubleshooting

Execution Failed

# Get execution history
aws stepfunctions get-execution-history \
  --execution-arn arn:aws:states:us-east-1:123456789012:execution:MyWorkflow:exec-123 \
  --query 'events[?type==`TaskFailed` || type==`ExecutionFailed`]'

Lambda Timeout

Causes:

Lambda running too long
Task timeout too short

Fix:

{
  "Type": "Task",
  "Resource": "arn:aws:lambda:...",
  "TimeoutSeconds": 300,
  "HeartbeatSeconds": 60
}

State Stuck

Check:

Task state waiting for callback
Wait state not yet elapsed
Activity worker not responding

Invalid State Machine

# Validate definition
aws stepfunctions validate-state-machine-definition \
  --definition file://workflow.json

References

Source

git clone https://github.com/itsmostafa/aws-agent-skills/blob/main/skills/step-functions/SKILL.mdView on GitHub

Overview

AWS Step Functions provides serverless workflow orchestration by coordinating AWS services through state machines. It supports Standard long-running flows and Express high-volume tasks, with built-in error handling, retries, and parallel execution. This skill helps you design, implement, and debug end-to-end workflows.

How This Skill Works

Define a workflow in Amazon States Language (ASL) as a JSON state machine. Step Functions executes states such as Task, Choice, Parallel, Map, and Wait, while managing retries, backoffs, and data flow between steps. It integrates with Lambda, API Gateway, and other AWS services, and you can monitor executions in the console.

When to Use It

Coordinating multiple AWS services into a business process (e.g., order processing, data pipelines).
Implementing error handling, retries with backoff, and failure paths.
Running long-lived workflows (Standard) or high-volume, short-duration tasks (Express).
Orchestrating parallel branches to improve throughput and efficiency.
Debugging and auditing executions with step-by-step state history.

Quick Start

Step 1: Define your ASL state machine JSON that models the workflow.
Step 2: Deploy the state machine using AWS CLI or boto3 to create it.
Step 3: Start an execution with input data and monitor via the console or CloudWatch.

Best Practices

Choose Standard vs Express based on durability, latency, and scale needs.
Design idempotent tasks and implement retries with exponential backoff.
Keep state machine definitions focused; use Map/Parallel to handle batches.
Filter sensitive data from inputs/outputs and minimize payloads.
Enable CloudWatch logs, metrics, and alarms to monitor executions.

Example Use Cases

Simple Lambda Workflow: ValidateOrder → ProcessPayment → FulfillOrder.
Create and deploy a state machine via AWS CLI (create-state-machine) or boto3 (create_state_machine).
Start an execution with an input payload like an order and track its progress.
Routing with Choice state based on total value or priority (high-value or rush orders).
Processing items in parallel using Parallel or Map states to speed up workloads.

Frequently Asked Questions

Add this skill to your agents

step-functions

AWS Step Functions

Table of Contents

Core Concepts

Workflow Types

State Types

Amazon States Language (ASL)

Common Patterns

Simple Lambda Workflow

Create State Machine

Start Execution

Choice State (Conditional Logic)

Parallel Execution

Map State (Iteration)

Error Handling

CLI Reference

State Machine Management

Executions

Best Practices

Design

Performance

Reliability

Cost Optimization

Troubleshooting

Execution Failed

Lambda Timeout

State Stuck

Invalid State Machine

References

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is AWS Step Functions?

What state types exist?

What is the difference between Standard and Express?