bedrock
Scannednpx machina-cli add skill itsmostafa/aws-agent-skills/bedrock --openclawAWS Bedrock
Amazon Bedrock provides access to foundation models (FMs) from AI companies through a unified API. Build generative AI applications with text generation, embeddings, and image generation capabilities.
Table of Contents
Core Concepts
Foundation Models
Pre-trained models available through Bedrock:
- Claude (Anthropic): Text generation, analysis, coding
- Titan (Amazon): Text, embeddings, image generation
- Llama (Meta): Open-weight text generation
- Mistral: Efficient text generation
- Stable Diffusion (Stability AI): Image generation
Model Access
Models must be enabled in your account before use:
- Request access in Bedrock console
- Some models require acceptance of EULAs
- Access is region-specific
Inference Types
| Type | Use Case | Pricing |
|---|---|---|
| On-Demand | Variable workloads | Per token |
| Provisioned Throughput | Consistent high-volume | Hourly commitment |
| Batch Inference | Async large-scale | Discounted per token |
Common Patterns
Invoke Model (Text Generation)
AWS CLI:
# Invoke Claude
aws bedrock-runtime invoke-model \
--model-id anthropic.claude-3-sonnet-20240229-v1:0 \
--content-type application/json \
--accept application/json \
--body '{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain AWS Lambda in 3 sentences."}
]
}' \
response.json
cat response.json | jq -r '.content[0].text'
boto3:
import boto3
import json
bedrock = boto3.client('bedrock-runtime')
def invoke_claude(prompt, max_tokens=1024):
response = bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': max_tokens,
'messages': [
{'role': 'user', 'content': prompt}
]
})
)
result = json.loads(response['body'].read())
return result['content'][0]['text']
# Usage
response = invoke_claude('What is Amazon S3?')
print(response)
Streaming Response
import boto3
import json
bedrock = boto3.client('bedrock-runtime')
def stream_claude(prompt):
response = bedrock.invoke_model_with_response_stream(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': 1024,
'messages': [
{'role': 'user', 'content': prompt}
]
})
)
for event in response['body']:
chunk = json.loads(event['chunk']['bytes'])
if chunk['type'] == 'content_block_delta':
yield chunk['delta'].get('text', '')
# Usage
for text in stream_claude('Write a haiku about cloud computing.'):
print(text, end='', flush=True)
Generate Embeddings
import boto3
import json
bedrock = boto3.client('bedrock-runtime')
def get_embedding(text):
response = bedrock.invoke_model(
modelId='amazon.titan-embed-text-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
'inputText': text,
'dimensions': 1024,
'normalize': True
})
)
result = json.loads(response['body'].read())
return result['embedding']
# Usage
embedding = get_embedding('AWS Lambda is a serverless compute service.')
print(f'Embedding dimension: {len(embedding)}')
Conversation with History
import boto3
import json
bedrock = boto3.client('bedrock-runtime')
class Conversation:
def __init__(self, system_prompt=None):
self.messages = []
self.system = system_prompt
def chat(self, user_message):
self.messages.append({
'role': 'user',
'content': user_message
})
body = {
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': 1024,
'messages': self.messages
}
if self.system:
body['system'] = self.system
response = bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
assistant_message = result['content'][0]['text']
self.messages.append({
'role': 'assistant',
'content': assistant_message
})
return assistant_message
# Usage
conv = Conversation(system_prompt='You are an AWS solutions architect.')
print(conv.chat('What database should I use for a chat application?'))
print(conv.chat('What about for time-series data?'))
List Available Models
# List all foundation models
aws bedrock list-foundation-models \
--query 'modelSummaries[*].[modelId,modelName,providerName]' \
--output table
# Filter by provider
aws bedrock list-foundation-models \
--by-provider anthropic \
--query 'modelSummaries[*].modelId'
# Get model details
aws bedrock get-foundation-model \
--model-identifier anthropic.claude-3-sonnet-20240229-v1:0
Request Model Access
# List model access status
aws bedrock list-foundation-model-agreement-offers \
--model-id anthropic.claude-3-sonnet-20240229-v1:0
CLI Reference
Bedrock (Control Plane)
| Command | Description |
|---|---|
aws bedrock list-foundation-models | List available models |
aws bedrock get-foundation-model | Get model details |
aws bedrock list-custom-models | List fine-tuned models |
aws bedrock create-model-customization-job | Start fine-tuning |
aws bedrock list-provisioned-model-throughputs | List provisioned capacity |
Bedrock Runtime (Data Plane)
| Command | Description |
|---|---|
aws bedrock-runtime invoke-model | Invoke model synchronously |
aws bedrock-runtime invoke-model-with-response-stream | Invoke with streaming |
aws bedrock-runtime converse | Multi-turn conversation API |
aws bedrock-runtime converse-stream | Streaming conversation |
Bedrock Agent Runtime
| Command | Description |
|---|---|
aws bedrock-agent-runtime invoke-agent | Invoke a Bedrock agent |
aws bedrock-agent-runtime retrieve | Query knowledge base |
aws bedrock-agent-runtime retrieve-and-generate | RAG query |
Best Practices
Cost Optimization
- Use appropriate models: Smaller models for simple tasks
- Set max_tokens: Limit output length when possible
- Cache responses: For repeated identical queries
- Batch when possible: Use batch inference for bulk processing
- Monitor usage: Set up CloudWatch alarms for cost
Performance
- Use streaming: For better user experience with long outputs
- Connection pooling: Reuse boto3 clients
- Regional deployment: Use closest region to reduce latency
- Provisioned throughput: For consistent high-volume workloads
Security
- Least privilege IAM: Only grant needed model access
- VPC endpoints: Keep traffic private
- Guardrails: Implement content filtering
- Audit with CloudTrail: Track model invocations
IAM Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
]
}
]
}
Troubleshooting
AccessDeniedException
Causes:
- Model access not enabled in console
- IAM policy missing
bedrock:InvokeModel - Wrong model ID or region
Debug:
# Check model access status
aws bedrock list-foundation-models \
--query 'modelSummaries[?modelId==`anthropic.claude-3-sonnet-20240229-v1:0`]'
# Test IAM permissions
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:role/my-role \
--action-names bedrock:InvokeModel \
--resource-arns "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
ModelNotReadyException
Cause: Model is still being provisioned or temporarily unavailable.
Solution: Implement retry with exponential backoff:
import time
from botocore.exceptions import ClientError
def invoke_with_retry(bedrock, body, max_retries=3):
for attempt in range(max_retries):
try:
return bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
body=json.dumps(body)
)
except ClientError as e:
if e.response['Error']['Code'] == 'ModelNotReadyException':
time.sleep(2 ** attempt)
else:
raise
raise Exception('Max retries exceeded')
ThrottlingException
Causes:
- Exceeded on-demand quota
- Too many concurrent requests
Solutions:
- Request quota increase
- Implement exponential backoff
- Consider provisioned throughput
ValidationException
Common issues:
- Invalid model ID
- Malformed request body
- max_tokens exceeds model limit
Debug:
# Check model-specific requirements
aws bedrock get-foundation-model \
--model-identifier anthropic.claude-3-sonnet-20240229-v1:0 \
--query 'modelDetails.inferenceTypesSupported'
References
Source
git clone https://github.com/itsmostafa/aws-agent-skills/blob/main/skills/bedrock/SKILL.mdView on GitHub Overview
Bedrock offers a unified API to access multiple foundation models (Claude, Titan, Llama, Mistral, Stable Diffusion) for generative AI tasks. It enables building AI apps with text generation, embeddings, and image generation, plus options to configure model access and support RAG workflows.
How This Skill Works
Bedrock exposes a single API to call different foundation models after you enable access in your account. Choose an inference type (On-Demand, Provisioned Throughput, or Batch Inference) and send a structured payload via CLI or SDK (boto3). You can also stream responses and generate embeddings using modelId like amazon.titan-embed-text-v2.
When to Use It
- Invoking a foundation model for text generation or analysis in a live app
- Building AI-powered applications that rely on generation, embeddings, or image generation
- Creating embeddings for search, similarity, or RAG pipelines
- Configuring which models are enabled in your AWS Bedrock account and managing access/EULA requirements
- Implementing RAG patterns by combining retrieval with Bedrock inference
Quick Start
- Step 1: Enable desired Bedrock models in the Bedrock console and accept any required EULAs
- Step 2: Select an inference type (On-Demand, Provisioned Throughput, or Batch) based on your workload
- Step 3: Call bedrock-runtime invoke-model (CLI) or the boto3 client to run generation or embedding tasks and process the response
Best Practices
- Enable desired models in the Bedrock console and complete any required EULA agreements
- Choose the correct inference type based on workload and cost: On-Demand, Provisioned Throughput, or Batch
- Use embeddings (e.g., Titan embed) for indexing, retrieval, and RAG readiness
- Test prompts and output safety; monitor latency and quotas across regions
- Document modelIds, versions, and access controls for reproducible automation using CLI or boto3
Example Use Cases
- Invoke Claude via bedrock-runtime to power a chat assistant
- Generate text embeddings with amazon.titan-embed-text-v2 for document similarity search
- Stream Claude responses in real time using invoke_model_with_response_stream
- Build a simple RAG pipeline by indexing documents with embeddings and querying Bedrock for generation
- Enable and manage model access (e.g., Claude, Titan) in the Bedrock console for a region