AWS Bedrock AgentCore — Serverless Platform for Production AI Agents

Posted on: 4/25/2026 1:13:40 AM

Building an AI Agent that works in a is one thing. Deploying that agent to production with millions of requests, enterprise-grade security, cross-session memory, and integration with existing business systems — that's the real challenge. Amazon Bedrock AgentCore was built to solve exactly this: a serverless platform providing every building block needed to run AI Agents at production scale.

3 API calls to deploy an agent to production
10 max collaborator agents per supervisor
6 safeguard policies in Guardrails
0 infrastructure to manage (serverless)

1. Why Do We Need a Dedicated Platform for AI Agents?

When building AI Agents, the "smart" part — calling LLMs, parsing results, invoking tools — typically accounts for about 20% of the effort. The remaining 80% is production concerns: security, authentication, memory management, monitoring, retry logic, rate limiting, and deployment at scale. This is why managed platforms like Bedrock AgentCore are becoming increasingly critical.

graph TB
    subgraph "AI Agent Development Effort"
        A["20% — Agent Logic
LLM, Tools, Prompts"] B["80% — Production Infrastructure
Security, Memory, Observability,
Scaling, Identity, Deployment"] end style A fill:#4CAF50,stroke:#fff,color:#fff style B fill:#e94560,stroke:#fff,color:#fff

Effort ratio: Agent logic vs Infrastructure

Production Reality

According to AWS, enterprise teams spend an average of 3-6 months building infrastructure around AI Agents before the agent can handle real workloads. Bedrock AgentCore reduces this timeline to days.

2. Amazon Bedrock AgentCore Architecture Overview

AgentCore isn't a single service — it's a suite of modular services, each addressing a specific aspect of the production AI Agent problem. You can use the entire suite or just the parts you need.

graph TB
    DEV["Developer / Agent Framework"]

    subgraph "Amazon Bedrock AgentCore"
        RT["Runtime
Serverless Execution"] GW["Gateway
Tool Access & Auth"] MEM["Memory
Session & Long-term"] ID["Identity
Agent Authentication"] OBS["Observability
Tracing & Metrics"] EVAL["Evaluations
Quality Scoring"] POL["Policy
Action Control"] BR["Browser
Web Interaction"] CI["Code Interpreter
Code Execution"] end DEV --> RT RT --> GW RT --> MEM RT --> ID RT --> OBS RT --> EVAL RT --> POL RT --> BR RT --> CI style DEV fill:#e94560,stroke:#fff,color:#fff style RT fill:#2c3e50,stroke:#fff,color:#fff style GW fill:#16213e,stroke:#fff,color:#fff style MEM fill:#16213e,stroke:#fff,color:#fff style ID fill:#16213e,stroke:#fff,color:#fff style OBS fill:#16213e,stroke:#fff,color:#fff style EVAL fill:#16213e,stroke:#fff,color:#fff style POL fill:#16213e,stroke:#fff,color:#fff style BR fill:#16213e,stroke:#fff,color:#fff style CI fill:#16213e,stroke:#fff,color:#fff

Modular architecture of Amazon Bedrock AgentCore

Service Function Problem Solved
Runtime Serverless environment with session isolation Deploy agents without managing servers
Gateway Unified tool access via OpenAPI specs Connect agents to APIs with automatic auth
Memory Session state + long-term memory Agent remembers context across sessions
Identity Agent authentication with IdPs (Okta, Entra ID) Secure agent auth with third-party services
Observability Distributed tracing, metrics, logs Debug and monitor agent behavior
Evaluations Continuous quality scoring Continuously assess response quality
Policy Fine-grained action control Control what agents are allowed to do

3. Runtime — Deploy Agents Without Infrastructure

AgentCore Runtime is the heart of the entire platform. It provides a serverless environment with session isolation — each conversation runs in a separate container, sharing no state with other sessions.

Managed Agent Harness (New in 2026)

The latest feature lets you deploy a complete agent with just 3 API calls — no orchestration infrastructure needed. The AgentCore CLI supports the entire development lifecycle from init, test, to deploy.

# Example: Deploy agent with AgentCore CLI
# Step 1: Initialize project
agentcore init my-support-agent --framework strands

# Step 2: Define agent logic
# agent.py using any framework (Strands, LangGraph, CrewAI...)

# Step 3: Deploy to AgentCore Runtime
agentcore deploy --name my-support-agent \
    --memory enabled \
    --guardrails my-guardrail-id \
    --identity-provider okta

The biggest differentiator of AgentCore Runtime: it's framework-agnostic. You can use any agent framework — Strands Agents SDK, LangGraph, CrewAI, AutoGen — and Runtime handles deployment, scaling, and monitoring.

4. Knowledge Base — Fully Managed RAG

Amazon Bedrock Knowledge Base provides a complete RAG (Retrieval-Augmented Generation) pipeline: from data ingestion, chunking, embedding, to vector search — all fully managed.

graph LR
    S3["S3 Bucket
Documents"] --> CHUNK["Auto Chunking
Semantic / Fixed"] CHUNK --> EMB["Embedding Model
Titan / Cohere"] EMB --> VS["Vector Store
OpenSearch / Pinecone"] Q["User Query"] --> AGENT["Bedrock Agent"] AGENT --> VS VS --> CTX["Retrieved Context"] CTX --> AGENT AGENT --> R["Grounded Response"] style S3 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style AGENT fill:#e94560,stroke:#fff,color:#fff style VS fill:#2c3e50,stroke:#fff,color:#fff style R fill:#4CAF50,stroke:#fff,color:#fff style CHUNK fill:#16213e,stroke:#fff,color:#fff style EMB fill:#16213e,stroke:#fff,color:#fff style Q fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style CTX fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50

RAG flow in Amazon Bedrock Knowledge Base

Chunking Strategies

Strategy Description When to Use
Fixed-size Split by fixed token count Uniform data, FAQ pages
Semantic Split by meaning, preserving context Long documents, technical docs
Hierarchical Multi-tier chunks (parent-child) Complex structured documents
No chunking Each file is one chunk Small files, one topic per file

5. Guardrails — Output Control and Security

Guardrails are the most critical defense layer in production. Amazon Bedrock Guardrails provides 6 safeguard policies to control both agent input and output.

graph LR
    INPUT["User Input"] --> G1["Content Filter"]
    G1 --> G2["Denied Topics"]
    G2 --> G3["Word Filter"]
    G3 --> G4["PII Redaction"]
    G4 --> G5["Prompt Attack Detection"]
    G5 --> LLM["LLM Processing"]
    LLM --> G6["Contextual Grounding"]
    G6 --> OUTPUT["Safe Response"]

    style INPUT fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style LLM fill:#2c3e50,stroke:#fff,color:#fff
    style OUTPUT fill:#4CAF50,stroke:#fff,color:#fff
    style G1 fill:#e94560,stroke:#fff,color:#fff
    style G2 fill:#e94560,stroke:#fff,color:#fff
    style G3 fill:#e94560,stroke:#fff,color:#fff
    style G4 fill:#e94560,stroke:#fff,color:#fff
    style G5 fill:#e94560,stroke:#fff,color:#fff
    style G6 fill:#e94560,stroke:#fff,color:#fff

6-layer Guardrails processing pipeline

Guardrails for Code

Starting in 2026, Guardrails expanded to protect code: detecting harmful content within code elements, blocking code injection attempts, and preventing PII leakage through code structures. This is crucial if you're building coding assistant agents.

Guardrails Configuration Example

{
  "name": "production-guardrail",
  "contentPolicyConfig": {
    "filtersConfig": [
      { "type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH" },
      { "type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH" },
      { "type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH" },
      { "type": "INSULTS", "inputStrength": "MEDIUM", "outputStrength": "HIGH" }
    ]
  },
  "topicPolicyConfig": {
    "topicsConfig": [
      {
        "name": "competitor-comparison",
        "definition": "Questions comparing products with competitors",
        "type": "DENY"
      }
    ]
  },
  "sensitiveInformationPolicyConfig": {
    "piiEntitiesConfig": [
      { "type": "EMAIL", "action": "ANONYMIZE" },
      { "type": "PHONE", "action": "ANONYMIZE" },
      { "type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK" }
    ]
  },
  "contextualGroundingPolicyConfig": {
    "filtersConfig": [
      { "type": "GROUNDING", "threshold": 0.7 },
      { "type": "RELEVANCE", "threshold": 0.7 }
    ]
  }
}

6. Multi-Agent Collaboration — Specialized Agent Teams

When a single agent can't handle complex tasks, Bedrock lets you build a team of agents that coordinate with each other. The supervisor-collaborator architecture supports up to 10 collaborator agents per supervisor.

graph TB
    USER["User Request"] --> SUP["Supervisor Agent
Orchestrate & Synthesize"] SUP --> A1["Agent 1
Customer Lookup"] SUP --> A2["Agent 2
Order Processing"] SUP --> A3["Agent 3
Inventory Check"] SUP --> A4["Agent 4
Payment Processing"] A1 --> DB["CRM Database"] A2 --> OMS["Order Management"] A3 --> WMS["Warehouse System"] A4 --> PAY["Payment Gateway"] A1 --> SUP A2 --> SUP A3 --> SUP A4 --> SUP SUP --> RESP["Consolidated Response"] style USER fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style SUP fill:#e94560,stroke:#fff,color:#fff style A1 fill:#2c3e50,stroke:#fff,color:#fff style A2 fill:#2c3e50,stroke:#fff,color:#fff style A3 fill:#2c3e50,stroke:#fff,color:#fff style A4 fill:#2c3e50,stroke:#fff,color:#fff style RESP fill:#4CAF50,stroke:#fff,color:#fff style DB fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style OMS fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style WMS fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style PAY fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50

Multi-Agent architecture with Supervisor pattern

Two Orchestration Modes

Supervisor with Routing

The supervisor acts only as a router — analyzing the query and directing it to the right specialized agent. Best when tasks are independent and don't require synthesizing results from multiple agents.

Supervisor with Orchestration

The supervisor breaks down the problem into parts, sends them to specialized agents, then synthesizes the results. Best for complex tasks requiring coordination across multiple data sources.

import boto3

bedrock = boto3.client('bedrock-agent')

# Create Supervisor Agent
supervisor = bedrock.create_agent(
    agentName='ecommerce-supervisor',
    instruction="""You are a supervisor orchestrating an e-commerce agent team.
    Analyze customer requests and coordinate specialized agents for:
    order lookup, inventory checks, and payment processing.""",
    foundationModel='anthropic.claude-sonnet-4-20250514'
)

# Associate collaborator agents
bedrock.associate_agent_collaborator(
    agentId=supervisor['agent']['agentId'],
    collaborationInstruction='Call this agent for customer information lookup',
    collaboratorId=customer_agent_id,
    collaboratorName='CustomerLookup',
    relayConversationHistory='TO_COLLABORATOR'
)

7. Identity — Authenticating Agents with External Systems

One of the biggest challenges when deploying AI Agents to production: how do you let agents access enterprise systems (Slack, GitHub, Jira, Salesforce) securely, without hardcoding credentials?

AgentCore Identity solves this by assigning agents their own identity and integrating with corporate Identity Providers (IdP).

graph LR
    AGENT["AI Agent"] --> ACID["AgentCore Identity"]
    ACID --> IDP["Corporate IdP
Okta / Entra ID / Cognito"] IDP --> TOKEN["OAuth Token"] TOKEN --> AGENT AGENT --> SLACK["Slack API"] AGENT --> GH["GitHub API"] AGENT --> JIRA["Jira API"] style AGENT fill:#e94560,stroke:#fff,color:#fff style ACID fill:#2c3e50,stroke:#fff,color:#fff style IDP fill:#16213e,stroke:#fff,color:#fff style TOKEN fill:#4CAF50,stroke:#fff,color:#fff style SLACK fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style GH fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style JIRA fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50

Agent authentication flow through AgentCore Identity

Best Practice: Principle of Least Privilege

Always grant agents the minimum required permissions. Use AgentCore Policy to restrict exactly which APIs the agent can call and on which resources. Never grant admin access to agents.

8. Memory — Agents with Cross-Session Memory

AgentCore Memory automatically manages two types of memory for agents:

Memory Type Scope Use Case
Session Memory Within a single conversation Maintain context of the current conversation
Long-term Memory Across multiple sessions Remember user preferences and interaction history
# Memory is automatically managed by AgentCore
# Agent can query memory via API

# Example: Agent remembers customer preferences
response = bedrock_runtime.invoke_agent(
    agentId='support-agent',
    sessionId='user-12345',
    enableTrace=True,
    memoryId='user-12345-memory',  # Long-term memory identifier
    inputText='I want to change my subscription plan'
)

# Agent automatically knows:
# - User has been on Premium plan since 2024
# - User has asked about downgrading twice before
# - User prefers Vietnamese communication

9. Gateway — Connecting Agents to Any API

AgentCore Gateway acts as a middleman between agents and external tools/APIs. Instead of each agent handling authentication, rate limiting, and error handling for every API — Gateway handles it all.

How Gateway Works

You define tools using OpenAPI specifications → Gateway understands the API contract → handles auth, validates requests/responses, auto-retries, and returns results to the agent as tool responses.

# Example OpenAPI spec for "get_order_status" tool
openapi: 3.0.0
info:
  title: Order Management API
  version: 1.0.0
paths:
  /orders/{orderId}:
    get:
      operationId: getOrderStatus
      summary: Get order status
      parameters:
        - name: orderId
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Order information
          content:
            application/json:
              schema:
                type: object
                properties:
                  status:
                    type: string
                    enum: [pending, processing, shipped, delivered]
                  estimatedDelivery:
                    type: string
                    format: date

10. Action Groups — Extending Agent Capabilities

Action Groups allow agents to perform specific actions by connecting to AWS Lambda functions or API endpoints. Each Action Group is a collection of tools the agent can invoke.

graph TB
    AGENT["Bedrock Agent"] --> AG1["Action Group: CRM"]
    AGENT --> AG2["Action Group: Inventory"]
    AGENT --> AG3["Action Group: Notifications"]
    AGENT --> KB["Knowledge Base"]

    AG1 --> L1["Lambda: searchCustomer"]
    AG1 --> L2["Lambda: updateCustomer"]
    AG2 --> L3["Lambda: checkStock"]
    AG2 --> L4["Lambda: reserveItem"]
    AG3 --> L5["Lambda: sendEmail"]
    AG3 --> L6["Lambda: sendSMS"]

    style AGENT fill:#e94560,stroke:#fff,color:#fff
    style AG1 fill:#2c3e50,stroke:#fff,color:#fff
    style AG2 fill:#2c3e50,stroke:#fff,color:#fff
    style AG3 fill:#2c3e50,stroke:#fff,color:#fff
    style KB fill:#16213e,stroke:#fff,color:#fff
    style L1 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style L2 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style L3 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style L4 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style L5 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style L6 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50

Agent using Action Groups to interact with systems

11. Observability — Monitoring Agents in Production

Agents aren't traditional APIs — output is non-deterministic, each request may go through multiple reasoning steps and tool calls. Observability in the agentic world requires tracing at the individual reasoning step level.

# Enable trace to monitor reasoning chain
response = bedrock_runtime.invoke_agent(
    agentId='my-agent',
    sessionId='session-001',
    enableTrace=True,
    inputText='Check order ORD-12345 and send an update to the customer'
)

# Trace output includes:
# 1. PreProcessingTrace - Input parsing & classification
# 2. OrchestrationTrace - Reasoning steps, tool selection
# 3. PostProcessingTrace - Response formatting
# 4. GuardrailTrace - Safety filter results
# 5. FailureTrace - Error details if any

CloudWatch Integration

AgentCore automatically sends metrics and traces to Amazon CloudWatch. You can create dashboards monitoring: average latency per step, tool invocation count, guardrail block rate, and LLM inference cost per session.

12. Comparison with Other AI Agent Platforms

Criteria Bedrock AgentCore Azure AI Foundry LangGraph Cloud
Deployment Fully serverless Container-based Managed hosting
Multi-model Claude, Llama, Mistral, Titan... GPT-4o, Phi, Llama Any LLM
Built-in RAG Knowledge Base (managed) Azure AI Search Self-integrated
Agent Identity AgentCore Identity + IdP Managed Identity Not built-in
Multi-Agent Supervisor + 10 collaborators Semantic Kernel orchestration Graph-based workflows
Guardrails 6 safeguard policies built-in Content Safety API Self-implemented
Protocols MCP, A2A, HTTP HTTP, gRPC HTTP
Lock-in Framework-agnostic Azure ecosystem LangChain ecosystem

13. Production Architecture: E-commerce Support Agent

To illustrate how these building blocks combine in practice, let's examine the architecture of a Customer Support Agent system for e-commerce.

graph TB
    CUST["Customer
Web / Mobile / Chat"] --> ALB["Application Load Balancer"] ALB --> API["API Gateway"] API --> RT["AgentCore Runtime"] RT --> SUP["Supervisor Agent"] SUP --> FAQ["FAQ Agent
+ Knowledge Base"] SUP --> ORD["Order Agent
+ Action Groups"] SUP --> ESC["Escalation Agent
+ SES / SNS"] RT --> MEM["AgentCore Memory"] RT --> GR["Guardrails"] RT --> OBS["CloudWatch
Observability"] RT --> AID["AgentCore Identity
→ Okta SSO"] FAQ --> S3["S3: Product Docs"] ORD --> DDB["DynamoDB: Orders"] ORD --> PAY["Stripe API"] ESC --> AGENT_DESK["Human Agent Queue"] style CUST fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style RT fill:#e94560,stroke:#fff,color:#fff style SUP fill:#2c3e50,stroke:#fff,color:#fff style FAQ fill:#16213e,stroke:#fff,color:#fff style ORD fill:#16213e,stroke:#fff,color:#fff style ESC fill:#16213e,stroke:#fff,color:#fff style GR fill:#e94560,stroke:#fff,color:#fff style MEM fill:#2c3e50,stroke:#fff,color:#fff style OBS fill:#2c3e50,stroke:#fff,color:#fff style AID fill:#2c3e50,stroke:#fff,color:#fff style ALB fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style API fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style S3 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style DDB fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style PAY fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style AGENT_DESK fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50

Production architecture: E-commerce Support Agent on Bedrock AgentCore

14. Production Best Practices

Agent Design
Each agent should own a single, clear responsibility. If an agent's instruction exceeds 500 words, split it into multi-agent. Agent instructions must be specific — avoid vague commands like "help the customer".
Guardrails First
Always set up Guardrails BEFORE exposing the agent externally. Start with restrictive policies, then relax gradually based on real data. Enable contextual grounding with threshold >= 0.7 to reduce hallucinations.
Observability from Day One
Enable trace for 100% of requests in the initial phase. Create CloudWatch alarms for: P99 latency > 10s, guardrail block rate > 15%, and error rate > 2%. Reduce sampling rate as the system stabilizes.
Memory and Privacy
Configure TTL for long-term memory aligned with your data retention policies. Use PII redaction in Guardrails to ensure memory doesn't store sensitive information.
Continuous Testing
Use AgentCore Evaluations to run continuous quality scoring. Build test suites covering edge cases: prompt injection, off-topic queries, and multi-step reasoning failures.

15. Cost and Pricing

Bedrock AgentCore uses a pay-per-use model, charging based on:

  • LLM inference: Billed per input/output tokens, varying by model (Claude Sonnet is cheaper than Claude Opus)
  • Knowledge Base: Embedding costs + vector store (OpenSearch Serverless)
  • Runtime: Compute costs based on active session time
  • Guardrails: Billed per text units processed
  • Memory: Storage costs for long-term memory

Cost Optimization Tips

Use Prompt Caching to reduce inference costs by up to 90% for repeated system prompts. Choose the right model — not every task needs Claude Opus; many routing/classification tasks only need Haiku. Combine with Intelligent Prompt Routing to automatically select the optimal model based on complexity.

Conclusion

Amazon Bedrock AgentCore represents the maturation of the AI Agent ecosystem — from demo stage to production-grade. With its modular architecture, you're not forced to use the entire platform — you can start with Runtime + Guardrails, then expand to Memory, Identity, and Multi-Agent as needs grow.

What makes AgentCore different isn't any single feature — Azure AI Foundry and LangGraph Cloud offer similar capabilities. The strength lies in everything being managed, serverless, and framework-agnostic — you focus on agent logic while AWS handles the production infrastructure.

References