Introduction: The Challenge of Building Agentic Systems

When I started building agentic applications, I encountered a fundamental challenge: the landscape of agentic frameworks and libraries is evolving at breakneck speed. Each new project brought new tools - LangChain one month, Google ADK the next, then Agents SDK, and so on. Each framework had its own patterns, its own organizational philosophy, and its own way of thinking about agents. This constant churn made it difficult to maintain consistency across projects or transfer knowledge between them.

What I needed was an architecture that could sit above these rapidly changing implementations - an agreement layer that would let me maintain consistent patterns regardless of which framework we were using underneath. This led me to develop Clean Agentic Architecture (CAA), which adapts the proven principles of Clean Architecture specifically for the unique challenges of agentic systems and give it a concrete shape that I can confidently apply across projects.

The core insight of CAA is treating the agentic system as a pluggable, self-contained module that can be developed independently and connected to any application through well-defined interfaces. This means we can build our agentic logic separately from our main applications, swap out underlying frameworks without rewriting everything, and maintain consistent patterns across projects even as the tooling landscape shifts beneath us.

This guide presents CAA as I’ve developed it through practical experience across projects and frameworks. It’s an opinionated approach that provides not just architectural principles but concrete recipes for structure and implementation. While I believe it addresses common problems in agentic development, it’s important to note that this is my solution to the challenges I’ve faced, and other approaches may work better for different contexts.

The Problems CAA Addresses

Let me be specific about the problems that drove me to create this architecture, as they’re likely challenges you’re facing too:

Problem 1: Framework Chaos and Pattern Inconsistency

The agentic framework landscape changes monthly. I found myself using LangChain for one project, then switching to Google ADK for the next because it had better multi-agent support, then trying OpenAI Agents SDK because it works beautifully with OpenAI ecosystem. Each framework brought its own organizational patterns, its own abstractions, and its own best practices.

This constant switching meant I could never build expertise or reusable patterns. Worse, when I needed to maintain or extend older projects, I had to context-switch back to whatever framework and patterns that project used. I needed an architecture that could provide stability above this chaos - an agreement layer that would let me use any framework underneath while maintaining consistent patterns on top.

Problem 2: Unclear Logic Separation and Execution Patterns

In the frameworks I used, there was no clear separation between different levels of agent execution. Sometimes an agent would run solo, sometimes in a workflow, sometimes in complex hierarchical arrangements. The execution patterns - loops, sequential processing, parallel execution, conditional branching - would change based on the framework and the specific use case.

Without clear architectural boundaries, business logic would leak into agent implementations, orchestration logic would get tangled with individual agent behavior, and it became impossible to reason about the system’s behavior at different levels of abstraction. I needed clear layers that separated concerns and made explicit what belonged where.

Problem 3: Lack of Concrete Folder Structure

While some frameworks like Google’s ADK suggested cohesion-based organization (keeping related things together), none provided a concrete, complete folder structure that I could consistently apply. I’d start each project wondering: where do prompts go? How do I organize tools? What about shared components? Should I group by feature or by type?

This lack of structure meant every project organized things differently, making it hard to onboard team members or maintain consistency. I needed a concrete folder structure that embodied our architectural principles while remaining flexible enough to accommodate different use cases.

Problem 4: No Clear Definition of Agent Identity

What exactly constitutes an agent? Different frameworks had different answers. Is it just a prompt? A prompt plus tools? What about schemas, examples, and configuration? And in what level these are effecting identity really? Is prompt the same importance as the tools? Without a clear definition of what makes up an agent’s identity and how to structure it, I couldn’t build consistent patterns for agent development, testing, or reuse.

We needed to formalize what an agent is, what components constitute its identity in what level, and how these components should be organized and structured. This formalization would let us build, test, and reason about agents consistently regardless of the underlying implementation.

Problem 5: Unclear Boundaries with Host Applications

When integrating agentic features into existing applications, the boundaries were always fuzzy. Agent code would become entangled with application code. Business logic would leak into prompts. Application-specific concerns would pollute agent implementations. This made it impossible to reuse agentic systems across applications or to develop them independently.

I needed clear architectural boundaries that would let me develop agentic systems as independent modules that could plug into any application without contamination in either direction.

Core Architectural Principles

Through iterating on multiple projects, I’ve identified five principles that make CAA work. These aren’t universal truths but rather the opinionated choices that bring consistency to my agentic development.

CAA Invariants (Non-Negotiables)

Before diving into the principles, let’s establish the architectural invariants - the non-negotiable rules that define CAA. These are the constraints that make the architecture work and can be mechanically checked:

Core Invariants:

  • Dependencies flow inward only: infrastructure -> adapters -> use_cases -> domain. Never the reverse.
  • Workflows are the sole entrypoint: Host apps never call agents directly; workflows orchestrate all agent interactions.
  • Agents are stateless: No persistent reads/writes or session management inside agents.
  • State is externalized: Workflows orchestrate state interactions via ports; session lifecycle is owned by app adapters or a session service in infrastructure.
  • Ports over concretes: use_cases depend only on ports they define; adapters implement them.
  • Identity is composite: Agent = code + prompts + schemas + tools + config; versioned together as a unit.

Non-Goals:

CAA deliberately avoids prescribing:

  • Model/framework/vector store/tracing vendor choices
  • Streaming envelope formats (adapters define these)
  • Deployment topology (library mode is first-class, microservices optional)

Breaking an invariant requires documenting the decision in an Architecture Decision Record (ADR) that explains the scope and containment strategy.

Principle 1: Dependencies Flow Inward

We’ve adopted Clean Architecture’s fundamental rule: dependencies always point inward, never outward. Our domain entities know nothing about workflows. Workflows know nothing about specific agent implementations. Agents know nothing about infrastructure details or the frameworks we’re using underneath.

This principle creates stability in a world of change. When we switch from LangChain to ADK, only the infrastructure layer changes. When we modify an agent’s implementation, workflows remain untouched. When we change how workflows orchestrate agents, the agents themselves don’t need to know.

Principle 2: The Agentic System as a Pluggable Module

We treat the entire agentic system as a self-contained module with its own internal architecture. It connects to host applications through explicit adapters, maintaining clear boundaries and contracts. This pluggability is what makes CAA practical - we can develop our agentic logic independently, test it in isolation, and deploy it however makes sense for each project.

The agentic system doesn’t know if it’s being called from a web service, a CLI tool, or a mobile app. It doesn’t know if it’s deployed as a microservice or embedded as a library. These concerns are handled by infrastructure adapters, keeping the core agentic logic pure and reusable.

Principle 3: Cohesion Over Layer Separation in the File System

Here’s a crucial point: while we maintain strict architectural layers for dependency management, these layers don’t directly correlate with our folder structure. We prioritize keeping related things together in the file system even if they belong to different architectural layers.

An agent’s prompts might technically be configuration (infrastructure layer), its schemas might be interface definitions (adapter layer), and its core logic might be implementation (adapter layer), but we keep them all together in the agent’s folder because they collectively define the agent’s identity and behavior. The architectural layering is about dependencies and abstractions, not about physical file organization.

Principle 4: Workflows as the Unit of Use

We never use agents directly; they’re always orchestrated through workflows. Even the simplest case of a single agent performing a single task gets wrapped in a workflow. This consistency provides several benefits: we always have a place for orchestration logic, error handling, and transaction boundaries; external systems always interact with the same abstraction level; and we can evolve from simple to complex use cases without architectural changes.

Crucially, workflows orchestrate state interactions via ports rather than owning state directly. State lives in what we call the State Plane - managed by infrastructure components and accessed through well-defined interfaces. This separation keeps workflows focused on orchestration logic while maintaining flexibility in how state is actually persisted and retrieved.

Principle 5: Purposeful Abstraction with the Rule of Three

We only abstract when we have proven need, following the Rule of Three: first use is specific to one agent, second use remains duplicated, third use triggers extraction to shared. This prevents the premature abstraction that can make systems rigid and over-complicated while ensuring genuinely useful patterns get reused.

The Four Layers of Clean Agentic Architecture

CAA organizes code into four architectural layers just like the Clean Architecture itself. Remember, these layers define dependency relationships, not necessarily folder structure.

Layer 1: Domain Layer (The Core)

The domain layer contains pure business concepts that define what agents and workflows are in our system. This layer is completely framework-agnostic - it doesn’t know if we’re using LangChain, AutoGen, or something we built ourselves. It defines abstract concepts: What is an Agent? What is a Workflow? What is a Message?

These definitions are surprisingly thin - often just interfaces or abstract base classes. Their power comes not from what they contain but from what they establish: a common vocabulary that the rest of the system uses. When we say “Agent” anywhere in our system, we mean specifically what’s defined in this layer, regardless of how it’s implemented.

Layer 2: Use Case Layer (The Orchestration)

The use case layer contains our workflows - the actual business value our agentic system provides. Each workflow represents a complete use case like “provide customer support” or “analyze research papers.” This layer defines the ports (interfaces) that agents must implement to participate in workflows.

Ports are use-case-specific interface requirements - they represent what a workflow needs, not universal domain concepts. A customer support workflow might define ClassifierPort with a specific classify() method signature, a RetrieverPort for finding information, and a ResponderPort for generating responses. Each port is tailored to that workflow’s specific needs.

This is where we encode our business logic - not how individual agents work, but how they work together to achieve business goals. The workflow orchestrates these ports without knowing the concrete implementations behind them.

Layer 3: Interface Adapters Layer (The Implementation)

This layer contains our actual agent implementations with all their complexity. Each agent includes its implementation code, prompts that define its behavior, schemas that structure its inputs and outputs, tools it can use, subagents it employs, and adapters that let it plug into workflow ports.

This is where framework-specific code lives, but always behind abstractions. An agent might use OpenAI’s prompt templates internally, but it exposes only the abstract interface that workflows expect.

Layer 4: Infrastructure Layer (The Foundation)

The infrastructure layer handles all external interactions and provides the foundation that makes everything else possible. There are no limitations for what we could have in this layer, since its dependent on the capabilities we are looking for in our agents. But for me it almost always includes following components:

Agentic Framework Integration: This is where we wrap whatever framework we’re using (Google ADK, Langchain, CrewAI, OpenAI Agents SDK, etc.) behind clean interfaces. The rest of our system doesn’t know or care which framework is underneath. We can ofc also use our own custom implementations if needed.

Session and State Management: Components that manage conversation state, session persistence, and context passing. The State Plane architecture follows strict separation of concerns:

  • Use Case Layer: Defines ports for state access (e.g., SessionPort, MemoryPort, RetrievalPort) and consumes them via dependency injection. This layer does not own session lifecycle or persistence - it only orchestrates state interactions through ports.
  • Infrastructure Layer: Provides concrete implementations (SQLite/Redis/OpenAI Conversations API/vector DB/etc.) and a Session Service responsible for session creation, lookup, and retention policies. This is where framework-specific state management (like OpenAI Agents SDK’s OpenAIConversationsSession) is wrapped.
  • App Adapters: Map external identity (user ID, session token) to internal session context. They hydrate SessionContext before workflow execution and persist state deltas afterward. The adapter layer owns the session lifecycle from the external perspective.
  • Domain/Adapter Agents: Remain completely stateless. Any memory access agents need is mediated via tools or provided as inputs to their execution context - never through direct client connections or state reads.

This separation allows swapping between in-memory, file-based, cloud-hosted, or custom state implementations without touching workflow or agent code. State management becomes a deployment concern handled by infrastructure configuration.

Tracing and Observability: Integration with tools like LangSmith, Weights & Biases, Phoenix Arize or custom tracing solutions. Agentic systems are notoriously hard to debug, so comprehensive tracing is essential.

Prompt Management: Systems for versioning, testing, and deploying prompts. Prompts have a dual nature in CAA:

  1. Development/Definition Level: Prompts physically reside in adapters/agents/{agent_name}/prompts/ as the source of truth and version control anchor during development
  2. Runtime/Management Level: Infrastructure provides optional prompt management that can:
    • Load from local files (development/default)
    • Load from Prompt Management SaaS (e.g., Promptlayer, LangSmith, custom systems)
    • Perform A/B testing with variations
    • Enable hot-swapping without code deployment

The agent folder contains either the actual prompt files (system.md, examples.yaml), configuration pointing to external prompt IDs (prompt_config.yaml), or both (local as fallback, cloud as override). Infrastructure handles the resolution logic, determining whether to use local files or fetch from a prompt management system at runtime.

Application Adapters: The crucial adapters that make our system pluggable. These expose our workflows to the outside world through REST APIs, gRPC, message queues, or direct function calls. All the other items were mere suggestions and what I generally end up doing, but this is the essential part that makes CAA practical. Without these adapters, our agentic system would be isolated and unusable. With them, it can plug into any application architecture we need.

Folder Structure and Organization

Here’s the concrete folder structure we’ve developed. Note how it reflects our cohesion principle - keeping related things together even if they belong to different architectural layers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
agentic_system/
├── domain/                       # Layer 1: Core Business Concepts
   ├── entities/
      ├── agent.py              # Abstract Agent definition
      ├── workflow.py           # Abstract Workflow definition
      └── message.py            # Message types
   └── value_objects/
       ├── result.py             # Result types
       └── context.py            # Context passing

├── use_cases/                    # Layer 2: Business Logic
   └── workflows/
       ├── customer_support/
          ├── workflow.py       # Workflow implementation
          ├── ports/            # Required agent interfaces
             ├── classifier_port.py
             ├── retriever_port.py
             └── responder_port.py
          └── config.py         # Workflow configuration (type-safe)
       └── document_analysis/
           ├── workflow.py
           ├── ports/
              ├── extractor_port.py
              └── summarizer_port.py
           └── config.py

├── adapters/                     # Layer 3: Implementations
   ├── agents/
      ├── customer_classifier/
         ├── agent.py          # Agent implementation
         ├── adapter.py        # Implements workflow port
         ├── prompts/          # Agent-specific prompts
            ├── system.md
            ├── examples.yaml # Few-shot examples
            └── templates/    # Prompt templates
         ├── schemas.py        # Input/Output schemas
         ├── tools/            # Agent-specific tools
            └── sentiment_analyzer.py
         ├── subagents/        # Agent-specific subagents
            └── entity_extractor/
         └── config.yaml       # Agent configuration
      └── response_generator/
          ├── agent.py
          ├── adapter.py
          ├── prompts/
          ├── schemas.py
          └── config.yaml
   
   └── shared/                  # Promoted shared resources
       ├── tools/
          ├── web_search.py
          ├── database_query.py
          └── file_parser.py
       ├── subagents/
          └── translator/
              ├── agent.py
              ├── prompts/
              ├── schemas.py
              └── config.yaml
       ├── schemas/
          └── common.py
       └── prompts/
           └── templates/       # Shared prompt templates

├── infrastructure/              # Layer 4: External Interfaces
   ├── frameworks/              # Agentic framework wrappers
      ├── google_adk_adapter.py
      ├── openai_sdk_adapter.py
      └── crewai_adapter.py
   ├── persistence/
      ├── conversation_store.py
      ├── vector_db.py         # Vector database interface
      └── document_store.py
   ├── tracing/
      ├── phoenix_arize_tracer.py
      ├── openai_tracer.py
      └── custom_tracer.py
   ├── prompt_management/
      ├── version_control.py
      └── prompt_registry.py
   └── app_adapters/            # External interfaces
       ├── rest_adapter.py      # REST API with SSE support
       ├── websocket_adapter.py # WebSocket interface
       ├── grpc_adapter.py
       ├── message_queue_adapter.py
       └── direct_adapter.py    # Direct function calls

├── config/                      # Configuration
   ├── agents.py                # Agent registry and configs
   ├── workflows.py             # Workflow configs
   └── infrastructure.yaml      # Deployment-level config (YAML acceptable here)

└── tests/                       # Test suite (mirrors structure)
    ├── domain/
    ├── use_cases/
    ├── adapters/
    └── infrastructure/

Note that we keep tests in a separate top-level folder that mirrors the main package structure. This is my preference for separation of concerns, though some teams prefer keeping tests next to the code they test. Either approach works with CAA - the important thing is consistency within your project.

The Pluggable Architecture and Communication Patterns

One of CAA’s key strengths is its flexibility in how it connects to applications and communicates with users. Let’s explore how this works in practice.

The Middle Layer Concept

The infrastructure layer’s app_adapters directory serves as the middle layer that makes our agentic system pluggable. These adapters translate between external communication patterns and our internal workflow interfaces. This abstraction layer is what lets us develop agentic logic independently of how it will be deployed or accessed.

Streaming and Real-time Communication

Most agentic applications need streaming responses - users expect to see the AI’s response as it’s being generated, not wait for the complete response. CAA’s infrastructure adapters can support various streaming patterns:

Server-Sent Events (SSE): A REST adapter can implement SSE for streaming responses. When a client calls a workflow endpoint, it requests a streaming response, and the adapter streams back the agent’s output as it’s generated. This works well for web applications and is simpler than WebSockets for unidirectional streaming.

WebSockets: For applications needing bidirectional communication - perhaps the user wants to interrupt or provide additional context mid-generation - a WebSocket adapter maintains a persistent connection and allows for more complex interaction patterns.

The key architectural benefit is that the same workflow can support multiple communication patterns without any changes to the workflow or agent code. The infrastructure adapters handle the translation between internal message passing and external communication protocols.

Here’s how this flexibility works: a workflow generates messages internally as it processes. A REST/SSE adapter might stream these as JSON chunks over SSE. A WebSocket adapter might send them as WebSocket frames. A message queue adapter might publish them to a topic. The workflow remains agnostic to these implementation details.

Integration Patterns CAA Supports

The architecture is designed to support various deployment patterns, though the specific choice depends on your application needs:

Microservice with Streaming API: The agentic system can run as a separate service, exposing REST endpoints with SSE for streaming. Web applications make HTTP requests and receive streamed responses. This provides isolation and independent scalability.

Embedded Library with WebSocket Server: The agentic system can be embedded in the main application as a library, while spawning a WebSocket server for real-time communication with the frontend. This reduces operational complexity while maintaining responsive user experience.

Async Message Processing: For batch processing or non-real-time use cases, the agentic system can subscribe to message queues, process requests asynchronously, and publish results. This pattern fits document processing pipelines or scheduled analysis tasks.

Hybrid Approach: Some applications might combine multiple patterns - REST/SSE for simple queries, WebSockets for interactive sessions, and message queues for background processing. CAA’s layered design allows supporting these patterns simultaneously from the same codebase, with different infrastructure adapters handling each communication style.

Agent Identity and Composition

Let’s dive deeper into what constitutes an agent in CAA and how agents compose together.

Defining Agent Identity

In CAA, an agent’s identity consists of several components that together define its behavior and capabilities:

Core Implementation: The agent class that orchestrates everything else. This is typically thin, mainly coordinating between components.

Prompts: The system prompt that defines the agent’s role and behavior, task-specific prompts for different operations, and few-shot examples that demonstrate expected behavior. These are not just configuration - they’re core to the agent’s identity.

Schemas: Input schemas that define what data the agent accepts, and output schemas that structure the agent’s responses. These schemas serve as contracts and enable validation. Right now most APIs actually enforce these contracts at runtime so we mostly make use of that - and if so we can skip the validation logic in the agent itself.

Tools: External capabilities the agent can invoke. These might be API calls, database queries, or computational functions. This is what makes the agent more than just a prompt - it can interact with the world. MCP logic would also sit here. A tool is a big identity component since it alters the agent’s capabilities fundamentally.

Subagents: Other agents that this agent uses as specialized tools. This recursive composition is key to building sophisticated systems from simple components.

Configuration: Runtime settings that adjust behavior without changing identity - things like temperature, max tokens, or retry policies.

Together, these components fully define an agent. Change any of them, and you’ve fundamentally altered the agent’s behavior. This is why we keep them together in the folder structure - they’re not separate concerns but facets of a single identity.

The Power of Subagents

Subagents deserve special attention because they’re what make CAA systems truly powerful. A subagent is simultaneously a complete agent (with all the components listed above) and a tool from the perspective of its parent agent.

This duality enables sophisticated composition patterns. We can build a research agent that uses a search subagent for information gathering, a summary subagent for condensing information, and a fact-check subagent for verification. Each subagent is independently developed, tested, and optimized for its specific task.

The parent agent doesn’t need to understand how subagents work internally - it just invokes them like tools. This encapsulation keeps complexity manageable even as we build deep hierarchies of specialized capabilities.

The Promotion Pattern and Code Evolution

The promotion pattern governs how our code evolves from specific implementations to shared resources. This pattern has saved us from both premature abstraction and excessive duplication.

The Rule of Three in Practice

When we first need a capability, we implement it directly where it’s needed. This keeps the initial implementation simple and focused. When a second agent needs similar functionality, we resist the urge to immediately abstract. Instead, we duplicate the implementation, allowing each to evolve independently.

Only when a third agent needs the capability do we stop and extract it to shared. By this point, we understand the capability well enough to create a useful abstraction. We’ve seen the variations and requirements across multiple use cases.

This might seem inefficient, but it prevents a worse problem: premature abstractions that don’t quite fit any use case well. We’ve all worked with “generic” components that have so many configuration options and special cases that they’re harder to use than writing specific code.

Managing Shared Resources

Once promoted to shared, a component enters a different lifecycle phase. Changes require more consideration because multiple agents depend on it. We version shared components more carefully and maintain backward compatibility when possible.

For prompts specifically, I’ve found that parameterization often allows sharing without sacrificing specialization. A shared translator subagent might accept parameters for tone, domain, and formality, allowing different agents to use it while maintaining their unique requirements.

Implementation Strategies and Patterns

Through building multiple systems with CAA, I’ve developed strategies that consistently lead to success.

Starting a New Project

I always start with workflows, not agents. This might seem counterintuitive, but workflows represent the business value. Starting here ensures we’re building something useful, not just technically interesting. I ask: What does this system need to accomplish? What are the user journeys? What are the success criteria?

Once I have workflows defined, I identify the capabilities they need. These become our ports - the contracts that agents will fulfill. Only then do I implement agents, starting with the simplest possible implementations that satisfy the contracts.

Framework Migration

One of CAA’s greatest strengths is enabling framework migration without wholesale rewrites. When I decide to move a project from Google ADK to OpenAI SDK, the process is remarkably smooth. I create new infrastructure adapters for OpenAI SDK, update agent implementations to use OpenAI SDK’s patterns, but keep all workflows and domain logic untouched.

The external interfaces don’t change either - the REST endpoints, WebSocket protocols, and message queue integrations all remain the same. From the application’s perspective, nothing changes except the system becomes more capable.

Testing Strategy

CAA’s layered architecture enables comprehensive testing at multiple levels:

Unit Tests: Test individual agents with mock LLM providers and tools. We can verify prompt construction, response parsing, and error handling without making expensive API calls.

Integration Tests: Test workflows with mock agents that implement the required ports. This verifies orchestration logic, error handling, and state management.

Contract Tests: Verify that agents correctly implement their port contracts. These tests ensure that agents and workflows can work together.

End-to-End Tests: Test complete user journeys with real implementations. This verifies the entire system from external interface to agent behavior.

Agent Evals: Since all our agentic logic is separated into the adapters layer, we can run agent evaluations independently. We can test prompts, tools, and subagents in isolation, ensuring they meet quality standards before integrating them into workflows. This is crucial especially since agent behavior can be unpredictable and is non-deterministic.

The separate test folder structure we use mirrors the main codebase, making it easy to find tests for any component. Some teams prefer keeping tests next to the code they test, and CAA supports this approach too - it’s a matter of team preference.

Common Patterns and Architectural Considerations

Here are some patterns and design considerations for common scenarios in agentic systems:

Conversation Memory

Most agentic applications need conversation memory. In CAA, this should be implemented at the workflow level, not in individual agents. The workflow maintains conversation state and passes relevant context to agents as needed. This keeps agents stateless and simpler to test and reason about.

By keeping session management in the workflow layer (via the SessionPort interface discussed earlier), agents remain pure functions that take input and produce output without side effects. This makes testing straightforward - you can test an agent with various inputs without worrying about session state contamination.

Prompt Versioning

Prompts should be versioned alongside code in git as the source of truth. Additionally, the infrastructure layer can provide optional prompt management systems for runtime flexibility:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# infrastructure/prompt_management/prompt_resolver.py
class PromptResolver:
    """Resolves prompts from multiple sources with fallback logic"""

    async def resolve_prompt(
        self,
        agent_id: str,
        prompt_type: str,
        version: Optional[str] = None
    ) -> str:
        # 1. Try cloud management system (if configured)
        if self.cloud_enabled:
            prompt = await self.cloud_provider.get_prompt(
                agent_id, prompt_type, version
            )
            if prompt:
                return prompt

        # 2. Fall back to local file
        return self._load_local_prompt(agent_id, prompt_type)

This approach enables A/B testing and hot-swapping in production while maintaining local files as the canonical source for development and version control.

Configuration Management

CAA strongly recommends using Python for configuration rather than YAML or JSON, especially for workflow and agent configurations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# use_cases/workflows/customer_support/config.py
from dataclasses import dataclass
from typing import Optional

@dataclass
class WorkflowConfig:
    """Type-safe workflow configuration with IDE support"""
    max_retries: int = 3
    timeout_seconds: int = 30
    enable_fallback: bool = True
    priority_threshold: float = 0.8

    # Agent wiring - explicit and type-checked
    classifier_agent_id: str = "customer_classifier_v2"
    responder_agent_id: str = "gpt4_responder"

    @classmethod
    def for_production(cls) -> "WorkflowConfig":
        return cls(max_retries=5, timeout_seconds=60)

    @classmethod
    def for_development(cls) -> "WorkflowConfig":
        return cls(max_retries=1, timeout_seconds=10)

Benefits over YAML:

  • Type safety and validation at import time
  • IDE autocomplete and refactoring support
  • Programmatic composition and inheritance
  • Better version control diffs
  • No runtime parsing errors

YAML remains acceptable for:

  • Deployment-level infrastructure configuration
  • Environment variable mappings
  • CI/CD pipeline definitions

But application logic configuration should use Python dataclasses or Pydantic models for maximum safety and developer experience.

Conclusion: A Living Architecture

Clean Agentic Architecture represents my attempt to bring order to the chaos of building maintainable, scalable agentic systems in a rapidly evolving landscape. It provides stability above the chaos of changing frameworks while remaining flexible enough to adopt new tools and patterns as they emerge.

The key insights that make CAA valuable are:

  • Treating agentic systems as pluggable modules with clear boundaries
  • Separating architectural layers from folder organization
  • Defining clear agent identity and composition patterns
  • Providing concrete recipes, not just abstract principles

CAA is designed to support multiple projects with different frameworks, deployment patterns, and use cases. The architecture aims to maintain consistency, enable code reuse, and adapt to changing requirements without architectural rewrites.

That said, CAA is an opinionated approach based on my experiences and requirements. Your context will likely be different. What I hope this guide provides is not a prescription to follow blindly, but a detailed example of how to think architecturally about agentic systems. Take what works for your context, adapt what needs adapting, and share your learnings with the community.

The architecture will continue evolving as I build more sophisticated systems and learn from the community’s experiences. What matters is having a coherent structure that helps us build reliable, maintainable agentic systems that deliver real value. I believe CAA provides that structure, and I’m excited to see how others adapt and extend it for their own needs.

Building agentic systems is still a young discipline, and we’re all learning together. By sharing our approaches and learning from each other, we can collectively develop the patterns and practices that will make agentic development as mature and reliable as traditional software development. CAA is my contribution to that effort, and I look forward to seeing how the community builds upon it.