Github Badge Co-authored with GPT-5-Pro Co-authored with Claude-4.5-Sonnet

Building a Production-Ready Agent Stack: Part 1 - The Foundation

Welcome to the first post in a series where we build a real, production-ready AI agent application from scratch. No shortcuts, no toy examples - just the patterns you’d actually use in production.


Why Another AI App Tutorial?

Well, I know that you’ve probably seen a dozen “build an AI chatbot” tutorials by now. Most of them show you how to slap together a quick demo in an afternoon, and they’re great for that. But when you try to take that demo to production, things get… complicated. At least it was for me.

Where do you put authentication? How do you stream responses so users don’t stare at a loading spinner for 30 seconds? What about session memory? Rate limiting? Credits? Deployment?

This series tackles all of that, but with a specific goal: we’re building a template you can actually use. I mainly built this codebase for myself to avoid reinventing the wheel every time I start a new agent project, then decided to share with the community. I like to think of it as “production-ready starter kit” for AI agent applications.

This isn’t just a tutorial - it’s an opinionated, minimal-yet-complete starting point for production agent applications. Think of it as a scaffold that:

  • Has all the production pieces in place (auth, streaming, persistence, deployment)
  • Remains small enough to understand fully
  • Takes a stance on how agentic applications should be structured
  • Can be cloned and customized for your specific use case

We’re building an agent stack where:

  • Users log in (securely, with Auth0)
  • They chat with AI agents that retains context
  • Responses stream in real-time, token by token, or tool calls are shown nicely as they do in research tools and ChatGPT etc. chatbots.
  • Usage gets tracked and metered
  • Everything runs in containers and deploys with one command
  • You can debug what’s happening in production

Sound ambitious? It is. But we’ll build it piece by piece, and by the end you’ll understand not just how to build it, but why each piece exists and how it fits together.

Info

This template is opinionated by design. We’re not trying to support every possible architecture — we’re showing you one that works well for production agent applications. Once you understand the patterns, you can adapt them to your needs.

Today, we’re starting with the foundation: the project structure, development environment, and tooling that makes everything else possible. But most importantly we will discuss the decisions behind the structure and architecture.

The Big Picture: What Are We Building?

Before we dive into code, let’s talk about what this system looks like when it’s done. I want you to see the full picture first—not just the pieces, but how they fit together and why each one matters.

Here’s the stack we’re building:

%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'16px'}}}%%
graph TB
    User[User Browser]
    Frontend["Frontend<br/><small>React + TypeScript + Vite</small>"]
    Auth0["Auth0<br/><small>Authentication</small>"]
    Backend["Backend API<br/><small>FastAPI + Python</small>"]
    AgentSDK["Agents SDK<br/><small>Agent Orchestration</small>"]
    DB[("Postgres<br/><small>Sessions & Messages</small>")]
    OpenAI["LLM API<br/><small>OpenAI/LiteLLM</small>"]

    User --> Frontend
    Frontend --> Auth0
    Frontend --> Backend
    Backend --> AgentSDK
    Backend --> DB
    AgentSDK --> OpenAI

    style User fill:#f0f4ff,stroke:#a5b4fc,stroke-width:2.5px,rx:15,ry:15,color:#1e293b
    style Frontend fill:#dbeafe,stroke:#93c5fd,stroke-width:2.5px,rx:15,ry:15,color:#1e3a8a
    style Auth0 fill:#fed7d7,stroke:#fca5a5,stroke-width:2.5px,rx:15,ry:15,color:#7f1d1d
    style Backend fill:#d1fae5,stroke:#6ee7b7,stroke-width:2.5px,rx:15,ry:15,color:#065f46
    style AgentSDK fill:#e9d5ff,stroke:#c084fc,stroke-width:2.5px,rx:15,ry:15,color:#581c87
    style DB fill:#bfdbfe,stroke:#60a5fa,stroke-width:2.5px,rx:15,ry:15,color:#1e3a8a
    style OpenAI fill:#ccfbf1,stroke:#5eead4,stroke-width:2.5px,rx:15,ry:15,color:#134e4a

Looks straightforward, right? Just four main components talking to each other. What matters isn’t just the components — it’s how they’re connected and what’s built around them.

Here’s what we’re building from three different perspectives:

For Your Users (The Experience Layer)

This is what people actually interact with:

  • Secure authentication via Auth0 - no passwords to manage, no security headaches for you. Auth0 is pretty commonly used in production apps, so it’s already a solid choice. But also it has a generous free tier for small projects (up to 25,000 monthly active users for free). And if you want to swap it out later, the codebase is structured so you can replace Auth0 with another provider without massive rewrites or code tangles.
  • Multiple chat sessions - users can organize conversations, switch between topics, keep context separate.
  • Real-time streaming - responses appear token by token, just like ChatGPT. Tool calls show up as they happen and are rendered nicely.
  • Credit-based usage - transparent costs, no surprise bills, users see their balance before they run out. We don’t have actual payment processing in this template, but the structure is there to add it.
  • Works everywhere - responsive design that works on desktop, tablet, and mobile

For You (The Developer Experience)

This is what makes the codebase pleasant to work with:

  • One command to start - make up spins up the entire stack (frontend, backend, database). No “install these 12 things first”
  • Hot-reload everything - change Python code, see it instantly. Change React code, see it instantly. No build steps in dev
  • Type-safe end-to-end - Python with mypy strict mode, TypeScript with strict mode. Catch bugs at compile time, not runtime
  • Migrations in version control - database schema changes are tracked with Alembic, reviewable in pull requests
  • Tests that actually pass - unit tests (fast, no I/O), integration tests (with real database), e2e tests (full stack)
  • Deploy with confidence - CI/CD pipeline that runs tests, builds containers, and deploys to production

The goal here is zero friction. You should spend time thinking about your agents, not fighting your tools.

For Production (The Operational Reality)

This is what keeps the system running reliably at scale:

  • Containers everywhere - same Docker images from dev to prod. No “works on my machine” surprises
  • Built-in observability - traces show you what agents are doing, logs tell you what went wrong, metrics tell you when to scale
  • Rate limiting - token-aware limits per user prevent abuse and runaway costs
  • Secret management - API keys and credentials stored properly (AWS Secrets Manager - or another secrets manager, not .env files in production)
  • Zero-downtime deploys - rolling updates, health checks, automatic rollback if something breaks
  • Cost tracking - every LLM call is metered, stored, and can be attributed to a user

This isn’t an afterthought. These pieces are wired in from the start, which is way easier than retrofitting them later.

The Decision: Why This Structure?

When I started building agent applications, I kept running into the same problems. This actually started all the way from framework selection. I tried using a bunch including Google-ADK, Autogen, CrewAI even LangFlow, but none gave me the satisfaction of using the OpenAI Agents SDK. I swear I have no ties to openai xD I will talk about the reasons I am so fond of it later. Let me walk you through the key decisions we made and why they matter.

Problem 1: The “Kitchen Sink” Approach

Frameworks like LangChain try to do everything: agent orchestration, vector stores, UI components, deployment. They’re fantastic for prototypes, but when you need to customize how agents hand off to each other, or change authentication providers, or swap databases, you’re fighting the framework’s opinions. This makes it hard to adapt to real-world production needs. Most projects I worked on were already built on a tech stack that needed customization from get go. Instead its better to make things modular so that you can pick the best tool for each job or swap components later.

Our approach: Use best-in-class tools for each layer. FastAPI for the API (it’s async, typed, and has great docs). React for the frontend (huge ecosystem, mature patterns) - TBH i don’t like React nor am good at it :) I would rather use Svelte myself, but given the popularity of React, it is what it is. OpenAI Agents SDK for agent orchestration (built by the people who make the models) - this makes the most sense especially if you are using OpenAI models but even if not the framework is just better overall (more on this later). Docker for containers (industry standard). This means a bit more wiring, but you control each piece and can swap components when needed.

Problem 2: The “Config Hell” Approach

Some frameworks — I’m looking at you, CrewAI… — lean heavily on YAML or JSON configurations. Want to change how an agent behaves? Edit three config files, restart the system, and hope you got the indentation right. Debugging means reading stack traces that point to generated code, not your configuration. This is a nightmare for complex logic.

Our approach: Code over config. Agents are Python files you can read, edit, and debug. Workflows are Python files that import agents directly. You get IDE autocomplete, type checking, breakpoints, and version control that actually shows meaningful diffs. Configuration is for environment-specific stuff (like API keys and database URLs), not behavior.

Tip

The “code over config” philosophy doesn’t mean zero configuration. It means using code for logic and configuration for environment. Your agent’s behavior should be in a Python file you can test. Your database connection string should be in an environment variable. If this bugs you, remember that I warned you this template is opinionated! :)

Bottom line: Code > Config…

Problem 3: The “Works on My Machine” Problem

I can’t count how many times I’ve seen repos that say “just install X, Y, Z and it should work.” But X needs Python 3.9 (you have 3.11), Y needs an older version of numpy, and Z… well, nobody’s sure why Z is even there. By the time you’ve wrangled the environment, you’ve lost an afternoon.

Our approach: Docker from day one. The same containers you run locally are what you deploy to production. No “works on my machine” surprises. No conda environments, no global npm installs. One command (make up) and you have a working system. I got into docker habit immensely last year when I learned more about it while working on my home server projects. It now just feels insane to me not to use it for any project.

Problem 4: The “Streaming Is Hard” Problem

Most LLM demos use simple request/response: send a message, wait, get the full answer. But in production, users don’t want to wait 30 seconds staring at nothing. They want to see the response being generated, like they do in ChatGPT.

Our approach: Server-Sent Events (SSE) for streaming. It’s simpler than WebSockets (I hate websockets) for one-way communication (server to client), works everywhere, and reconnects automatically. The OpenAI Agents SDK handles the complex part (streaming from the LLM), and we map those events to what the frontend needs (tokens, tool calls, completion).

Problem 5: The “Security Afterthought” Problem

So many tutorials add auth as a last step, if at all. But retrofitting security is painful — you end up changing every endpoint, every database query, every test. And you inevitably miss something (like forgetting to filter messages by user_id, leaking conversations between users).

Our approach: Authentication and authorization come early, right after we have a working API. It’s early enough that it’s not a massive refactor, but late enough that we understand what we’re protecting. Every database model has a user_id from the start. Every endpoint checks authentication. No retrofitting.

The Structure: How Agentic Applications Should Be Organized

This is where the template really matters. We’re not just building an app — we’re defining a structure that makes sense for production agent systems. Let me walk you through the directory layout and explain why each piece exists and how they work together.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
agents-sdk-prod-ready-template/
├── backend/
│   ├── app/
│   │   ├── api/              # HTTP routes (REST endpoints)
│   │   ├── agents/           # Agent definitions (one folder per agent)
│   │   ├── workflows/        # Multi-agent orchestration
│   │   ├── domain/           # Business logic (pure Python)
│   │   ├── persistence/      # Database models and repositories
│   │   └── core/             # Settings, security, database engine
│   ├── Dockerfile
│   └── pyproject.toml        # Python dependencies
├── frontend/
│   ├── src/
│   │   ├── api/              # Backend client (REST + SSE)
│   │   ├── auth/             # Auth0 integration
│   │   ├── components/       # React components
│   │   ├── pages/            # Top-level page components
│   │   └── store/            # State management
│   ├── Dockerfile
│   └── package.json          # Node dependencies
├── infra/
│   ├── docker-compose.yml    # Local development (3 services)
│   └── terraform/            # Production infrastructure
└── tests/
    ├── unit/                 # Fast, no I/O
    ├── integration/          # Require services (DB, API)
    └── e2e/                  # Full stack

This structure embodies a specific opinion about how agent applications should be built. Let me explain the critical decisions:

The Backend: Clean Architecture for Agents

Separation of Concerns (The Foundation):

The backend is split into clear layers, each with a single responsibility:

  • api/ handles HTTP concerns: routing, request validation, response serialization, status codes. This layer knows about FastAPI but doesn’t know about Postgres or agent logic.

  • domain/ contains business logic: session lifecycle, message handling, credit calculations. This is pure Python — no FastAPI imports, no SQLAlchemy imports. You can test it without starting a server or database.

  • persistence/ manages data access: ORM models, database queries, migrations. This layer knows about Postgres but doesn’t know about HTTP or business rules.

Why does this matter? Because when you need to change databases (ie Postgres to MongoDB), you only touch persistence/. When you need to change from FastAPI to Flask, you only touch api/. When you need to change business rules (like credit calculations), you only touch domain/. Changes don’t cascade.

Note

This is an implementation of “Hexagonal Architecture” (also called “Ports and Adapters”). The core domain logic is at the center, and infrastructure concerns (HTTP, database, external APIs) are at the edges. It’s a little more setup than throwing everything in one file, but it scales beautifully.

Agents and Workflows as First-Class Citizens:

Here’s where our structure gets opinionated about agent applications specifically:

Each agent lives in its own folder: agents/agent_<name>/

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
agents/
├── shared/
   ├── tools/           # Tools multiple agents can use
   └── types/           # Shared data structures
├── agent_checkout/
   ├── agent.py         # Agent definition (build function)
   ├── tools.py         # Agent-specific tools
   ├── schemas.py       # Structured output schemas
   └── prompts/
       └── system.md    # System instructions
└── agent_refund/
    ├── agent.py
    ├── tools.py
    ├── subagents/
       └── agent_lookup/
           ├── agent.py
           ├── tools.py
           └── prompts/
               └── system.md
    └── prompts/
        └── system.md

Why one folder per agent? Because agents are complex entities with prompts, tools, and configuration. Keeping them together makes it easy to understand what an agent does and to test it in isolation. The shared/ folder prevents duplication when multiple agents need the same tools or data structures.

A normal agent will have an agent.py, tools.py, and if we are using structured outputs, a schemas.py file. Apart from these, agents can also have subagents depending on its use case. A subagent is simply an agent that is only used by a parent agent. For example, if we have a “Support Agent” that handles customer support queries, it might have subagents like “Order Lookup Agent” and “Refund Processing Agent” to handle specific tasks. These subagents would live in their own folders within the parent agent’s folder. This keeps the subagent logic encapsulated and makes it clear they are not meant to be used standalone. If at any point we need to promote a subagent to a full agent, we can easily move it out.

If we are housing the prompts in the repo as well, there will be a prompts/ folder too for that agent. And prompts are stored as markdown files for better readability, separation of concerns, and easier versioning.

Tip

Another way of handling the prompts though is to use a prompt management system. For this, we can stay in OpenAI ecosystem and use OpenAI’s prompts system, or use a third party system like PromptLayer or Phoenix-Arize etc.

In this template we are storing prompts as markdown files in the repo but there is virtually no limitation to using a prompt management system instead. We also show an implementation of this in the repo as well.

I must admit though this structure is actually coming from Google-ADK’s recommended structure. Based on my view they nailed it! Though they still have a missing point in the structure which is what we fix and discuss next: workflows.

Workflows live in workflows/<name>/workflow.py and import agents directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# workflows/support_pipeline/workflow.py
from agents.agent_checkout import build_agent as build_checkout_agent
from agents.agent_refund import build_agent as build_refund_agent

def build_support_workflow():
    checkout_agent = build_checkout_agent()
    refund_agent = build_refund_agent()

    # Wire handoffs, define routing, set guardrails
    # ...

A workflow is any sort of orchestration between multiple agents. This could be as simple as routing user messages to different agents based on intent, or as complex as multi-step processes where one agent’s output feeds into another’s input. In OpenAI Agents SDK, there is no formal “workflow” construct like ADK had. Meaning it doesn’t give you blocks for “Run this agent after this” or “Run this agents in parellel” etc. But there is also no need for these technically because everything is just Python code. So you can implement any sort of workflow logic you want using normal Python functions and classes. This gives you ultimate flexibility.

ADK format was using these workflow logic just as they do an agent. So you would have the orchestration coupled with agentic logic. I found this to be a bad idea as it mixes two different concerns. So in our structure we separate workflows into their own folder. This way, agents focus on “what to do” and workflows focus on “how to coordinate” only.

A workflow also handles the “handoff” logic for the agents. We never import one agent into another. Instead, the workflow imports both agents and wires them together. This keeps agents decoupled and reusable.

The Frontend: Simple and Focused

The frontend structure is intentionally minimal:

  • api/ - Client for talking to the backend (REST + SSE wrapper)
  • auth/ - Auth0 integration (login, logout, token management)
  • components/ - Reusable UI components (ChatWindow, SessionList, MessageBubble)
  • pages/ - Top-level page components (Login, Dashboard, Chat)
  • store/ - State management (sessions, messages, user)

We’re not using a complex state management library (like Redux) because we don’t need it. The state is simple: current user, list of sessions, list of messages in current session. React’s built-in state and context are enough.

The critical piece is the SSE client in api/. This is where we consume the streaming events from the backend and turn them into UI updates. It’s the most “agent-specific” part of the frontend.

Infrastructure and Testing

Centralized Testing:

All tests live in one /tests directory that mirrors the source structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
tests/
├── unit/
│   ├── agents/
│   │   └── agent_checkout/
│   ├── domain/
│   └── persistence/
├── integration/
│   └── api/
└── e2e/
    └── full_flow/

Why centralized? Because it makes CI simpler (one command runs all tests), makes coverage reports meaningful, and makes it obvious where tests live. Some projects scatter tests next to source files (agent.py and agent_test.py in the same folder). I always thought that centralizing them reduces confusion and makes it easier to run subsets (“just run unit tests” vs “run everything”).

Infrastructure as Code:

The infra/ folder contains everything needed to run the system:

  • docker-compose.yml for local development (3 services: db, backend, frontend)
  • terraform/ for cloud resources (compute, database, secrets, DNS)
  • .github/workflows/ for CI/CD (lint, test, build, deploy)

Everything is versioned. Everything is reviewable. You can see the history of infrastructure changes just like code changes.

This is the final piece of our “minimal but complete” philosophy: we give you the deployment story, not just the app code.

Building the Foundation: The Setup

Alright, enough philosophy. Let’s build something!

We’re starting with the foundation — the pieces that make everything else possible:

  1. The project skeleton (directories, files)
  2. Python tooling (uv, ruff, mypy)
  3. Node/React with Vite - the initial setup
  4. Docker Compose for local development
  5. Environment configuration
  6. Development scripts (Makefile)

This might seem like a lot of setup before writing “real” code, but trust me—investing time here will save you a lot of frustration later.

Info

All the code we’re building today is available in the repository. You can follow along by cloning it, or use this as a reference while building your own version. Or just skip it altogether if you trust me that it works :)

I used different branches for each post in the series so you can see the incremental changes. Today’s code is in the part-1-foundation branch.

Here is the link to the branch.

Python Setup with uv

Info

Skip this section if: You’re already familiar with Python dependency management tools.

Skip to Code Quality →

We’re using uv for Python dependency management. Why not Poetry or pip?

  • uv is 10-100x faster. Seriously. Installing dependencies that take a minute with pip take seconds with uv. It uses a Rust-based resolver and caches aggressively.
  • uv uses standard pyproject.toml. If you decide to switch to Poetry later, it’s easy. The file format is the same.
  • uv handles Python versions. Need Python 3.11? uv python install 3.11. Done.

I was an avid pip user before I moved to Poetry 5-6 years back. And last year I discovered uv and switched to it immediately. It is just so much faster and the switch is very painless, highly recommended! But if for some reason you don’t want to use uv, you can easily adapt the instructions to Poetry or pip.


Let’s start by creating the backend folder and initializing uv:

1
2
3
mkdir backend
cd backend
uv init

Now, let’s define our dependencies in pyproject.toml.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# backend/pyproject.toml
[project]
name = "agents-sdk-prod-ready-template"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "fastapi>=0.109.0",
    "uvicorn[standard]>=0.27.0",
    "sqlalchemy[asyncio]>=2.0.25",
    "asyncpg>=0.29.0",
    "alembic>=1.13.0",
    "pydantic>=2.5.0",
    "pydantic-settings>=2.1.0",
]

[project.optional-dependencies]
dev = [
    "ruff>=0.1.11",      # Linter and formatter (replaces flake8, black, isort)
    "mypy>=1.8.0",       # Type checker
    "pytest>=7.4.4",     # Testing framework
    "httpx>=0.26.0",     # For testing API clients
]

These are some sensible version defaults (versions) I got as of today, but feel free to adjust as needed. As long as you use uv, it will resolve dependencies quickly so no worries about conflicts.

Code Quality: Linting and Type Checking

Info

Skip this section if: You’re familiar with Python linting and type checking tools.

Skip to FastAPI Boilerplate →

We’re setting up ruff for linting/formatting and mypy for type checking.

Why ruff? Python has a fragmented ecosystem for code quality. You’ve probably seen projects with black (formatting), flake8 (linting), isort (import sorting), and maybe pylint thrown in. That’s four tools, four configs, and four places where your CI can fail. Ruff combines all of this into one blazingly fast Rust-based tool. It runs 10-100x faster than the competition and gives you one config file instead of four. I used to use flake8 + black + isort combo for years, but once I switched to ruff, I never looked back (I sense a pattern here :)).

Why mypy? Python’s dynamic typing is great for prototyping but dangerous in production. When you’re handling user credits, streaming agent responses, and managing database transactions, you want the compiler to tell you “this function expects a SessionID but you’re passing a str” before your users find out. Mypy with strict mode is how you get that safety.

Note

Type checking isn’t just for catching bugs — it’s documentation that stays up to date. When a new developer looks at def process_run(session: SessionID, user: User) -> RunResult:, they know exactly what the function expects and returns. No guessing, no digging through implementation details.

Let’s configure both tools in pyproject.toml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# backend/pyproject.toml
[tool.ruff]
line-length = 100  # Readability sweet spot; 88 is too cramped, 120 gets unwieldy
select = ["E", "F", "W", "C", "N", "B", "I"]  # Enable common error categories
ignore = ["E501"]  # Let the formatter handle line length
exclude = ["__pycache__", "build", "dist", ".venv"]

# Automatically fix issues where possible (import sorting, trailing whitespace, etc.)
fix = true
show-fixes = true

[tool.mypy]
python_version = "3.14"
strict = true  # Enable all strict checks at once
warn_unused_configs = true
disallow_untyped_defs = true  # Every function must have type hints
check_untyped_defs = true
no_implicit_optional = true  # Optional[T] must be explicit, not inferred from None
warn_redundant_casts = true
warn_unused_ignores = true
warn_return_any = true  # Catch functions that return Any (a type safety hole)

# Tell mypy to follow imports and check third-party libraries
follow_imports = "normal"
disallow_untyped_calls = true

[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false  # Relax for tests; we care more about coverage

Let me break down the key choices:

Ruff configuration:

  • line-length = 100: This is opinionated. Black uses 88, but I find 100 strikes a better balance between readability and fitting complex FastAPI endpoint signatures on one line.
  • select = ["E", "F", "W", "C", "N", "B", "I"]: These are error codes for pycodestyle errors (E), pyflakes (F), warnings (W), complexity (C), naming conventions (N), bugbear (B), and import sorting (I). You’re getting the equivalent of flake8 + isort in one tool.
  • fix = true: Ruff will automatically fix issues like import sorting and trailing whitespace on save. This eliminates bikeshedding in code reviews.

Mypy configuration:

  • strict = true: This is the nuclear option. It enables every type checking rule mypy has. You’ll get errors for missing type hints, returning Any, or unsafe casts. This feels painful at first but pays off when you’re refactoring agent logic at scale.
  • disallow_untyped_defs = true: Every function needs type hints. Period. When you’re streaming tokens, managing sessions, and tracking credits, you don’t want ambiguity about what types flow through your system.
  • no_implicit_optional = true: If a parameter can be None, you must write Optional[T]. This catches bugs where you assume a value exists but it’s actually None at runtime (classic NoneType error in production).
  • warn_return_any = true: Returning Any defeats the purpose of type checking. This warns you when a function’s return type is too loose, which often happens when integrating with third-party libraries.

Tip

If you’re adding type hints to an existing codebase, start with strict = false and enable rules incrementally. For a new project like this template, going strict from day one is the right move — you’ll never have to retrofit types later.

The [[tool.mypy.overrides]] section at the end relaxes rules for tests. In test files, we care more about coverage and readability than perfect type safety. It’s fine if a test helper function doesn’t have complete type hints—the production code is what matters.

When building agent systems with the OpenAI Agents SDK, you’re juggling complex types: StreamedEvent, RunResult, SessionID, custom tool schemas, and Pydantic models for your database. Mypy catches mismatches before they become production incidents. Ruff ensures your code is consistent and readable when onboarding new team members or revisiting agent logic six months later.

These tools run in CI (we’ll set that up shortly), so every pull request gets checked automatically. No “it worked on my machine” surprises.

Additional Resources

FastAPI Boilerplate: Your First Endpoint

Now for the fun part — let’s write some actual code. We’re starting with FastAPI as our backend framework. If you’ve used Flask before, FastAPI will feel familiar but with superpowers: automatic validation, async support out of the box, and OpenAPI docs that generate themselves.

Why FastAPI over Flask or Django? Three reasons:

  1. Native async support: When you’re streaming agent responses or making multiple LLM calls in parallel, you need async. Flask bolted on async support in 2.0, but FastAPI was built for it from day one.
  2. Pydantic integration: FastAPI uses Pydantic for request/response validation. This means your API contracts are enforced automatically — send malformed JSON and you get a clear error before your handler runs.
  3. Auto-generated docs: Every endpoint you write shows up in interactive Swagger UI at /docs. No manual API documentation needed. This is a game-changer when working with frontend developers or building integrations.
  4. Simplicity and performance: FastAPI is lightweight and fast, making it ideal for high-throughput applications like agent systems.

Let’s write our first endpoint:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# backend/app/main.py
from fastapi import FastAPI


app = FastAPI(
    title="Agent Stack Backend",
    version="0.1.0",
    description="Production-ready backend for OpenAI Agents SDK applications"
)


@app.get("/health")
async def health() -> dict[str, str]:
    """Health check endpoint for monitoring and load balancers."""
    return {"status": "ok"}

This looks simple, but there’s a lot happening here:

  • async def health(): This is an async endpoint. FastAPI will run it on the event loop, which means it won’t block other requests. When you’re handling 100+ concurrent agent sessions, this matters.
  • -> dict[str, str]: Type hint for the response. FastAPI uses this to generate OpenAPI schema and validate your response at runtime (if you enable response validation).
  • Docstring: Shows up in the auto-generated docs. Write these for every endpoint — your future self will thank you.

Note

The async def keyword is important even for simple endpoints. FastAPI can handle both sync and async functions, but if you define a sync function, it runs in a thread pool which has overhead. For database queries, LLM calls, or any I/O, always use async def.

Now let’s start the server:

1
2
3
cd backend
uv sync  # Install dependencies if you haven't already
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

You’ll see output like this:

1
2
3
4
5
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [12345] using WatchFiles
INFO:     Started server process [12346]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

The --reload flag is critical during development — it auto-restarts the server when you change code. Uvicorn uses watchfiles under the hood (another Rust-based tool) for blazing fast reloads.

Open your browser and hit these URLs:

  • http://localhost:8000/health - You’ll see {"status":"ok"}
  • http://localhost:8000/docs - Interactive API documentation (Swagger UI)
  • http://localhost:8000/redoc - Alternative API docs (ReDoc, which I find prettier)

Tip

The auto-generated docs at /docs aren’t just for show. You can test endpoints directly from the browser, inspect request/response schemas, and even download the OpenAPI spec. When you’re debugging agent runs or testing credit deduction logic, this beats using curl or Postman. I keep this tab open constantly during development.

Key benefits we get from this setup:

  • Health checks: The /health endpoint is what load balancers and orchestrators (Kubernetes, ECS) use to determine if an instance is ready to serve traffic
  • Type safety: FastAPI validates return types at runtime - if you return the wrong type, you’ll catch it immediately
  • Async from the start: No refactoring needed when we add streaming endpoints later
  • OpenAPI schema: Auto-generated at /openapi.json for type-safe frontend clients

Additional Resources

Next steps:

This health check endpoint is just the skeleton. In the next sections, we’ll add:

  • Database integration (PostgreSQL + SQLAlchemy)
  • Authentication and user management
  • Agent streaming endpoints (the real meat of the application)
  • Credit tracking and rate limiting
  • Proper error handling and logging

But for now, you have a working FastAPI server with auto-generated docs, type safety, and async support. That’s a rock-solid foundation to build on.

Frontend with Vite: Modern React Development

Info

Skip this section if: You’re familiar with Vite and modern React tooling.

Skip to Creating Dockerfiles →

Time to set up the frontend. We’re using Vite as our build tool and development server.

Why Vite over Create React App?

Create React App was the standard for years, but it’s showing its age. The development server takes forever to start, hot module replacement is slow, and the build process uses webpack under the hood (which is powerful but complex). Vite takes a different approach:

  1. Native ESM in development: Vite serves your code as native ES modules. No bundling during development means the dev server starts instantly — even on large projects. CRA bundles everything upfront, which means 30-60 second startup times on big codebases. Vite? Under 2 seconds, always.

  2. Lightning-fast HMR: Change a React component and see it update in the browser in milliseconds. Vite’s HMR is so fast it feels like you’re editing the page directly. This matters when you’re iterating on UI stuff and you want tight feedback loops. Ofc this don’t matter much for our simple template project, but we are thinking big here.

  3. Optimized production builds: Vite uses Rollup under the hood for production builds, which generates smaller, more efficient bundles than webpack. Smaller bundles = faster page loads for your users.

  4. No ejecting required: With CRA, if you need custom configuration, you either eject (and maintain all the build tooling yourself) or use workarounds like CRACO. Vite’s config is simple and transparent from day on — it’s just a JavaScript file. TBH this is the selling point for me :) I hate reacts complex build tooling.

TypeScript strict mode from the start:

We’re using TypeScript with strict mode enabled. I know, I know — TypeScript can feel like overkill for simple UIs. But when you’re building agent applications, your frontend is managing complex state:

  • Streaming events from Server-Sent Events
  • Message history with nested objects (text, tool calls, errors)
  • Session metadata (created_at, updated_at, message count)
  • User credits and rate limiting

Without types, you’ll spend hours debugging “Cannot read property ‘X’ of undefined” errors. With types, your IDE tells you exactly what’s available and catches errors as you type. Also as I mentioned before, we think big regardless of this simple template project.


Let’s create the frontend:

1
2
3
cd frontend
npm create vite@latest . -- --template react-ts
npm install

Note

I am skipping the folder creation steps in the commands cuz its assumed you can just create the folders as needed. So just focus on the commands relevant to each section. (Or just create the structure all together from the beginning based on the structure we discussed.)

Here I am selecting the rolldown-vite for the build, you can not choose it ofc but why would you not? :)

This scaffolds a React + TypeScript project with Vite. You’ll get a basic structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
frontend/
├── src/
│   ├── App.tsx           # Main app component
│   ├── App.css           # App styles
│   ├── main.tsx          # Entry point
│   ├── assets/           # Static assets (images, fonts)
│   ├── index.css         # Global styles
│   └── vite-env.d.ts     # Vite type definitions
├── index.html            # HTML template
├── package.json          # Dependencies
├── tsconfig.json         # TypeScript config
└── vite.config.ts        # Vite config
<!-- Rest of the defaults don't matter - just leave them as is for now -->

That’s all we need for Part 1! The default Vite structure is fine for now. We’ll build out the full frontend architecture (components, pages, API clients, state management) in Part 5 when we implement the agent UI. For now, we just need a working dev server that we can containerize.

Configuring path aliases (optional but recommended):

One quick improvement: set up path aliases so you can write @/components/Button instead of ../../../components/Button later.

Update vite.config.ts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// frontend/vite.config.ts
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import path from 'path'

export default defineConfig({
  plugins: [react()],
  resolve: {
    alias: {
      '@': path.resolve(__dirname, './src')
    }
  },
  server: {
    host: '0.0.0.0',  // Allows Docker to access dev server
    port: 5173,
  }
})

And update tsconfig.json:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// frontend/tsconfig.json
{
  "compilerOptions": {
    // ... existing config ...
    "baseUrl": ".",
    "paths": {
      "@/*": ["./src/*"]
    }
  }
}

Note

The server.host configuration is important for Docker. It makes Vite accessible from outside the container. We’ll use this when we set up Docker Compose next.

Start the development server:

1
npm run dev

You’ll see:

1
2
3
4
5
  ROLLDOWN-VITE v7.1.14  ready in 332 ms

  ➜  Local:   http://localhost:5173/
  ➜  Network: use --host to expose
  ➜  press h + enter to show help

Open http://localhost:5173 and you’ll see the default Vite + React landing page with the spinning Vite logo. Not exciting yet, but notice how fast that startup was. On a comparable CRA project, you’d still be waiting for webpack to bundle.

That’s it for the frontend in Part 1! We have a working dev server with hot module replacement, TypeScript support, and path aliases configured. In Part 5, we’ll come back and build out the full agent UI with components, state management, SSE streaming, and all the bells and whistles.

Additional Resources

Creating Dockerfiles

Info

Skip this section if: You’re comfortable writing Dockerfiles and understand layer caching.

Skip to Docker Compose →

Before we can use Docker Compose, we need Dockerfiles for our backend and frontend.

Backend Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# backend/Dockerfile
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# Copy dependency files first (for better caching)
COPY pyproject.toml ./

# Install uv
RUN pip install --no-cache-dir uv

# Install Python dependencies
RUN uv sync --no-dev

# Copy application code
COPY app/ ./app/

# Expose port
EXPOSE 8000

# Run the application
CMD ["uv", "run", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Let me break down what’s happening here:

Base image choice:

1
FROM python:3.11-slim

We use python:3.11-slim instead of the full python:3.11 image. The slim variant is much smaller (100MB vs 900MB) because it excludes unnecessary build tools and libraries. This means faster builds, faster deployments, and lower storage costs.

System dependencies:

1
2
3
4
RUN apt-get update && apt-get install -y \
    gcc \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*
  • gcc: Required by some Python packages that compile C extensions (like asyncpg)
  • postgresql-client: Useful for debugging (you can run psql inside the container)
  • rm -rf /var/lib/apt/lists/*: Cleans up apt cache to keep the image small

Dependency caching:

1
2
3
COPY pyproject.toml ./
RUN uv sync --no-dev
COPY app/ ./app/

This order is critical for Docker layer caching. Docker caches each instruction as a layer. If nothing changes in a layer, Docker reuses the cached layer instead of rebuilding.

By copying pyproject.toml first and installing dependencies, we ensure that layer is cached. When you change application code (which happens constantly), Docker only rebuilds the COPY app/ layer and later layers—not the expensive dependency installation layer.

Production optimization:

1
RUN uv sync --no-dev

The --no-dev flag skips development dependencies (pytest, ruff, mypy). In production, you don’t need testing or linting tools—only the code needed to run the app. This keeps the image smaller and more secure.

Note

In development, we override this CMD in docker-compose.yml to add the --reload flag. This way, the same Dockerfile works for both dev and prod—we just change the command at runtime.

Frontend Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# frontend/Dockerfile
FROM node:20-alpine

# Set working directory
WORKDIR /app

# Copy dependency files first (for better caching)
COPY package*.json ./

# Install dependencies
RUN npm ci

# Copy application code
COPY . .

# Build the application (for production)
RUN npm run build

# Expose port
EXPOSE 5173

# In development, we override this with the dev server
CMD ["npm", "run", "preview"]

Breaking this down:

Alpine base:

1
FROM node:20-alpine

Alpine Linux is a minimal distribution designed for containers. node:20-alpine is ~120MB compared to ~1GB for the full node:20 image. Alpine uses musl libc instead of glibc, which is lighter weight.

npm ci vs npm install:

1
RUN npm ci

npm ci (clean install) is faster and more reliable than npm install in CI/CD and containers:

  • Deletes node_modules before installing (ensures clean state)
  • Installs exact versions from package-lock.json (reproducible builds)
  • Fails if package.json and package-lock.json are out of sync
  • 2-3x faster than npm install

Build step:

1
RUN npm run build

This compiles TypeScript, bundles with Vite, and optimizes assets. The result goes in dist/. In production, you’d serve this dist/ folder with nginx or a CDN. In development, we override the CMD to run npm run dev instead.

Development vs production:

The Dockerfile is written for production (build artifacts, optimized bundles). In docker-compose.yml, we override the command for development:

1
2
3
# docker-compose.yml (we'll create this next)
frontend:
  command: npm run dev -- --host 0.0.0.0

This runs the Vite dev server instead of serving the build output.

Tip

Multi-stage builds for production: In a real production setup, you’d use a multi-stage Dockerfile for the frontend:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

This creates a tiny final image (20MB) with just nginx and your built assets. The node installation and source code are discarded after the build. We’ll cover this pattern in Part 6 (Deployment).

Testing the Dockerfiles

Before using Docker Compose, verify the Dockerfiles work individually:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Build backend image
cd backend
docker build -t agent-backend .

# Build frontend image
cd ../frontend
docker build -t agent-frontend .

# Verify images were created
docker images | grep agent

You should see both images listed with their sizes. If the build fails, check for:

  • Typos in Dockerfile commands
  • Missing files (make sure pyproject.toml, package.json exist)
  • Network issues (Docker needs to download base images and dependencies)

Warning

Common Dockerfile mistakes to avoid:

  1. Not using .dockerignore: Create a .dockerignore file to exclude unnecessary files from the build context:

    1
    2
    3
    4
    5
    6
    7
    
    # backend/.dockerignore
    __pycache__
    *.pyc
    .venv
    .pytest_cache
    .mypy_cache
    .ruff_cache
    
    1
    2
    3
    4
    
    # frontend/.dockerignore
    node_modules
    dist
    .vite
    

    Without this, Docker copies everything to the build context, slowing builds and potentially including secrets.

  2. Running as root: For production, you should create a non-root user in the Dockerfile. We’ll cover this in Part 6.

  3. Installing dependencies every time: Always copy dependency files (pyproject.toml, package.json) before copying source code. This leverages Docker’s layer caching.

Now that we have Dockerfiles, we’re ready to orchestrate all three services with Docker Compose.

Additional Resources

Docker Compose: Orchestrating the Full Stack

Info

Skip this section if: You’re comfortable with Docker Compose service definitions, health checks, and volume mounts.

Skip to Configuration →

This is where everything comes together. We’ve set up the backend (FastAPI + Python), the frontend (Vite + React), and now we’re going to run them together with Docker Compose. This is the secret sauce that eliminates “works on my machine” problems and makes onboarding new developers trivial.

Why Docker Compose?

You could run each service manually: start Postgres in one terminal, start the backend in another, start the frontend in a third. But that’s annoying, error-prone, and hard to document. Docker Compose lets you define all services in one file and start them with a single command.

More importantly, it ensures consistency. The same Docker images you use locally are what you deploy to production (with different environment variables). No subtle differences between dev and prod. No “but it worked on my laptop” debugging sessions.

Here’s our complete docker-compose.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# infra/docker-compose.yml
services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: agent_stack
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  backend:
    build:
      context: ../backend
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes:
      - ../backend/app:/app/app:ro  # Hot-reload for development
    depends_on:
      db:
        condition: service_healthy   # Wait for DB to be ready
    environment:
      - DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/agent_stack
      - ENV=dev
    command: uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

  frontend:
    build:
      context: ../frontend
      dockerfile: Dockerfile
    ports:
      - "5173:5173"
    volumes:
      - ../frontend/src:/app/src:ro  # Hot-reload for development
    depends_on:
      - backend
    environment:
      - VITE_API_URL=http://localhost:8000
    command: npm run dev -- --host 0.0.0.0

volumes:
  postgres_data:

Let me break down the critical pieces:

1. Database Service (Postgres)

1
2
db:
  image: postgres:16-alpine

We’re using the alpine variant of Postgres because it’s tiny (50MB vs 300MB for the full image). This matters when you’re pulling images in CI or deploying to cloud providers that charge for bandwidth.

1
2
3
4
5
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]
  interval: 5s
  timeout: 5s
  retries: 5

This is crucial. Without a health check, Docker Compose considers the database “ready” as soon as the container starts. But Postgres takes a few seconds to initialize. If the backend tries to connect during those seconds, it crashes with “connection refused.”

The health check runs pg_isready every 5 seconds. Only when it succeeds does Docker Compose start the backend service. This prevents race conditions.

1
2
volumes:
  - postgres_data:/var/lib/postgresql/data

This is a named volume. It persists database data between container restarts. Without this, every time you run docker compose down, you’d lose all your data. Named volumes live outside containers and survive restarts.

Note

Named volumes are stored in Docker’s internal directory (usually /var/lib/docker/volumes on Linux). You can list them with docker volume ls and inspect them with docker volume inspect postgres_data. To completely reset your database, run docker compose down -v (the -v flag removes volumes).

2. Backend Service (FastAPI)

1
2
3
4
backend:
  build:
    context: ../backend
    dockerfile: Dockerfile

This tells Docker to build an image from the backend/Dockerfile. During development, this build only happens once (or when you change dependencies). The actual source code is mounted as a volume (see below), so code changes don’t require rebuilding.

1
2
volumes:
  - ../backend/app:/app/app:ro

This is the magic of hot-reload. We mount the local backend/app directory into the container at /app/app. The :ro flag makes it read-only (security best practice).

When you change a Python file locally, uvicorn detects the change and reloads automatically. No rebuild, no restart. Just save and refresh.

1
2
3
depends_on:
  db:
    condition: service_healthy

This is smarter than a basic depends_on. It doesn’t just wait for the database container to start — it waits for the health check to pass. This eliminates the race condition where the backend starts before Postgres is ready to accept connections.

1
2
3
4
environment:
  - DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/agent_stack
  - ENV=dev
command: uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Notice the hostname: db. In Docker Compose, services can reach each other by service name. From the backend’s perspective, the database is at db:5432, not localhost:5432 (because they’re in separate containers).

The command overrides the Dockerfile’s CMD for development. We use uv run to execute uvicorn within uv’s managed environment—this is crucial because uv installs packages in a virtual environment. The --reload flag enables hot-reloading during development.

Tip

If you need to connect to the database from your host machine (e.g., to run database migrations or use a GUI tool like pgAdmin), use localhost:5432. From inside containers, use the service name db:5432. This trips up a lot of people initially.

3. Frontend Service (Vite)

1
2
3
4
5
6
7
8
9
frontend:
  build:
    context: ../frontend
    dockerfile: Dockerfile
  volumes:
    - ../frontend/src:/app/src:ro
  environment:
    - VITE_API_URL=http://localhost:8000
  command: npm run dev -- --host 0.0.0.0

Same pattern as the backend: build once, mount source code for hot-reload. When you change a React component, Vite’s HMR kicks in and updates the browser instantly.

The command overrides the Dockerfile’s CMD to run the Vite dev server instead of serving the production build. The --host 0.0.0.0 flag makes Vite accessible from outside the container (necessary for Docker).

The VITE_API_URL environment variable tells the frontend where the backend API lives. In production, you’d set this to your actual API domain (e.g., https://api.yourdomain.com). In development, it’s localhost:8000.

Note

Vite requires environment variables to be prefixed with VITE_ to expose them to the browser. Any variable without this prefix is only available during the build, not in runtime code.

Starting everything:

1
2
cd infra
docker compose up

You’ll see logs from all three services interleaved:

1
2
3
4
5
6
7
8
db-1        | 2025-10-29 02:29:56.540 UTC [29] LOG:  database system was shut down at 2025-10-29 02:29:50 UTC
db-1        | 2025-10-29 02:29:56.543 UTC [1] LOG:  database system is ready to accept connections
frontend-1  |   ROLLDOWN-VITE v7.1.14  ready in 138 ms
frontend-1  |
frontend-1  |   ➜  Local:   http://localhost:5173/
frontend-1  |   ➜  Network: http://172.22.0.4:5173/
backend-1   | INFO:     Waiting for application startup.
backend-1   | INFO:     Application startup complete.

Three services, one command. That’s the developer experience we’re aiming for.

Warning

On first run, Docker will download base images for Postgres (alpine), Python, and Node. This can take 2-10 minutes depending on your connection. Subsequent runs are instant because images are cached locally. Don’t panic if the first run takes a while!

Dev/Prod Parity: Why This Matters

One of the Twelve-Factor App principles is “dev/prod parity”—keep development and production as similar as possible. Docker Compose achieves this:

  • Same database: You’re using real Postgres locally, not SQLite. No “works in dev, breaks in prod” surprises from database quirks.
  • Same networking: Services talk to each other over Docker’s internal network, just like they will in production (via service mesh or internal DNS).
  • Same environment variables: The backend reads DATABASE_URL from the environment, whether it’s Docker Compose locally or Kubernetes in production.

When you deploy, you’re not crossing your fingers hoping everything works. You’re deploying the same containers you’ve been running locally for weeks. The only difference is environment variables (prod database URL, prod API keys, etc.).

Additional Resources

Troubleshooting common issues:

“Port 5432 is already in use”: You have Postgres running locally. Either stop it (brew services stop postgresql on Mac) or change the port mapping in docker-compose.yml to 5433:5432.

Backend can’t connect to database: Check that the health check is passing with docker compose ps. If the database is “unhealthy,” something’s wrong with Postgres startup. Check logs with docker compose logs db.

Hot-reload not working: Make sure the volume mounts are correct. Run docker compose config to see the resolved configuration. The paths should match your local directory structure.

“Cannot connect to Docker daemon”: Docker Desktop isn’t running. Start it and try again.

Configuration That Makes Sense

Info

Skip this section if: You’re familiar with Pydantic Settings and environment-based configuration.

Skip to Developer Experience →

Configuration is one of those things that seems simple at first but becomes a nightmare if you don’t set it up properly. I’ve seen too many projects where config is scattered across environment variables, YAML files, hardcoded constants, and command-line flags. Debugging “why does this behave differently in staging?” becomes an archaeological expedition.

We’re using Pydantic Settings to centralize all configuration in one type-safe place. This isn’t just about convenience—it’s about catching errors before they reach production.

Why Pydantic Settings over environment variables or config files?

Most projects use one of these approaches:

  1. Raw os.environ: No validation, no type safety, missing variables cause runtime errors deep in the code
  2. python-decouple or similar: Better than raw environ but still string-based, no nested config support
  3. YAML/JSON files: Great for complex config but no type safety, easy to typo a key
  4. Dotenv only: Simple but no validation, everything is a string

Pydantic Settings combines the best parts of all these approaches:

  • Type-safe: Define config as a typed class, get IDE autocomplete and mypy validation
  • Validated on startup: App crashes immediately if config is invalid, with clear error messages
  • Environment variable support: Reads from .env files or actual environment variables
  • Nested config: Support complex structures like database pools, API rate limits, etc.
  • Multiple sources: Can read from files, env vars, and defaults with clear precedence

Here’s our complete settings module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# backend/app/core/settings.py
from typing import Literal
from pydantic import field_validator, PostgresDsn
from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    """Application settings with validation and type safety."""

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False,  # DATABASE_URL or database_url both work
        extra="ignore",  # Ignore unknown env vars instead of failing
    )

    # Environment
    env: Literal["dev", "staging", "prod"] = "dev"
    debug: bool = False

    # API
    api_title: str = "Agent Stack Backend"
    api_version: str = "0.1.0"
    api_description: str = "Production-ready backend for OpenAI Agents SDK"

    # Database (required in all environments)
    database_url: PostgresDsn
    database_pool_size: int = 10
    database_max_overflow: int = 20

    # CORS - who can call our API
    cors_origins: list[str] = ["http://localhost:5173"]
    cors_allow_credentials: bool = True

    # Auth0 (we'll add this in Part 3)
    auth0_domain: str = ""
    auth0_audience: str = ""

    # OpenAI (we'll add this in Part 4)
    openai_api_key: str = ""
    openai_model: str = "gpt-4o"

    # Rate limiting
    rate_limit_requests_per_minute: int = 60
    rate_limit_tokens_per_minute: int = 100000

    @field_validator("database_url")
    @classmethod
    def validate_database_url(cls, v: PostgresDsn) -> str:
        """Ensure we're using PostgreSQL, not SQLite or MySQL."""
        if not str(v).startswith("postgresql"):
            raise ValueError(
                "DATABASE_URL must use PostgreSQL. "
                "For async support, use postgresql+asyncpg://"
            )
        return str(v)

    @field_validator("env")
    @classmethod
    def validate_env(cls, v: str) -> str:
        """Ensure environment is one of the allowed values."""
        allowed = {"dev", "staging", "prod"}
        if v not in allowed:
            raise ValueError(f"env must be one of {allowed}, got '{v}'")
        return v

    @field_validator("openai_api_key")
    @classmethod
    def validate_openai_key(cls, v: str, info) -> str:
        """In production, OpenAI key is required."""
        if info.data.get("env") == "prod" and not v:
            raise ValueError("OPENAI_API_KEY is required in production")
        return v

    @property
    def is_production(self) -> bool:
        """Convenience property for production checks."""
        return self.env == "prod"

    @property
    def is_development(self) -> bool:
        """Convenience property for development checks."""
        return self.env == "dev"


# Singleton instance - import this throughout the app
settings = Settings()

Let me break down the key pieces:

1. Type annotations with validation

1
2
database_url: PostgresDsn
env: Literal["dev", "staging", "prod"] = "dev"

PostgresDsn is a Pydantic type that validates the URL format. If you typo the URL, you get an error like “Invalid Postgres DSN: expected ‘postgresql://’, got ‘postgres://’” at startup.

Literal["dev", "staging", "prod"] means the env field can only be one of these three values. Try to set it to “production” (not “prod”) and your IDE will show an error before you even run the code.

2. Field validators for custom logic

1
2
3
4
5
6
@field_validator("openai_api_key")
@classmethod
def validate_openai_key(cls, v: str, info) -> str:
    if info.data.get("env") == "prod" and not v:
        raise ValueError("OPENAI_API_KEY is required in production")
    return v

This is powerful: validation can depend on other fields. In development, missing an OpenAI key is fine (you might be working on the database layer). In production, it’s a fatal error that stops the app from starting.

3. Smart defaults and required fields

1
2
database_url: PostgresDsn  # No default = required
database_pool_size: int = 10  # Has default = optional

If database_url isn’t set, Pydantic raises an error immediately:

1
2
3
ValidationError: 1 validation error for Settings
database_url
  Field required [type=missing, input_value={...}]

This is way better than getting a runtime error 10 minutes into testing when you try to connect to the database.

4. Environment variable mapping

Pydantic automatically maps environment variables to fields:

  • DATABASE_URL in .envsettings.database_url in Python
  • OPENAI_API_KEYsettings.openai_api_key
  • ENVsettings.env

The case_sensitive=False setting means database_url, DATABASE_URL, and Database_Url all work. This is convenient but can be disabled if you want strict naming.

Creating the .env file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# backend/.env
ENV=dev
DEBUG=true

# Database
DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/agent_stack
DATABASE_POOL_SIZE=10
DATABASE_MAX_OVERFLOW=20

# CORS - allow frontend to call API
CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]

# Auth0 (add these later when we implement auth)
AUTH0_DOMAIN=your-tenant.auth0.com
AUTH0_AUDIENCE=https://your-api.com

# OpenAI (add this when we implement agents)
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

Tip

The .env file should never be committed to git. Add it to .gitignore immediately:

1
echo ".env" >> .gitignore

Create a .env.example file with placeholder values so new developers know what variables are needed:

1
2
3
4
# .env.example
ENV=dev
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/dbname
OPENAI_API_KEY=sk-your-key-here

Using settings throughout the app:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# backend/app/main.py
from fastapi import FastAPI
from app.core.settings import settings

app = FastAPI(
    title=settings.api_title,
    version=settings.api_version,
    debug=settings.debug,
)

@app.get("/health")
async def health():
    return {
        "status": "ok",
        "env": settings.env,
        "debug": settings.is_development,
    }
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# backend/app/persistence/database.py
from sqlalchemy.ext.asyncio import create_async_engine
from app.core.settings import settings

engine = create_async_engine(
    settings.database_url,
    pool_size=settings.database_pool_size,
    max_overflow=settings.database_max_overflow,
    echo=settings.is_development,  # Log SQL queries in dev
)

Environment-specific configuration:

In production, you’d override settings via environment variables (not .env files):

1
2
3
4
5
# Production environment (Kubernetes, ECS, etc.)
export ENV=prod
export DATABASE_URL=postgresql+asyncpg://user:pass@prod-db.example.com:5432/agent_stack
export OPENAI_API_KEY=sk-prod-key-from-secrets-manager
export CORS_ORIGINS='["https://app.example.com"]'

The same Settings class works in all environments — you just change the source of the values.

Note

For production secrets (API keys, database passwords), never use .env files. Use a secrets manager like AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault. Your deployment script fetches secrets and sets them as environment variables. Pydantic Settings reads them the same way it reads .env files — the code doesn’t change.

Why this matters for agent applications:

Agent applications have a lot of knobs to tune: model selection, token limits, rate limiting, database connection pools, API keys for multiple providers. Centralizing config in a type-safe class means:

  1. Easier debugging: When something behaves differently in staging, check settings.env and settings.openai_model instead of grepping for environment variable accesses
  2. Safer deploys: If you forget to set OPENAI_API_KEY in production, the app crashes on startup (before serving any traffic) instead of failing the first time a user tries to chat
  3. Better testability: In tests, you can override settings easily:
    1
    2
    3
    4
    5
    6
    7
    
    @pytest.fixture
    def test_settings():
        return Settings(
            env="dev",
            database_url="postgresql+asyncpg://test:test@localhost:5432/test_db",
            openai_api_key="sk-test-key"
        )
    

Using asyncpg for database connections:

We specified postgresql+asyncpg:// in the database URL. Why asyncpg specifically?

  • Fastest Postgres driver for Python: Benchmarks show it’s 3-5x faster than psycopg2
  • Native async support: Built for asyncio from the ground up (unlike psycopg2 which added async later)
  • Type-safe: Uses Python’s type system for query parameters
  • Connection pooling: Built-in connection pool management

When you’re streaming agent responses and handling multiple concurrent sessions, database performance matters. asyncpg ensures database queries don’t become the bottleneck.

Additional Resources

Developer Experience: The Makefile

Here’s a problem I’ve seen on every project: each developer has their own set of commands they memorized. One person runs tests with pytest, another uses uv run pytest, a third uses python -m pytest. Someone remembers that you need to be in the backend/ directory, someone else doesn’t. Six months later, nobody remembers the exact incantation to run database migrations.

The solution: standardize everything in a Makefile. Make is old (1976!), ubiquitous (comes with every Unix system), and perfect for this job. It’s not just a build tool—it’s a command runner and documentation system.

Why Make over npm scripts or custom shell scripts?

  • npm scripts: Great if your whole project is Node, awkward when you have Python backend + React frontend + Docker + infrastructure
  • Shell scripts: Work but require careful path handling and error checking, no built-in dependency between tasks
  • Task runners like Task or Just: Modern and nice, but not installed by default. Make is already there.

Make gives you:

  1. Self-documenting commands: Run make or make help to see all available commands
  2. Task dependencies: “Run tests only after linting passes”
  3. Consistent working directory: No more “which folder am I in?” confusion
  4. Cross-platform (mostly): Works on Linux, macOS, and WSL2

Here’s our Makefile for Part 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Makefile
.PHONY: help up down logs clean

# Default target - show help
help:
	@echo "Available commands:"
	@echo "  make dev       - Start all services in development mode"
	@echo "  make up        - Start all services (with logs)"
	@echo "  make down      - Stop all services"
	@echo "  make logs      - View logs from all services"
	@echo "  make clean     - Remove caches and temporary files"

# Start services in detached mode (background)
dev:
	@echo "Starting all services..."
	cd infra && docker compose up -d
	@echo ""
	@echo "✓ Services started!"
	@echo "  Backend:  http://localhost:8000"
	@echo "  Frontend: http://localhost:5173"
	@echo "  API Docs: http://localhost:8000/docs"
	@echo ""
	@echo "Run 'make logs' to view logs"

# Start services with logs visible
up:
	@echo "Starting all services..."
	cd infra && docker compose up

# Stop all services
down:
	@echo "Stopping all services..."
	cd infra && docker compose down

# View logs from all services
logs:
	cd infra && docker compose logs -f

# Clean up Python cache files
clean:
	@echo "Cleaning up caches and temporary files..."
	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
	find . -type d -name ".pytest_cache" -exec rm -rf {} + 2>/dev/null || true
	find . -type d -name ".mypy_cache" -exec rm -rf {} + 2>/dev/null || true
	find . -type d -name ".ruff_cache" -exec rm -rf {} + 2>/dev/null || true
	find . -type f -name "*.pyc" -delete 2>/dev/null || true
	@echo "✓ Cleanup complete!"

Let me break down what each command does:

make dev - Your daily driver:

1
2
3
4
5
dev:
	cd infra && docker compose up -d
	@echo "✓ Services started!"
	@echo "  Backend:  http://localhost:8000"
	# ...

Starts all three services in detached mode (background) and shows you the URLs. This is what you’ll run every morning. The -d flag means services run in the background, so you get your terminal back.

make up - When you want to see logs:

1
2
up:
	cd infra && docker compose up

Starts services in the foreground, showing logs from all three services. Useful when you’re debugging and want to see what’s happening. Press Ctrl+C to stop.

make down - Stop everything:

1
2
down:
	cd infra && docker compose down

Stops all containers and removes them. The database volume persists, so you don’t lose data.

make logs - View live logs:

1
2
logs:
	cd infra && docker compose logs -f

Attaches to logs from all running services. The -f flag means “follow” (like tail -f). Press Ctrl+C to exit.

make clean - Remove clutter:

1
2
3
clean:
	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
	# ...

Deletes Python cache files that accumulate during development. Run this occasionally to free up space.

Your typical workflow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Morning: Start everything
make dev

# Check if backend started correctly
curl http://localhost:8000/health

# View logs to debug an issue
make logs

# Evening: Stop everything
make down

# Occasional: Clean up cache files
make clean

Why this matters:

  • One source of truth: Instead of remembering “was it docker compose or docker-compose?”, you just run make dev
  • New developer friendly: Someone clones the repo, runs make, sees all commands. No hunting through README files
  • Works everywhere: Same commands on Mac, Linux, and WSL2
  • Easy to extend: As we add features in later parts (tests, migrations, deployments), we’ll add more make targets

Tip

For Windows developers: If you don’t have Make installed, you have a few options:

  1. WSL2 (recommended): Full Linux environment, Make works perfectly, this is how I use it on my Desktop
  2. Chocolatey: choco install make installs GNU Make on Windows
  3. Git Bash: Recent versions include Make
  4. Just run the commands: Look inside the Makefile and run the docker compose commands directly

But seriously, just go with WSL.

What we’ll add in later parts:

This is a minimal Makefile for Part 1. As we progress through the series, we’ll add:

  • Part 2: make migrate, make migrate-create (database migrations)
  • Part 3: make test, make lint, make format (testing and code quality)
  • Part 6: make deploy-staging, make deploy-prod (deployments)

For now, these five commands are all we need to work with our foundation.

Additional Resources

What We’ve Built

Take a moment to appreciate what we’ve accomplished. This isn’t just “hello world” — this is a production-grade foundation that most teams spend weeks refining. Let’s inventory what we have:

Infrastructure & DevOps:

  • Three-service architecture running with one command (make dev)
  • Docker Compose with health checks, volume mounts, and service dependencies
  • Hot-reload everywhere: Python with uvicorn watch, React with Vite HMR
  • Named volumes for persistent database storage
  • Dev/prod parity: same containers locally and in production

Backend (Python + FastAPI):

  • Async-first FastAPI application with automatic OpenAPI docs
  • Type-safe configuration using Pydantic Settings with validation
  • uv for dependency management (10-100x faster than pip)
  • Ruff for linting and formatting (replaces black, flake8, isort)
  • mypy in strict mode catching type errors before runtime
  • asyncpg for Postgres (fastest async driver available)

Frontend (React + TypeScript):

  • Vite for blazing-fast dev server (sub-2-second startup)
  • TypeScript in strict mode with path aliases configured
  • Clean folder structure anticipating SSE, auth, and agent UI
  • Type-safe API client ready to match backend Pydantic models

Developer Experience:

  • Makefile with standard commands for all common tasks
  • Self-documenting (run make to see all commands)
  • Consistent workflow across all team members
  • CI-ready (same commands work in GitHub Actions, GitLab CI, etc.)

Testing & Quality:

  • Test structure ready for unit, integration, and e2e tests
  • Coverage reporting configured with pytest
  • Lint and typecheck commands for pre-commit hooks
  • Quality gates that fail fast with clear error messages

Try it now:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Start everything
make dev

# In another terminal, check the health endpoint
curl http://localhost:8000/health
# {"status":"ok","env":"dev","debug":true}

# Visit the API docs
open http://localhost:8000/docs

# Visit the frontend
open http://localhost:5173

The frontend is still the default Vite landing page, and the backend has one endpoint. But you have:

  • Type safety enforced from database to browser
  • Configuration validated on startup
  • Services orchestrated with proper dependencies
  • Developer workflow standardized
  • Hot-reload for instant feedback

Tip

Checkpoint: Before moving on, make sure everything works:

  1. Run make dev and wait for all services to start
  2. Visit http://localhost:8000/docs and see the interactive API docs
  3. Visit http://localhost:5173 and see the React app
  4. Run make logs-backend in another terminal and see live logs
  5. Change a file in backend/app/main.py, save, and watch uvicorn reload

If any of these fail, check the troubleshooting sections in each setup step above. The foundation must be solid before we build on it.

What’s Next?

Part 2: Backend Core & Database - FastAPI routing, async SQLAlchemy, Alembic migrations, Repository pattern

Part 3: Authentication & Security - Auth0 integration, JWT validation, session cookies for SSE, CORS

Part 4: Agent Integration & Streaming - OpenAI Agents SDK, SSE streaming, tool calling, session memory

Part 5: Frontend & User Interface - React SSE client, chat UI, session management, markdown rendering

Part 6: Credits, Limits & Usage Tracking - Token-based credits, rate limiting, usage analytics

Part 7: Observability & Tracing - Structured logging, OpenAI Traces, Arize Phoenix integration

Part 8: Production Deployment - Terraform, GitHub Actions CI/CD, zero-downtime deployments

Additional Resources

Further Reading on Topics Covered Today:

Resources and Community

Repository: github.com/bedirt/agents-sdk-prod-ready-template

Issues and questions: Open a GitHub issue or discussion. I try to respond within a day or two. Common issues usually have solutions in existing threads.

Comments and feedback: Please leave a comment below if you found this helpful, have suggestions, or want to share how you used the template. You can also send a suggestion using the “Suggest an Edit” link at the bottom of the page - which takes you to the GitHub repo issues.

A Final Reiteration

I built this template because I was tired of reinventing the wheel every time I started a new agent project. The first few times, I’d spend a week setting up Docker, configuring type checking, wiring authentication, and building deployment pipelines before writing a single line of agent logic.

This template encapsulates those weeks of setup. It’s the project structure I wish I had when I started building production agent applications.

My hope is that it saves you time and helps you focus on what matters: building great agent experiences for your users.

See you in Part 2, where we’ll add the database layer and start persisting chat sessions.

Next: Part 2 - Backend Core & Database (Coming Soon)


This is part of a series on building production-ready AI agent applications. All code is open source on GitHub.

Info

Enjoying this series? Star the GitHub repo, share it with your team, or send feedback. This template is a living project—contributions, suggestions, and questions are welcome.