Building a Production-Ready Agent Stack: Part 1 - The Foundation

Welcome to the first post in a series where we build a real, production-ready AI agent application from scratch. No shortcuts, no toy examples - just the patterns you’d actually use in production.

Why Another AI App Tutorial?

Well, I know that you’ve probably seen a dozen “build an AI chatbot” tutorials by now. Most of them show you how to slap together a quick demo in an afternoon, and they’re great for that. But when you try to take that demo to production, things get… complicated. At least it was for me.

Where do you put authentication? How do you stream responses so users don’t stare at a loading spinner for 30 seconds? What about session memory? Rate limiting? Credits? Deployment?

This series tackles all of that, but with a specific goal: we’re building a template you can actually use. I mainly built this codebase for myself to avoid reinventing the wheel every time I start a new agent project, then decided to share with the community. I like to think of it as “production-ready starter kit” for AI agent applications.

This isn’t just a tutorial - it’s an opinionated, minimal-yet-complete starting point for production agent applications. Think of it as a scaffold that:

Has all the production pieces in place (auth, streaming, persistence, deployment)
Remains small enough to understand fully
Takes a stance on how agentic applications should be structured
Can be cloned and customized for your specific use case

Key Decisions

Decision	Choice	Alternatives	Why
Backend framework	FastAPI	Flask, Django	Async-first, type-driven, OpenAPI docs for free.
Frontend	React + Vite (TS)	Next.js, CRA, SvelteKit	We don’t need SSR; Vite gives the fastest dev loop and standalone deploys.
Orchestration	OpenAI Agents SDK	LangChain, CrewAI, AutoGen, Google ADK	Small surface area, Python-first, easy to reason/test.
Packaging	uv	Poetry, pip-tools	Rust-speed resolver + lockfile + pure `pyproject`.
Streaming	Server-Sent Events (SSE)	WebSockets, polling	One-way, infra-friendly, auto-reconnect; simpler than WS.
Auth	Auth0 + short-lived session cookie for SSE	Clerk, Cognito, Firebase, Keycloak	Mature OIDC, easy to swap later, solves JWT + SSE split.
Database	Postgres + asyncpg + SQLAlchemy	SQLite, MongoDB, DynamoDB	Dev/prod parity, ACID, strong tooling.
Local orchestration	Docker Compose	Tilt, Nix, minikube	One command, mirrors prod containers, easy onboarding.

Info

Quick start (TL;DR):

1
2
3
4
5
git clone https://github.com/bedirt/agents-sdk-prod-ready-template
cd agents-sdk-prod-ready-template
git checkout part-1-foundation
cp backend/.env.example backend/.env  # fill in DB URL
make dev

Clone the repo, copy the sample environment file, run one command, and both the FastAPI backend and React frontend spin up with hot reload.

Need a browser opener? Pick the command for your OS right after make dev:

1
2
3
4
5
6
# macOS
open http://localhost:8000/docs
# Linux
xdg-open http://localhost:8000/docs
# Windows (PowerShell)
start http://localhost:8000/docs

Info

Prereqs: Docker Desktop (or Docker Engine), make, and git. Optional but recommended: install uv locally (pipx install uv or brew install uv) so make lint/typecheck work without extra setup.

We’re building an agent stack where:

Users log in (securely, with Auth0)
API requests use standard JWT bearer tokens, while SSE streams rely on short-lived signed cookies issued after we verify the JWT
They chat with AI agents that retain context
Responses stream in real-time, token by token, or tool calls are shown nicely as they do in research tools and ChatGPT etc. chatbots.
Usage gets tracked and metered
Everything runs in containers and deploys with one command
You can debug what’s happening in production

Sound ambitious? It is. But we’ll build it piece by piece, and by the end you’ll understand not just how to build it, but why each piece exists and how it fits together.

Info

This template is opinionated by design. We’re not trying to support every possible architecture — we’re showing you one that works well for production agent applications. Once you understand the patterns, you can adapt them to your needs.

Today, we’re starting with the foundation: the project structure, development environment, and tooling that makes everything else possible. But most importantly we will discuss the decisions behind the structure and architecture.

The Big Picture: What Are We Building?

TLDR

Full-stack template: React + Vite frontend, FastAPI backend, Agents SDK orchestration, Postgres, Auth0, Docker.
Users get real-time streaming, session memory, and reliable auth from day one.
Developers keep modular services, typed Python + TypeScript, and an ops story ready for prod.

Before we dive into code, let’s talk about what this system looks like when it’s done. I want you to see the full picture first—not just the pieces, but how they fit together and why each one matters.

Here’s the stack we’re building:

%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'16px'}}}%%
graph TB
    User[User Browser]
    Frontend["Frontend<br/><small>React + TypeScript + Vite</small>"]
    Auth0["Auth0<br/><small>Authentication</small>"]
    Backend["Backend API<br/><small>FastAPI + Python</small>"]
    AgentSDK["Agents SDK<br/><small>Agent Orchestration</small>"]
    DB[("Postgres<br/><small>Sessions & Messages</small>")]
    OpenAI["LLM API<br/><small>OpenAI/LiteLLM</small>"]

    User --> Frontend
    Frontend --> Auth0
    Frontend --> Backend
    Backend --> AgentSDK
    Backend --> DB
    AgentSDK --> OpenAI

    style User fill:#f0f4ff,stroke:#a5b4fc,stroke-width:2.5px,rx:15,ry:15,color:#1e293b
    style Frontend fill:#dbeafe,stroke:#93c5fd,stroke-width:2.5px,rx:15,ry:15,color:#1e3a8a
    style Auth0 fill:#fed7d7,stroke:#fca5a5,stroke-width:2.5px,rx:15,ry:15,color:#7f1d1d
    style Backend fill:#d1fae5,stroke:#6ee7b7,stroke-width:2.5px,rx:15,ry:15,color:#065f46
    style AgentSDK fill:#e9d5ff,stroke:#c084fc,stroke-width:2.5px,rx:15,ry:15,color:#581c87
    style DB fill:#bfdbfe,stroke:#60a5fa,stroke-width:2.5px,rx:15,ry:15,color:#1e3a8a
    style OpenAI fill:#ccfbf1,stroke:#5eead4,stroke-width:2.5px,rx:15,ry:15,color:#134e4a

Figure: High-level system diagram showing the user’s browser talking to the React frontend, which authenticates with Auth0, ca lls the FastAPI backend, orchestrates agents via the Agents SDK, and persists data in Postgres while reaching out to the LLM.

Looks straightforward, right? Just four main components talking to each other. What matters isn’t just the components — it’s how they’re connected and what’s built around them.

Here’s what we’re building from three different perspectives:

For Your Users (The Experience Layer)

This is what people actually interact with:

Secure authentication via Auth0 - no passwords to manage, no security headaches for you. Auth0 is pretty commonly used in production apps, so it’s already a solid choice. It currently has a generous free tier for small projects (up to 25,000 monthly active users for free — double-check the latest limits before going live). And if you want to swap it out later, the codebase is structured so you can replace Auth0 with another provider without massive rewrites or code tangles.
Multiple chat sessions - users can organize conversations, switch between topics, keep context separate.
Real-time streaming - responses appear token by token, just like ChatGPT. Tool calls show up as they happen and are rendered nicely.
Credit-based usage - transparent costs, no surprise bills, users see their balance before they run out. We don’t have actual payment processing in this template, but the structure is there to add it.
Works everywhere - responsive design that works on desktop, tablet, and mobile

For You (The Developer Experience)

This is what makes the codebase pleasant to work with:

One command to start - make up spins up the entire stack (frontend, backend, database). No “install these 12 things first”
Hot-reload everything - change Python code, see it instantly. Change React code, see it instantly. No build steps in dev
Type-safe end-to-end - Python with mypy strict mode, TypeScript with strict mode. Catch bugs at compile time, not runtime
Migrations in version control - database schema changes are tracked with Alembic, reviewable in pull requests
Tests that actually pass - unit tests (fast, no I/O), integration tests (with real database), e2e tests (full stack)
Deploy with confidence - CI/CD pipeline that runs tests, builds containers, and deploys to production

The goal here is zero friction. You should spend time thinking about your agents, not fighting your tools.

For Production (The Operational Reality)

This is what keeps the system running reliably at scale:

Containers everywhere - same Docker images from dev to prod. No “works on my machine” surprises
Built-in observability - traces show you what agents are doing, logs tell you what went wrong, metrics tell you when to scale
Rate limiting - token-aware limits per user prevent abuse and runaway costs
Secret management - API keys and credentials stored properly (AWS Secrets Manager - or another secrets manager, not .env files in production)
Zero-downtime deploys - rolling updates, health checks, automatic rollback if something breaks
Cost tracking - every LLM call is metered, stored, and can be attributed to a user

This isn’t an afterthought. These pieces are wired in from the start, which is way easier than retrofitting them later.

The Decision: Why This Structure?

TLDR

Pick best-in-class pieces (FastAPI, React, OpenAI Agents SDK, Docker) instead of all-in-one frameworks.
Favor code-over-config so agents, workflows, and policies live in Python with IDE + testing support.
Bake in Docker + auth + streaming early to avoid “works on my machine” and retrofit headaches.

When I started building agent applications, I kept running into the same problems. This actually started all the way from framework selection. I tried using a bunch including Google-ADK, Autogen, CrewAI even LangFlow, but none gave me the satisfaction of using the OpenAI Agents SDK. I promise there’s no sponsorship here — I’ll talk about the reasons I am so fond of it later. Let me walk you through the key decisions we made and why they matter.

Problem 1: The “Kitchen Sink” Approach

Frameworks like LangChain try to do everything: agent orchestration, vector stores, UI components, deployment. They’re fantastic for prototypes, but when you need to customize how agents hand off to each other, or change authentication providers, or swap databases, you’re fighting the framework’s opinions. This makes it hard to adapt to real-world production needs. Most projects I worked on were already built on a tech stack that needed customization from the get-go. Instead it’s better to make things modular so that you can pick the best tool for each job or swap components later.

Our approach: Use best-in-class tools for each layer. FastAPI for the API (it’s async, typed, and has great docs). React for the frontend (huge ecosystem, mature patterns — even if my personal projects lean toward Svelte, React remains the pragmatic default most teams expect). OpenAI Agents SDK for agent orchestration (built by the people who make the models) — this makes the most sense especially if you are using OpenAI models but even if not the framework is just better overall (more on this later). Docker for containers (industry standard). This means a bit more wiring, but you control each piece and can swap components when needed.

Problem 2: The “Config Hell” Approach

Some frameworks — I’m looking at you, CrewAI… — lean heavily on YAML or JSON configurations. Want to change how an agent behaves? Edit three config files, restart the system, and hope you got the indentation right. Debugging means reading stack traces that point to generated code, not your configuration. This is a nightmare for complex logic.

Our approach: Code over config. Agents are Python files you can read, edit, and debug. Workflows are Python files that import agents directly. You get IDE autocomplete, type checking, breakpoints, and version control that actually shows meaningful diffs. Configuration is for environment-specific stuff (like API keys and database URLs), not behavior.

Tip

The “code over config” philosophy doesn’t mean zero configuration. It means using code for logic and configuration for environment. Your agent’s behavior should be in a Python file you can test. Your database connection string should be in an environment variable. If this bugs you, remember that I warned you this template is opinionated! :)

Bottom line: Code > Config…

Problem 3: The “Works on My Machine” Problem

I can’t count how many times I’ve seen repos that say “just install X, Y, Z and it should work.” But X needs Python 3.9 (you have 3.11), Y needs an older version of numpy, and Z… well, nobody’s sure why Z is even there. By the time you’ve wrangled the environment, you’ve lost an afternoon.

Our approach: Docker from day one. The same containers you run locally are what you deploy to production. No “works on my machine” surprises. No conda environments, no global npm installs. One command (make up) and you have a working system. I got into docker habit immensely last year when I learned more about it while working on my home server projects. It now just feels insane to me not to use it for any project.

Problem 4: The “Streaming Is Hard” Problem

Most LLM demos use simple request/response: send a message, wait, get the full answer. But in production, users don’t want to wait 30 seconds staring at nothing. They want to see the response being generated, like they do in ChatGPT.

Our approach: Server-Sent Events (SSE) for streaming. It’s simpler than WebSockets for one-way communication (server to client), works everywhere, and reconnects automatically. The OpenAI Agents SDK handles the complex part (streaming from the LLM), and we map those events to what the frontend needs (tokens, tool calls, completion).

Heads-up on SSE auth: The native EventSource API doesn’t let you set custom headers like Authorization. If you need tokens, either stream with fetch() + ReadableStream or, as we do here, rely on a short-lived signed cookie that the backend issues after verifying the user’s JWT (via a quick /auth/session exchange). That cookie requires withCredentials: true on the client and explicit CORS configuration (Access-Control-Allow-Credentials: true with a non-wildcard origin). Calling this out now saves a lot of “why doesn’t my bearer token work?” debugging later.

Also keep reverse proxies in mind: Nginx, Cloudflare, and friends often buffer responses by default. Disabling buffering (proxy_buffering off;, X-Accel-Buffering: no) and extending read timeouts is table stakes for reliable SSE in production. We’ll revisit the exact settings when we deploy.

File: infra/nginx/sse.conf

1
2
3
4
5
6
location /api/stream {
    proxy_pass http://backend:8000;
    proxy_buffering off;
    proxy_read_timeout 3600s;
    add_header X-Accel-Buffering no;
}

Problem 5: The “Security Afterthought” Problem

So many tutorials add auth as a last step, if at all. But retrofitting security is painful — you end up changing every endpoint, every database query, every test. And you inevitably miss something (like forgetting to filter messages by user_id, leaking conversations between users).

Our approach: Authentication and authorization come early, right after we have a working API. It’s early enough that it’s not a massive refactor, but late enough that we understand what we’re protecting. Every database model has a user_id from the start. Every endpoint checks authentication. No retrofitting.

In practice that means a hybrid flow: Auth0 issues JWTs that protect our REST APIs, and the backend mints short-lived signed cookies (after verifying those JWTs) specifically for SSE connections. We’ll wire up that exchange in Part 3 so the frontend gets the best of both worlds without compromising security.

The Structure: How Agentic Applications Should Be Organized

TLDR

Monorepo with backend/, frontend/, infra/, and tests/ mirrors how the system runs.
Backend app splits into api, domain, persistence, and agents directories for clear boundaries.
Infra + tests live beside code so deployments, CI, and validation are versioned together.

This is where the template really matters. We’re not just building an app — we’re defining a structure that makes sense for production agent systems. Let me walk you through the directory layout and explain why each piece exists and how they work together.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
agents-sdk-prod-ready-template/
├── backend/
│   ├── app/
│   │   ├── api/              # HTTP routes (REST endpoints)
│   │   ├── agents/           # Agent definitions (one folder per agent)
│   │   ├── workflows/        # Multi-agent orchestration
│   │   ├── domain/           # Business logic (pure Python)
│   │   ├── persistence/      # Database models and repositories
│   │   └── core/             # Settings, security, database engine
│   ├── Dockerfile
│   └── pyproject.toml        # Python dependencies
├── frontend/
│   ├── src/
│   │   ├── api/              # Backend client (REST + SSE)
│   │   ├── auth/             # Auth0 integration
│   │   ├── components/       # React components
│   │   ├── pages/            # Top-level page components
│   │   └── store/            # State management
│   ├── Dockerfile
│   └── package.json          # Node dependencies
├── infra/
│   ├── docker-compose.yml    # Local development (3 services)
│   └── terraform/            # Production infrastructure
├── scripts/
│   └── smoke_e2e.sh          # Curl-based smoke test helper
└── tests/
    ├── unit/                 # Fast, no I/O
    ├── integration/          # Require services (DB, API)
    └── e2e/                  # Full stack

This structure embodies a specific opinion about how agent applications should be built. Let me explain the critical decisions:

The Backend: Clean Architecture for Agents

TLDR

Layered backend: api handles HTTP, domain owns business rules, persistence isolates data access.
Agents live beside their prompts, tools, and subagents so behavior stays discoverable and testable.
SSE via OpenAI Agents SDK gives us streaming UX while staying simple to operate.

Separation of Concerns (The Foundation):

The backend is split into clear layers, each with a single responsibility:

api/ handles HTTP concerns: routing, request validation, response serialization, status codes. This layer knows about FastAPI but doesn’t know about Postgres or agent logic.
domain/ contains business logic: session lifecycle, message handling, credit calculations. This is pure Python — no FastAPI imports, no SQLAlchemy imports. You can test it without starting a server or database.
persistence/ manages data access: ORM models, database queries, migrations. This layer knows about Postgres but doesn’t know about HTTP or business rules.

Why does this matter? Because when you need to change databases (ie Postgres to MongoDB), you only touch persistence/. When you need to change from FastAPI to Flask, you only touch api/. When you need to change business rules (like credit calculations), you only touch domain/. Changes don’t cascade.

Note

This is an implementation of “Hexagonal Architecture” (also called “Ports and Adapters”). The core domain logic is at the center, and infrastructure concerns (HTTP, database, external APIs) are at the edges. It’s a little more setup than throwing everything in one file, but it scales beautifully.

Additional Resources

Agents and Workflows as First-Class Citizens:

Here’s where our structure gets opinionated about agent applications specifically:

Each agent lives in its own folder: agents/agent_<name>/

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
agents/
├── shared/
│   ├── tools/           # Tools multiple agents can use
│   └── types/           # Shared data structures
├── agent_checkout/
│   ├── agent.py         # Agent definition (build function)
│   ├── tools.py         # Agent-specific tools
│   ├── schemas.py       # Structured output schemas
│   └── prompts/
│       └── system.md    # System instructions
└── agent_refund/
    ├── agent.py
    ├── tools.py
    ├── subagents/
    │   └── agent_lookup/
    │       ├── agent.py
    │       ├── tools.py
    │       └── prompts/
    │           └── system.md
    └── prompts/
        └── system.md

Why one folder per agent? Because agents are complex entities with prompts, tools, and configuration. Keeping them together makes it easy to understand what an agent does and to test it in isolation. The shared/ folder prevents duplication when multiple agents need the same tools or data structures.

A normal agent will have an agent.py, tools.py, and if we are using structured outputs, a schemas.py file. Apart from these, agents can also have subagents depending on its use case. A subagent is simply an agent that is only used by a parent agent. For example, if we have a “Support Agent” that handles customer support queries, it might have subagents like “Order Lookup Agent” and “Refund Processing Agent” to handle specific tasks. These subagents would live in their own folders within the parent agent’s folder. This keeps the subagent logic encapsulated and makes it clear they are not meant to be used standalone. If at any point we need to promote a subagent to a full agent, we can easily move it out.

If we are housing the prompts in the repo as well, there will be a prompts/ folder too for that agent. And prompts are stored as markdown files for better readability, separation of concerns, and easier versioning.

Tip

Another way of handling the prompts though is to use a prompt management system. For this, we can stay in OpenAI ecosystem and use OpenAI’s prompts system, or use a third party system like PromptLayer or Phoenix-Arize etc.

In this template we are storing prompts as markdown files in the repo but there is virtually no limitation to using a prompt management system instead. We also show an implementation of this in the repo as well.

I must admit though this structure is actually coming from Google-ADK’s recommended structure. Based on my view they nailed it! Though they still have a missing point in the structure which is what we fix and discuss next: workflows.

Workflows live in workflows/<name>/workflow.py and import agents directly:

File: backend/app/workflows/support_pipeline/workflow.py

1
2
3
4
5
6
7
8
9
from agents.agent_checkout import build_agent as build_checkout_agent
from agents.agent_refund import build_agent as build_refund_agent

def build_support_workflow():
    checkout_agent = build_checkout_agent()
    refund_agent = build_refund_agent()

    # Wire handoffs, define routing, set guardrails
    # ...

A workflow is any sort of orchestration between multiple agents. This could be as simple as routing user messages to different agents based on intent, or as complex as multi-step processes where one agent’s output feeds into another’s input. In OpenAI Agents SDK, there is no formal “workflow” construct like ADK had. Meaning it doesn’t give you blocks for “Run this agent after this” or “Run these agents in parallel” etc. But there is also no need for these technically because everything is just Python code. So you can implement any sort of workflow logic you want using normal Python functions and classes. This gives you ultimate flexibility.

ADK format was using these workflow logic just as they do an agent. So you would have the orchestration coupled with agentic logic. I found this to be a bad idea as it mixes two different concerns. So in our structure we separate workflows into their own folder. This way, agents focus on “what to do” and workflows focus on “how to coordinate” only.

A workflow also handles the “handoff” logic for the agents. We never import one agent into another. Instead, the workflow imports both agents and wires them together. This keeps agents decoupled and reusable.

The Frontend: Simple and Focused

TLDR

Keep frontend lean: React + Vite, minimal state, Auth0 hooks, and an SSE client.
Organize by feature (api/, auth/, components/, pages/) so future features scale cleanly.
We’ll flesh out UI later; for now hot reload + TypeScript + path aliases give a fast dev loop.

The frontend structure is intentionally minimal:

api/ - Client for talking to the backend (REST + SSE wrapper)
auth/ - Auth0 integration (login, logout, token management)
components/ - Reusable UI components (ChatWindow, SessionList, MessageBubble)
pages/ - Top-level page components (Login, Dashboard, Chat)
store/ - State management (sessions, messages, user)

We’re not using a complex state management library (like Redux) because we don’t need it. The state is simple: current user, list of sessions, list of messages in current session. React’s built-in state and context are enough.

The critical piece is the SSE client in api/. This is where we consume the streaming events from the backend and turn them into UI updates. It’s the most “agent-specific” part of the frontend.

Infrastructure and Testing

TLDR

Centralized test tree (unit/, integration/, e2e/) mirrors the code structure.
infra/ captures docker-compose for dev plus Terraform and CI so infra changes get reviewed.
One repo, one workflow: clone, run tests, deploy with the same scripts everywhere.

Centralized Testing:

All tests live in one /tests directory that mirrors the source structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
tests/
├── unit/
│   ├── agents/
│   │   └── agent_checkout/
│   ├── domain/
│   └── persistence/
├── integration/
│   └── api/
└── e2e/
    └── full_flow/

Why centralized? Because it makes CI simpler (one command runs all tests), makes coverage reports meaningful, and makes it obvious where tests live. Some projects scatter tests next to source files (agent.py and agent_test.py in the same folder). I always thought that centralizing them reduces confusion and makes it easier to run subsets (“just run unit tests” vs “run everything”).

Infrastructure as Code:

The infra/ folder contains everything needed to run the system:

docker-compose.yml for local development (3 services: db, backend, frontend)
terraform/ for cloud resources (compute, database, secrets, DNS)
.github/workflows/ for CI/CD (lint, test, build, deploy)

Everything is versioned. Everything is reviewable. You can see the history of infrastructure changes just like code changes.

This is the final piece of our “minimal but complete” philosophy: we give you the deployment story, not just the app code.

Building the Foundation: The Setup

TLDR

Scaffold skeleton repo structure and commit to uv + npm tooling.
Configure Python (uv, ruff, mypy) and React (Vite) before business logic.
Containerize everything early (Dockerfiles + Compose) and wire helpful Make targets.

Alright, enough philosophy. Let’s build something!

We’re starting with the foundation — the pieces that make everything else possible:

The project skeleton (directories, files)
Python tooling (uv, ruff, mypy)
Node/React with Vite - the initial setup
Docker Compose for local development
Environment configuration
Development scripts (Makefile)

This might seem like a lot of setup before writing “real” code, but trust me—investing time here will save you a lot of frustration later.

Info

All the code we’re building today is available in the repository. You can follow along by cloning it, or use this as a reference while building your own version. Or just skip it altogether if you trust me that it works :)

I used different branches for each post in the series so you can see the incremental changes. Today’s code is in the part-1-foundation branch.

Here is the link to the branch.

Python Setup with uv

TLDR

Use uv for lightning-fast dependency resolution and lockfiles.
Standard pyproject.toml keeps migration to Poetry/pip trivial if needed.
uv lock + uv sync --frozen guarantee reproducible builds in CI and Docker.

Skip Ahead

If you know:

uv for dependency management (init, lock, sync)
Standard pyproject.toml + lockfiles
Reproducible installs in CI/Docker

Skip to Code Quality →

We’re using uv for Python dependency management. Why not Poetry or pip?

uv is 10-100x faster. Seriously. Installing dependencies that take a minute with pip take seconds with uv. It uses a Rust-based resolver and caches aggressively.
uv uses standard pyproject.toml. If you decide to switch to Poetry later, it’s easy. The file format is the same.
uv handles Python versions. Need Python 3.11? uv python install 3.11. Done.

I was an avid pip user before I moved to Poetry 5-6 years back. And last year I discovered uv and switched to it immediately. It is just so much faster and the switch is very painless, highly recommended! But if for some reason you don’t want to use uv, you can easily adapt the instructions to Poetry or pip.

Let’s start by creating the backend folder and initializing uv:

1
2
3
mkdir backend
cd backend
uv init

Once dependencies are defined, generate a lock file and commit it:

1
2
uv lock          # resolves dependencies and creates uv.lock
uv sync --frozen # respect the lock (use this in CI and Docker builds)

Locking dependencies means every teammate, CI job, and container image gets the exact same versions—no “works for me” drift when a transitive dependency ships a breaking change overnight.

Now, let’s define our dependencies in pyproject.toml.

File: backend/pyproject.toml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[project]
name = "agents-sdk-prod-ready-template"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "fastapi>=0.109.0",
    "uvicorn[standard]>=0.27.0",
    "sqlalchemy[asyncio]>=2.0.25",
    "asyncpg>=0.29.0",
    "alembic>=1.13.0",
    "pydantic>=2.5.0",
    "pydantic-settings>=2.1.0",
]

[project.optional-dependencies]
dev = [
    "ruff>=0.1.11",      # Linter and formatter (replaces flake8, black, isort)
    "mypy>=1.8.0",       # Type checker
    "pytest>=7.4.4",     # Testing framework
    "httpx>=0.26.0",     # For testing API clients
]

These are some sensible version defaults (versions) I got as of today, but feel free to adjust as needed. As long as you use uv, it will resolve dependencies quickly so no worries about conflicts.

Code Quality: Linting and Type Checking

TLDR

Ruff replaces black + flake8 + isort with one fast tool and one config.
Mypy strict mode keeps domain logic honest and documents intent.
Make targets (make lint, make format, make typecheck) standardize how the team runs checks.

Skip Ahead

If you know:

Ruff for linting/formatting
Mypy in strict mode
Standard Make targets: lint, format, typecheck

Skip to FastAPI Boilerplate →

We’re setting up ruff for linting/formatting and mypy for type checking.

Why ruff? Python has a fragmented ecosystem for code quality. You’ve probably seen projects with black (formatting), flake8 (linting), isort (import sorting), and maybe pylint thrown in. That’s four tools, four configs, and four places where your CI can fail. Ruff combines all of this into one blazingly fast Rust-based tool. It runs 10-100x faster than the competition and gives you one config file instead of four. I used to use flake8 + black + isort combo for years, but once I switched to ruff, I never looked back (I sense a pattern here :)).

Why mypy? Python’s dynamic typing is great for prototyping but dangerous in production. When you’re handling user credits, streaming agent responses, and managing database transactions, you want the compiler to tell you “this function expects a SessionID but you’re passing a str” before your users find out. Mypy with strict mode is how you get that safety.

Note

Type checking isn’t just for catching bugs — it’s documentation that stays up to date. When a new developer looks at def process_run(session: SessionID, user: User) -> RunResult:, they know exactly what the function expects and returns. No guessing, no digging through implementation details.

Let’s configure both tools in pyproject.toml:

File: backend/pyproject.toml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[tool.ruff]
line-length = 100  # Readability sweet spot; 88 is too cramped, 120 gets unwieldy
exclude = ["__pycache__", "build", "dist", ".venv"]

# Automatically fix issues where possible (import sorting, trailing whitespace, etc.)
fix = true
show-fixes = true

# Surfacing fix suggestions in the output helps contributors learn what changed when CI auto-applies fixes.

[tool.ruff.lint]
select = ["E", "F", "W", "C", "N", "B", "I"]  # Enable common error categories
ignore = ["E501"]  # Let the formatter handle line length

[tool.mypy]
python_version = "3.11"
strict = true  # Enable all strict checks at once
warn_unused_configs = true
disallow_untyped_defs = true  # Every function must have type hints
check_untyped_defs = true
no_implicit_optional = true  # Optional[T] must be explicit, not inferred from None
warn_redundant_casts = true
warn_unused_ignores = true
warn_return_any = true  # Catch functions that return Any (a type safety hole)

# Tell mypy to follow imports and check third-party libraries
follow_imports = "normal"
disallow_untyped_calls = true

[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false  # Relax for tests; we care more about coverage

Running lint, formatting, and type checks

We standardize on Make targets so nobody has to remember exact incantations:

1
2
3
4
# From repo root
make lint       # Ruff lint (no writes)
make format     # Ruff formatter (writes changes)
make typecheck  # mypy strict mode

Under the hood these targets call uv run ..., which executes commands inside the project’s managed virtualenv. Run uv sync --extra dev in backend/ once so the lint/type-check tooling is available locally and in CI.

Let me break down the key choices:

Ruff configuration:

line-length = 100: This is opinionated. Black uses 88, but I find 100 strikes a better balance between readability and fitting complex FastAPI endpoint signatures on one line.
select = ["E", "F", "W", "C", "N", "B", "I"]: These are error codes for pycodestyle errors (E), pyflakes (F), warnings (W), complexity (C), naming conventions (N), bugbear (B), and import sorting (I). You’re getting the equivalent of flake8 + isort in one tool.
fix = true: Ruff will automatically fix issues like import sorting and trailing whitespace on save. This eliminates bikeshedding in code reviews.
show-fixes = true: Ruff prints the applied autofixes, which makes CI logs and local runs more educational—contributors see what changed, not just that something changed.

Mypy configuration:

strict = true: This is the nuclear option. It enables every type checking rule mypy has. You’ll get errors for missing type hints, returning Any, or unsafe casts. This feels painful at first but pays off when you’re refactoring agent logic at scale.
disallow_untyped_defs = true: Every function needs type hints. Period. When you’re streaming tokens, managing sessions, and tracking credits, you don’t want ambiguity about what types flow through your system.
no_implicit_optional = true: If a parameter can be None, you must write Optional[T]. This catches bugs where you assume a value exists but it’s actually None at runtime (classic NoneType error in production).
warn_return_any = true: Returning Any defeats the purpose of type checking. This warns you when a function’s return type is too loose, which often happens when integrating with third-party libraries.

Tip

If you’re adding type hints to an existing codebase, start with strict = false and enable rules incrementally. For a new project like this template, going strict from day one is the right move — you’ll never have to retrofit types later.

The [[tool.mypy.overrides]] section at the end relaxes rules for tests. In test files, we care more about coverage and readability than perfect type safety. It’s fine if a test helper function doesn’t have complete type hints—the production code is what matters.

When building agent systems with the OpenAI Agents SDK, you’re juggling complex types: StreamedEvent, RunResult, SessionID, custom tool schemas, and Pydantic models for your database. Mypy catches mismatches before they become production incidents. Ruff ensures your code is consistent and readable when onboarding new team members or revisiting agent logic six months later.

These tools run in CI (we’ll set that up shortly), so every pull request gets checked automatically. No “it worked on my machine” surprises.

Additional Resources

Ruff Documentation – Full list of rules and configuration options
Mypy Documentation – Type checking deep dive and best practices
Python Type Hints Guide – Official Python typing module reference

FastAPI Boilerplate: Your First Endpoint

TLDR

Create app/main.py with a health endpoint to prove the stack works.
Configure settings via pydantic-settings so env vars drive behavior later.
Wire uvicorn for reload + CORS groundwork before real routes arrive.

Now for the fun part — let’s write some actual code. We’re starting with FastAPI as our backend framework. If you’ve used Flask before, FastAPI will feel familiar but with superpowers: automatic validation, async support out of the box, and OpenAPI docs that generate themselves.

Why FastAPI over Flask or Django? Three reasons:

Native async support: When you’re streaming agent responses or making multiple LLM calls in parallel, you need async. Flask bolted on async support in 2.0, but FastAPI was built for it from day one.
Pydantic integration: FastAPI uses Pydantic for request/response validation. This means your API contracts are enforced automatically — send malformed JSON and you get a clear error before your handler runs.
Auto-generated docs: Every endpoint you write shows up in interactive Swagger UI at /docs. No manual API documentation needed. This is a game-changer when working with frontend developers or building integrations.
Simplicity and performance: FastAPI is lightweight and fast, making it ideal for high-throughput applications like agent systems.

Let’s write our first endpoint:

File: backend/app/main.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

from app.core.logging import RequestIdMiddleware
from app.core.settings import settings


app = FastAPI(
    title=settings.api_title,
    version=settings.api_version,
    debug=settings.debug,
)

app.add_middleware(RequestIdMiddleware)
app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.cors_origins,
    allow_credentials=settings.cors_allow_credentials,
    allow_methods=["*"],
    allow_headers=["*"],
)


@app.get("/health")
async def health() -> dict[str, str | bool]:
    """Health check endpoint for monitoring and load balancers."""
    return {
        "status": "ok",
        "env": settings.env,
        "debug": settings.is_development,
    }

This looks simple, but there’s a lot happening here:

async def health(): This is an async endpoint. FastAPI will run it on the event loop, which means it won’t block other requests. When you’re handling 100+ concurrent agent sessions, this matters.
-> dict[str, str | bool]: Type hint for the response. FastAPI uses this to generate OpenAPI schema and validate your response at runtime (if you enable response validation).
Docstring: Shows up in the auto-generated docs. Write these for every endpoint — your future self will thank you.
RequestIdMiddleware: Every request gets/propagates an x-request-id, and we emit structured JSON logs so you can trace LLM/tool calls later.
CORS middleware: Pulls origins + credential policy from Settings. This is what lets the browser share Auth0 JWTs and SSE session cookies safely. On the frontend, remember to call fetch(..., { credentials: "include" }) whenever you hit authenticated endpoints or EventSource proxies.

Frontend calls centralize that credentials flag so nobody forgets it:

File: frontend/src/api/client.ts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
export const apiClient = async <T>(path: string, options: RequestInit = {}) => {
  const response = await fetch(path, {
    credentials: "include",
    ...options,
  });

  if (!response.ok) {
    throw new Error(`Request failed: ${response.status}`);
  }

  return response.json() as Promise<T>;
};

Note

The async def keyword is important even for simple endpoints. FastAPI can handle both sync and async functions, but if you define a sync function, it runs in a thread pool which has overhead. For database queries, LLM calls, or any I/O, always use async def.

Tip

Drop empty __init__.py files into backend/app/ (and backend/app/core/) so Python treats those directories as packages. This keeps tooling like mypy from discovering duplicate modules (e.g., settings vs app.core.settings).

Info

The template already includes these two files:

1
2
backend/app/__init__.py
backend/app/core/__init__.py

If you’re building along manually, make sure to add them so imports resolve cleanly.

To keep traces connected across HTTP -> LLM calls -> tool invocations, I add a dirt-simple middleware that assigns a request ID and emits structured JSON logs:

File: backend/app/core/logging.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from __future__ import annotations

import json
import logging
import sys
import uuid

from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response


logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO,
    format="%(message)s",
)


class RequestIdMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        request_id = request.headers.get("x-request-id", str(uuid.uuid4()))
        response: Response = await call_next(request)
        response.headers["x-request-id"] = request_id
        logging.info(
            json.dumps(
                {
                    "event": "http_request",
                    "method": request.method,
                    "path": request.url.path,
                    "status": response.status_code,
                    "request_id": request_id,
                }
            )
        )
        return response

Now every log line carries a correlation ID you can pass to upstream LLM/tool clients. If you terminate SSE behind Nginx/Cloudflare later, forward their X-Request-ID header so traces stay stitched together.

Now let’s start the server:

1
2
3
cd backend
uv sync  # Install dependencies if you haven't already
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

You’ll see output like this:

1
2
3
4
5
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [12345] using WatchFiles
INFO:     Started server process [12346]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

The --reload flag is critical during development — it auto-restarts the server when you change code. Uvicorn uses watchfiles under the hood (another Rust-based tool) for blazing fast reloads.

Open your browser and hit these URLs:

http://localhost:8000/health - You’ll see {"status":"ok"}
http://localhost:8000/docs - Interactive API documentation (Swagger UI)
http://localhost:8000/redoc - Alternative API docs (ReDoc, which I find prettier)

Tip

The auto-generated docs at /docs aren’t just for show. You can test endpoints directly from the browser, inspect request/response schemas, and even download the OpenAPI spec. When you’re debugging agent runs or testing credit deduction logic, this beats using curl or Postman. I keep this tab open constantly during development.

Key benefits we get from this setup:

Health checks: The /health endpoint is what load balancers and orchestrators (Kubernetes, ECS) use to determine if an instance is ready to serve traffic
Type safety: FastAPI validates return types at runtime - if you return the wrong type, you’ll catch it immediately
Async from the start: No refactoring needed when we add streaming endpoints later
OpenAPI schema: Auto-generated at /openapi.json for type-safe frontend clients

Additional Resources

FastAPI Documentation – Comprehensive guide with excellent examples
Uvicorn Settings – Deployment and performance tuning options
Pydantic Models – Deep dive into request/response validation

Next steps:

This health check endpoint is just the skeleton. In the next sections, we’ll add:

Database integration (PostgreSQL + SQLAlchemy)
Authentication and user management
Agent streaming endpoints (the real meat of the application)
Credit tracking and rate limiting
Proper error handling and logging

But for now, you have a working FastAPI server with auto-generated docs, type safety, and async support. That’s a rock-solid foundation to build on.

Frontend with Vite: Modern React Development

TLDR

Scaffold React + TypeScript via Vite (npm create vite@latest) for a fast dev loop.
Add path aliases and Docker-friendly host config so future features plug in easily.
Keep UI minimal in Part 1; we just need hot reload and a running dev server.

Skip Ahead

If you know:

Vite dev server + HMR basics
React + TypeScript scaffold via Vite
Path aliases + Docker-friendly host config

Skip to Creating Dockerfiles →

Time to set up the frontend. We’re using Vite as our build tool and development server.

Why Vite over Create React App?

Create React App was the standard for years, but it’s showing its age. The development server takes forever to start, hot module replacement is slow, and the build process uses webpack under the hood (which is powerful but complex). Vite takes a different approach:

Native ESM in development: Vite serves your code as native ES modules. No bundling during development means the dev server starts instantly — even on large projects. CRA bundles everything upfront, which means 30-60 second startup times on big codebases. Vite? Under 2 seconds, always.
Lightning-fast HMR: Change a React component and see it update in the browser in milliseconds. Vite’s HMR is so fast it feels like you’re editing the page directly. This matters when you’re iterating on UI stuff and you want tight feedback loops. Ofc this don’t matter much for our simple template project, but we are thinking big here.
Optimized production builds: Vite uses Rollup under the hood for production builds, which generates smaller, more efficient bundles than webpack. Smaller bundles = faster page loads for your users.
No ejecting required: With CRA, if you need custom configuration, you either eject (and maintain all the build tooling yourself) or use workarounds like CRACO. Vite’s config is simple and transparent from day on — it’s just a JavaScript file. Honestly that’s the selling point for me—React’s legacy build tooling got complicated fast, and Vite keeps things transparent.

TypeScript strict mode from the start:

We’re using TypeScript with strict mode enabled. I know, I know — TypeScript can feel like overkill for simple UIs. But when you’re building agent applications, your frontend is managing complex state:

Streaming events from Server-Sent Events
Message history with nested objects (text, tool calls, errors)
Session metadata (created_at, updated_at, message count)
User credits and rate limiting

Without types, you’ll spend hours debugging “Cannot read property ‘X’ of undefined” errors. With types, your IDE tells you exactly what’s available and catches errors as you type. Also as I mentioned before, we think big regardless of this simple template project.

Let’s create the frontend:

1
2
3
cd frontend
npm create vite@latest . -- --template react-ts
npm install

Note

I am skipping the folder creation steps in the commands cuz its assumed you can just create the folders as needed. So just focus on the commands relevant to each section. (Or just create the structure all together from the beginning based on the structure we discussed.)

Vite currently offers the rolldown bundler during scaffolding—it’s fast and works out of the box, so I stick with the default.

This scaffolds a React + TypeScript project with Vite. You’ll get a basic structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
frontend/
├── src/
│   ├── App.tsx           # Main app component
│   ├── App.css           # App styles
│   ├── main.tsx          # Entry point
│   ├── assets/           # Static assets (images, fonts)
│   ├── index.css         # Global styles
│   └── vite-env.d.ts     # Vite type definitions
├── index.html            # HTML template
├── package.json          # Dependencies
├── tsconfig.json         # TypeScript config
└── vite.config.ts        # Vite config
<!-- Rest of the defaults don't matter - just leave them as is for now -->

That’s all we need for Part 1! The default Vite structure is fine for now. We’ll build out the full frontend architecture (components, pages, API clients, state management) in Part 5 when we implement the agent UI. For now, we just need a working dev server that we can containerize.

Configuring path aliases (optional but recommended):

One quick improvement: set up path aliases so you can write @/components/Button instead of ../../../components/Button later.

Update vite.config.ts:

File: frontend/vite.config.ts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import path from 'path'

export default defineConfig({
  plugins: [react()],
  resolve: {
    alias: {
      '@': path.resolve(__dirname, './src')
    }
  },
  server: {
    host: '0.0.0.0',  // Allows Docker to access dev server
    port: 5173,
  }
})

And update tsconfig.json:

File: frontend/tsconfig.json

1
2
3
4
5
6
7
8
9
{
  "compilerOptions": {
    // ... existing config ...
    "baseUrl": ".",
    "paths": {
      "@/*": ["./src/*"]
    }
  }
}

Note

The server.host configuration is important for Docker. It makes Vite accessible from outside the container. We’ll use this when we set up Docker Compose next.

Start the development server:

`1`	`npm run dev`

You’ll see:

1
2
3
4
5
  ROLLDOWN-VITE v7.1.14  ready in 332 ms

  ➜  Local:   http://localhost:5173/
  ➜  Network: use --host to expose
  ➜  press h + enter to show help

Open http://localhost:5173 and you’ll see the default Vite + React landing page with the spinning Vite logo. Not exciting yet, but notice how fast that startup was. On a comparable CRA project, you’d still be waiting for webpack to bundle.

That’s it for the frontend in Part 1! We have a working dev server with hot module replacement, TypeScript support, and path aliases configured. In Part 5, we’ll come back and build out the full agent UI with components, state management, SSE streaming, and all the bells and whistles.

Additional Resources

Vite Documentation – Official guide and configuration options
Vite + React Plugin – React-specific Vite features
TypeScript Handbook – Learning TypeScript

Creating Dockerfiles

TLDR

Backend image: Python slim, uv install, non-root user, multi-stage to keep it lean.
Frontend image: Node base, install deps, build static assets ready for nginx/preview.
Local dev still uses Vite/Uvicorn directly; containers shine for CI and prod.

Skip Ahead

If you know:

Python slim multi-stage + uv basics
Frontend builds static assets
Dev uses Uvicorn/Vite; containers for CI/prod

Skip to Docker Compose →

Before we can use Docker Compose, we need Dockerfiles for our backend and frontend.

Backend Dockerfile

File: backend/Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
FROM python:3.11-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    curl \
 && rm -rf /var/lib/apt/lists/*

COPY pyproject.toml uv.lock ./

RUN pip install --no-cache-dir uv \
 && uv sync --frozen --no-dev

COPY app/ ./app/

EXPOSE 8000

CMD ["uv", "run", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Let me break down what’s happening here:

Base image choice: File: backend/Dockerfile (excerpt)

1
FROM python:3.11-slim

We use python:3.11-slim instead of the full python:3.11 image. The slim variant is much smaller (100MB vs 900MB) because it excludes unnecessary build tools and libraries. This means faster builds, faster deployments, and lower storage costs.

System dependencies: File: backend/Dockerfile (excerpt)

1
2
3
4
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    curl \
 && rm -rf /var/lib/apt/lists/*

gcc: Required by Python deps that compile C extensions (asyncpg, uvloop, etc.).
curl: Handy for container health checks and debugging.
--no-install-recommends: Keep the image lean by skipping optional packages.
rm -rf /var/lib/apt/lists/*: Cleans up apt cache to keep the image small.

Dependency caching: File: backend/Dockerfile (excerpt)

1
2
3
4
COPY pyproject.toml uv.lock ./
RUN pip install --no-cache-dir uv \
 && uv sync --frozen --no-dev
COPY app/ ./app/

This order is critical for Docker layer caching. Docker caches each instruction as a layer. If nothing changes in a layer, Docker reuses the cached layer instead of rebuilding.

By copying pyproject.toml and uv.lock first, we ensure that layer is cached. When you change application code (which happens constantly), Docker only rebuilds the COPY app/ layer and later layers—not the expensive dependency installation layer. The --frozen flag guarantees the image respects the exact versions locked in git, which is the whole point of using uv.

Production optimization: File: backend/Dockerfile (excerpt)

1
RUN uv sync --no-dev

The --no-dev flag skips development dependencies (pytest, ruff, mypy). In production, you don’t need testing or linting tools—only the code needed to run the app. This keeps the image smaller and more secure.

Note

In development, we override this CMD in docker-compose.yml to add the --reload flag. This way, the same Dockerfile works for both dev and prod—we just change the command at runtime.

Frontend Dockerfile

File: frontend/Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
FROM node:20-alpine

# Set working directory
WORKDIR /app

# Copy dependency files first (for better caching)
COPY package*.json ./

# Install dependencies
RUN npm ci

# Copy application code
COPY . .

# Build the application (for production)
RUN npm run build

# Expose port
EXPOSE 5173

# In development, we override this with the dev server
CMD ["npm", "run", "preview", "--", "--host", "--port", "5173"]

Preview defaults to port 4173, so explicitly pinning it to 5173 keeps the Dockerfile aligned with the port we expose. That way docker run works even without Compose overriding the command.

Breaking this down:

Alpine base: File: frontend/Dockerfile (excerpt)

1
FROM node:20-alpine

Alpine Linux is a minimal distribution designed for containers. node:20-alpine is ~120MB compared to ~1GB for the full node:20 image. Alpine uses musl libc instead of glibc, which is lighter weight.

npm ci vs npm install: File: frontend/Dockerfile (excerpt)

1
RUN npm ci

npm ci (clean install) is faster and more reliable than npm install in CI/CD and containers:

Deletes node_modules before installing (ensures clean state)
Installs exact versions from package-lock.json (reproducible builds)
Fails if package.json and package-lock.json are out of sync
2-3x faster than npm install

Build step: File: frontend/Dockerfile (excerpt)

1
RUN npm run build

This compiles TypeScript, bundles with Vite, and optimizes assets. The result goes in dist/. In production, you’d serve this dist/ folder with nginx or a CDN. In development, we override the CMD to run npm run dev instead.

Development vs production:

The Dockerfile is written for production (build artifacts, optimized bundles). In docker-compose.yml, we override the command for development:

File: infra/docker-compose.yml (excerpt)

1
2
3
# infra/docker-compose.yml (excerpt)
frontend:
  command: npm run dev -- --host 0.0.0.0

This runs the Vite dev server instead of serving the build output.

Tip

Multi-stage builds for production: In a real production setup, you’d use a multi-stage Dockerfile for the frontend:

File: frontend/Dockerfile (multi-stage example)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

This creates a tiny final image (20MB) with just nginx and your built assets. The node installation and source code are discarded after the build. We’ll cover this pattern in Part 6 (Deployment).

Testing the Dockerfiles

Before using Docker Compose, verify the Dockerfiles work individually:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Build backend image
cd backend
docker build -t agent-backend .

# Build frontend image
cd ../frontend
docker build -t agent-frontend .

# Verify images were created
docker images | grep agent

You should see both images listed with their sizes. If the build fails, check for:

Typos in Dockerfile commands
Missing files (make sure pyproject.toml, package.json exist)
Network issues (Docker needs to download base images and dependencies)

Warning

Common Dockerfile mistakes to avoid:

Not using .dockerignore: Create a .dockerignore file to exclude unnecessary files from the build context: File: backend/.dockerignore
1 2 3 4 5 6
__pycache__ *.pyc .venv .pytest_cache .mypy_cache .ruff_cache
File: frontend/.dockerignore
1 2 3
node_modules dist .vite
Without this, Docker copies everything to the build context, slowing builds and potentially including secrets.
Running as root: For production, you should create a non-root user in the Dockerfile. We’ll cover this in Part 6.
Installing dependencies every time: Always copy dependency files (pyproject.toml, package.json) before copying source code. This leverages Docker’s layer caching.

Now that we have Dockerfiles, we’re ready to orchestrate all three services with Docker Compose.

Additional Resources

Dockerfile Best Practices – Official Docker guide
Multi-stage Builds – Optimizing production images
Docker Layer Caching – Understanding how caching works

Docker Compose: Orchestrating the Full Stack

TLDR

Compose spins up Postgres, backend, and frontend with shared env + volumes.
Hot-reload mounts let you develop inside containers without rebuilding.
One make up gives the full experience your teammates and CI will run.

Skip Ahead

If you know:

Compose services, health checks, volumes
Hot-reload mounts for dev
One-command orchestration (make up)

Skip to Configuration →

This is where everything comes together. We’ve set up the backend (FastAPI + Python), the frontend (Vite + React), and now we’re going to run them together with Docker Compose. This is the secret sauce that eliminates “works on my machine” problems and makes onboarding new developers trivial.

Why Docker Compose?

You could run each service manually: start Postgres in one terminal, start the backend in another, start the frontend in a third. But that’s annoying, error-prone, and hard to document. Docker Compose lets you define all services in one file and start them with a single command.

More importantly, it ensures consistency. The same Docker images you use locally are what you deploy to production (with different environment variables). No subtle differences between dev and prod. No “but it worked on my laptop” debugging sessions.

Here’s our complete docker-compose.yml:

File: infra/docker-compose.yml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: agent_stack
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  backend:
    build:
      context: ../backend
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes:
      - ../backend/app:/app/app:ro  # Hot-reload for development
    depends_on:
      db:
        condition: service_healthy   # Wait for DB to be ready
    environment:
      - DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/agent_stack
      - ENV=dev
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      timeout: 3s
      retries: 5
    command: uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

  frontend:
    build:
      context: ../frontend
      dockerfile: Dockerfile
    ports:
      - "5173:5173"
    volumes:
      - ../frontend:/app  # Mount source for hot-reload
      - node_modules:/app/node_modules  # Keep deps inside container
    depends_on:
      - backend
    environment:
      - VITE_API_URL=http://localhost:8000
    command: sh -lc "npm ci && npm run dev -- --host 0.0.0.0"

volumes:
  postgres_data:
  node_modules:

Let me break down the critical pieces: If you want an extra safety net, sprinkle restart: unless-stopped on the services. Even in dev it helps containers bounce back when Postgres restarts or you tweak network settings.

1. Database Service (Postgres)

File: infra/docker-compose.yml (excerpt)

1
2
db:
  image: postgres:16-alpine

We’re using the alpine variant of Postgres because it’s tiny (50MB vs 300MB for the full image). This matters when you’re pulling images in CI or deploying to cloud providers that charge for bandwidth.

File: infra/docker-compose.yml (excerpt)

1
2
3
4
5
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]
  interval: 5s
  timeout: 5s
  retries: 5

This is crucial. Without a health check, Docker Compose considers the database “ready” as soon as the container starts. But Postgres takes a few seconds to initialize. If the backend tries to connect during those seconds, it crashes with “connection refused.”

The health check runs pg_isready every 5 seconds. Only when it succeeds does Docker Compose start the backend service. This prevents race conditions.

File: infra/docker-compose.yml (excerpt)

1
2
volumes:
  - postgres_data:/var/lib/postgresql/data

This is a named volume. It persists database data between container restarts. Without this, every time you run docker compose down, you’d lose all your data. Named volumes live outside containers and survive restarts.

Note

Named volumes are stored in Docker’s internal directory (usually /var/lib/docker/volumes on Linux). You can list them with docker volume ls and inspect them with docker volume inspect postgres_data. To completely reset your database, run docker compose down -v (the -v flag removes volumes).

2. Backend Service (FastAPI)

File: infra/docker-compose.yml (excerpt)

1
2
3
4
backend:
  build:
    context: ../backend
    dockerfile: Dockerfile

This tells Docker to build an image from the backend/Dockerfile. During development, this build only happens once (or when you change dependencies). The actual source code is mounted as a volume (see below), so code changes don’t require rebuilding.

File: infra/docker-compose.yml (excerpt)

1
2
volumes:
  - ../backend/app:/app/app:ro

This is the magic of hot-reload. We mount the local backend/app directory into the container at /app/app. The :ro flag makes it read-only (security best practice).

When you change a Python file locally, uvicorn detects the change and reloads automatically. No rebuild, no restart. Just save and refresh.

File: infra/docker-compose.yml (excerpt)

1
2
3
depends_on:
  db:
    condition: service_healthy

This is smarter than a basic depends_on. It doesn’t just wait for the database container to start — it waits for the health check to pass. This eliminates the race condition where the backend starts before Postgres is ready to accept connections.

File: infra/docker-compose.yml (excerpt)

1
2
3
4
5
6
7
8
9
environment:
  - DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/agent_stack
  - ENV=dev
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 10s
  timeout: 3s
  retries: 5
command: uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Notice the hostname: db. In Docker Compose, services can reach each other by service name. From the backend’s perspective, the database is at db:5432, not localhost:5432 (because they’re in separate containers).

The backend health check hits a lightweight /health endpoint so Compose can restart the container if it ever wedges. It’s the same endpoint we’ll expose publicly for uptime monitoring later.

The command overrides the Dockerfile’s CMD for development. We use uv run to execute uvicorn within uv’s managed environment—this is crucial because uv installs packages in a virtual environment. The --reload flag enables hot-reloading during development.

3. Frontend Service (Vite)

File: infra/docker-compose.yml (excerpt)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
frontend:
  build:
    context: ../frontend
    dockerfile: Dockerfile
  volumes:
    - ../frontend:/app
    - node_modules:/app/node_modules
  environment:
    - VITE_API_URL=http://localhost:8000
  command: sh -lc "npm ci && npm run dev -- --host 0.0.0.0"

Same pattern as the backend: build once, mount source code for hot-reload. When you change a React component, Vite’s HMR kicks in and updates the browser instantly.

The named node_modules volume keeps dependencies inside the container so they aren’t blown away when the bind-mount overlays /app. Each docker compose up starts with npm ci, which is fast thanks to Docker’s layer cache and guarantees modules match package-lock.json.

The command overrides the Dockerfile’s CMD to run the Vite dev server instead of serving the production build. The --host 0.0.0.0 flag makes Vite accessible from outside the container (necessary for Docker). sh -lc lets us chain npm ci and npm run dev in one go.

The VITE_API_URL environment variable tells the frontend where the backend API lives. In production, you’d set this to your actual API domain (e.g., https://api.yourdomain.com). In development, it’s localhost:8000.

Note

Vite requires environment variables to be prefixed with VITE_ to expose them to the browser. Any variable without this prefix is only available during the build, not in runtime code.

Starting everything:

1
2
cd infra
docker compose up

You’ll see logs from all three services interleaved:

1
2
3
4
5
6
7
8
db-1        | 2025-10-29 02:29:56.540 UTC [29] LOG:  database system was shut down at 2025-10-29 02:29:50 UTC
db-1        | 2025-10-29 02:29:56.543 UTC [1] LOG:  database system is ready to accept connections
frontend-1  |   ROLLDOWN-VITE v7.1.14  ready in 138 ms
frontend-1  |
frontend-1  |   ➜  Local:   http://localhost:5173/
frontend-1  |   ➜  Network: http://172.22.0.4:5173/
backend-1   | INFO:     Waiting for application startup.
backend-1   | INFO:     Application startup complete.

Three services, one command. That’s the developer experience we’re aiming for.

Warning

On first run, Docker will download base images for Postgres (alpine), Python, and Node. This can take 2-10 minutes depending on your connection. Subsequent runs are instant because images are cached locally. Don’t panic if the first run takes a while!

Dev/Prod Parity: Why This Matters

One of the Twelve-Factor App principles is “dev/prod parity”—keep development and production as similar as possible. Docker Compose achieves this:

Same database: You’re using real Postgres locally, not SQLite. No “works in dev, breaks in prod” surprises from database quirks.
Same networking: Services talk to each other over Docker’s internal network, just like they will in production (via service mesh or internal DNS).
Same environment variables: The backend reads DATABASE_URL from the environment, whether it’s Docker Compose locally or Kubernetes in production.

When you deploy, you’re not crossing your fingers hoping everything works. You’re deploying the same containers you’ve been running locally for weeks. The only difference is environment variables (prod database URL, prod API keys, etc.).

Additional Resources

Docker Compose Documentation – Complete reference for all Compose features
Docker Networking – How containers communicate
Twelve-Factor App – Methodology for building modern web applications
Docker Compose in Production – Best practices for deploying with Compose

Troubleshooting common issues:

“Port 5432 is already in use”: You have Postgres running locally. Either stop it (brew services stop postgresql on Mac) or change the port mapping in docker-compose.yml to 5433:5432.

Backend can’t connect to database: Check that the health check is passing with docker compose ps. If the database is “unhealthy,” something’s wrong with Postgres startup. Check logs with docker compose logs db.

Hot-reload not working: Make sure the volume mounts are correct. Run docker compose config to see the resolved configuration. The paths should match your local directory structure.

“Cannot connect to Docker daemon”: Docker Desktop isn’t running. Start it and try again.

Configuration That Makes Sense

TLDR

Use .env files for secrets + per-environment values, never hard-code in code.
Settings classes (Pydantic) validate and document required configuration.
Compose + Make targets ensure backend/frontend share the same env contract.

Skip Ahead

If you know:

.env per environment, no hard-coded secrets
Pydantic Settings for validated config
Shared env contract via Compose + Make

Skip to Developer Experience →

Configuration is one of those things that seems simple at first but becomes a nightmare if you don’t set it up properly. I’ve seen too many projects where config is scattered across environment variables, YAML files, hardcoded constants, and command-line flags. Debugging “why does this behave differently in staging?” becomes an archaeological expedition.

We’re using Pydantic Settings to centralize all configuration in one type-safe place. This isn’t just about convenience—it’s about catching errors before they reach production.

Why Pydantic Settings over environment variables or config files?

Most projects use one of these approaches:

Raw os.environ: No validation, no type safety, missing variables cause runtime errors deep in the code
python-decouple or similar: Better than raw environ but still string-based, no nested config support
YAML/JSON files: Great for complex config but no type safety, easy to typo a key
Dotenv only: Simple but no validation, everything is a string

Pydantic Settings combines the best parts of all these approaches:

Type-safe: Define config as a typed class, get IDE autocomplete and mypy validation
Validated on startup: App crashes immediately if config is invalid, with clear error messages
Environment variable support: Reads from .env files or actual environment variables
Nested config: Support complex structures like database pools, API rate limits, etc.
Multiple sources: Can read from files, env vars, and defaults with clear precedence

Here’s our complete settings module:

File: backend/app/core/settings.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
from typing import Literal

from pydantic import PostgresDsn, ValidationInfo, field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    """Application settings with validation and type safety."""

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False,  # DATABASE_URL or database_url both work
        extra="ignore",  # Ignore unknown env vars instead of failing
    )

    # Environment
    env: Literal["dev", "staging", "prod"] = "dev"
    debug: bool = False

    # API
    api_title: str = "Agent Stack Backend"
    api_version: str = "0.1.0"
    api_description: str = "Production-ready backend for OpenAI Agents SDK"

    # Database (required in all environments)
    database_url: PostgresDsn
    database_pool_size: int = 10
    database_max_overflow: int = 20

    # CORS - who can call our API
    cors_origins: list[str] = ["http://localhost:5173"]
    cors_allow_credentials: bool = True

    # Auth0 (we'll add this in Part 3)
    auth0_domain: str = ""
    auth0_audience: str = ""

    # OpenAI (we'll add this in Part 4)
    openai_api_key: str = ""
    openai_model: str = "gpt-5-nano"
    llm_timeout_seconds: int = 30

    # Rate limiting + HTTP clients
    rate_limit_requests_per_minute: int = 60
    rate_limit_tokens_per_minute: int = 100000
    http_client_timeout_seconds: int = 10

    @field_validator("database_url")
    @classmethod
    def validate_database_url(cls, v: PostgresDsn) -> str:
        """Ensure we're using PostgreSQL, not SQLite or MySQL."""
        if not str(v).startswith("postgresql"):
            raise ValueError(
                "DATABASE_URL must use PostgreSQL. "
                "For async support, use postgresql+asyncpg://"
            )
        return str(v)

    @field_validator("env")
    @classmethod
    def validate_env(cls, v: str) -> str:
        """Ensure environment is one of the allowed values."""
        allowed = {"dev", "staging", "prod"}
        if v not in allowed:
            raise ValueError(f"env must be one of {allowed}, got '{v}'")
        return v

    @field_validator("openai_api_key")
    @classmethod
    def validate_openai_key(cls, v: str, info: ValidationInfo) -> str:
        """In production, OpenAI key is required."""
        if info.data.get("env") == "prod" and not v:
            raise ValueError("OPENAI_API_KEY is required in production")
        return v

    @property
    def is_production(self) -> bool:
        """Convenience property for production checks."""
        return self.env == "prod"

    @property
    def is_development(self) -> bool:
        """Convenience property for development checks."""
        return self.env == "dev"


# Singleton instance - import this throughout the app
settings = Settings()  # type: ignore[call-arg]

Tip

Settings requires database_url at type-check time, so mypy thinks instantiating it without explicit arguments is invalid. Adding # type: ignore[call-arg] documents that configuration is provided via environment variables instead.

Let me break down the key pieces:

1. Type annotations with validation

File: backend/app/core/settings.py (excerpt)

1
2
database_url: PostgresDsn
env: Literal["dev", "staging", "prod"] = "dev"

PostgresDsn is a Pydantic type that validates the URL format. If you typo the URL, you get an error like “Invalid Postgres DSN: expected ‘postgresql://’, got ‘postgres://’” at startup. Yes, DSNs with drivers (e.g., postgresql+asyncpg://...) are supported in Pydantic v2—handy when you’re wiring SQLAlchemy’s async engine.

Literal["dev", "staging", "prod"] means the env field can only be one of these three values. Try to set it to “production” (not “prod”) and your IDE will show an error before you even run the code.

2. Field validators for custom logic

File: backend/app/core/settings.py (excerpt)

1
2
3
4
5
6
@field_validator("openai_api_key")
@classmethod
def validate_openai_key(cls, v: str, info) -> str:
    if info.data.get("env") == "prod" and not v:
        raise ValueError("OPENAI_API_KEY is required in production")
    return v

This is powerful: validation can depend on other fields. In development, missing an OpenAI key is fine (you might be working on the database layer). In production, it’s a fatal error that stops the app from starting.

3. Smart defaults and required fields

File: backend/app/core/settings.py (excerpt)

1
2
database_url: PostgresDsn  # No default = required
database_pool_size: int = 10  # Has default = optional

If database_url isn’t set, Pydantic raises an error immediately:

1
2
3
ValidationError: 1 validation error for Settings
database_url
  Field required [type=missing, input_value={...}]

This is way better than getting a runtime error 10 minutes into testing when you try to connect to the database.

4. Environment variable mapping

Pydantic automatically maps environment variables to fields:

DATABASE_URL in .env -> settings.database_url in Python
OPENAI_API_KEY -> settings.openai_api_key
ENV -> settings.env

The case_sensitive=False setting means database_url, DATABASE_URL, and Database_Url all work. This is convenient but can be disabled if you want strict naming.

Creating the .env file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# backend/.env
ENV=dev
DEBUG=true

# Database
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/agent_stack
DATABASE_POOL_SIZE=10
DATABASE_MAX_OVERFLOW=20

# CORS - allow frontend to call API
CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]

# Auth0 (add these later when we implement auth)
AUTH0_DOMAIN=your-tenant.auth0.com
AUTH0_AUDIENCE=https://your-api.com

# OpenAI (add this when we implement agents)
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-nano

Tip

The .env file should never be committed to git. Add it to .gitignore immediately:

1
echo ".env" >> .gitignore

Create a .env.example file with placeholder values so new developers know what variables are needed:

File: backend/.env.example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
ENV=dev
DEBUG=true
DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/agent_stack
CORS_ORIGINS=["http://localhost:5173"]
CORS_ALLOW_CREDENTIALS=true
AUTH0_DOMAIN=<AUTH0_DOMAIN>
AUTH0_AUDIENCE=<AUTH0_AUDIENCE>
OPENAI_API_KEY=<OPENAI_API_KEY>
OPENAI_MODEL=gpt-5-nano
LLM_TIMEOUT_SECONDS=30
HTTP_CLIENT_TIMEOUT_SECONDS=10
RATE_LIMIT_REQUESTS_PER_MINUTE=60
RATE_LIMIT_TOKENS_PER_MINUTE=100000

Using settings throughout the app:

File: backend/app/main.py (excerpt)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from app.core.settings import settings


@app.get("/health")
async def health() -> dict[str, str | bool]:
    return {
        "status": "ok",
        "env": settings.env,
        "debug": settings.is_development,
    }

We’ll wire settings.database_url into our async database engine in the next post when we build out the persistence layer.

Environment-specific configuration:

In production, you’d override settings via environment variables (not .env files):

1
2
3
4
5
# Production environment (Kubernetes, ECS, etc.)
export ENV=prod
export DATABASE_URL=postgresql+asyncpg://user:pass@prod-db.example.com:5432/agent_stack
export OPENAI_API_KEY=sk-prod-key-from-secrets-manager
export CORS_ORIGINS='["https://app.example.com"]'

The same Settings class works in all environments — you just change the source of the values.

Note

For production secrets (API keys, database passwords), never use .env files. Use a secrets manager like AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault. Your deployment script fetches secrets and sets them as environment variables. Pydantic Settings reads them the same way it reads .env files — the code doesn’t change.

Why this matters for agent applications:

Agent applications have a lot of knobs to tune: model selection, token limits, rate limiting, database connection pools, API keys for multiple providers. Centralizing config in a type-safe class means:

Easier debugging: When something behaves differently in staging, check settings.env and settings.openai_model instead of grepping for environment variable accesses
Safer deploys: If you forget to set OPENAI_API_KEY in production, the app crashes on startup (before serving any traffic) instead of failing the first time a user tries to chat

Better testability: In tests, you can override settings easily: File: tests/conftest.py (example)

1
2
3
4
5
6
7
@pytest.fixture
def test_settings():
    return Settings(
        env="dev",
        database_url="postgresql+asyncpg://test:test@localhost:5432/test_db",
        openai_api_key="sk-test-key"
    )

Using asyncpg for database connections:

We specified postgresql+asyncpg:// in the database URL. Why asyncpg specifically?

Fastest Postgres driver for Python: Benchmarks show it’s 3-5x faster than psycopg2
Native async support: Built for asyncio from the ground up (unlike psycopg2 which added async later)
Type-safe: Uses Python’s type system for query parameters
Connection pooling: Built-in connection pool management

When you’re streaming agent responses and handling multiple concurrent sessions, database performance matters. asyncpg ensures database queries don’t become the bottleneck.

Additional Resources

Pydantic Settings Documentation – Complete guide to all features
Twelve-Factor App: Config – Why config should live in environment variables
asyncpg Documentation – High-performance async Postgres driver
Environment Variables Best Practices – Security and management tips

Developer Experience: The Makefile

TLDR

Make targets wrap uv/npm/docker commands so onboarding is make dev, make lint, make test.
CI can reuse the same targets for consistency between laptops and pipelines.
Document commands in one place; contributors don’t have to memorize long invocations.

Here’s a problem I’ve seen on every project: each developer has their own set of commands they memorized. One person runs tests with pytest, another uses uv run pytest, a third uses python -m pytest. Someone remembers that you need to be in the backend/ directory, someone else doesn’t. Six months later, nobody remembers the exact incantation to run database migrations.

The solution: standardize everything in a Makefile. Make is old (1976!), ubiquitous (comes with every Unix system), and perfect for this job. It’s not just a build tool—it’s a command runner and documentation system.

Why Make over npm scripts or custom shell scripts?

npm scripts: Great if your whole project is Node, awkward when you have Python backend + React frontend + Docker + infrastructure
Shell scripts: Work but require careful path handling and error checking, no built-in dependency between tasks
Task runners like Task or Just: Modern and nice, but not installed by default. Make is already there.

Make gives you:

Self-documenting commands: Run make or make help to see all available commands
Task dependencies: “Run tests only after linting passes”
Consistent working directory: No more “which folder am I in?” confusion
Cross-platform (mostly): Works on Linux, macOS, and WSL2

Here’s our Makefile for Part 1:

File: Makefile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Makefile
.PHONY: help up down logs logs-backend logs-frontend logs-db clean

# Default target - show help
help:
	@echo "Available commands:"
	@echo "  make dev          - Start all services in development mode"
	@echo "  make up           - Start all services (with logs)"
	@echo "  make down         - Stop all services"
	@echo "  make logs         - View logs from all services"
	@echo "  make logs-backend - Tail backend logs only"
	@echo "  make logs-frontend - Tail frontend logs only"
	@echo "  make logs-db      - Tail Postgres logs"
	@echo "  make clean        - Remove caches and temporary files"

# Start services in detached mode (background)
dev:
	@echo "Starting all services..."
	cd infra && docker compose up -d
	@echo ""
	@echo "✓ Services started!"
	@echo "  Backend:  http://localhost:8000"
	@echo "  Frontend: http://localhost:5173"
	@echo "  API Docs: http://localhost:8000/docs"
	@echo ""
	@echo "Run 'make logs' to view logs"

# Start services with logs visible
up:
	@echo "Starting all services..."
	cd infra && docker compose up

# Stop all services
down:
	@echo "Stopping all services..."
	cd infra && docker compose down

# View logs from all services
logs:
	cd infra && docker compose logs -f

logs-backend:
	cd infra && docker compose logs -f backend

logs-frontend:
	cd infra && docker compose logs -f frontend

logs-db:
	cd infra && docker compose logs -f db

# Clean up Python cache files
clean:
	@echo "Cleaning up caches and temporary files..."
	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
	find . -type d -name ".pytest_cache" -exec rm -rf {} + 2>/dev/null || true
	find . -type d -name ".mypy_cache" -exec rm -rf {} + 2>/dev/null || true
	find . -type d -name ".ruff_cache" -exec rm -rf {} + 2>/dev/null || true
	find . -type f -name "*.pyc" -delete 2>/dev/null || true
	@echo "✓ Cleanup complete!"

Add these code-quality commands to the bottom of your Makefile so the team runs the same tools everywhere:

File: Makefile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
.PHONY: lint format typecheck

lint:
	cd backend && uv run ruff check app

format:
	cd backend && uv run ruff format app

typecheck:
	cd backend && uv run mypy app

Let me break down what each command does:

make dev - Your daily driver: File: Makefile

1
2
3
4
5
dev:
	cd infra && docker compose up -d
	@echo "✓ Services started!"
	@echo "  Backend:  http://localhost:8000"
	# ...

Starts all three services in detached mode (background) and shows you the URLs. This is what you’ll run every morning. The -d flag means services run in the background, so you get your terminal back.

make up - When you want to see logs: File: Makefile

1
2
up:
	cd infra && docker compose up

Starts services in the foreground, showing logs from all three services. Useful when you’re debugging and want to see what’s happening. Press Ctrl+C to stop.

make down - Stop everything: File: Makefile

1
2
down:
	cd infra && docker compose down

Stops all containers and removes them. The database volume persists, so you don’t lose data.

make logs - View live logs: File: Makefile

1
2
logs:
        cd infra && docker compose logs -f

Attaches to logs from all running services. The -f flag means “follow” (like tail -f). Press Ctrl+C to exit.

Want service-specific logs? Use make logs-backend, make logs-frontend, or make logs-db to focus on a single container—hugely useful when debugging streaming output without the frontend noise.

make clean - Remove clutter: File: Makefile

1
2
3
clean:
	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
	# ...

Deletes Python cache files that accumulate during development. Run this occasionally to free up space.

Your typical workflow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Morning: Start everything
make dev

# Check if backend started correctly
curl http://localhost:8000/health

# View logs to debug an issue
make logs

# Evening: Stop everything
make down

# Occasional: Clean up cache files
make clean

Why this matters:

One source of truth: Instead of remembering “was it docker compose or docker-compose?”, you just run make dev
New developer friendly: Someone clones the repo, runs make, sees all commands. No hunting through README files
Works everywhere: Same commands on Mac, Linux, and WSL2
Easy to extend: As we add features in later parts (tests, migrations, deployments), we’ll add more make targets

Tip

For Windows developers: If you don’t have Make installed, you have a few options:

WSL2 (recommended): Full Linux environment, Make works perfectly, this is how I use it on my Desktop
Chocolatey: choco install make installs GNU Make on Windows
Git Bash: Recent versions include Make
Just run the commands: Look inside the Makefile and run the docker compose commands directly

But seriously, just go with WSL.

What we’ll add in later parts:

This is a minimal Makefile for Part 1. As we progress through the series, we’ll add more commands!

For now, these eight commands are all we need to work with our foundation.

Additional Resources

GNU Make Documentation – Complete reference
Makefile Tutorial – Beginner-friendly guide with examples

What We’ve Built

TLDR

Repo skeleton with backend, frontend, infra, and tests wired for production workflows.
Tooling stack (uv, ruff, mypy, Vite, Docker, Compose, Make) already productive.
Running health check + Vite dev server prove the foundation before agents arrive.

Take a moment to appreciate what we’ve accomplished. This isn’t just “hello world” — this is a production-grade foundation that most teams spend weeks refining. Let’s inventory what we have:

Infrastructure & DevOps:

Three-service architecture running with one command (make dev)
Docker Compose with health checks, volume mounts, and service dependencies
Hot-reload everywhere: Python with uvicorn watch, React with Vite HMR
Named volumes for persistent database storage
Dev/prod parity: same containers locally and in production

Backend (Python + FastAPI):

Async-first FastAPI application with automatic OpenAPI docs
Type-safe configuration using Pydantic Settings with validation
uv for dependency management (10-100x faster than pip)
Ruff for linting and formatting (replaces black, flake8, isort)
mypy in strict mode catching type errors before runtime
asyncpg for Postgres (fastest async driver available)

Frontend (React + TypeScript):

Vite for blazing-fast dev server (sub-2-second startup)
TypeScript in strict mode with path aliases configured
Clean folder structure anticipating SSE, auth, and agent UI
Type-safe API client ready to match backend Pydantic models

Developer Experience:

Makefile with standard commands for all common tasks
Self-documenting (run make to see all commands)
Consistent workflow across all team members
CI-ready (same commands work in GitHub Actions, GitLab CI, etc.)

Testing & Quality:

Test structure ready for unit, integration, and e2e tests
Coverage reporting configured with pytest
Lint and typecheck commands for pre-commit hooks
Quality gates that fail fast with clear error messages

Try it now:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Start everything
make dev

# In another terminal, check the health endpoint
curl http://localhost:8000/health
# {"status":"ok","env":"dev","debug":true}

# Visit the API docs
# macOS
open http://localhost:8000/docs
# Linux
xdg-open http://localhost:8000/docs
# Windows (PowerShell)
start http://localhost:8000/docs

# Visit the frontend (pick the same OS-specific opener)
# macOS
open http://localhost:5173
# Linux
xdg-open http://localhost:5173
# Windows (PowerShell)
start http://localhost:5173

Prefer an automated smoke test? Run the helper script once services are up:

File: scripts/smoke_e2e.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/usr/bin/env bash
set -euo pipefail

backend_url=${BACKEND_URL:-http://localhost:8000/health}
frontend_url=${FRONTEND_URL:-http://localhost:5173}

printf "[*] Checking backend (%s)..." "$backend_url"
curl -fsS "$backend_url" >/dev/null
printf " ok\n"

printf "[*] Checking frontend (%s)..." "$frontend_url"
curl -fsS "$frontend_url" >/dev/null
printf " ok\n"

echo "[ok] Smoke test passed. Backend + frontend responded successfully."

The frontend is still the default Vite landing page, and the backend has one endpoint. But you have:

Type safety enforced from database to browser
Configuration validated on startup
Services orchestrated with proper dependencies
Developer workflow standardized
Hot-reload for instant feedback

Tip

Checkpoint: Before moving on, make sure everything works:

Run make dev and wait for all services to start
Visit http://localhost:8000/docs and see the interactive API docs
Visit http://localhost:5173 and see the React app
Run make logs-backend in another terminal and see live logs
Change a file in backend/app/main.py, save, and watch uvicorn reload

If any of these fail, check the troubleshooting sections in each setup step above. The foundation must be solid before we build on it.

This post builds clean on a fresh machine using only the commands listed above—no hidden prerequisites.

What’s Next?

Part 2: Backend Core & Database - FastAPI routing, async SQLAlchemy, Alembic migrations, Repository pattern

Part 3: Authentication & Security - Auth0 integration, JWT validation, session cookies for SSE, CORS

Part 4: Agent Integration & Streaming - OpenAI Agents SDK, SSE streaming, tool calling, session memory

Part 5: Frontend & User Interface - React SSE client, chat UI, session management, markdown rendering

Part 6: Credits, Limits & Usage Tracking - Token-based credits, rate limiting, usage analytics

Part 7: Observability & Tracing - Structured logging, OpenAI Traces, Arize Phoenix integration

Part 8: Production Deployment - Terraform, GitHub Actions CI/CD, zero-downtime deployments

Additional Resources

Further Reading on Topics Covered Today:

FastAPI Documentation – Official docs with excellent examples
uv Documentation – Modern Python packaging and dependency management
Vite Guide – Fast frontend tooling and build configuration
Docker Compose Docs – Multi-container orchestration
Pydantic Settings – Type-safe configuration management
Twelve-Factor App – Methodology for building production apps

I’ve received these questions from readers who’ve followed along or are evaluating whether this template fits their needs. Let me address them:

Architecture & Design Decisions

Q: Why Postgres instead of SQLite?

Dev/prod parity. SQLite is fantastic for single-user applications and prototyping, but it has fundamental limitations for production web applications:

Single-writer concurrency model: Only one process can write at a time. In production, you’ll have multiple API servers (for high availability) all writing to the database. Postgres handles concurrent writes beautifully; SQLite doesn’t.
No network access: SQLite is a file. In a containerized deployment (Kubernetes, ECS), each container would need its own file, meaning no shared state. Postgres is a network service that all containers can access.
Different SQL dialect: SQLite has quirks (JSON support, date handling, foreign keys) that differ from Postgres. Using SQLite in dev and Postgres in prod is asking for “works on my machine” surprises.

Bottom line: Use the same database in development and production. Postgres is production-grade, open-source, and runs great in Docker. There’s no reason not to use it from day one.

Q: Why not just use Next.js?

I love Next.js for content-heavy sites, blogs, and marketing pages. It’s fantastic at server-side rendering and SEO. But we’re building an agent application, not a content site, and the requirements are different:

Why FastAPI + React beats Next.js here:

Streaming: We need Server-Sent Events for streaming agent responses. Next.js API routes can do SSE, but it’s awkward—you’re fighting the framework’s assumption that responses finish quickly.
Python ecosystem: The OpenAI Agents SDK is Python. FastAPI is Python. Your agent logic, tools, and database queries are all Python. Keeping everything in one language simplifies debugging and reduces context switching.
Async-first: Agent conversations are I/O-heavy (waiting for LLM responses, database queries). FastAPI’s async support is first-class. Next.js API routes bolt on async, but the ecosystem (middleware, libraries) isn’t designed around it.
Separation of concerns: A separate frontend (React SPA) and backend (FastAPI) means you can deploy them independently. Deploy the frontend to a CDN (instant global distribution), scale the backend horizontally. With Next.js, they’re coupled.
OpenAPI docs: FastAPI auto-generates interactive API documentation. This is invaluable when debugging agent runs or integrating with third-party services.

When to use Next.js instead: If your app is content-heavy (blog, documentation, marketing site) or you need SEO for public pages, Next.js is great. But for internal tools, agent UIs, and API-driven applications, FastAPI + React is cleaner.

Q: Is this overkill for a simple chatbot?

Fair question. If you just want to experiment with LLMs for an afternoon, yes, this is overkill. Use the OpenAI Playground, a simple script, or a Jupyter notebook.

But if your goal is to build something that:

Real users can sign up for and use
Tracks usage for billing or debugging
Streams responses in real-time (not “wait 30 seconds, then show the full response”)
Remembers conversation history across sessions
Can be deployed to production without scrambling to add auth, monitoring, and error handling later

Then this structure saves you time. We’re building the scaffolding once—properly—so we don’t have to refactor later when you realize “oh, I need authentication now” or “oh, I should have used Postgres from the start.”

The pattern I’ve seen countless times:

“Let me just hack together a quick chatbot” (2 hours)
“Oh, I need to add auth” (6 hours of refactoring)
“Oh, I need streaming” (another 4 hours)
“Oh, I need to deploy this” (8 hours fighting with Docker and environment variables)
“Oh, I should have done this right from the start” (rewrite from scratch)

This template short-circuits that cycle. Start with the right structure, and adding features is straightforward because the architecture supports them.

Q: Can I use a different auth provider?

Absolutely. We use Auth0 in this series because it’s popular, has a generous free tier, and handles OAuth flows well. But the authentication patterns (JWT validation, middleware, session cookies for SSE) work with any OAuth2 provider:

Firebase Auth: Issues JWTs, same verification pattern
Supabase Auth: Issues JWTs, integrates with Supabase database
Clerk: Issues JWTs, great dev experience
AWS Cognito: Issues JWTs, integrates with AWS ecosystem
Keycloak: Self-hosted, full control
Custom auth: Roll your own with JWT signing

The key is that all modern auth providers issue JWTs. Your backend validates JWTs, extracts the user ID, and uses it to filter database queries. From there you can mint the short-lived SSE cookie we rely on in Part 3. The provider doesn’t matter—the pattern is the same.

We isolate auth logic in backend/app/core/auth.py. To swap providers, you change one file, not the entire architecture.

Q: Why Docker from day one? Isn’t it easier to just run things locally?

Short term: Yes, running Postgres, Python, and Node locally is simpler for the first 30 minutes.

Medium term (day 2+): Docker becomes easier. Here’s why:

Onboarding: New developer joins? They run make dev. Done. No “install Postgres, create a user, set up a database, install Python 3.11 (not 3.12!), create a virtualenv…”
Consistency: Everyone runs the same containers. No “works on my machine” because someone has Python 3.9 and someone else has 3.12.
Dev/prod parity: The containers you run locally are the same containers deployed to production (with different environment variables). No surprises.
Isolation: Projects don’t interfere with each other. One project uses Postgres 15, another uses Postgres 16? No problem. They’re separate containers.
Cleanup: Delete the project? docker compose down -v removes everything. No lingering Postgres installations or orphaned databases.

The alternative (local installations) is fine for solo projects you’ll finish in a weekend. For anything you’ll work on for more than a week or share with others, Docker is less friction.

Practical Concerns

Q: What if I want to use a different database (MongoDB, DynamoDB, etc.)?

The architecture supports this! The Repository pattern (which we implement in Part 2) isolates database access. To swap databases:

Keep the domain models (Python classes representing User, Session, Message)
Rewrite the repository implementations (how those models are saved/loaded)
Everything else (API endpoints, business logic) stays the same

This is the power of clean architecture: infrastructure (like databases) is swappable without touching business logic.

Q: Can I use this for non-agent applications?

Yes! This is a general-purpose full-stack template that happens to be optimized for agent applications. The patterns (clean architecture, type safety, Docker Compose, Makefiles) apply to any web application:

Internal tools
APIs for mobile apps
SaaS products
Admin dashboards
Data processing pipelines

The agent-specific parts (OpenAI Agents SDK integration, SSE streaming) are in Parts 4-5. If you don’t need those, just use Parts 1-3 as a template for any FastAPI + React application.

Resources and Community

Repository: github.com/bedirt/agents-sdk-prod-ready-template

Issues and questions: Open a GitHub issue or discussion. I try to respond within a day or two. Common issues usually have solutions in existing threads.

Comments and feedback: Please leave a comment below if you found this helpful, have suggestions, or want to share how you used the template. You can also send a suggestion using the “Suggest an Edit” link at the bottom of the page - which takes you to the GitHub repo issues.

A Final Reiteration

I built this template because I was tired of reinventing the wheel every time I started a new agent project. The first few times, I’d spend a week setting up Docker, configuring type checking, wiring authentication, and building deployment pipelines before writing a single line of agent logic.

This template encapsulates those weeks of setup. It’s the project structure I wish I had when I started building production agent applications.

My hope is that it saves you time and helps you focus on what matters: building great agent experiences for your users.

See you in Part 2, where we’ll add the database layer and start persisting chat sessions.

Next: Part 2 - Backend Core & Database (Coming Soon)

This is part of a series on building production-ready AI agent applications. All code is open source on GitHub.

Info

Enjoying this series? Star the GitHub repo, share it with your team, or send feedback. This template is a living project—contributions, suggestions, and questions are welcome.

Building a Production-Ready Agent Stack: Part 1 - The Foundation#

Why Another AI App Tutorial?#

Key Decisions#

The Big Picture: What Are We Building?#

For Your Users (The Experience Layer)#

For You (The Developer Experience)#

For Production (The Operational Reality)#

The Decision: Why This Structure?#

The Structure: How Agentic Applications Should Be Organized#

The Backend: Clean Architecture for Agents#

The Frontend: Simple and Focused#

Infrastructure and Testing#

Building the Foundation: The Setup#

Python Setup with uv#

Code Quality: Linting and Type Checking#

Running lint, formatting, and type checks#

FastAPI Boilerplate: Your First Endpoint#

Frontend with Vite: Modern React Development#

Creating Dockerfiles#

Backend Dockerfile#

Frontend Dockerfile#

Testing the Dockerfiles#

Docker Compose: Orchestrating the Full Stack#

Configuration That Makes Sense#

Developer Experience: The Makefile#

What We’ve Built#

What’s Next?#

Architecture & Design Decisions#

Practical Concerns#

Resources and Community#

A Final Reiteration#

Comments

Building a Production-Ready Agent Stack: Part 1 - The Foundation

Why Another AI App Tutorial?

Key Decisions

The Big Picture: What Are We Building?

For Your Users (The Experience Layer)

For You (The Developer Experience)

For Production (The Operational Reality)

The Decision: Why This Structure?

The Structure: How Agentic Applications Should Be Organized

The Backend: Clean Architecture for Agents

The Frontend: Simple and Focused

Infrastructure and Testing

Building the Foundation: The Setup

Python Setup with uv

Code Quality: Linting and Type Checking

Running lint, formatting, and type checks

FastAPI Boilerplate: Your First Endpoint

Frontend with Vite: Modern React Development

Creating Dockerfiles

Backend Dockerfile

Frontend Dockerfile

Testing the Dockerfiles

Docker Compose: Orchestrating the Full Stack

Configuration That Makes Sense

Developer Experience: The Makefile

What We’ve Built

What’s Next?

Architecture & Design Decisions

Practical Concerns

Resources and Community

A Final Reiteration