Building a Production-Ready Agent Stack: Part 1 - The Foundation
Welcome to the first post in a series where we build a real, production-ready AI agent application from scratch. No shortcuts, no toy examples - just the patterns you’d actually use in production.
Why Another AI App Tutorial?
Well, I know that you’ve probably seen a dozen “build an AI chatbot” tutorials by now. Most of them show you how to slap together a quick demo in an afternoon, and they’re great for that. But when you try to take that demo to production, things get… complicated. At least it was for me.
Where do you put authentication? How do you stream responses so users don’t stare at a loading spinner for 30 seconds? What about session memory? Rate limiting? Credits? Deployment?
This series tackles all of that, but with a specific goal: we’re building a template you can actually use. I mainly built this codebase for myself to avoid reinventing the wheel every time I start a new agent project, then decided to share with the community. I like to think of it as “production-ready starter kit” for AI agent applications.
This isn’t just a tutorial - it’s an opinionated, minimal-yet-complete starting point for production agent applications. Think of it as a scaffold that:
- Has all the production pieces in place (auth, streaming, persistence, deployment)
- Remains small enough to understand fully
- Takes a stance on how agentic applications should be structured
- Can be cloned and customized for your specific use case
We’re building an agent stack where:
- Users log in (securely, with Auth0)
- They chat with AI agents that retains context
- Responses stream in real-time, token by token, or tool calls are shown nicely as they do in research tools and ChatGPT etc. chatbots.
- Usage gets tracked and metered
- Everything runs in containers and deploys with one command
- You can debug what’s happening in production
Sound ambitious? It is. But we’ll build it piece by piece, and by the end you’ll understand not just how to build it, but why each piece exists and how it fits together.
Info
This template is opinionated by design. We’re not trying to support every possible architecture — we’re showing you one that works well for production agent applications. Once you understand the patterns, you can adapt them to your needs.
Today, we’re starting with the foundation: the project structure, development environment, and tooling that makes everything else possible. But most importantly we will discuss the decisions behind the structure and architecture.
The Big Picture: What Are We Building?
Before we dive into code, let’s talk about what this system looks like when it’s done. I want you to see the full picture first—not just the pieces, but how they fit together and why each one matters.
Here’s the stack we’re building:
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'16px'}}}%%
graph TB
User[User Browser]
Frontend["Frontend<br/><small>React + TypeScript + Vite</small>"]
Auth0["Auth0<br/><small>Authentication</small>"]
Backend["Backend API<br/><small>FastAPI + Python</small>"]
AgentSDK["Agents SDK<br/><small>Agent Orchestration</small>"]
DB[("Postgres<br/><small>Sessions & Messages</small>")]
OpenAI["LLM API<br/><small>OpenAI/LiteLLM</small>"]
User --> Frontend
Frontend --> Auth0
Frontend --> Backend
Backend --> AgentSDK
Backend --> DB
AgentSDK --> OpenAI
style User fill:#f0f4ff,stroke:#a5b4fc,stroke-width:2.5px,rx:15,ry:15,color:#1e293b
style Frontend fill:#dbeafe,stroke:#93c5fd,stroke-width:2.5px,rx:15,ry:15,color:#1e3a8a
style Auth0 fill:#fed7d7,stroke:#fca5a5,stroke-width:2.5px,rx:15,ry:15,color:#7f1d1d
style Backend fill:#d1fae5,stroke:#6ee7b7,stroke-width:2.5px,rx:15,ry:15,color:#065f46
style AgentSDK fill:#e9d5ff,stroke:#c084fc,stroke-width:2.5px,rx:15,ry:15,color:#581c87
style DB fill:#bfdbfe,stroke:#60a5fa,stroke-width:2.5px,rx:15,ry:15,color:#1e3a8a
style OpenAI fill:#ccfbf1,stroke:#5eead4,stroke-width:2.5px,rx:15,ry:15,color:#134e4a
Looks straightforward, right? Just four main components talking to each other. What matters isn’t just the components — it’s how they’re connected and what’s built around them.
Here’s what we’re building from three different perspectives:
For Your Users (The Experience Layer)
This is what people actually interact with:
- Secure authentication via Auth0 - no passwords to manage, no security headaches for you. Auth0 is pretty commonly used in production apps, so it’s already a solid choice. But also it has a generous free tier for small projects (up to 25,000 monthly active users for free). And if you want to swap it out later, the codebase is structured so you can replace Auth0 with another provider without massive rewrites or code tangles.
- Multiple chat sessions - users can organize conversations, switch between topics, keep context separate.
- Real-time streaming - responses appear token by token, just like ChatGPT. Tool calls show up as they happen and are rendered nicely.
- Credit-based usage - transparent costs, no surprise bills, users see their balance before they run out. We don’t have actual payment processing in this template, but the structure is there to add it.
- Works everywhere - responsive design that works on desktop, tablet, and mobile
For You (The Developer Experience)
This is what makes the codebase pleasant to work with:
- One command to start -
make upspins up the entire stack (frontend, backend, database). No “install these 12 things first” - Hot-reload everything - change Python code, see it instantly. Change React code, see it instantly. No build steps in dev
- Type-safe end-to-end - Python with mypy strict mode, TypeScript with strict mode. Catch bugs at compile time, not runtime
- Migrations in version control - database schema changes are tracked with Alembic, reviewable in pull requests
- Tests that actually pass - unit tests (fast, no I/O), integration tests (with real database), e2e tests (full stack)
- Deploy with confidence - CI/CD pipeline that runs tests, builds containers, and deploys to production
The goal here is zero friction. You should spend time thinking about your agents, not fighting your tools.
For Production (The Operational Reality)
This is what keeps the system running reliably at scale:
- Containers everywhere - same Docker images from dev to prod. No “works on my machine” surprises
- Built-in observability - traces show you what agents are doing, logs tell you what went wrong, metrics tell you when to scale
- Rate limiting - token-aware limits per user prevent abuse and runaway costs
- Secret management - API keys and credentials stored properly (AWS Secrets Manager - or another secrets manager, not
.envfiles in production) - Zero-downtime deploys - rolling updates, health checks, automatic rollback if something breaks
- Cost tracking - every LLM call is metered, stored, and can be attributed to a user
This isn’t an afterthought. These pieces are wired in from the start, which is way easier than retrofitting them later.
The Decision: Why This Structure?
When I started building agent applications, I kept running into the same problems. This actually started all the way from framework selection. I tried using a bunch including Google-ADK, Autogen, CrewAI even LangFlow, but none gave me the satisfaction of using the OpenAI Agents SDK. I swear I have no ties to openai xD I will talk about the reasons I am so fond of it later. Let me walk you through the key decisions we made and why they matter.
Problem 1: The “Kitchen Sink” Approach
Frameworks like LangChain try to do everything: agent orchestration, vector stores, UI components, deployment. They’re fantastic for prototypes, but when you need to customize how agents hand off to each other, or change authentication providers, or swap databases, you’re fighting the framework’s opinions. This makes it hard to adapt to real-world production needs. Most projects I worked on were already built on a tech stack that needed customization from get go. Instead its better to make things modular so that you can pick the best tool for each job or swap components later.
Our approach: Use best-in-class tools for each layer. FastAPI for the API (it’s async, typed, and has great docs). React for the frontend (huge ecosystem, mature patterns) - TBH i don’t like React nor am good at it :) I would rather use Svelte myself, but given the popularity of React, it is what it is. OpenAI Agents SDK for agent orchestration (built by the people who make the models) - this makes the most sense especially if you are using OpenAI models but even if not the framework is just better overall (more on this later). Docker for containers (industry standard). This means a bit more wiring, but you control each piece and can swap components when needed.
Problem 2: The “Config Hell” Approach
Some frameworks — I’m looking at you, CrewAI… — lean heavily on YAML or JSON configurations. Want to change how an agent behaves? Edit three config files, restart the system, and hope you got the indentation right. Debugging means reading stack traces that point to generated code, not your configuration. This is a nightmare for complex logic.
Our approach: Code over config. Agents are Python files you can read, edit, and debug. Workflows are Python files that import agents directly. You get IDE autocomplete, type checking, breakpoints, and version control that actually shows meaningful diffs. Configuration is for environment-specific stuff (like API keys and database URLs), not behavior.
Tip
The “code over config” philosophy doesn’t mean zero configuration. It means using code for logic and configuration for environment. Your agent’s behavior should be in a Python file you can test. Your database connection string should be in an environment variable. If this bugs you, remember that I warned you this template is opinionated! :)
Bottom line: Code > Config…
Problem 3: The “Works on My Machine” Problem
I can’t count how many times I’ve seen repos that say “just install X, Y, Z and it should work.” But X needs Python 3.9 (you have 3.11), Y needs an older version of numpy, and Z… well, nobody’s sure why Z is even there. By the time you’ve wrangled the environment, you’ve lost an afternoon.
Our approach: Docker from day one. The same containers you run locally are what you deploy to production. No “works on my machine” surprises. No conda environments, no global npm installs. One command (make up) and you have a working system. I got into docker habit immensely last year when I learned more about it while working on my home server projects. It now just feels insane to me not to use it for any project.
Problem 4: The “Streaming Is Hard” Problem
Most LLM demos use simple request/response: send a message, wait, get the full answer. But in production, users don’t want to wait 30 seconds staring at nothing. They want to see the response being generated, like they do in ChatGPT.
Our approach: Server-Sent Events (SSE) for streaming. It’s simpler than WebSockets (I hate websockets) for one-way communication (server to client), works everywhere, and reconnects automatically. The OpenAI Agents SDK handles the complex part (streaming from the LLM), and we map those events to what the frontend needs (tokens, tool calls, completion).
Problem 5: The “Security Afterthought” Problem
So many tutorials add auth as a last step, if at all. But retrofitting security is painful — you end up changing every endpoint, every database query, every test. And you inevitably miss something (like forgetting to filter messages by user_id, leaking conversations between users).
Our approach: Authentication and authorization come early, right after we have a working API. It’s early enough that it’s not a massive refactor, but late enough that we understand what we’re protecting. Every database model has a user_id from the start. Every endpoint checks authentication. No retrofitting.
The Structure: How Agentic Applications Should Be Organized
This is where the template really matters. We’re not just building an app — we’re defining a structure that makes sense for production agent systems. Let me walk you through the directory layout and explain why each piece exists and how they work together.
| |
This structure embodies a specific opinion about how agent applications should be built. Let me explain the critical decisions:
The Backend: Clean Architecture for Agents
Separation of Concerns (The Foundation):
The backend is split into clear layers, each with a single responsibility:
api/handles HTTP concerns: routing, request validation, response serialization, status codes. This layer knows about FastAPI but doesn’t know about Postgres or agent logic.domain/contains business logic: session lifecycle, message handling, credit calculations. This is pure Python — no FastAPI imports, no SQLAlchemy imports. You can test it without starting a server or database.persistence/manages data access: ORM models, database queries, migrations. This layer knows about Postgres but doesn’t know about HTTP or business rules.
Why does this matter? Because when you need to change databases (ie Postgres to MongoDB), you only touch persistence/. When you need to change from FastAPI to Flask, you only touch api/. When you need to change business rules (like credit calculations), you only touch domain/. Changes don’t cascade.
Note
This is an implementation of “Hexagonal Architecture” (also called “Ports and Adapters”). The core domain logic is at the center, and infrastructure concerns (HTTP, database, external APIs) are at the edges. It’s a little more setup than throwing everything in one file, but it scales beautifully.
Additional Resources
Agents and Workflows as First-Class Citizens:
Here’s where our structure gets opinionated about agent applications specifically:
Each agent lives in its own folder: agents/agent_<name>/
| |
Why one folder per agent? Because agents are complex entities with prompts, tools, and configuration. Keeping them together makes it easy to understand what an agent does and to test it in isolation. The shared/ folder prevents duplication when multiple agents need the same tools or data structures.
A normal agent will have an agent.py, tools.py, and if we are using structured outputs, a schemas.py file. Apart from these, agents can also have subagents depending on its use case. A subagent is simply an agent that is only used by a parent agent. For example, if we have a “Support Agent” that handles customer support queries, it might have subagents like “Order Lookup Agent” and “Refund Processing Agent” to handle specific tasks. These subagents would live in their own folders within the parent agent’s folder. This keeps the subagent logic encapsulated and makes it clear they are not meant to be used standalone. If at any point we need to promote a subagent to a full agent, we can easily move it out.
If we are housing the prompts in the repo as well, there will be a prompts/ folder too for that agent. And prompts are stored as markdown files for better readability, separation of concerns, and easier versioning.
Tip
Another way of handling the prompts though is to use a prompt management system. For this, we can stay in OpenAI ecosystem and use OpenAI’s prompts system, or use a third party system like PromptLayer or Phoenix-Arize etc.
In this template we are storing prompts as markdown files in the repo but there is virtually no limitation to using a prompt management system instead. We also show an implementation of this in the repo as well.
I must admit though this structure is actually coming from Google-ADK’s recommended structure. Based on my view they nailed it! Though they still have a missing point in the structure which is what we fix and discuss next: workflows.
Workflows live in workflows/<name>/workflow.py and import agents directly:
| |
A workflow is any sort of orchestration between multiple agents. This could be as simple as routing user messages to different agents based on intent, or as complex as multi-step processes where one agent’s output feeds into another’s input. In OpenAI Agents SDK, there is no formal “workflow” construct like ADK had. Meaning it doesn’t give you blocks for “Run this agent after this” or “Run this agents in parellel” etc. But there is also no need for these technically because everything is just Python code. So you can implement any sort of workflow logic you want using normal Python functions and classes. This gives you ultimate flexibility.
ADK format was using these workflow logic just as they do an agent. So you would have the orchestration coupled with agentic logic. I found this to be a bad idea as it mixes two different concerns. So in our structure we separate workflows into their own folder. This way, agents focus on “what to do” and workflows focus on “how to coordinate” only.
A workflow also handles the “handoff” logic for the agents. We never import one agent into another. Instead, the workflow imports both agents and wires them together. This keeps agents decoupled and reusable.
The Frontend: Simple and Focused
The frontend structure is intentionally minimal:
api/- Client for talking to the backend (REST + SSE wrapper)auth/- Auth0 integration (login, logout, token management)components/- Reusable UI components (ChatWindow, SessionList, MessageBubble)pages/- Top-level page components (Login, Dashboard, Chat)store/- State management (sessions, messages, user)
We’re not using a complex state management library (like Redux) because we don’t need it. The state is simple: current user, list of sessions, list of messages in current session. React’s built-in state and context are enough.
The critical piece is the SSE client in api/. This is where we consume the streaming events from the backend and turn them into UI updates. It’s the most “agent-specific” part of the frontend.
Infrastructure and Testing
Centralized Testing:
All tests live in one /tests directory that mirrors the source structure:
Why centralized? Because it makes CI simpler (one command runs all tests), makes coverage reports meaningful, and makes it obvious where tests live. Some projects scatter tests next to source files (agent.py and agent_test.py in the same folder). I always thought that centralizing them reduces confusion and makes it easier to run subsets (“just run unit tests” vs “run everything”).
Infrastructure as Code:
The infra/ folder contains everything needed to run the system:
docker-compose.ymlfor local development (3 services: db, backend, frontend)terraform/for cloud resources (compute, database, secrets, DNS).github/workflows/for CI/CD (lint, test, build, deploy)
Everything is versioned. Everything is reviewable. You can see the history of infrastructure changes just like code changes.
This is the final piece of our “minimal but complete” philosophy: we give you the deployment story, not just the app code.
Building the Foundation: The Setup
Alright, enough philosophy. Let’s build something!
We’re starting with the foundation — the pieces that make everything else possible:
- The project skeleton (directories, files)
- Python tooling (uv, ruff, mypy)
- Node/React with Vite - the initial setup
- Docker Compose for local development
- Environment configuration
- Development scripts (Makefile)
This might seem like a lot of setup before writing “real” code, but trust me—investing time here will save you a lot of frustration later.
Info
All the code we’re building today is available in the repository. You can follow along by cloning it, or use this as a reference while building your own version. Or just skip it altogether if you trust me that it works :)
I used different branches for each post in the series so you can see the incremental changes. Today’s code is in the part-1-foundation branch.
Python Setup with uv
Info
Skip this section if: You’re already familiar with Python dependency management tools.
We’re using uv for Python dependency management. Why not Poetry or pip?
- uv is 10-100x faster. Seriously. Installing dependencies that take a minute with pip take seconds with uv. It uses a Rust-based resolver and caches aggressively.
- uv uses standard
pyproject.toml. If you decide to switch to Poetry later, it’s easy. The file format is the same. - uv handles Python versions. Need Python 3.11?
uv python install 3.11. Done.
I was an avid pip user before I moved to Poetry 5-6 years back. And last year I discovered uv and switched to it immediately. It is just so much faster and the switch is very painless, highly recommended! But if for some reason you don’t want to use uv, you can easily adapt the instructions to Poetry or pip.
Let’s start by creating the backend folder and initializing uv:
Now, let’s define our dependencies in pyproject.toml.
| |
These are some sensible version defaults (versions) I got as of today, but feel free to adjust as needed. As long as you use uv, it will resolve dependencies quickly so no worries about conflicts.
Code Quality: Linting and Type Checking
Info
Skip this section if: You’re familiar with Python linting and type checking tools.
We’re setting up ruff for linting/formatting and mypy for type checking.
Why ruff? Python has a fragmented ecosystem for code quality. You’ve probably seen projects with black (formatting), flake8 (linting), isort (import sorting), and maybe pylint thrown in. That’s four tools, four configs, and four places where your CI can fail. Ruff combines all of this into one blazingly fast Rust-based tool. It runs 10-100x faster than the competition and gives you one config file instead of four. I used to use flake8 + black + isort combo for years, but once I switched to ruff, I never looked back (I sense a pattern here :)).
Why mypy? Python’s dynamic typing is great for prototyping but dangerous in production. When you’re handling user credits, streaming agent responses, and managing database transactions, you want the compiler to tell you “this function expects a SessionID but you’re passing a str” before your users find out. Mypy with strict mode is how you get that safety.
Note
Type checking isn’t just for catching bugs — it’s documentation that stays up to date. When a new developer looks at def process_run(session: SessionID, user: User) -> RunResult:, they know exactly what the function expects and returns. No guessing, no digging through implementation details.
Let’s configure both tools in pyproject.toml:
| |
Let me break down the key choices:
Ruff configuration:
line-length = 100: This is opinionated. Black uses 88, but I find 100 strikes a better balance between readability and fitting complex FastAPI endpoint signatures on one line.select = ["E", "F", "W", "C", "N", "B", "I"]: These are error codes for pycodestyle errors (E), pyflakes (F), warnings (W), complexity (C), naming conventions (N), bugbear (B), and import sorting (I). You’re getting the equivalent of flake8 + isort in one tool.fix = true: Ruff will automatically fix issues like import sorting and trailing whitespace on save. This eliminates bikeshedding in code reviews.
Mypy configuration:
strict = true: This is the nuclear option. It enables every type checking rule mypy has. You’ll get errors for missing type hints, returningAny, or unsafe casts. This feels painful at first but pays off when you’re refactoring agent logic at scale.disallow_untyped_defs = true: Every function needs type hints. Period. When you’re streaming tokens, managing sessions, and tracking credits, you don’t want ambiguity about what types flow through your system.no_implicit_optional = true: If a parameter can beNone, you must writeOptional[T]. This catches bugs where you assume a value exists but it’s actuallyNoneat runtime (classic NoneType error in production).warn_return_any = true: ReturningAnydefeats the purpose of type checking. This warns you when a function’s return type is too loose, which often happens when integrating with third-party libraries.
Tip
If you’re adding type hints to an existing codebase, start with strict = false and enable rules incrementally. For a new project like this template, going strict from day one is the right move — you’ll never have to retrofit types later.
The [[tool.mypy.overrides]] section at the end relaxes rules for tests. In test files, we care more about coverage and readability than perfect type safety. It’s fine if a test helper function doesn’t have complete type hints—the production code is what matters.
When building agent systems with the OpenAI Agents SDK, you’re juggling complex types: StreamedEvent, RunResult, SessionID, custom tool schemas, and Pydantic models for your database. Mypy catches mismatches before they become production incidents. Ruff ensures your code is consistent and readable when onboarding new team members or revisiting agent logic six months later.
These tools run in CI (we’ll set that up shortly), so every pull request gets checked automatically. No “it worked on my machine” surprises.
Additional Resources
- Ruff Documentation – Full list of rules and configuration options
- Mypy Documentation – Type checking deep dive and best practices
- Python Type Hints Guide – Official Python typing module reference
FastAPI Boilerplate: Your First Endpoint
Now for the fun part — let’s write some actual code. We’re starting with FastAPI as our backend framework. If you’ve used Flask before, FastAPI will feel familiar but with superpowers: automatic validation, async support out of the box, and OpenAPI docs that generate themselves.
Why FastAPI over Flask or Django? Three reasons:
- Native async support: When you’re streaming agent responses or making multiple LLM calls in parallel, you need async. Flask bolted on async support in 2.0, but FastAPI was built for it from day one.
- Pydantic integration: FastAPI uses Pydantic for request/response validation. This means your API contracts are enforced automatically — send malformed JSON and you get a clear error before your handler runs.
- Auto-generated docs: Every endpoint you write shows up in interactive Swagger UI at
/docs. No manual API documentation needed. This is a game-changer when working with frontend developers or building integrations. - Simplicity and performance: FastAPI is lightweight and fast, making it ideal for high-throughput applications like agent systems.
Let’s write our first endpoint:
| |
This looks simple, but there’s a lot happening here:
async def health(): This is an async endpoint. FastAPI will run it on the event loop, which means it won’t block other requests. When you’re handling 100+ concurrent agent sessions, this matters.-> dict[str, str]: Type hint for the response. FastAPI uses this to generate OpenAPI schema and validate your response at runtime (if you enable response validation).- Docstring: Shows up in the auto-generated docs. Write these for every endpoint — your future self will thank you.
Note
The async def keyword is important even for simple endpoints. FastAPI can handle both sync and async functions, but if you define a sync function, it runs in a thread pool which has overhead. For database queries, LLM calls, or any I/O, always use async def.
Now let’s start the server:
You’ll see output like this:
The --reload flag is critical during development — it auto-restarts the server when you change code. Uvicorn uses watchfiles under the hood (another Rust-based tool) for blazing fast reloads.
Open your browser and hit these URLs:
- http://localhost:8000/health - You’ll see
{"status":"ok"} - http://localhost:8000/docs - Interactive API documentation (Swagger UI)
- http://localhost:8000/redoc - Alternative API docs (ReDoc, which I find prettier)
Tip
The auto-generated docs at /docs aren’t just for show. You can test endpoints directly from the browser, inspect request/response schemas, and even download the OpenAPI spec. When you’re debugging agent runs or testing credit deduction logic, this beats using curl or Postman. I keep this tab open constantly during development.
Key benefits we get from this setup:
- Health checks: The
/healthendpoint is what load balancers and orchestrators (Kubernetes, ECS) use to determine if an instance is ready to serve traffic - Type safety: FastAPI validates return types at runtime - if you return the wrong type, you’ll catch it immediately
- Async from the start: No refactoring needed when we add streaming endpoints later
- OpenAPI schema: Auto-generated at
/openapi.jsonfor type-safe frontend clients
Additional Resources
- FastAPI Documentation – Comprehensive guide with excellent examples
- Uvicorn Settings – Deployment and performance tuning options
- Pydantic Models – Deep dive into request/response validation
Next steps:
This health check endpoint is just the skeleton. In the next sections, we’ll add:
- Database integration (PostgreSQL + SQLAlchemy)
- Authentication and user management
- Agent streaming endpoints (the real meat of the application)
- Credit tracking and rate limiting
- Proper error handling and logging
But for now, you have a working FastAPI server with auto-generated docs, type safety, and async support. That’s a rock-solid foundation to build on.
Frontend with Vite: Modern React Development
Info
Skip this section if: You’re familiar with Vite and modern React tooling.
Time to set up the frontend. We’re using Vite as our build tool and development server.
Why Vite over Create React App?
Create React App was the standard for years, but it’s showing its age. The development server takes forever to start, hot module replacement is slow, and the build process uses webpack under the hood (which is powerful but complex). Vite takes a different approach:
Native ESM in development: Vite serves your code as native ES modules. No bundling during development means the dev server starts instantly — even on large projects. CRA bundles everything upfront, which means 30-60 second startup times on big codebases. Vite? Under 2 seconds, always.
Lightning-fast HMR: Change a React component and see it update in the browser in milliseconds. Vite’s HMR is so fast it feels like you’re editing the page directly. This matters when you’re iterating on UI stuff and you want tight feedback loops. Ofc this don’t matter much for our simple template project, but we are thinking big here.
Optimized production builds: Vite uses Rollup under the hood for production builds, which generates smaller, more efficient bundles than webpack. Smaller bundles = faster page loads for your users.
No ejecting required: With CRA, if you need custom configuration, you either eject (and maintain all the build tooling yourself) or use workarounds like CRACO. Vite’s config is simple and transparent from day on — it’s just a JavaScript file. TBH this is the selling point for me :) I hate reacts complex build tooling.
TypeScript strict mode from the start:
We’re using TypeScript with strict mode enabled. I know, I know — TypeScript can feel like overkill for simple UIs. But when you’re building agent applications, your frontend is managing complex state:
- Streaming events from Server-Sent Events
- Message history with nested objects (text, tool calls, errors)
- Session metadata (created_at, updated_at, message count)
- User credits and rate limiting
Without types, you’ll spend hours debugging “Cannot read property ‘X’ of undefined” errors. With types, your IDE tells you exactly what’s available and catches errors as you type. Also as I mentioned before, we think big regardless of this simple template project.
Let’s create the frontend:
Note
I am skipping the folder creation steps in the commands cuz its assumed you can just create the folders as needed. So just focus on the commands relevant to each section. (Or just create the structure all together from the beginning based on the structure we discussed.)
Here I am selecting the rolldown-vite for the build, you can not choose it ofc but why would you not? :)
This scaffolds a React + TypeScript project with Vite. You’ll get a basic structure:
| |
That’s all we need for Part 1! The default Vite structure is fine for now. We’ll build out the full frontend architecture (components, pages, API clients, state management) in Part 5 when we implement the agent UI. For now, we just need a working dev server that we can containerize.
Configuring path aliases (optional but recommended):
One quick improvement: set up path aliases so you can write @/components/Button instead of ../../../components/Button later.
Update vite.config.ts:
| |
And update tsconfig.json:
Note
The server.host configuration is important for Docker. It makes Vite accessible from outside the container. We’ll use this when we set up Docker Compose next.
Start the development server:
| |
You’ll see:
Open http://localhost:5173 and you’ll see the default Vite + React landing page with the spinning Vite logo. Not exciting yet, but notice how fast that startup was. On a comparable CRA project, you’d still be waiting for webpack to bundle.
That’s it for the frontend in Part 1! We have a working dev server with hot module replacement, TypeScript support, and path aliases configured. In Part 5, we’ll come back and build out the full agent UI with components, state management, SSE streaming, and all the bells and whistles.
Additional Resources
- Vite Documentation – Official guide and configuration options
- Vite + React Plugin – React-specific Vite features
- TypeScript Handbook – Learning TypeScript
Creating Dockerfiles
Info
Skip this section if: You’re comfortable writing Dockerfiles and understand layer caching.
Before we can use Docker Compose, we need Dockerfiles for our backend and frontend.
Backend Dockerfile
| |
Let me break down what’s happening here:
Base image choice:
| |
We use python:3.11-slim instead of the full python:3.11 image. The slim variant is much smaller (100MB vs 900MB) because it excludes unnecessary build tools and libraries. This means faster builds, faster deployments, and lower storage costs.
System dependencies:
- gcc: Required by some Python packages that compile C extensions (like asyncpg)
- postgresql-client: Useful for debugging (you can run
psqlinside the container) rm -rf /var/lib/apt/lists/*: Cleans up apt cache to keep the image small
Dependency caching:
This order is critical for Docker layer caching. Docker caches each instruction as a layer. If nothing changes in a layer, Docker reuses the cached layer instead of rebuilding.
By copying pyproject.toml first and installing dependencies, we ensure that layer is cached. When you change application code (which happens constantly), Docker only rebuilds the COPY app/ layer and later layers—not the expensive dependency installation layer.
Production optimization:
| |
The --no-dev flag skips development dependencies (pytest, ruff, mypy). In production, you don’t need testing or linting tools—only the code needed to run the app. This keeps the image smaller and more secure.
Note
In development, we override this CMD in docker-compose.yml to add the --reload flag. This way, the same Dockerfile works for both dev and prod—we just change the command at runtime.
Frontend Dockerfile
| |
Breaking this down:
Alpine base:
| |
Alpine Linux is a minimal distribution designed for containers. node:20-alpine is ~120MB compared to ~1GB for the full node:20 image. Alpine uses musl libc instead of glibc, which is lighter weight.
npm ci vs npm install:
| |
npm ci (clean install) is faster and more reliable than npm install in CI/CD and containers:
- Deletes
node_modulesbefore installing (ensures clean state) - Installs exact versions from
package-lock.json(reproducible builds) - Fails if
package.jsonandpackage-lock.jsonare out of sync - 2-3x faster than
npm install
Build step:
| |
This compiles TypeScript, bundles with Vite, and optimizes assets. The result goes in dist/. In production, you’d serve this dist/ folder with nginx or a CDN. In development, we override the CMD to run npm run dev instead.
Development vs production:
The Dockerfile is written for production (build artifacts, optimized bundles). In docker-compose.yml, we override the command for development:
This runs the Vite dev server instead of serving the build output.
Tip
Multi-stage builds for production: In a real production setup, you’d use a multi-stage Dockerfile for the frontend:
This creates a tiny final image (20MB) with just nginx and your built assets. The node installation and source code are discarded after the build. We’ll cover this pattern in Part 6 (Deployment).
Testing the Dockerfiles
Before using Docker Compose, verify the Dockerfiles work individually:
You should see both images listed with their sizes. If the build fails, check for:
- Typos in Dockerfile commands
- Missing files (make sure
pyproject.toml,package.jsonexist) - Network issues (Docker needs to download base images and dependencies)
Warning
Common Dockerfile mistakes to avoid:
Not using
.dockerignore: Create a.dockerignorefile to exclude unnecessary files from the build context:Without this, Docker copies everything to the build context, slowing builds and potentially including secrets.
Running as root: For production, you should create a non-root user in the Dockerfile. We’ll cover this in Part 6.
Installing dependencies every time: Always copy dependency files (
pyproject.toml,package.json) before copying source code. This leverages Docker’s layer caching.
Now that we have Dockerfiles, we’re ready to orchestrate all three services with Docker Compose.
Additional Resources
- Dockerfile Best Practices – Official Docker guide
- Multi-stage Builds – Optimizing production images
- Docker Layer Caching – Understanding how caching works
Docker Compose: Orchestrating the Full Stack
Info
Skip this section if: You’re comfortable with Docker Compose service definitions, health checks, and volume mounts.
This is where everything comes together. We’ve set up the backend (FastAPI + Python), the frontend (Vite + React), and now we’re going to run them together with Docker Compose. This is the secret sauce that eliminates “works on my machine” problems and makes onboarding new developers trivial.
Why Docker Compose?
You could run each service manually: start Postgres in one terminal, start the backend in another, start the frontend in a third. But that’s annoying, error-prone, and hard to document. Docker Compose lets you define all services in one file and start them with a single command.
More importantly, it ensures consistency. The same Docker images you use locally are what you deploy to production (with different environment variables). No subtle differences between dev and prod. No “but it worked on my laptop” debugging sessions.
Here’s our complete docker-compose.yml:
| |
Let me break down the critical pieces:
1. Database Service (Postgres)
We’re using the alpine variant of Postgres because it’s tiny (50MB vs 300MB for the full image). This matters when you’re pulling images in CI or deploying to cloud providers that charge for bandwidth.
This is crucial. Without a health check, Docker Compose considers the database “ready” as soon as the container starts. But Postgres takes a few seconds to initialize. If the backend tries to connect during those seconds, it crashes with “connection refused.”
The health check runs pg_isready every 5 seconds. Only when it succeeds does Docker Compose start the backend service. This prevents race conditions.
This is a named volume. It persists database data between container restarts. Without this, every time you run docker compose down, you’d lose all your data. Named volumes live outside containers and survive restarts.
Note
Named volumes are stored in Docker’s internal directory (usually /var/lib/docker/volumes on Linux). You can list them with docker volume ls and inspect them with docker volume inspect postgres_data. To completely reset your database, run docker compose down -v (the -v flag removes volumes).
2. Backend Service (FastAPI)
This tells Docker to build an image from the backend/Dockerfile. During development, this build only happens once (or when you change dependencies). The actual source code is mounted as a volume (see below), so code changes don’t require rebuilding.
This is the magic of hot-reload. We mount the local backend/app directory into the container at /app/app. The :ro flag makes it read-only (security best practice).
When you change a Python file locally, uvicorn detects the change and reloads automatically. No rebuild, no restart. Just save and refresh.
This is smarter than a basic depends_on. It doesn’t just wait for the database container to start — it waits for the health check to pass. This eliminates the race condition where the backend starts before Postgres is ready to accept connections.
Notice the hostname: db. In Docker Compose, services can reach each other by service name. From the backend’s perspective, the database is at db:5432, not localhost:5432 (because they’re in separate containers).
The command overrides the Dockerfile’s CMD for development. We use uv run to execute uvicorn within uv’s managed environment—this is crucial because uv installs packages in a virtual environment. The --reload flag enables hot-reloading during development.
Tip
If you need to connect to the database from your host machine (e.g., to run database migrations or use a GUI tool like pgAdmin), use localhost:5432. From inside containers, use the service name db:5432. This trips up a lot of people initially.
3. Frontend Service (Vite)
Same pattern as the backend: build once, mount source code for hot-reload. When you change a React component, Vite’s HMR kicks in and updates the browser instantly.
The command overrides the Dockerfile’s CMD to run the Vite dev server instead of serving the production build. The --host 0.0.0.0 flag makes Vite accessible from outside the container (necessary for Docker).
The VITE_API_URL environment variable tells the frontend where the backend API lives. In production, you’d set this to your actual API domain (e.g., https://api.yourdomain.com). In development, it’s localhost:8000.
Note
Vite requires environment variables to be prefixed with VITE_ to expose them to the browser. Any variable without this prefix is only available during the build, not in runtime code.
Starting everything:
You’ll see logs from all three services interleaved:
| |
Three services, one command. That’s the developer experience we’re aiming for.
Warning
On first run, Docker will download base images for Postgres (alpine), Python, and Node. This can take 2-10 minutes depending on your connection. Subsequent runs are instant because images are cached locally. Don’t panic if the first run takes a while!
Dev/Prod Parity: Why This Matters
One of the Twelve-Factor App principles is “dev/prod parity”—keep development and production as similar as possible. Docker Compose achieves this:
- Same database: You’re using real Postgres locally, not SQLite. No “works in dev, breaks in prod” surprises from database quirks.
- Same networking: Services talk to each other over Docker’s internal network, just like they will in production (via service mesh or internal DNS).
- Same environment variables: The backend reads
DATABASE_URLfrom the environment, whether it’s Docker Compose locally or Kubernetes in production.
When you deploy, you’re not crossing your fingers hoping everything works. You’re deploying the same containers you’ve been running locally for weeks. The only difference is environment variables (prod database URL, prod API keys, etc.).
Additional Resources
- Docker Compose Documentation – Complete reference for all Compose features
- Docker Networking – How containers communicate
- Twelve-Factor App – Methodology for building modern web applications
- Docker Compose in Production – Best practices for deploying with Compose
Troubleshooting common issues:
“Port 5432 is already in use”: You have Postgres running locally. Either stop it (brew services stop postgresql on Mac) or change the port mapping in docker-compose.yml to 5433:5432.
Backend can’t connect to database: Check that the health check is passing with docker compose ps. If the database is “unhealthy,” something’s wrong with Postgres startup. Check logs with docker compose logs db.
Hot-reload not working: Make sure the volume mounts are correct. Run docker compose config to see the resolved configuration. The paths should match your local directory structure.
“Cannot connect to Docker daemon”: Docker Desktop isn’t running. Start it and try again.
Configuration That Makes Sense
Info
Skip this section if: You’re familiar with Pydantic Settings and environment-based configuration.
Configuration is one of those things that seems simple at first but becomes a nightmare if you don’t set it up properly. I’ve seen too many projects where config is scattered across environment variables, YAML files, hardcoded constants, and command-line flags. Debugging “why does this behave differently in staging?” becomes an archaeological expedition.
We’re using Pydantic Settings to centralize all configuration in one type-safe place. This isn’t just about convenience—it’s about catching errors before they reach production.
Why Pydantic Settings over environment variables or config files?
Most projects use one of these approaches:
- Raw
os.environ: No validation, no type safety, missing variables cause runtime errors deep in the code - python-decouple or similar: Better than raw environ but still string-based, no nested config support
- YAML/JSON files: Great for complex config but no type safety, easy to typo a key
- Dotenv only: Simple but no validation, everything is a string
Pydantic Settings combines the best parts of all these approaches:
- Type-safe: Define config as a typed class, get IDE autocomplete and mypy validation
- Validated on startup: App crashes immediately if config is invalid, with clear error messages
- Environment variable support: Reads from
.envfiles or actual environment variables - Nested config: Support complex structures like database pools, API rate limits, etc.
- Multiple sources: Can read from files, env vars, and defaults with clear precedence
Here’s our complete settings module:
| |
Let me break down the key pieces:
1. Type annotations with validation
PostgresDsn is a Pydantic type that validates the URL format. If you typo the URL, you get an error like “Invalid Postgres DSN: expected ‘postgresql://’, got ‘postgres://’” at startup.
Literal["dev", "staging", "prod"] means the env field can only be one of these three values. Try to set it to “production” (not “prod”) and your IDE will show an error before you even run the code.
2. Field validators for custom logic
This is powerful: validation can depend on other fields. In development, missing an OpenAI key is fine (you might be working on the database layer). In production, it’s a fatal error that stops the app from starting.
3. Smart defaults and required fields
If database_url isn’t set, Pydantic raises an error immediately:
This is way better than getting a runtime error 10 minutes into testing when you try to connect to the database.
4. Environment variable mapping
Pydantic automatically maps environment variables to fields:
DATABASE_URLin.env→settings.database_urlin PythonOPENAI_API_KEY→settings.openai_api_keyENV→settings.env
The case_sensitive=False setting means database_url, DATABASE_URL, and Database_Url all work. This is convenient but can be disabled if you want strict naming.
Creating the .env file:
| |
Tip
The .env file should never be committed to git. Add it to .gitignore immediately:
| |
Create a .env.example file with placeholder values so new developers know what variables are needed:
Using settings throughout the app:
| |
| |
Environment-specific configuration:
In production, you’d override settings via environment variables (not .env files):
The same Settings class works in all environments — you just change the source of the values.
Note
For production secrets (API keys, database passwords), never use .env files. Use a secrets manager like AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault. Your deployment script fetches secrets and sets them as environment variables. Pydantic Settings reads them the same way it reads .env files — the code doesn’t change.
Why this matters for agent applications:
Agent applications have a lot of knobs to tune: model selection, token limits, rate limiting, database connection pools, API keys for multiple providers. Centralizing config in a type-safe class means:
- Easier debugging: When something behaves differently in staging, check
settings.envandsettings.openai_modelinstead of grepping for environment variable accesses - Safer deploys: If you forget to set
OPENAI_API_KEYin production, the app crashes on startup (before serving any traffic) instead of failing the first time a user tries to chat - Better testability: In tests, you can override settings easily:
Using asyncpg for database connections:
We specified postgresql+asyncpg:// in the database URL. Why asyncpg specifically?
- Fastest Postgres driver for Python: Benchmarks show it’s 3-5x faster than psycopg2
- Native async support: Built for asyncio from the ground up (unlike psycopg2 which added async later)
- Type-safe: Uses Python’s type system for query parameters
- Connection pooling: Built-in connection pool management
When you’re streaming agent responses and handling multiple concurrent sessions, database performance matters. asyncpg ensures database queries don’t become the bottleneck.
Additional Resources
- Pydantic Settings Documentation – Complete guide to all features
- Twelve-Factor App: Config – Why config should live in environment variables
- asyncpg Documentation – High-performance async Postgres driver
- Environment Variables Best Practices – Security and management tips
Developer Experience: The Makefile
Here’s a problem I’ve seen on every project: each developer has their own set of commands they memorized. One person runs tests with pytest, another uses uv run pytest, a third uses python -m pytest. Someone remembers that you need to be in the backend/ directory, someone else doesn’t. Six months later, nobody remembers the exact incantation to run database migrations.
The solution: standardize everything in a Makefile. Make is old (1976!), ubiquitous (comes with every Unix system), and perfect for this job. It’s not just a build tool—it’s a command runner and documentation system.
Why Make over npm scripts or custom shell scripts?
- npm scripts: Great if your whole project is Node, awkward when you have Python backend + React frontend + Docker + infrastructure
- Shell scripts: Work but require careful path handling and error checking, no built-in dependency between tasks
- Task runners like Task or Just: Modern and nice, but not installed by default. Make is already there.
Make gives you:
- Self-documenting commands: Run
makeormake helpto see all available commands - Task dependencies: “Run tests only after linting passes”
- Consistent working directory: No more “which folder am I in?” confusion
- Cross-platform (mostly): Works on Linux, macOS, and WSL2
Here’s our Makefile for Part 1:
| |
Let me break down what each command does:
make dev - Your daily driver:
Starts all three services in detached mode (background) and shows you the URLs. This is what you’ll run every morning. The -d flag means services run in the background, so you get your terminal back.
make up - When you want to see logs:
Starts services in the foreground, showing logs from all three services. Useful when you’re debugging and want to see what’s happening. Press Ctrl+C to stop.
make down - Stop everything:
Stops all containers and removes them. The database volume persists, so you don’t lose data.
make logs - View live logs:
Attaches to logs from all running services. The -f flag means “follow” (like tail -f). Press Ctrl+C to exit.
make clean - Remove clutter:
Deletes Python cache files that accumulate during development. Run this occasionally to free up space.
Your typical workflow:
Why this matters:
- One source of truth: Instead of remembering “was it
docker composeordocker-compose?”, you just runmake dev - New developer friendly: Someone clones the repo, runs
make, sees all commands. No hunting through README files - Works everywhere: Same commands on Mac, Linux, and WSL2
- Easy to extend: As we add features in later parts (tests, migrations, deployments), we’ll add more make targets
Tip
For Windows developers: If you don’t have Make installed, you have a few options:
- WSL2 (recommended): Full Linux environment, Make works perfectly, this is how I use it on my Desktop
- Chocolatey:
choco install makeinstalls GNU Make on Windows - Git Bash: Recent versions include Make
- Just run the commands: Look inside the Makefile and run the
docker composecommands directly
But seriously, just go with WSL.
What we’ll add in later parts:
This is a minimal Makefile for Part 1. As we progress through the series, we’ll add:
- Part 2:
make migrate,make migrate-create(database migrations) - Part 3:
make test,make lint,make format(testing and code quality) - Part 6:
make deploy-staging,make deploy-prod(deployments)
For now, these five commands are all we need to work with our foundation.
Additional Resources
- GNU Make Documentation – Complete reference
- Makefile Tutorial – Beginner-friendly guide with examples
What We’ve Built
Take a moment to appreciate what we’ve accomplished. This isn’t just “hello world” — this is a production-grade foundation that most teams spend weeks refining. Let’s inventory what we have:
Infrastructure & DevOps:
- Three-service architecture running with one command (
make dev) - Docker Compose with health checks, volume mounts, and service dependencies
- Hot-reload everywhere: Python with uvicorn watch, React with Vite HMR
- Named volumes for persistent database storage
- Dev/prod parity: same containers locally and in production
Backend (Python + FastAPI):
- Async-first FastAPI application with automatic OpenAPI docs
- Type-safe configuration using Pydantic Settings with validation
- uv for dependency management (10-100x faster than pip)
- Ruff for linting and formatting (replaces black, flake8, isort)
- mypy in strict mode catching type errors before runtime
- asyncpg for Postgres (fastest async driver available)
Frontend (React + TypeScript):
- Vite for blazing-fast dev server (sub-2-second startup)
- TypeScript in strict mode with path aliases configured
- Clean folder structure anticipating SSE, auth, and agent UI
- Type-safe API client ready to match backend Pydantic models
Developer Experience:
- Makefile with standard commands for all common tasks
- Self-documenting (run
maketo see all commands) - Consistent workflow across all team members
- CI-ready (same commands work in GitHub Actions, GitLab CI, etc.)
Testing & Quality:
- Test structure ready for unit, integration, and e2e tests
- Coverage reporting configured with pytest
- Lint and typecheck commands for pre-commit hooks
- Quality gates that fail fast with clear error messages
Try it now:
The frontend is still the default Vite landing page, and the backend has one endpoint. But you have:
- Type safety enforced from database to browser
- Configuration validated on startup
- Services orchestrated with proper dependencies
- Developer workflow standardized
- Hot-reload for instant feedback
Tip
Checkpoint: Before moving on, make sure everything works:
- Run
make devand wait for all services to start - Visit http://localhost:8000/docs and see the interactive API docs
- Visit http://localhost:5173 and see the React app
- Run
make logs-backendin another terminal and see live logs - Change a file in
backend/app/main.py, save, and watch uvicorn reload
If any of these fail, check the troubleshooting sections in each setup step above. The foundation must be solid before we build on it.
What’s Next?
Part 2: Backend Core & Database - FastAPI routing, async SQLAlchemy, Alembic migrations, Repository pattern
Part 3: Authentication & Security - Auth0 integration, JWT validation, session cookies for SSE, CORS
Part 4: Agent Integration & Streaming - OpenAI Agents SDK, SSE streaming, tool calling, session memory
Part 5: Frontend & User Interface - React SSE client, chat UI, session management, markdown rendering
Part 6: Credits, Limits & Usage Tracking - Token-based credits, rate limiting, usage analytics
Part 7: Observability & Tracing - Structured logging, OpenAI Traces, Arize Phoenix integration
Part 8: Production Deployment - Terraform, GitHub Actions CI/CD, zero-downtime deployments
Additional Resources
Further Reading on Topics Covered Today:
- FastAPI Documentation – Official docs with excellent examples
- uv Documentation – Modern Python packaging and dependency management
- Vite Guide – Fast frontend tooling and build configuration
- Docker Compose Docs – Multi-container orchestration
- Pydantic Settings – Type-safe configuration management
- Twelve-Factor App – Methodology for building production apps
Resources and Community
Repository: github.com/bedirt/agents-sdk-prod-ready-template
Issues and questions: Open a GitHub issue or discussion. I try to respond within a day or two. Common issues usually have solutions in existing threads.
Comments and feedback: Please leave a comment below if you found this helpful, have suggestions, or want to share how you used the template. You can also send a suggestion using the “Suggest an Edit” link at the bottom of the page - which takes you to the GitHub repo issues.
A Final Reiteration
I built this template because I was tired of reinventing the wheel every time I started a new agent project. The first few times, I’d spend a week setting up Docker, configuring type checking, wiring authentication, and building deployment pipelines before writing a single line of agent logic.
This template encapsulates those weeks of setup. It’s the project structure I wish I had when I started building production agent applications.
My hope is that it saves you time and helps you focus on what matters: building great agent experiences for your users.
See you in Part 2, where we’ll add the database layer and start persisting chat sessions.
Next: Part 2 - Backend Core & Database (Coming Soon)
This is part of a series on building production-ready AI agent applications. All code is open source on GitHub.
Info
Enjoying this series? Star the GitHub repo, share it with your team, or send feedback. This template is a living project—contributions, suggestions, and questions are welcome.
Comments
Comments