Every request handler eventually faces the same temptation: there's a slow, non-essential chore to do — send a welcome email, transcode an upload, generate a PDF, call a sluggish third-party API — and the obvious place to do it is right there, inline, before you return the response. Resist. The user is sitting on a spinner waiting for work they don't care about the timing of, your worker is pinned for the duration, and a single slow downstream turns a 50ms endpoint into a 5-second one. The fix is to move that work off the request path and run it in the background. This is a practical, vendor-neutral guide to doing that in FastAPI: the two very different things “background” can mean, when the built-in tool is genuinely enough, when you need a real task queue, and how to keep deferred work from quietly losing data the first time a process restarts.
“Background” Is Two Different Problems
Before picking a tool, separate the two jobs that both get called “background work,” because they have opposite requirements:
- Fire-and-forget after the response. Quick, best-effort work you want off the critical path but don't need to guarantee — writing an audit-log row, busting a cache, sending a non-critical notification. If it occasionally gets dropped on a deploy, nobody is harmed.
- Durable deferred work. Work that must eventually complete even if the server restarts mid-task: charging a card, processing an upload, running a nightly report, fanning out a thousand emails. This needs to survive crashes, be retried on failure, and ideally be observable.
FastAPI ships a perfect tool for the first problem and nothing at all for the second. Knowing which one you're holding is the whole decision.
FastAPI BackgroundTasks: Fire-and-Forget, In-Process
FastAPI's BackgroundTasks lets you register a function that runs after the response is sent, in the same process. You add a parameter of that type to your path operation, schedule the work, and return immediately:
from fastapi import BackgroundTasks, FastAPI
app = FastAPI()
def send_welcome_email(to: str) -> None:
# talks to your email provider — slow, and the user doesn't need to wait for it
email_client.send(to, template="welcome")
@app.post("/signup")
async def signup(email: str, background_tasks: BackgroundTasks):
user = create_user(email)
background_tasks.add_task(send_welcome_email, email)
return {"id": user.id} # responds now; the email goes out afterward
The response goes back the instant the handler returns; send_welcome_email runs once the client already has its 200. That's exactly right for the first category — cheap, best-effort, non-critical. Use it freely for audit logs, cache invalidation, firing an analytics event, or a notification you can afford to lose.
BackgroundTasks Runs In Your Web Process
That convenience is also its ceiling. The task shares the lifecycle of the worker that served the request, so a deploy, crash, or restart between the response and the task running means the work is gone — no record, no retry. There is no durability, no retry, no visibility, and no scheduling. And because it runs in the web process, a CPU-heavy task (image processing, a big PDF) blocks that worker from serving other requests. BackgroundTasks is a way to defer work to just after the response, not a way to run a job that has to happen.
When BackgroundTasks Isn't Enough
You've outgrown the built-in the moment any of these become true, and most real apps hit several:
- The work must not be lost. Anything touching money, user data, or an external system you're on the hook to call needs a record that survives a restart.
- It needs to be retried. The email provider 500s, the payment API times out — you want automatic retry with backoff, not a silent drop.
- It's slow or CPU-bound. Minutes of number-crunching or media processing shouldn't run inside the process that's supposed to be answering HTTP.
- It needs scheduling. “Every night at 2 a.m.” or “15 minutes after signup” is a cron/delay feature
BackgroundTaskssimply doesn't have. - You want to see it. Queue depth, failures, retries, and the ability to replay a dead job are operational table stakes once background work matters.
All five point at the same architecture: a real task queue with workers that live outside the web process.
The Task Queue Model: Producer, Broker, Worker
A task queue decouples asking for work from doing work by putting a durable message between them. Three roles:
- Producer — your FastAPI app. Instead of running the job, it serializes “run
generate_report(42, '2026-06')” into a message and hands it to the broker. This takes milliseconds, so your endpoint returns immediately. - Broker — a message store every party talks to, almost always Redis or RabbitMQ. It holds the queue durably, so a job waits safely even if every worker is down or restarting.
- Worker — one or more separate processes (their own containers, their own scaling) that pull messages off the broker and execute them. Crash mid-job and the broker hands the message to another worker.
This is the shape that gives you everything BackgroundTasks couldn't: durability (the broker persists the job), retries (a failed job goes back on the queue), isolation (heavy work can't stall your HTTP workers), independent scaling (add workers without adding web capacity), and observability (the queue is a thing you can measure).
Picking a Queue: Celery, RQ, Dramatiq, arq
Four mature options dominate Python, and the honest differences are about weight and ergonomics, not capability:
- Celery — the batteries-included incumbent. Brokers (Redis/RabbitMQ), result backends, scheduled tasks via Celery beat, workflows (chains, groups, chords), rate limiting, retries — it does everything, and the config surface reflects that. Reach for it when you want the ecosystem and the maturity and don't mind the heft. It's synchronous-worker at heart, which suits the CPU- and IO-bound jobs most queues actually run.
- RQ (Redis Queue) — the simple one. Redis-only, a small readable codebase, almost no ceremony to get a worker running. Perfect when your needs are modest and you'd rather not learn Celery's vocabulary. Fewer knobs is the feature.
- Dramatiq — the middle path. Cleaner ergonomics than Celery, retries and middleware built in, Redis or RabbitMQ. A strong default when RQ feels too thin but Celery feels too heavy.
- arq — the async-native one. Built on
asyncioand Redis, so tasks areasync defand share the idiom of your FastAPI code. If your jobs are IO-bound and alreadyawait-heavy — calling APIs, hitting the DB — arq fits FastAPI like a glove.
Which one to pick
Default to arq if your background work is async and IO-bound — it matches FastAPI's mental model with the least friction. Choose RQ for the simplest possible durable queue on Redis. Choose Dramatiq when you want clean defaults with room to grow. Choose Celery when you need its breadth — complex workflows, multiple brokers, a mature scheduling story — and accept the configuration cost that buys. There is no wrong answer here, only over- and under-buying.
Enqueue From FastAPI, Execute in a Worker
The pattern is the same across all four: define the task on the worker side, and from your route push a message and return a handle. Here it is with arq, because the async fit is cleanest. The worker module defines the function and its retry policy:
# worker.py — runs as its own process: `arq worker.WorkerSettings`
async def generate_report(ctx, user_id: int, month: str) -> str:
data = await crunch_numbers(user_id, month) # minutes of work, off the web process
url = await upload_pdf(render(data))
await notify(user_id, url)
return url
class WorkerSettings:
functions = [generate_report]
max_tries = 5 # retry a failing job up to 5 times, with backoff
Your FastAPI handler becomes a thin producer — enqueue, return a job id, done in milliseconds:
from arq import create_pool
from arq.connections import RedisSettings
@app.post("/reports", status_code=202)
async def request_report(user_id: int, month: str):
redis = await create_pool(RedisSettings())
job = await redis.enqueue_job("generate_report", user_id, month)
return {"job_id": job.job_id, "status": "queued"} # 202 Accepted, not 200
Two details worth copying. Return 202 Accepted, not 200 — it's the honest status code for “I've taken your request and will process it,” and it tells the client to poll or wait for a webhook rather than expecting the result inline. And hand back a job id so the client (or your own status endpoint) can ask “is it done yet?” against the result backend.
Make Every Task Idempotent
Here's the catch that bites everyone once: queues deliver at least once, not exactly once. A worker can finish a job, crash before acknowledging it, and the broker — reasonably — redelivers it to another worker. Your task runs twice. If the task charges a card or sends an email, twice is a real problem. The fix is the same discipline that makes idempotent webhook handlers safe: guard the side effect on a key you control, so a second run is a no-op.
async def charge_customer(ctx, payment_id: str) -> None:
# At-least-once delivery means this may run more than once.
# Make the second run harmless by checking a key you own first.
if await already_processed(payment_id):
return
await do_charge(payment_id)
await mark_processed(payment_id) # ideally same transaction as the charge
Pair idempotency with sane retries: exponential backoff so a struggling downstream gets room to recover, a cap on attempts, and a dead-letter destination for jobs that exhaust their retries so a poison message doesn't loop forever. Most of the queues above give you backoff and a max-tries setting out of the box — the idempotency is the part only you can supply, because only you know what “already done” means for your data.
Scheduled and Periodic Jobs
The other thing a queue buys you is time-based work — the nightly digest, the hourly sync, the cleanup sweep. Rather than a system crontab firing curl at a secret endpoint, every major queue has a first-class scheduler: Celery has beat, and arq takes cron definitions right in the worker settings:
from arq import cron
class WorkerSettings:
functions = [generate_report, send_daily_digest]
cron_jobs = [
cron(send_daily_digest, hour=8, minute=0), # every day at 08:00
]
The scheduled job lands on the same queue as everything else, runs in the same worker pool, and inherits the same retries and observability. One system for “do this soon” and “do this every morning” beats two.
The Operational Footprint
A queue isn't free — you're adding a broker and a second kind of process to run and watch, and that's the real cost to weigh against the durability you gain. Three things to decide on purpose. You now run workers as their own deployment (their own container and scaling, separate from the web tier). You depend on Redis — the same dependency that shows up the moment you do distributed rate limiting across more than one process, so it often pays for itself twice. And you should watch the queue: alert on depth (work arriving faster than it drains) and on the dead-letter queue (jobs giving up), because a silent backlog is the failure mode you won't notice until a customer does.
The Bottom Line
Match the tool to the promise. For quick, best-effort work you're happy to occasionally lose, FastAPI's BackgroundTasks is the right amount of machinery and nothing more. The moment the work has to happen — survive a restart, retry on failure, run on a schedule, or stay off your HTTP workers — move it to a real task queue and let a separate worker pool own it. Pick the lightest queue that covers your needs (arq for async-IO, RQ for simple, Dramatiq for the middle, Celery for breadth), make every task idempotent because delivery is at-least-once, and put eyes on the queue. Do that, and slow work stops being a spinner your users stare at and becomes a job a worker quietly finishes after you've already said 202.
A Worker, Already Wired In
A background worker — queue, retries, and an idempotent task pattern wired to the same Redis your rate limiter uses — ships pre-built in ShipKit, our production-ready FastAPI boilerplate. See inside ShipKit's architecture for how the pieces fit together.
Explore ShipKit