Relay Overview¶

The relay daemon is the core component that moves tasks from the database to the message broker.

How It Works¶

┌─────────────────────────────────────────────────────────┐
│                    PROCESSING LOOP                      │
│                                                         │
│  1. If breaker is open, sleep until cooldown expires    │
│  2. SELECT batch of messages (FOR UPDATE SKIP LOCKED)   │
│  3. Publish selected messages                           │
│     - Serial by default                                 │
│     - Optional bounded sliding window                   │
│     - Stop starting new sends after shutdown deadline   │
│     - Stop refilling after consecutive broker outages   │
│     - Worker threads publish prepared broker payloads   │
│     - Main thread classifies outcomes and mutates DB    │
│  4. Delete published messages                           │
│  5. Apply retry backoff / dead-letter / outage deferral │
│  6. Touch liveness file                                 │
│  7. Sleep if queue was empty or batch was short         │
│  8. Repeat                                              │
│                                                         │
└─────────────────────────────────────────────────────────┘

Starting the Relay¶

python manage.py celery_outbox_relay

Optional advanced tuning:

python manage.py celery_outbox_relay --publish-concurrency 2

Keep --publish-concurrency=1 as the baseline unless you have broker-backed verification for your deployment path.

Graceful Shutdown¶

The relay handles SIGTERM and SIGINT gracefully:

SIGTERM or SIGINT starts draining mode.
The relay stops starting new sends after --shutdown-timeout.
An already-running publish is bounded only by --send-timeout.
Already-selected but not-yet-started rows recover later through stale-timeout selection.

This lets container orchestrators stop the process without dropping committed rows, while still leaving duplicate-tolerant recovery semantics in place if a row is reclaimed later.

Broker Outage Handling¶

Broker outages are handled differently from ordinary task publish failures:

Publish attempts are still bounded by --send-timeout.
Broker-outage rows are deferred by --broker-outage-cooldown instead of consuming retry budget.
After two consecutive broker outages without an intervening successful publish, the process-local breaker opens and the relay stops starting new batch attempts until the cooldown expires.
In parallel publish mode, the relay also stops refilling the worker window until the already in-flight results are classified.
The breaker is not shared across relay processes.