Health Checks
Allegro exposes two read-only HTTP endpoints designed to be polled by an
external monitor (an uptime checker, a metrics scraper, or an on-call alerting
system). One reports the health of a single tenant's queues; the other reports
the health of the platform-wide job queues. Both return an HTTP 503 when their
backing service can't be reached, so even a naive "is the status code 200?"
check trips on a degraded state.
These endpoints are part of the REST API — see the REST API reference for the authoritative request and response schemas. This guide explains what they report and how to wire up monitoring against them.
Tenant health
GET /api/v1/health
Authenticated with a Sanctum bearer token; the token's user must be able to view the tenant. The endpoint reports the depth and oldest-item age of that tenant's Redis-backed queues — the Event Queue and the Heartbeat Queue.
{
"status": "ok",
"tenant": "tenant_123",
"redis": { "reachable": true },
"queues": {
"events": { "depth": 0, "oldest_age_seconds": null },
"heartbeats": { "depth": 12, "oldest_age_seconds": 34 }
}
}
| Field | Meaning |
|---|---|
status | ok when Redis is reachable, degraded otherwise. |
redis.reachable | Whether the queue's Redis backend responded. |
queues.*.depth | Number of items currently waiting in the queue. |
queues.*.oldest_age_seconds | Age in seconds of the oldest item still waiting, or null when the queue is empty. |
When Redis cannot be reached, status is degraded, every queue value is
null, and the response status code is 503.
A steadily rising depth or oldest_age_seconds means a consumer has fallen
behind and is no longer draining the queue fast enough — see the
Event Queue and Heartbeat Queue
guides for how consumers read and delete items.
Platform health
GET /api/v1/landlord/health
Authenticated with a Sanctum bearer token belonging to a super admin, and served only in the landlord context. It reports the backlog of every Horizon-managed background queue plus the recent failed-job count.
{
"status": "ok",
"redis": { "reachable": true },
"queues": {
"high": { "pending": 0, "wait_seconds": 0 },
"default": { "pending": 2, "wait_seconds": 1 },
"audience-sync": { "pending": 0, "wait_seconds": 0 }
},
"failed_jobs": 0
}
| Field | Meaning |
|---|---|
status | ok when the queue backend is reachable, degraded otherwise. |
redis.reachable | Whether the queue backend responded. |
queues.*.pending | Number of jobs waiting on that queue. |
queues.*.wait_seconds | Horizon's estimated wait, in seconds, before a new job on that queue starts. |
failed_jobs | Count of jobs that have recently failed across all queues. |
Each queue is reported separately, keyed by its name. For the meaning of the
high, default, and audience-sync queues and how they're prioritized, see
Queue Management. When the backend is unreachable,
status is degraded, queues is empty, failed_jobs is null, and the
response status code is 503.
To enumerate the tenants a monitor should poll for tenant-level health, a super
admin can list them with GET /api/v1/landlord/tenants.
Setting up monitoring
- Alert on the status code. Treat any non-
200(especially503) as a page-worthy event — the endpoints are built so a plain HTTP check is enough. - Trend the depths. A single high reading can be a momentary spike; a depth or backlog age that climbs across consecutive polls is a stuck consumer.
- Watch
failed_jobs. A non-zero, growing count signals jobs erroring out rather than simply queuing.
Both endpoints require authentication, so configure your monitor with a Sanctum token. See REST API Authentication for how to issue one.