Health Checks

Allegro exposes two read-only HTTP endpoints designed to be polled by an external monitor (an uptime checker, a metrics scraper, or an on-call alerting system). One reports the health of a single tenant's queues; the other reports the health of the platform-wide job queues. Both return an HTTP 503 when their backing service can't be reached, so even a naive "is the status code 200?" check trips on a degraded state.

These endpoints are part of the REST API — see the REST API reference for the authoritative request and response schemas. This guide explains what they report and how to wire up monitoring against them.

Tenant health

GET /api/v1/health

Authenticated with a Sanctum bearer token; the token's user must be able to view the tenant. The endpoint reports the depth and oldest-item age of that tenant's Redis-backed queues — the Event Queue and the Heartbeat Queue.

{
    "status": "ok",
    "tenant": "tenant_123",
    "redis": { "reachable": true },
    "queues": {
        "events": { "depth": 0, "oldest_age_seconds": null },
        "heartbeats": { "depth": 12, "oldest_age_seconds": 34 }
    }
}

Field	Meaning
`status`	`ok` when Redis is reachable, `degraded` otherwise.
`redis.reachable`	Whether the queue's Redis backend responded.
`queues.*.depth`	Number of items currently waiting in the queue.
`queues.*.oldest_age_seconds`	Age in seconds of the oldest item still waiting, or `null` when the queue is empty.

When Redis cannot be reached, status is degraded, every queue value is null, and the response status code is 503.

A steadily rising depth or oldest_age_seconds means a consumer has fallen behind and is no longer draining the queue fast enough — see the Event Queue and Heartbeat Queue guides for how consumers read and delete items.

Platform health

GET /api/v1/landlord/health

Authenticated with a Sanctum bearer token belonging to a super admin, and served only in the landlord context. It reports the backlog of every Horizon-managed background queue plus the recent failed-job count.

{
    "status": "ok",
    "redis": { "reachable": true },
    "queues": {
        "high": { "pending": 0, "wait_seconds": 0 },
        "default": { "pending": 2, "wait_seconds": 1 },
        "audience-sync": { "pending": 0, "wait_seconds": 0 }
    },
    "failed_jobs": 0
}

Field	Meaning
`status`	`ok` when the queue backend is reachable, `degraded` otherwise.
`redis.reachable`	Whether the queue backend responded.
`queues.*.pending`	Number of jobs waiting on that queue.
`queues.*.wait_seconds`	Horizon's estimated wait, in seconds, before a new job on that queue starts.
`failed_jobs`	Count of jobs that have recently failed across all queues.

Each queue is reported separately, keyed by its name. For the meaning of the high, default, and audience-sync queues and how they're prioritized, see Queue Management. When the backend is unreachable, status is degraded, queues is empty, failed_jobs is null, and the response status code is 503.

To enumerate the tenants a monitor should poll for tenant-level health, a super admin can list them with GET /api/v1/landlord/tenants.

Setting up monitoring

Alert on the status code. Treat any non-200 (especially 503) as a page-worthy event — the endpoints are built so a plain HTTP check is enough.
Trend the depths. A single high reading can be a momentary spike; a depth or backlog age that climbs across consecutive polls is a stuck consumer.
Watch failed_jobs. A non-zero, growing count signals jobs erroring out rather than simply queuing.

note

Both endpoints require authentication, so configure your monitor with a Sanctum token. See REST API Authentication for how to issue one.

Tenant health​

Platform health​

Setting up monitoring​

Tenant health

Platform health

Setting up monitoring