Skip to content

Alerting

Alerting turns a query into a watch. You write an OQL query, set a condition on its result, and Hexcovery evaluates it on a schedule. When the condition holds, it notifies you; when it clears, it tells you that too. Everything firing is recorded as an incident you can review later.

How a rule works

An alert rule has three moving parts:

  1. A query. An OQL expression that returns a single value per evaluation — for example the average CPU of a host, the p99 latency of a service, or the error rate of an endpoint.
  2. A condition. A comparison against that value: a fixed threshold (> 90) or an anomaly condition that flags departures from normal behavior instead of a fixed number.
  3. A schedule. How often the rule is evaluated. On each tick, the worker runs the query and checks the condition.

A pool of alert workers evaluates due rules continuously and in parallel, so adding rules does not slow down evaluation.

The lifecycle: OK → PENDING → FIRING → OK

Every rule is always in exactly one state:

State Meaning
OK The condition is not met. Nothing to do.
PENDING The condition just became true, but the rule has a for duration and that time has not yet elapsed. The rule is watching to see whether the condition sustains.
FIRING The condition has held continuously for the whole for duration. This is a real alert: notifications are sent.
OK (resolved) The condition cleared. The incident is closed and a resolution notification is sent.
        condition met            held for `for`           condition clears
  OK ───────────────────► PENDING ───────────────► FIRING ───────────────► OK
        (start watching)        (sustained)              (resolved)

The for duration is what keeps a single noisy spike from paging you. A rule that fires on cpu > 90 for 5m only reaches FIRING if CPU stays above 90 for five solid minutes; a one-off blip moves it to PENDING and then straight back to OK with no notification sent.

What happens when a rule fires

When a rule reaches FIRING, Hexcovery sends a notification through every notification channel on the rule's escalation chain — Slack, PagerDuty, Discord, email, or any HTTP webhook. The same chain delivers a resolved notification when the rule returns to OK, so an acknowledged incident always has a clear end.

Notifications are de-duplicated: you are notified when state changes, not on every evaluation while the condition continues to hold.

Reviewing what happened

Each transition into FIRING opens an incident. The Alert History page lists past and current incidents; each one has its own page with the metric chart at the time of the event, any annotations that overlap it (deploys, releases), and an AI-written summary of what the data was doing.

In this section

  • Alert rules — write the query, the threshold, the for duration, and the escalation chain; preview, bulk-edit, and the per-organization rule limit.
  • Anomaly conditions — alert on unusual behavior (z-score, EMA, seasonal) instead of a fixed number.
  • Notification channels — configure webhooks and email, and the JSON payload they receive.
  • Incidents — Alert History and the per-incident page.
  • Annotations — event markers (deploys, releases, incidents) overlaid on your charts.

Back to the documentation home.