Alert rules

An alert rule pairs an OQL query with a condition and a schedule. You create and manage rules on the Alert Rules page (under Alerts). This page walks through every field.

The condition: query + threshold

The core of a rule is a query that returns a single value, compared against a threshold.

Query. Write an OQL expression that produces one number per evaluation — an aggregate over a metric, a latency percentile, an error count, a log match rate. See the OQL examples for patterns you can adapt.
Comparison. Choose the operator (>, >=, <, <=, ==, !=).
Threshold. The value to compare against.

A rule reading "average CPU above 90" evaluates its query, takes the result, and checks it against 90 on every tick. Keep the query cheap and focused — it runs on the rule's schedule, not on demand.

Not a fixed number?

If "normal" for your metric drifts with time of day or traffic, a fixed threshold either pages too often or misses real problems. Use an anomaly condition instead — it compares the value against the metric's own recent history.

The `for` duration

The for duration sets how long the condition must hold continuously before the rule fires. While the condition is met but the duration has not elapsed, the rule sits in PENDING; only after it has held for the full for window does it reach FIRING. (See the lifecycle.)

Short for (or none) → fast paging, more sensitive to transient spikes.
Longer for → fewer false alarms, slower to fire on genuine sustained problems.

A for of a few minutes is a good default for most metric rules — long enough to ignore a single bad scrape, short enough to catch a real outage quickly.

The escalation chain

Each rule carries an ordered list of notification channels — its escalation chain. When the rule fires, Hexcovery delivers the alert down the chain; when it resolves, the chain receives the resolution. Channels are reusable across rules, so you configure them once (see Notification channels) and attach them here.

Anomaly conditions

Instead of a fixed threshold, a rule can use an anomaly condition that flags when the metric departs from its own normal range. This is the right choice for metrics with no obvious "bad number". The available methods (z-score, EMA, seasonal) and how to tune them are covered in Anomaly conditions.

Preview before you save

Each rule has a preview chart: it plots the query's recent results so you can see where your threshold falls relative to real data before committing. Use it to sanity-check the threshold and for duration — does the line actually cross where you expect, and does a normal day stay clear of the line?

Bulk operations

The Alert Rules list supports bulk operations so you can manage many rules at once — for example enabling or disabling a group of rules together rather than editing them one by one. Disabling a rule stops its evaluation without deleting it.

The per-organization rule limit

Each organization has a maximum number of alert rules (max_alert_rules, 100 by default). When you reach the limit, creating another rule is rejected. If you need more, an administrator can raise the limit for your organization.