Skip to content

Health Sweeps

The /health command runs a comprehensive infrastructure health check, scanning across incidents, alarms, error logs, metrics, and recent deployments to produce a structured report.


Usage

/health          # 1-hour lookback (default)
/health 4h       # 4-hour lookback
/health 24h      # Full day
/health 15m      # Quick 15-minute check

What it checks

Category Data Sources What NeuBird Looks For
Incidents PagerDuty, OpsGenie, ServiceNow Active incidents, severity, acknowledgment status
Alarms CloudWatch, Datadog monitors Currently firing alarms, alarm history
Error logs Grafana Loki, Splunk Error rate by service, new error patterns
Metrics Prometheus, CloudWatch, Datadog Resource saturation, latency trends, anomalies
Deployments GitHub, GitLab, ArgoCD Recent merges, deployment status, risky changes

Report structure

The health report is organized into severity-ranked sections:

Section Description
Bad Active problems with evidence, blast radius, and timeline
Good Services operating normally, with evidence to prove it
Ugly Not broken yet, but trending the wrong way
Recommended Actions Prioritized next steps ranked by urgency

Each finding includes the specific data sources and queries that produced the evidence.