Health Sweeps
The /health command runs a comprehensive infrastructure health check, scanning across incidents, alarms, error logs, metrics, and recent deployments to produce a structured report.
Usage
/health # 1-hour lookback (default)
/health 4h # 4-hour lookback
/health 24h # Full day
/health 15m # Quick 15-minute check
What it checks
| Category | Data Sources | What NeuBird Looks For |
|---|---|---|
| Incidents | PagerDuty, OpsGenie, ServiceNow | Active incidents, severity, acknowledgment status |
| Alarms | CloudWatch, Datadog monitors | Currently firing alarms, alarm history |
| Error logs | Grafana Loki, Splunk | Error rate by service, new error patterns |
| Metrics | Prometheus, CloudWatch, Datadog | Resource saturation, latency trends, anomalies |
| Deployments | GitHub, GitLab, ArgoCD | Recent merges, deployment status, risky changes |
Report structure
The health report is organized into severity-ranked sections:
| Section | Description |
|---|---|
| Bad | Active problems with evidence, blast radius, and timeline |
| Good | Services operating normally, with evidence to prove it |
| Ugly | Not broken yet, but trending the wrong way |
| Recommended Actions | Prioritized next steps ranked by urgency |
Each finding includes the specific data sources and queries that produced the evidence.