Quick Start¶
After you have configured your MCP client in the Installation section, continue with the following steps to run your first investigation. This guide assumes usage of Claude Code.
Test Hawkeye Connection¶
Try asking these questions in your MCP Client:
You should get a list of your Hawkeye projects and uninvestigated incidents.
Your First Investigation¶
Let's investigate an alert from start to finish.
1. Find Uninvestigated Alerts¶
Ask Claude:
This uses hawkeye_list_sessions with only_uninvestigated=true.
Expected output:
Found 3 uninvestigated alerts:
1. High CPU on prod-web-server-01
Alert ID: /subscriptions/.../alerts/cpu-spike-123
Severity: P1
Time: 2024-01-15 14:23 UTC
2. Database connection timeout
Alert ID: /subscriptions/.../alerts/db-timeout-456
Severity: P2
Time: 2024-01-15 13:45 UTC
3. Memory leak in api-service
Alert ID: /subscriptions/.../alerts/memory-leak-789
Severity: P2
Time: 2024-01-15 12:10 UTC
2. Investigate an Alert¶
Ask Claude:
Claude will:
- Extract the
alert_idfrom the list - Call
hawkeye_investigate_alertto start investigation - Monitor real-time progress updates showing current steps and data sources
- Wait for investigation to complete (usually 30-60 seconds)
- Retrieve the Root Cause Analysis
Real-Time Progress Tracking
When you start an investigation, Hawkeye automatically streams progress updates. You'll see live updates on what the investigation is doing, which data sources it's consulting, and how far along it is. Use hawkeye_get_investigation_status to check progress at any time.
Expected output:
Investigation complete! Here's what I found:
Root Cause Analysis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Incident: High CPU on prod-web-server-01
Severity: P1
Status: Resolved
Root Cause:
A runaway cron job (backup-cleanup.sh) was consuming
excessive CPU due to an infinite loop caused by a race
condition when processing large file lists.
Timeline:
1. 14:15 UTC - Backup cleanup cron job started
2. 14:18 UTC - CPU spiked to 95% on prod-web-server-01
3. 14:23 UTC - CloudWatch alarm triggered
4. 14:25 UTC - Auto-remediation killed the process
5. 14:26 UTC - CPU returned to normal (12%)
Corrective Actions:
1. Kill the runaway process
Status: ✓ Auto-executed
2. Add timeout to backup script
bash:
sed -i '1a timeout 300' /opt/scripts/backup-cleanup.sh
3. Add file count limit
bash:
find /backups -mtime +30 | head -n 1000 | xargs rm
Business Impact:
- API response time increased 200ms (14:18-14:26)
- No customer-facing errors
- Estimated impact: 50 users affected
Time Saved: 45 minutes
3. Get Investigation Link¶
Ask Claude:
This provides a direct URL to view the investigation in the Hawkeye web interface, making it easy to share with team members or bookmark for later reference.
4. Ask Follow-up Questions¶
Ask Claude:
This uses hawkeye_continue_investigation to dig deeper.
Expected output:
Looking deeper into the history...
This is the 3rd occurrence in the past month:
- Jan 15: High CPU (this incident)
- Dec 28: Similar CPU spike, manually resolved
- Dec 12: CPU spike, auto-resolved
Pattern Analysis:
All incidents occurred when backup directory
contained >10,000 files. The script lacks proper
file count handling.
Recommendation:
Implement the corrective actions to prevent
recurrence. The auto-remediation worked, but
fixing the root cause will eliminate these
incidents entirely.
5. Get Actionable Insights¶
The RCA includes ready-to-execute bash scripts:
# Add timeout to prevent infinite loops
sed -i '1a timeout 300' /opt/scripts/backup-cleanup.sh
# Add file count limit
find /backups -mtime +30 | head -n 1000 | xargs rm
# Add this to crontab for daily cleanup
echo "0 2 * * * timeout 300 /opt/scripts/backup-cleanup.sh" | crontab -
Common Workflows¶
List Projects¶
List Connections¶
Get Investigation History¶
Check Performance Metrics¶
This uses hawkeye_get_incident_report for organization-wide analytics.
What's Next?¶
Now that you've completed your first investigation, choose your next step:
Managing Connections → Connect AWS, Azure, Datadog, and more
Using Instructions → Guide Hawkeye's investigation behavior
Complete Onboarding → Full setup from scratch to production
Examples → See real-world examples and workflows
Tips for Success¶
Start Simple¶
Don't try to configure everything at once. Start with:
- One project
- One connection (your primary monitoring tool)
- A few basic instructions
Use the Guidance System¶
Ask Hawkeye for help directly:
This uses hawkeye_get_guidance for interactive help.
Test Instructions Before Deploying¶
Always test instructions on past sessions before adding to your project:
- Validate instruction
- Apply to test session
- Rerun investigation
- Compare results
- Add to project if improved
See Using Instructions for details.
Monitor Your Analytics¶
Check your incident statistics regularly:
Track: - MTTR (Mean Time To Resolution) - Time saved vs manual investigation - Investigation quality scores - Noise reduction from filtering
Troubleshooting¶
No Tools Available¶
Problem: Claude says Hawkeye tools aren't available
Solution:
- Check Claude Desktop config is correct
- Restart Claude Desktop completely
- Check for errors in Claude Desktop logs:
Authentication Failed¶
Problem: 401 Unauthorized error
Solution:
- Verify credentials are correct
- Check
HAWKEYE_BASE_URLends with/api - Test login at Hawkeye web UI
Investigation Taking Too Long¶
Problem: Investigation running for several minutes
Solution:
This is normal for first investigation on a new project while Hawkeye:
- Syncs your connections (may take 5-10 minutes)
- Indexes your data sources
- Builds correlation models
Subsequent investigations are much faster (30-60 seconds).
No Uninvestigated Alerts¶
Problem: List shows no uninvestigated alerts
Solution:
This is actually good news! It means:
- All alerts have been investigated, or
- Filters are working and removing noise, or
- No alerts in the time period
Try expanding the date range:
Getting Help¶
- Documentation: Full guides at guides section
- Examples: Real-world examples in examples section
- Support: Contact NeuBird for help
Next Steps¶
Choose your path:
Follow the comprehensive onboarding guide:
See practical examples and workflows: