Complete Setup Example¶

End-to-end example of setting up Hawkeye MCP from scratch.

Scenario¶

Company: Acme Corp Environment: Production AWS + Datadog Goal: Autonomous incident investigation with <10 minute MTTR

Step 1: Prerequisites¶

Ensure you have: - Node.js 20+ installed - A Hawkeye account with credentials

Step 2: Configure Claude Desktop¶

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "hawkeye": {
      "command": "npx",
      "args": ["-y", "hawkeye-mcp-server@latest"],
      "env": {
        "HAWKEYE_EMAIL": "ops@acmecorp.com",
        "HAWKEYE_PASSWORD": "SecurePassword123!"
      }
    }
  }
}

Restart Claude Desktop.

Step 3: Create Project¶

Claude conversation:

User: Create a new Hawkeye project called "Production" for our AWS production environment

Claude: ✓ Created project "Production"
UUID: 550e8400-e29b-41d4-a716-446655440000
Status: Active

Step 4: Add AWS Connection¶

Prepare AWS Role:

# CloudFormation template for IAM role
aws cloudformation create-stack \
  --stack-name hawkeye-readonly-role \
  --template-body file://hawkeye-role.yaml

Create connection:

User: Create an AWS connection named "Production AWS" with:
- Role ARN: arn:aws:iam::123456789012:role/HawkeyeReadOnly
- External ID: acme-prod-hawkeye-2024
- Regions: us-east-1, us-west-2

Claude: ✓ Created AWS connection "Production AWS"
UUID: conn-aws-550e8400
Status: SYNCING

Syncing your AWS resources...
This may take 5-10 minutes for first sync.

Wait for sync:

User: Wait for my AWS connection to finish syncing

Claude: ⏳ Syncing... (2 minutes elapsed)
⏳ Syncing... (4 minutes elapsed)
⏳ Syncing... (6 minutes elapsed)
✓ Sync complete! (7 minutes total)

Discovered resources:
- 52 EC2 instances
- 8 RDS databases
- 35 Lambda functions
- 180 CloudWatch log groups
- 600+ CloudWatch metrics

Step 5: Add Datadog Connection¶

User: Create a Datadog connection named "Production Datadog" with:
- API Key: a1b2c3d4e5f6g7h8i9j0
- App Key: k1l2m3n4o5p6q7r8s9t0
- Endpoint: datadoghq.com

Claude: ✓ Created Datadog connection "Production Datadog"
UUID: conn-dd-660f9511
Status: SYNCING

Syncing... (1 minute)
✓ Sync complete!

Discovered resources:
- 500+ Datadog metrics
- 200+ APM services
- 150 log pipelines

Step 6: Link Connections to Project¶

User: Add my AWS and Datadog connections to the Production project

Claude: ✓ Added connections to Production project:
  - Production AWS (AWS)
  - Production Datadog (Datadog)

Your project can now access:
- AWS CloudWatch logs & metrics
- EC2, RDS, Lambda resources
- Datadog logs, metrics & traces

Step 7: Create System Instruction¶

User: Create a SYSTEM instruction for my Production project:

"Our production environment:
- Microservices architecture on AWS EKS
- 5 main services: api-gateway, user-service, order-service, payment-service, notification-service
- PostgreSQL RDS 13 for primary database with 2 read replicas
- Redis ElastiCache for session storage and caching
- All services instrumented with Datadog APM
- Logging: CloudWatch + Datadog (centralized)
- Deployment: Rolling updates via ArgoCD
- Peak traffic: 1200 req/sec weekdays 2-6pm EST
- SLA: 99.9% uptime, <200ms p95 API latency"

Claude: ✓ Created SYSTEM instruction
Name: "Production Architecture Context"
Status: Active

This context will be included in all investigations.

Step 8: Create Filter Instruction¶

User: Create a FILTER instruction for Production:

"Only investigate incidents with:
- Severity P1 (critical) or P2 (high)
- Not from automated testing systems
- Not during scheduled maintenance windows
- Affecting production workloads (not canary/staging)"

Claude: ✓ Created FILTER instruction
Name: "Production Priority Filter"
Status: Active

This will reduce noise by filtering P3+ alerts automatically.

Step 9: Create RCA Instructions¶

Database Issues¶

User: Create an RCA instruction:

"For database-related incidents:
1. Check PostgreSQL slow query log (queries >100ms)
2. Get EXPLAIN ANALYZE output for slow queries
3. Identify missing indexes using pg_stat_user_tables
4. Review connection pool metrics (max: 100 connections)
5. Check for table locks and lock wait times
6. Provide specific CREATE INDEX CONCURRENTLY commands
7. Estimate performance improvement"

Claude: ✓ Created RCA instruction
Name: "Database Performance Investigation"
Status: Active

API Latency¶

User: Create an RCA instruction:

"For API latency issues:
1. Check Datadog APM traces for affected endpoints
2. Identify slowest span in the trace
3. Review database query performance within spans
4. Check external API call timeouts
5. Examine Redis cache hit rates
6. Review EKS pod resource utilization (CPU/memory)
7. Provide specific optimization recommendations"

Claude: ✓ Created RCA instruction
Name: "API Latency Investigation Steps"
Status: Active

Step 10: First Investigation¶

User: Show me uninvestigated alerts from the last 24 hours

Claude: Found 3 uninvestigated incidents:

1. High API Latency - payment-service
   Alert ID: /aws/cloudwatch/alerts/latency-spike-20240115-1430
   Severity: P1
   Time: 2024-01-15 14:30 UTC
   Source: Datadog

2. Database Connection Pool Warning
   Alert ID: /aws/cloudwatch/alerts/db-pool-20240115-1315
   Severity: P2
   Time: 2024-01-15 13:15 UTC
   Source: CloudWatch

3. Memory Usage High - user-service
   Alert ID: /aws/cloudwatch/alerts/memory-20240115-1210
   Severity: P2
   Time: 2024-01-15 12:10 UTC
   Source: CloudWatch

User: Investigate the payment-service latency issue

Claude: 🔍 Starting investigation...
Using instructions:
- Production Architecture Context
- Database Performance Investigation
- API Latency Investigation Steps

⏳ Analyzing Datadog APM traces... (20s)
⏳ Checking CloudWatch logs... (15s)
⏳ Reviewing RDS metrics... (12s)
⏳ Correlating events... (18s)

✓ Investigation complete! (65s total)

Step 11: Review RCA¶

User: Show me the RCA

Claude: Root Cause Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Incident: High API Latency - payment-service
Severity: P1
Duration: 15 minutes (14:30-14:45 UTC)
Status: Auto-resolved

ROOT CAUSE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
New payment validation query introduced in v2.1.0 deployment
was missing index on payments.merchant_id column. This caused
full table scans on 5M row table, resulting in:
- Query time: 850ms (vs normal 5ms)
- Database CPU spike: 45% → 85%
- Connection pool saturation: 95/100
- API latency increase: 50ms → 900ms

TIMELINE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
14:25 - Deployment v2.1.0 completed via ArgoCD
14:30 - Payment API latency increased to 900ms
14:32 - Database CPU spiked to 85%
14:33 - Connection pool 95/100 connections used
14:35 - Datadog APM alarm triggered (p95 >500ms)
14:40 - Auto-scaling added 2 more RDS read replicas
14:42 - Database CPU distributed, dropped to 60%
14:45 - API latency returned to normal (55ms)

CORRECTIVE ACTIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Immediate (AUTO-EXECUTED):
✓ Added 2 RDS read replicas
✓ Distributed query load

Required:
1. Add missing database index

   psql -h prod-db.us-east-1.rds.amazonaws.com \\
     -U admin -d production

   CREATE INDEX CONCURRENTLY idx_payments_merchant_id
     ON payments(merchant_id);

   Estimated improvement: 850ms → 5ms (99.4% faster)

2. (Optional) Rollback deployment v2.1.0

   argocd app rollback payment-service --revision 2.0.9

3. Review PR #456 query optimization before redeploying

PREVENTIVE MEASURES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Add database query performance tests to CI/CD pipeline
2. Enable pg_stat_statements for query monitoring
3. Add pre-deployment query plan review to checklist
4. Configure proactive connection pool alerts (>80%)
5. Implement query timeout guards (<100ms) in application

BUSINESS IMPACT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
- Payment API requests degraded: 50ms → 900ms
- Duration: 15 minutes
- Affected requests: ~3,000 (200 req/sec × 15 min × 100 payment requests/sec)
- Failed requests: 0 (degraded but not failing)
- Estimated affected users: ~800
- No financial loss (all payments eventually processed)

TIME SAVED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Manual investigation estimate: 45 minutes
- Check logs: 10 min
- Query Datadog APM: 10 min
- Database analysis: 15 min
- Correlation & RCA: 10 min

Hawkeye investigation time: 65 seconds
Time saved: 43 minutes 55 seconds ⚡

Step 12: Execute Fix¶

# Connect to RDS
psql -h prod-db.us-east-1.rds.amazonaws.com -U admin -d production

# Add index (uses CONCURRENTLY to avoid locking)
CREATE INDEX CONCURRENTLY idx_payments_merchant_id
  ON payments(merchant_id);

# Verify index
\d payments

# Output shows new index:
# idx_payments_merchant_id btree (merchant_id)

Step 13: Verify Fix¶

User: Test payment API latency now

# Direct query to database
SELECT merchant_id, COUNT(*)
FROM payments
WHERE merchant_id = 'merch_123'
GROUP BY merchant_id;

# Time: 4.8ms (was 850ms) ✓

Results After 30 Days¶

User: Show me incident statistics for the last 30 days

Claude: Incident Analytics - Production Project
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

INVESTIGATIONS:
- Total incidents: 67
- Auto-investigated: 67 (100%)
- Average quality score: 91/100
- Perfect RCAs (95-100): 52 (78%)

PERFORMANCE:
- MTTR: 7.2 minutes (vs 40min manual baseline)
- Time saved: 36.7 hours (82% reduction)
- Fastest resolution: 1.5 minutes
- Slowest resolution: 22 minutes

NOISE REDUCTION:
- Alerts received: 340
- Filtered by instructions: 273 (80%)
- Investigated: 67 (20%)
- Grouped incidents: 15 groups

TOP INCIDENT TYPES:
1. API Latency (18 incidents)
2. Database Issues (15 incidents)
3. Memory Leaks (12 incidents)
4. Deployment Failures (10 incidents)
5. External API Timeouts (7 incidents)
6. Other (5 incidents)

BUSINESS IMPACT:
- Prevented downtime: 4.2 hours
- Maintained SLA: 99.92% (target: 99.9%)
- Cost avoidance: $125,000 (downtime cost prevented)

Key Takeaways¶

✅ Setup time: 45 minutes ✅ MTTR: 7.2 minutes (82% improvement) ✅ Time saved: 36.7 hours in first month ✅ Noise reduction: 80% of alerts auto-filtered ✅ SLA maintenance: Exceeded 99.9% target

Next Steps¶

Add more RCA instructions for edge cases
Integrate with Slack for notifications
Set up weekly review process
Train team on Hawkeye workflows