Now monitoring infrastructure in 37 regions

Your Fly.io Infrastructure.
On Autopilot.

Meet Caveman — the AI assistant that monitors, manages, and roasts your servers so you don't have to. One-man IT department. Zero brain cells required.

caveman — mission control
ðŸĶī

Ugh. CPU at 98% on us-east-1 for 3 hours straight. You deploy spaghetti code again?

Maybe bump machines from 3 → 5 in same region? Or... write better code. Just saying. ðŸ”Ĩ

Scale it up, Caveman.
You
ðŸĶī

Done. 5 machines running in us-east-1. CPU already dropping. You owe me a steak. ðŸĨĐ

Features

Everything You Need. Nothing You Don't.

Dashboard

Mission Control.
But Make It Caveman.

Your main dashboard is the nerve center of your infrastructure. Real-time server status, performance metrics, traffic flows, and scaling controls — all in one place. No more SSH-ing into 47 different machines to figure out what's on fire.

  • Real-time server monitoring across all regions
  • One-click node scaling — drag a slider, get more compute
  • Live traffic visualization — see where your users actually are
  • CPU, RAM, network — all updating live
Mission Control Live

CPU

42%

Memory

2.4 GB

Network

847 MB

Machines 5 running
ðŸĶī

Hey chief. Memory usage creeping up on app-worker-3. Looks like a leak. I'd restart that bad boy before it takes everyone down.

ðŸĶī

Traffic from EU doubled since last Tuesday. Maybe time to add a Frankfurt region? I can set it up in 30 seconds. You just say word.

ðŸĶī

CPU been at 100% for 6 hours. At this point... maybe think about optimizing code? Or throw more cores at it. Your call, boss. ðŸĪ·

AI Assistant

Meet Caveman.
Your One-Man IT Department.

Caveman watches your infrastructure 24/7 so you can sleep. When CPU hits the ceiling, traffic spikes, or something smells funny — Caveman slides into your dashboard with advice that actually makes sense.

Think of it as a senior DevOps engineer who communicates exclusively in plain English. It doesn't just alert you — it tells you exactly what to do about it.

Automation

Set Rules.
Caveman Does the Rest.

Define your own scaling rules and let Caveman handle the grunt work. Job queue above 1,000? Spin up 5 machines. Below 3? Scale back down. CPU over 90% for 10 minutes? Add more cores. You write the rules, Caveman follows orders — no questions asked. Well, maybe some sarcastic comments.

  • Custom scaling triggers — CPU, memory, queue depth, anything
  • Scale up AND down automatically — no wasted spend
  • Caveman notifies you every time a rule fires
  • Override anytime — you're still the boss
Automation Rules 3 active
Scale Up — High Load Active

If job queue > 1,000 → scale to 5 machines

Triggered 12 times this week

Scale Down — Low Traffic Active

If job queue < 3 → scale to 1 machine

Triggered 8 times this week

CPU Alert — Hot Potato Active

If CPU > 90% for 10min → add 2 cores + notify

Triggered 3 times this week

Live Logs
us-east-1

12:04:32 [info] GET /api/users 200 in 12ms

12:04:33 [info] POST /api/orders 201 in 45ms

12:04:33 [warn] Slow query detected: 890ms

12:04:34 [error] Connection timeout to db-replica-2

12:04:34 [info] Retrying connection... success

12:04:35 [info] GET /api/health 200 in 2ms

12:04:36 [info] Worker processed 142 jobs

Logs

Every Log.
Instantly Searchable.

Stream logs from every machine, every region, in real time. Full-text search across your entire infrastructure. Find that one needle in your haystack of stdout before your boss finds it first.

  • Real-time log streaming across all instances
  • Full-text search with regex support
  • Filter by region, app, instance, or severity
Metrics

Numbers That Actually
Mean Something.

CPU, memory, network I/O, request latency, error rates — visualized in dashboards that update in real time. No Grafana PhD required. Spot trends, catch anomalies, and impress people in standups with charts you didn't have to configure.

  • Pre-built dashboards for the metrics that matter
  • Anomaly detection — Caveman flags the weird stuff
  • Historical data — trends over hours, days, or weeks
Metrics — Last 24h All regions

Request Latency (ms)

Avg Latency

23ms

Requests

1.2M

Error Rate

0.02%

Your Dashboards + New

Mission Control

Default — 4 widgets

On

Logs & Search

Default — 2 widgets

On

Metrics

Default — 3 widgets

Off

Deploy Pipeline

Custom — 6 widgets

On
Customizable

Your Cave.
Your Rules.

Don't like our dashboards? Build your own. Drag, drop, and arrange widgets however your brain works. Want to disable everything we built and start from scratch? We respect that energy.

  • Drag-and-drop dashboard builder
  • Enable, disable, or rearrange any default dashboard
  • Create unlimited custom views
Cost Intelligence

Know Where Every
Dollar Goes.

Real-time cost tracking per app, per region. Know exactly what you're spending, what you're going to spend, and where you're wasting money. Caveman finds the idle machines burning cash and tells you about it — before your finance team does.

  • Per-app, per-region spend tracking in real time
  • Cost forecasting — "On track for $X this month"
  • Optimization suggestions — "Move from iad to ord, save 15%"
  • Idle resource detection — "This machine ran 720h, handled 12 requests"
Cost Overview — January Under budget
Monthly Spend Budget: $200

$127.43

Forecast: $183 by month end

api-prod

us-east-1 — 3 machines

$67.20

web-frontend

us-east-1 — 2 machines

$42.80

staging-worker

Idle — 4 requests in 7 days

$17.43
Caveman Insights Last 24h
ðŸ”Ĩ

Anomaly Detected

Memory usage 40% higher than normal for this time of day. Possible memory leak in api-prod since last deploy.

⚡

Root Cause Analysis

Latency spike at 3pm correlates with deploy #847 two hours ago. Response times up 3x since. Consider rollback?

📈

Proactive Warning

At current growth rate, you'll hit memory limits in 3 days. Recommend upgrading to performance-2x or adding another machine.

📊

Weekly Digest

Your apps used $47.20 this week, had 2 incidents, and served 1.2M requests with 99.97% uptime.

AI Monitoring

Caveman Sees What
Dashboards Can't.

Dashboards show you numbers. Caveman tells you what they mean. Anomaly detection that actually understands your traffic patterns. Root cause analysis that connects deploys to latency spikes. Proactive warnings before things break — not after.

  • Anomaly detection — flags unusual patterns automatically
  • Root cause analysis — correlates deploys with incidents
  • Proactive warnings — "You'll hit limits in 3 days"
  • Daily/weekly digests — spend, incidents, uptime in your inbox

Not Another Dashboard

Your Fly.io Control Plane

Monitor, scale, optimize, automate — all in one place.

Feature Grafana Datadog Caveman
Fly.io Native Requires setup Generic Built-in
Autoscaling Read-only Monitor only Integrated actions
AI Anomaly Detection Manual thresholds Yes Fly.io-specific
Cost Tracking No $$$ Included
Machine Management No No Scale, restart, deploy
Pricing Free / Cloud $100-1000+/mo Free to start

Infrastructure Management
Shouldn't Require a PhD

Always Watching

Caveman monitors your infrastructure around the clock. CPU spiking? Memory leaking? Traffic surging? You'll know before your users do.

Actually Helpful

Not just alerts — actionable advice. Caveman tells you what's wrong AND what to do about it. Scale up, optimize, restart — with one click.

Dead Simple

No YAML files. No Terraform. No 47-page runbooks. Just a dashboard that makes sense and an assistant that speaks human.

Stop Wrestling Your Infrastructure.

Let Caveman Handle It.

Set up in minutes. No blood sacrifice required.

Get Started — It's Free →