Caveman

Now monitoring infrastructure in 37 regions

Your Fly.io Infrastructure.
On Autopilot.

Meet Caveman — the AI assistant that monitors, manages, and roasts your servers so you don't have to. One-man IT department. Zero brain cells required.

Start Grunting → See Features

caveman — mission control

🦴

Ugh. CPU at 98% on us-east-1 for 3 hours straight. You deploy spaghetti code again?

Maybe bump machines from 3 → 5 in same region? Or... write better code. Just saying. 🔥

Scale it up, Caveman.

You

🦴

Done. 5 machines running in us-east-1. CPU already dropping. You owe me a steak. 🥩

Features

Everything You Need. Nothing You Don't.

Dashboard

Mission Control.
But Make It Caveman.

Your main dashboard is the nerve center of your infrastructure. Real-time server status, performance metrics, traffic flows, and scaling controls — all in one place. No more SSH-ing into 47 different machines to figure out what's on fire.

Real-time server monitoring across all regions
One-click node scaling — drag a slider, get more compute
Live traffic visualization — see where your users actually are
CPU, RAM, network — all updating live

Mission Control Live

CPU

42%

Memory

2.4 GB

Network

847 MB

Machines 5 running

🦴

Hey chief. Memory usage creeping up on app-worker-3. Looks like a leak. I'd restart that bad boy before it takes everyone down.

🦴

Traffic from EU doubled since last Tuesday. Maybe time to add a Frankfurt region? I can set it up in 30 seconds. You just say word.

🦴

CPU been at 100% for 6 hours. At this point... maybe think about optimizing code? Or throw more cores at it. Your call, boss. 🤷

AI Assistant

Meet Caveman.
Your One-Man IT Department.

Caveman watches your infrastructure 24/7 so you can sleep. When CPU hits the ceiling, traffic spikes, or something smells funny — Caveman slides into your dashboard with advice that actually makes sense.

Think of it as a senior DevOps engineer who communicates exclusively in plain English. It doesn't just alert you — it tells you exactly what to do about it.

Automation

Set Rules.
Caveman Does the Rest.

Define your own scaling rules and let Caveman handle the grunt work. Job queue above 1,000? Spin up 5 machines. Below 3? Scale back down. CPU over 90% for 10 minutes? Add more cores. You write the rules, Caveman follows orders — no questions asked. Well, maybe some sarcastic comments.

Custom scaling triggers — CPU, memory, queue depth, anything
Scale up AND down automatically — no wasted spend
Caveman notifies you every time a rule fires
Override anytime — you're still the boss

Automation Rules 3 active

Scale Up — High Load Active

If job queue > 1,000 → scale to 5 machines

Triggered 12 times this week

Scale Down — Low Traffic Active

If job queue < 3 → scale to 1 machine

Triggered 8 times this week

CPU Alert — Hot Potato Active

If CPU > 90% for 10min → add 2 cores + notify

Triggered 3 times this week

Live Logs

us-east-1

12:04:32 [info] GET /api/users 200 in 12ms

12:04:33 [info] POST /api/orders 201 in 45ms

12:04:33 [warn] Slow query detected: 890ms

12:04:34 [error] Connection timeout to db-replica-2

12:04:34 [info] Retrying connection... success

12:04:35 [info] GET /api/health 200 in 2ms

12:04:36 [info] Worker processed 142 jobs

Logs

Every Log.
Instantly Searchable.

Stream logs from every machine, every region, in real time. Full-text search across your entire infrastructure. Find that one needle in your haystack of stdout before your boss finds it first.

Real-time log streaming across all instances
Full-text search with regex support
Filter by region, app, instance, or severity

Metrics

Numbers That Actually
Mean Something.

CPU, memory, network I/O, request latency, error rates — visualized in dashboards that update in real time. No Grafana PhD required. Spot trends, catch anomalies, and impress people in standups with charts you didn't have to configure.

Pre-built dashboards for the metrics that matter
Anomaly detection — Caveman flags the weird stuff
Historical data — trends over hours, days, or weeks

Metrics — Last 24h All regions

Request Latency (ms)

Avg Latency

23ms

Requests

1.2M

Error Rate

0.02%

Your Dashboards + New

Mission Control

Default — 4 widgets

Logs & Search

Default — 2 widgets

Metrics

Default — 3 widgets

Off

Deploy Pipeline

Custom — 6 widgets

Customizable

Your Cave.
Your Rules.

Don't like our dashboards? Build your own. Drag, drop, and arrange widgets however your brain works. Want to disable everything we built and start from scratch? We respect that energy.

Drag-and-drop dashboard builder
Enable, disable, or rearrange any default dashboard
Create unlimited custom views

Cost Intelligence

Know Where Every
Dollar Goes.

Real-time cost tracking per app, per region. Know exactly what you're spending, what you're going to spend, and where you're wasting money. Caveman finds the idle machines burning cash and tells you about it — before your finance team does.

Per-app, per-region spend tracking in real time
Cost forecasting — "On track for $X this month"
Optimization suggestions — "Move from iad to ord, save 15%"
Idle resource detection — "This machine ran 720h, handled 12 requests"

Cost Overview — January Under budget

Monthly Spend Budget: $200

$127.43

Forecast: $183 by month end

api-prod

us-east-1 — 3 machines

$67.20

web-frontend

us-east-1 — 2 machines

$42.80

staging-worker

Idle — 4 requests in 7 days

$17.43

Caveman Insights Last 24h

🔥

Anomaly Detected

Memory usage 40% higher than normal for this time of day. Possible memory leak in api-prod since last deploy.

⚡

Root Cause Analysis

Latency spike at 3pm correlates with deploy #847 two hours ago. Response times up 3x since. Consider rollback?

📈

Proactive Warning

At current growth rate, you'll hit memory limits in 3 days. Recommend upgrading to performance-2x or adding another machine.

📊

Weekly Digest

Your apps used $47.20 this week, had 2 incidents, and served 1.2M requests with 99.97% uptime.

AI Monitoring

Caveman Sees What
Dashboards Can't.

Dashboards show you numbers. Caveman tells you what they mean. Anomaly detection that actually understands your traffic patterns. Root cause analysis that connects deploys to latency spikes. Proactive warnings before things break — not after.

Anomaly detection — flags unusual patterns automatically
Root cause analysis — correlates deploys with incidents
Proactive warnings — "You'll hit limits in 3 days"
Daily/weekly digests — spend, incidents, uptime in your inbox

Not Another Dashboard

Your Fly.io Control Plane

Monitor, scale, optimize, automate — all in one place.

Feature	Grafana	Datadog	Caveman
Fly.io Native	Requires setup	Generic	Built-in
Autoscaling	Read-only	Monitor only	Integrated actions
AI Anomaly Detection	Manual thresholds	Yes	Fly.io-specific
Cost Tracking	No	$$$	Included
Machine Management	No	No	Scale, restart, deploy
Pricing	Free / Cloud	$100-1000+/mo	Free to start

Infrastructure Management
Shouldn't Require a PhD

Always Watching

Caveman monitors your infrastructure around the clock. CPU spiking? Memory leaking? Traffic surging? You'll know before your users do.

Actually Helpful

Not just alerts — actionable advice. Caveman tells you what's wrong AND what to do about it. Scale up, optimize, restart — with one click.

Dead Simple

No YAML files. No Terraform. No 47-page runbooks. Just a dashboard that makes sense and an assistant that speaks human.

Stop Wrestling Your Infrastructure.

Let Caveman Handle It.

Set up in minutes. No blood sacrifice required.

Get Started — It's Free →

Your Fly.io Infrastructure. On Autopilot.

Everything You Need. Nothing You Don't.

Mission Control.But Make It Caveman.

Meet Caveman.Your One-Man IT Department.

Set Rules.Caveman Does the Rest.

Every Log.Instantly Searchable.

Numbers That ActuallyMean Something.

Your Cave.Your Rules.

Know Where EveryDollar Goes.

Caveman Sees WhatDashboards Can't.