IsUp - Infrastructure Recommendations
UptimeRobot's Approach (What They Do)
Based on IP analysis and their recent architecture migration:
┌─────────────────────────────────────────────────────────────────────┐
│ UptimeRobot Infrastructure │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Multi-Cloud VM Strategy (Traditional) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ AWS │ │ DigitalOcean │ │ Hetzner │ │
│ │ (Premium) │ │ (Value) │ │ (Budget) │ │
│ ├──────────────┤ ├──────────────┤ ├──────────────┤ │
│ │ US-East │ │ NYC, AMS │ │ Germany │ │
│ │ US-West │ │ SGP, SYD │ │ Finland │ │
│ │ EU-Frankfurt │ │ │ │ │ │
│ │ AP-Tokyo │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ~110 checker IPs across 4 regions │
│ Traditional VMs running custom monitoring software │
│ │
└─────────────────────────────────────────────────────────────────────┘Their Stack:
- PHP + Node.js backend
- MySQL database
- Redis caching
- Multi-cloud VMs for monitoring nodes
- Moved FROM dedicated servers TO cloud (2024-2025)
What Should IsUp Use?
Option 1: Cloudflare Workers (Recommended for MVP)
Best for: Fast time-to-market, lowest ops burden, global edge by default
┌─────────────────────────────────────────────────────────────────────┐
│ Cloudflare Workers Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Cloudflare Edge (300+ locations) │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Worker │ │ Worker │ │ Worker │ │ Worker │ ... │ │
│ │ │ US-East │ │ EU-West │ │ Asia │ │ Oceania │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │
│ │ └───────────┴─────┬─────┴───────────┘ │ │
│ │ │ │ │
│ └──────────────────────────┼──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Central Services │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Vercel │ │ Neon │ │ Upstash │ │ Tinybird │ │ │
│ │ │ (App) │ │ (Postgres)│ │ (Redis) │ │(Analytics)│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘Pros:
- 300+ edge locations (vs UptimeRobot's ~4 regions)
- Zero server management
- Built-in cron triggers for scheduled checks
- Extremely low latency globally
- $5/month for 10M requests
- Durable Objects for state management
- KV storage for configuration
Cons:
- 50ms CPU time limit per request (fine for HTTP checks)
- Can't do raw TCP/ICMP (need workarounds for ping/port)
- Vendor lock-in to Cloudflare
Cost Estimate:
| Component | Monthly Cost |
|---|---|
| Workers (10M requests) | $5 |
| KV Storage | $5 |
| Durable Objects | $5-10 |
| Total | $15-20 |
Option 2: Fly.io (Recommended for Full Control)
Best for: Need TCP/UDP/ICMP, want containers, more control
┌─────────────────────────────────────────────────────────────────────┐
│ Fly.io Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Fly.io Global Network │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ iad │ │ ams │ │ nrt │ │ │
│ │ │ (Virginia) │ │ (Amsterdam) │ │ (Tokyo) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ 2x checker │ │ 2x checker │ │ 2x checker │ │ │
│ │ │ containers │ │ containers │ │ containers │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ syd │ │ gru │ │ lhr │ │ │
│ │ │ (Sydney) │ │ (Sao Paulo) │ │ (London) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘Pros:
- Full Docker containers (any language, any protocol)
- Can do ICMP ping, TCP port checks natively
- 30+ regions available
- Easy horizontal scaling
- Persistent volumes for local state
- Built-in private networking
Cons:
- More ops overhead than serverless
- Containers run 24/7 (vs pay-per-invocation)
- Need to manage scaling yourself
Cost Estimate:
| Component | Monthly Cost |
|---|---|
| 6 regions x 2 shared-cpu-1x | $30-60 |
| Fly Postgres | $15 |
| Total | $45-75 |
Option 3: Hybrid (Best of Both Worlds)
Best for: Production-ready, handles all monitor types
┌─────────────────────────────────────────────────────────────────────┐
│ Hybrid Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ HTTP/HTTPS/SSL Checks TCP/ICMP/Port Checks │
│ ┌───────────────────┐ ┌───────────────────┐ │
│ │ Cloudflare Workers│ │ Fly.io │ │
│ │ │ │ │ │
│ │ • Fast & cheap │ │ • Full protocol │ │
│ │ • 300+ locations │ │ support │ │
│ │ • HTTP only │ │ • 6-10 regions │ │
│ └─────────┬─────────┘ └─────────┬─────────┘ │
│ │ │ │
│ └──────────────┬───────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ Central API (Vercel) │ │
│ │ + Neon + Upstash │ │
│ └──────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘Detailed Comparison
| Factor | Cloudflare Workers | Fly.io | AWS/DO/Hetzner (UptimeRobot style) |
|---|---|---|---|
| Setup Time | Hours | Days | Weeks |
| Ops Burden | Minimal | Low | High |
| Regions | 300+ | 30+ | DIY (3-10 typically) |
| HTTP Checks | ✅ Excellent | ✅ Good | ✅ Good |
| TCP/Port | ⚠️ Workarounds | ✅ Native | ✅ Native |
| ICMP Ping | ❌ No | ✅ Native | ✅ Native |
| Cost (MVP) | $15-30/mo | $50-100/mo | $100-300/mo |
| Cost (Scale) | $100-500/mo | $200-500/mo | $500-2000/mo |
| Scaling | Auto | Manual | Manual |
| Vendor Lock-in | High | Medium | Low |
My Recommendation
Phase 1 (MVP): Cloudflare Workers Only
Start with Cloudflare Workers for everything. Accept limitations:
- HTTP/HTTPS monitoring: ✅ Perfect
- Keyword monitoring: ✅ Perfect
- SSL monitoring: ✅ Can check certs via HTTPS
- DNS monitoring: ⚠️ Use DNS-over-HTTPS APIs
- Ping monitoring: ❌ Skip for MVP
- Port monitoring: ❌ Skip for MVP
Why?
- Ship faster
- Lowest cost
- 90% of users only need HTTP monitoring anyway
- Add Fly.io later for ping/port
Phase 2 (Growth): Add Fly.io for Advanced Checks
When users request ping/port monitoring:
- Deploy Fly.io containers in 6 key regions
- Route ping/port checks to Fly.io
- Keep HTTP checks on Cloudflare Workers
Phase 3 (Scale): Evaluate Multi-Cloud
At scale (100k+ monitors), consider:
- Adding Hetzner for cost optimization in EU
- Adding more regions based on customer demand
- Potentially moving to Kubernetes for flexibility
Infrastructure Stack Summary
Recommended Production Stack
| Component | Service | Why |
|---|---|---|
| App Hosting | Vercel | Easy deploys, great DX, auto-scaling |
| Database | Neon (Postgres) | Serverless, scales to zero, branching |
| Cache/Queue | Upstash Redis | Serverless, per-request pricing |
| HTTP Monitors | Cloudflare Workers | 300+ locations, dirt cheap |
| TCP/Ping Monitors | Fly.io | Full protocol support |
| Time-Series | Tinybird or ClickHouse Cloud | Fast analytics at scale |
| Resend | Modern, great API | |
| SMS | Twilio | Reliable, global |
| Secrets | Vercel/Infisical | Secure env management |
| Monitoring | Axiom + Sentry | Logs + errors |
Alternative: Self-Hosted Stack
If you prefer more control / lower cost at scale:
| Component | Service | Why |
|---|---|---|
| App Hosting | Fly.io or Railway | Full control, predictable pricing |
| Database | Fly Postgres or Supabase | Managed, good DX |
| Cache/Queue | Fly Redis or Dragonfly | Self-managed but cheap |
| Monitors | Fly.io (all regions) | Single platform |
| Time-Series | Self-hosted ClickHouse | Cheapest at scale |
Cost Projections
Serverless Stack (Recommended)
| Scale | Monitors | Checks/day | Monthly Cost |
|---|---|---|---|
| MVP | 1,000 | 300k | $50-100 |
| Growth | 10,000 | 3M | $150-300 |
| Scale | 100,000 | 30M | $500-1,500 |
Self-Hosted Stack
| Scale | Monitors | Monthly Cost |
|---|---|---|
| MVP | 1,000 | $100-150 |
| Growth | 10,000 | $200-400 |
| Scale | 100,000 | $800-1,500 |
Key Differences from UptimeRobot
| Aspect | UptimeRobot | IsUp (Recommended) |
|---|---|---|
| Edge Locations | ~4 regions | 300+ (Cloudflare) |
| Architecture | Traditional VMs | Serverless edge |
| Check Latency | Higher (fewer nodes) | Lower (edge) |
| Scaling | Manual | Automatic |
| Ops Burden | High | Minimal |
| Protocol Support | Full | HTTP-first (add TCP later) |
| Cost Efficiency | Medium | High |
Next Steps
- Set up Cloudflare Workers project for monitoring
- Deploy Vercel app with Neon database
- Implement HTTP monitoring first
- Add Fly.io when ping/port is needed
- Scale regions based on customer demand