Managed Load Balancing Services — What to Use and When#

You understand how load balancing works. Now the real question: should you run your own, or let someone else operate it? And if you go managed, which service, for what purpose? This post maps the entire landscape — cloud-native, CDN-based, DNS-based, and self-hosted — so you can make that call with confidence.

The Case for Managed Services#

Running your own load balancer sounds straightforward until you're actually doing it. You need to provision servers, configure active-passive high availability with floating IPs, write failover logic, plan for capacity ahead of traffic spikes, apply security patches without downtime, and respond to 3am alerts when it dies.

That's a meaningful operational burden. Managed load balancing services replace all of that with a configuration interface and an SLA. You get load balancing as a feature, not as infrastructure you're responsible for keeping alive.

The counterargument — control, cost at scale, no vendor lock-in — is also real. We'll get to that. But for most teams, the default should be managed until you have a concrete reason to go self-hosted.

The Landscape#

There are three categories of managed load balancing, each operating at a different layer of the stack:

┌─────────────────────────────────────────────────────────┐
│                    MANAGED LB SERVICES                   │
├───────────────────┬─────────────────┬───────────────────┤
│   CLOUD-NATIVE    │    CDN-BASED    │    DNS-BASED      │
│ (AWS/GCP/Azure)   │ (Cloudflare,    │ (Route 53, Azure  │
│                   │  Fastly)        │  Traffic Manager) │
└───────────────────┴─────────────────┴───────────────────┘

Understanding which layer each service occupies is the key to knowing when to use it — and how to stack them together.

AWS Load Balancing Services#

AWS offers three distinct load balancer products. They are not interchangeable. Picking the wrong one for your use case is a common mistake.

AWS ALB — Application Load Balancer#

ALB is the workhorse for web applications on AWS. It operates at Layer 7 — it reads and routes based on HTTP content.

The mental model: you create Target Groups (logical groupings of EC2 instances, containers, or Lambda functions), configure listeners on port 80/443, and write rules that inspect the request and forward to the right Target Group.

Internet
   ↓
ALB (distributed across AZ-a, AZ-b, AZ-c)
   ├── Rule: /api/users*    → Target Group: User Service (3 EC2s)
   ├── Rule: /api/payments* → Target Group: Payment Service (2 EC2s)
   └── Rule: /*             → Target Group: Frontend Service (4 EC2s)

This is the foundation of microservices routing on AWS — one ALB, one domain, different path prefixes fanning out to independent services.

What ALB handles for you out of the box:

Content-based routing — by URL path, host header, HTTP method, query string parameters, or any combination
SSL/TLS termination — ALB decrypts HTTPS at the edge; your backend servers get plain HTTP and never touch certificates
WebSocket and gRPC — long-lived connections and binary RPC work without special configuration
Authentication — integrates with Cognito or any OIDC provider; you can gate your entire application behind a login wall at the LB layer before a single request reaches your app
WAF integration — attach AWS WAF rules to filter SQLi, XSS, and other attack patterns at the ALB
Sticky sessions — cookie-based session persistence built in
Access logs — every request logged to S3, useful for audit trails and debugging

On redundancy: ALB is not a single EC2 instance behind the scenes. AWS runs a fleet of LB nodes spread across every Availability Zone you enable. If an entire AZ fails, ALB automatically routes only through surviving AZs. You never manage any of this.

Pricing: ~~$0.008/hour base (~~$5.84/month) plus per LCU (Load Balancer Capacity Unit), which scales with connections, bandwidth, and rules evaluated. Expect $16–$50/month for typical production workloads.

What ALB cannot do: TCP/UDP load balancing. If your protocol isn't HTTP, ALB isn't your tool.

Best for: Any web application, REST API, or microservice architecture running on AWS.

AWS NLB — Network Load Balancer#

NLB operates at Layer 4 — TCP, UDP, TLS. It doesn't read HTTP. It doesn't know what's in the packet. It routes based on IP and port alone, and it does this with extraordinary speed.

The headline numbers: NLB adds roughly ~100 microseconds of overhead. ALB adds ~1ms. For most applications that difference is invisible. For financial trading systems, gaming servers, or anything doing millions of connections per second, it matters enormously.

Two features make NLB uniquely valuable in specific scenarios:

Static IP per AZ. NLB gives you a fixed, predictable IP address for each Availability Zone you enable. ALB IPs are dynamic — they change as AWS scales the fleet. If you have clients that whitelist IPs on their firewall (common in B2B, financial services, enterprise), NLB is often the only viable option.

Source IP preservation. When a request passes through ALB, the backend server sees the ALB's IP, not the client's. You recover the real IP from the X-Forwarded-For header. NLB, by contrast, preserves the original client IP all the way to the backend — no header parsing needed.

Client (IP: 103.21.x.x)
   ↓
NLB (TCP:443)
   ↓
Backend Server sees: 103.21.x.x  ← original client IP, intact

What NLB cannot do: Content-based routing, SSL termination at the application layer, WAF integration. It doesn't read the request, so it can't make decisions based on it.

Best for: Gaming servers, IoT infrastructure, databases, real-time systems, anything requiring static IPs or raw TCP throughput.

AWS CLB — Classic Load Balancer (Legacy)#

CLB is AWS's original load balancer — it predates ALB and NLB by years. It operates at both L4 and L7 but does both worse than its successors.

AWS no longer recommends CLB for new applications. You may encounter it in legacy systems. The migration path is: CLB → ALB (for HTTP) or CLB → NLB (for TCP). Nothing else to say here.

AWS Route 53 — DNS-Layer Load Balancing#

Route 53 is authoritative DNS with a traffic policy engine built in. It operates entirely at the DNS layer — it controls which IP addresses get returned for a domain name, and it makes that decision intelligently based on configurable routing policies.

The routing policies cover most global traffic management scenarios:

Policy	Behaviour
Simple	Returns one or more IPs with no intelligence
Weighted	Send X% of traffic to region A, Y% to region B
Latency-based	Route each user to the region with lowest measured latency
Geolocation	Route based on the user's country or continent
Geoproximity	Route based on geographic distance, with adjustable bias
Failover	Primary + secondary; automatic switch when primary fails health check
Multivalue Answer	Return up to 8 healthy IPs; basic load distribution

Route 53 also runs its own health check system — it probes your endpoints from multiple AWS regions globally. The moment an endpoint fails enough checks, Route 53 stops returning its IP in DNS responses.

A real multi-region scenario:

Route 53 Latency Policy:
  User in Bangalore → ap-south-1 (Mumbai) has lowest latency → returns Mumbai ALB IP
  User in London    → eu-west-1 (Ireland) has lowest latency → returns Ireland ALB IP
  User in New York  → us-east-1 has lowest latency           → returns US ALB IP

If Mumbai goes down:

Route 53 health check: Mumbai ALB → FAIL (3 consecutive failures)
Route 53 stops returning Mumbai IP in DNS responses
Bangalore users now route to Singapore (next nearest)

The fundamental limitation of DNS-based routing: failover is never instant. DNS has TTL. Old records cache in resolvers, browsers, and OS caches for the duration of the TTL. You can set TTL as low as 30–60 seconds in practice, but there's no such thing as zero propagation time at the DNS layer. Design your failover strategy around this reality.

Best for: Multi-region architectures, geographic routing, failover between data centres. Route 53 is the global traffic manager; ALB/NLB are the regional load balancers. They're meant to be used together.

GCP Cloud Load Balancing#

GCP's architecture is philosophically different from AWS. Their global HTTP(S) load balancer isn't a regional service that you deploy to us-east1. It's built on top of Google's global network backbone — the same infrastructure that serves Search, YouTube, and Gmail.

Global HTTP(S) Load Balancer:

A single Anycast IP address that routes users to the nearest backend globally. Google's edge handles SSL termination, and requests are forwarded to your backend via Google's private fibre network — not the public internet.

User in Mumbai → hits Google's Mumbai PoP → forwarded to Mumbai backend (via Google backbone)
User in London → hits Google's London PoP → forwarded to Europe backend (via Google backbone)
Both reach: 34.x.x.x  ← same single IP

This model means you get global load balancing with a single IP without configuring multiple regional load balancers and routing DNS between them. It's architecturally cleaner than the AWS approach, though the configuration model takes getting used to.

GCP also offers TCP/UDP load balancers for L4 use cases and internal load balancers for east-west traffic between services. The global LB integrates tightly with GKE (Kubernetes), making it the natural choice for teams running containerised workloads on Google Cloud.

The honest limitation: GCP has a smaller market share than AWS, which means less community documentation, fewer third-party integrations, and a smaller pool of engineers with hands-on experience. If you're not already committed to GCP, this matters in practice.

Azure Load Balancing Services#

Azure provides four distinct load balancing products, each serving a different scope and layer:

Azure Load Balancer is the L4 (TCP/UDP) option. Regional scope. Use the Standard tier — it's zone-redundant by default. The Basic tier is not, and shouldn't be used in production.

Azure Application Gateway is the L7 (HTTP/HTTPS) option — the closest equivalent to AWS ALB. Supports URL-based routing, SSL termination, cookie-based session affinity, and integrates with Azure WAF.

Azure Traffic Manager is DNS-based routing — equivalent to AWS Route 53 routing policies. Supports geographic, weighted, priority-based, and performance (latency-based) routing. Subject to the same DNS TTL propagation constraints as Route 53.

Azure Front Door is the most interesting product: a global L7 load balancer that combines CDN, WAF, and DDoS protection under a single Anycast IP. For organisations already committed to Azure, Front Door is the single service that handles global edge delivery, routing, and security in one place.

In practice: if your stack is Azure, this is your toolbox. If you're evaluating cloud providers, AWS has a larger ecosystem and more mature tooling for most use cases.

CDN-Based Load Balancing#

CDNs aren't just for caching static assets. Modern CDN platforms operate as global distributed load balancers with security and performance tooling built in. For public-facing applications, a CDN sits in front of everything else.

Cloudflare#

Cloudflare has Points of Presence in 300+ cities. Every PoP uses Anycast — the same IP addresses exist at all locations. Users automatically connect to the nearest PoP, without any routing configuration on your part.

User → Cloudflare Edge (nearest of 300+ PoPs) → Your Origin Server

The load balancing addon (paid) adds intelligent origin routing:

Geo-steering — route to the origin closest to the user
Latency-steering — route to the origin with the lowest measured round-trip time
Weighted steering — distribute traffic across origins by percentage
Health checks — Cloudflare probes your origins from its global network; unhealthy origins are automatically removed from rotation

But the load balancing feature is arguably secondary to what you get from Cloudflare's platform itself:

DDoS mitigation is included on the free tier. Cloudflare's network absorbs attacks before they reach your infrastructure. For most companies, this alone justifies putting Cloudflare in front of everything.

WAF filters malicious requests at the edge — SQL injection, XSS, bot traffic — before they consume your origin capacity.

SSL/TLS at edge — Cloudflare handles HTTPS from the user to the PoP. Your origin can serve plain HTTP internally, simplifying certificate management.

Caching — static assets served from the nearest PoP never reach your origin. Done correctly, this is the most effective form of load reduction available.

Argo Smart Routing (paid addon) routes traffic through Cloudflare's private backbone, bypassing congested internet paths between PoPs and your origin. For latency-sensitive applications, this can meaningfully improve tail latency.

Pricing tiers:

Free — CDN, DDoS protection, basic WAF
Pro ($20/month) — enhanced WAF, image optimisation, mobile redirect
Business ($200/month) — advanced WAF rules, 100% uptime SLA
Load Balancing — starts at $5/month per origin health check

The one thing to be clear-eyed about: all your traffic passes through Cloudflare's infrastructure. You are trusting them with every request, every header, and if you use their SSL termination, every decrypted payload. For most companies this is an entirely reasonable tradeoff. For specific regulated industries or highly sensitive workloads, it warrants consideration.

Best for: Almost any public-facing web application. Cloudflare is the most pragmatic default for startups and mid-size companies.

AWS CloudFront#

CloudFront is AWS's CDN service with 450+ PoPs globally. Where Cloudflare's value proposition is security + performance + simplicity, CloudFront's is deep AWS ecosystem integration.

Static assets are cached at edge PoPs. Dynamic requests are forwarded to your origin — an ALB, EC2 instance, S3 bucket, or API Gateway. For teams already running everything on AWS, CloudFront slots naturally into that architecture without leaving the ecosystem.

The load balancing dimension comes through origin groups: you configure a primary origin and a failover origin. If the primary returns 5xx responses or becomes unreachable, CloudFront automatically shifts traffic to the failover origin. It's simpler than Route 53 failover policies and works at the CDN layer rather than the DNS layer.

Lambda@Edge is CloudFront's most powerful feature — it lets you run Node.js code at the edge to implement custom routing logic, A/B testing, authentication, request/response manipulation, and more, without the request ever reaching your origin.

vs. Cloudflare: CloudFront wins on AWS integration and Lambda@Edge programmability. Cloudflare wins on DDoS protection (particularly on free and lower tiers), PoP count, and ease of setup for teams not already on AWS.

Fastly#

Fastly is the CDN of choice for infrastructure-focused engineering teams. GitHub, Stripe, and Spotify run on it. It earns that reputation through two differentiators:

Instant cache purge. Fastly can invalidate cached content globally in approximately 150ms. Other CDNs take seconds to minutes. For news publishers, e-commerce sites, and any application where content changes frequently and stale cache is unacceptable, this matters.

Edge compute via WebAssembly. Fastly's Compute@Edge platform lets you run arbitrary application logic at the edge using compiled WebAssembly — not just JavaScript. You can implement custom routing, authentication, data transformation, and API aggregation with near-native performance, without the constraints of traditional serverless edge functions.

The honest assessment: Fastly is complex and expensive. The VCL (Varnish Configuration Language) required for advanced configuration has a steep learning curve. It's the right choice for large enterprises with specific requirements around cache control or edge programmability. For most teams, Cloudflare is simpler and sufficient.

Self-Hosted Options#

Sometimes managed isn't the right answer. Three tools dominate self-hosted load balancing.

HAProxy#

HAProxy is the gold standard for self-managed, high-performance load balancing. GitHub ran on it. Instagram ran on it. It handles hundreds of thousands of requests per second on modest hardware, supports L4 and L7, and is deeply configurable.

Everything covered in the Load Balancing Algorithms post is configurable in HAProxy — Round Robin, Least Connections, Source IP Hash, Weighted variants, Least Response Time. Health checks are flexible. SSL termination works. Rate limiting is built in. The statistics dashboard gives you real-time visibility into every backend's connection counts and error rates.

What you give up: HAProxy doesn't manage itself. High availability requires you to set up your own active-passive cluster with floating IPs (typically via Keepalived). Failover is your problem to configure and test. Capacity planning is your problem. Upgrades — done carefully to avoid dropping connections — are your problem.

Best for: High-traffic companies that want full control and are comfortable operating infrastructure. The cost model is appealing at scale — you pay for servers, not per-LCU — but only if you account honestly for the engineering time to operate it.

Nginx#

Nginx wears many hats — web server, reverse proxy, load balancer. As a load balancer it's capable and extremely well-documented. The free (open source) version supports Round Robin, Least Connections, and IP Hash. The configuration syntax is clean and widely understood.

The meaningful limitation: the open source version has passive health checks only. Nginx marks a server as down only after observing a failed request to it. Real users see those failures before the server is pulled from rotation. Nginx Plus (paid) adds active health checks that probe backends proactively, along with advanced algorithms and a live monitoring dashboard.

Best for: Teams that already run Nginx as a web server and want to add load balancing without introducing another component. For dedicated load balancing, HAProxy is generally the stronger choice.

Envoy Proxy#

Envoy is a modern, open source proxy designed from the ground up for cloud-native microservices. It's the data plane behind Istio service mesh and is used by companies like Lyft (who built it), Google, and Apple.

Unlike HAProxy or Nginx, Envoy is typically deployed as a sidecar alongside every service instance in a Kubernetes cluster, handling all inbound and outbound traffic for that service. This gives you per-service observability, circuit breaking, automatic retries, distributed tracing, and traffic shaping — across your entire service mesh — without modifying any application code.

Envoy supports L3/L4/L7 load balancing, integrates with service discovery (Consul, Kubernetes DNS), and exposes rich metrics for every connection and request.

Best for: Kubernetes-based microservices architectures where you're adopting a service mesh (Istio, Consul Connect). It's not a standalone load balancer you configure manually — it's infrastructure that runs underneath your services and is configured programmatically via the xDS API.

Comparison at a Glance#

Service	Layer	Scope	Managed	Approx. Cost	Best For
AWS ALB	L7	Regional	✅	$16–$50/month	Web apps, microservices on AWS
AWS NLB	L4	Regional	✅	$15–$50/month	High perf, non-HTTP, static IP
AWS Route 53	DNS	Global	✅	$1–$5/month	Multi-region routing, failover
GCP Global LB	L7	Global	✅	Usage-based	Google-native global apps
Azure App Gateway	L7	Regional	✅	~$100–$300/month	Azure web apps
Azure Front Door	L7	Global	✅	Usage-based	Global Azure apps + CDN
Cloudflare	L7	Global	✅	Free–$200/month	Any public app, DDoS protection
AWS CloudFront	L7	Global	✅	Usage-based	AWS apps needing CDN + edge
Fastly	L7	Global	✅	Enterprise	Large orgs needing edge compute
HAProxy	L4+L7	Self-hosted	❌	Server cost only	High control + high performance
Nginx	L7	Self-hosted	❌	Server cost only	General purpose, familiar ops
Envoy	L4+L7	Self-hosted	❌	Server cost only	Kubernetes / service mesh

The Architecture Nobody Tells You About#

The biggest misconception in this space is that you pick one load balancer. In production at any meaningful scale, you stack multiple — each operating at a different layer, solving a different problem.

A common production architecture looks like this:

User
  ↓
Cloudflare (global edge — DDoS mitigation, WAF, CDN, TLS termination)
  ↓
AWS ALB (regional — path-based routing, sticky sessions, health checks)
  ↓
Service Instances (ECS containers, EC2s, Lambda)
  ↓
Route 53 (global traffic manager — routes between regions, failover)

Cloudflare handles the global edge, security, and caching. Route 53 handles multi-region routing at the DNS layer. ALB handles content-aware routing within a region. Each layer does what it's best at, and none of them overlap.

Understanding this layered model is what separates engineers who configure load balancers from engineers who design resilient, globally distributed systems.

Common Misconceptions — Busted#

Misconception	Reality
"Managed LBs are always more expensive"	At scale, managed LBs often cost less than self-hosted when you account for engineering time, on-call, and operational overhead. Engineer hours are expensive.
"Cloudflare is just a CDN"	Cloudflare is a global security, performance, and load balancing platform. CDN is one feature of many.
"AWS ALB and NLB do the same thing"	ALB is content-aware L7. NLB is raw TCP performance L4. They're designed for fundamentally different use cases.
"I need to choose one load balancer"	Production systems stack multiple. Cloudflare at the edge, ALB regionally, Route 53 globally — each solving a distinct problem at a distinct layer.

Key Takeaways#

Match the tool to the layer. DNS-based routing (Route 53, Traffic Manager) for global failover. CDN-based (Cloudflare, CloudFront) for edge security and caching. Cloud LBs (ALB, NLB) for regional application routing. Don't try to use one for all three.
ALB vs NLB is a binary choice, not a preference. HTTP traffic → ALB. TCP/UDP, static IP, or source IP preservation required → NLB.
Cloudflare should be in front of most public applications. The DDoS protection alone justifies it. Everything else is a bonus.
Self-hosted makes sense when you've outgrown managed pricing or need full control. Not before. The operational overhead is real.
Production systems are layered, not singular. Design your load balancing architecture as a stack of specialised components, not as a single box to select.
DNS propagation delay is irreducible. No DNS-based failover is instant. Set TTLs low before planned changes, and design SLOs around realistic failover times.

Further reading: AWS ALB Documentation, Cloudflare Load Balancing, HAProxy Documentation, Envoy Architecture Overview

Managed Load Balancing Services