How DNS Actually Works — End to End#
You type
mail.google.cominto your browser and hit Enter. In under 100 milliseconds, your browser knows exactly which server on the planet to talk to. How? That's DNS — and it's a lot more interesting than most engineers realise.
First, What Even Is DNS?#
DNS stands for Domain Name System. At its most basic, it translates human-readable names like google.com into machine-readable IP addresses like 142.250.195.46. Without it, you'd have to memorise IP addresses for every website you visit — like the internet phonebook from hell.
But here's what most people get wrong: DNS is not a single server. It's a globally distributed hierarchy of servers, each responsible for a slice of the naming system, operating at planetary scale and handling trillions of queries every single day.
The Hierarchy You Need to Understand#
Before tracing a request, picture the structure:
Root (.)
└── Top Level Domains (TLDs): .com, .in, .org, .net
└── Second Level Domains: google.com, amazon.com
└── Subdomains: mail.google.com, www.amazon.com
Each layer is managed by a different authority:
| Level | Example | Managed By |
|---|---|---|
| Root | . | ICANN — 13 root server clusters |
| TLD | .com, .in | Verisign (.com), NIXI (.in) |
| Domain | google.com | The company/owner |
| Subdomain | mail.google.com | The company/owner |
This separation of concerns is the entire design philosophy of DNS. No single entity controls it all. Each layer knows about the layer directly below it — and nothing else.
The Cast of Characters#
A DNS query involves several distinct players. Knowing who does what is half the battle.
| Player | What It Does |
|---|---|
| DNS Client (Stub Resolver) | Tiny code inside your OS that initiates DNS queries on behalf of apps |
| Recursive Resolver | The server (e.g. Google's 8.8.8.8, Cloudflare's 1.1.1.1) that does all the heavy lifting for you |
| Root Name Server | Knows where all TLD servers live. 13 logical clusters, replicated 1,000+ times globally via Anycast |
| TLD Name Server | Knows which authoritative server handles each domain under its TLD |
| Authoritative Name Server | The final authority. Holds the actual DNS records for a specific domain |
The recursive resolver is the unsung hero here. It fans out across the hierarchy on your behalf and shields you from all that complexity. Your device only ever talks to the resolver — it never directly touches root or TLD servers.
DNS Record Types — The Quick Reference#
Before going end-to-end, know what kind of answers DNS can give:
| Record | Purpose | Example |
|---|---|---|
| A | Domain → IPv4 address | google.com → 142.250.195.46 |
| AAAA | Domain → IPv6 address | google.com → 2404:6800:... |
| CNAME | Alias → points to another domain | www.google.com → google.com |
| MX | Mail server for the domain | google.com → smtp.google.com |
| NS | Nameserver for the domain | google.com → ns1.google.com |
| TXT | Arbitrary text (SPF, domain verification) | v=spf1 include:... |
| SOA | Start of Authority — zone metadata | Serial number, refresh intervals |
TTL — The Clock on Every Answer#
Every DNS record carries a TTL (Time To Live) value, measured in seconds:
google.com A 142.250.195.46 TTL: 300
This tells every resolver and cache in the chain: hold this answer for 300 seconds, then ask again.
TTL is one of the most consequential knobs in DNS:
- Low TTL (e.g. 30s) → Faster failover and propagation, but more queries hitting your DNS infrastructure and slightly higher lookup latency for users
- High TTL (e.g. 86400s = 1 day) → Fewer queries, lighter load, but changes take a full day to propagate globally
When you're planning a migration or a failover drill, the first thing you do is drop your TTLs well in advance. Forgetting this is a classic operational mistake.
End-to-End: What Actually Happens When You Type mail.google.com#
Let's walk through the full journey, one step at a time.
Step 1 — Browser Cache Check#
Your browser doesn't immediately ask anyone anything. It first checks its own internal DNS cache.
Browser → "Do I already know mail.google.com?"
→ Cache hit + TTL not expired → use it. Done.
→ Cache miss → ask the OS.
Browsers maintain their own DNS cache entirely separately from the OS. In Chrome, you can inspect it directly at chrome://net-internals/#dns.
Step 2 — OS Cache Check#
If the browser has nothing, it hands the query to the OS stub resolver.
OS → "Do I know mail.google.com?"
→ First checks /etc/hosts (the original 1970s DNS — a local static file)
→ Then checks OS DNS cache (systemd-resolved on Linux, ipconfig /displaydns on Windows)
→ Cache hit → return to browser. Done.
→ Cache miss → ask the configured recursive resolver.
The /etc/hosts check happens before any network call. This is why developers use it to override DNS locally during development — it's the first thing the OS reads.
Step 3 — Query Hits the Recursive Resolver#
The OS sends the query to whichever recursive resolver is configured on the device — typically set via DHCP from your router, or manually to something like 8.8.8.8 or 1.1.1.1.
OS → Recursive Resolver (8.8.8.8): "What's the IP for mail.google.com?"
The resolver handles millions of queries per second and caches aggressively.
→ Resolver cache hit → return the cached answer immediately.
→ Cache miss → start the full resolution journey.
Step 4 — Resolver Asks a Root Name Server#
If the resolver has no cached answer, it goes to the top of the hierarchy — a root name server.
Resolver → Root Server: "Who handles .com domains?"
Root Server responds:
"I don't know mail.google.com, but here are the .com TLD servers:
a.gtld-servers.net → 192.5.6.30
b.gtld-servers.net → 192.33.14.30
(and 11 more)"
A few things to note here:
- There are 13 logical root server identifiers (labeled
athroughm), but physically over 1,000 machines globally, all reachable via Anycast routing. The "13 root servers" factoid is technically misleading. - Root servers know only about TLD servers — nothing else.
- The resolver caches this response. It won't ask root servers about
.comagain for the full TTL (often 48 hours or more).
Step 5 — Resolver Asks the TLD Name Server#
Now the resolver goes to the .com TLD server:
Resolver → .com TLD Server: "Who handles google.com?"
TLD Server responds:
"I don't know mail.google.com, but google.com is managed by:
ns1.google.com → 216.239.32.10
ns2.google.com → 216.239.34.10
ns3.google.com → 216.239.36.10
ns4.google.com → 216.239.38.10"
This is called a referral — the TLD server doesn't answer your question, but it tells you who can. The TLD server knows which authoritative nameserver is responsible for each domain registered under .com — and that's it.
Step 6 — Resolver Asks the Authoritative Name Server#
Finally, the resolver reaches the domain's own authoritative name server:
Resolver → ns1.google.com: "What's the IP for mail.google.com?"
Authoritative Server responds:
"mail.google.com A 142.250.195.83 TTL: 300
mail.google.com A 74.125.24.83 TTL: 300"
This is the authoritative answer — the ground truth. This record was configured directly by Google. There is no higher authority to ask. The buck stops here.
Step 7 — Resolver Caches and Returns the Answer#
Resolver:
→ Caches: mail.google.com → [142.250.195.83, 74.125.24.83] for 300 seconds
→ Returns both IPs to the OS
Step 8 — OS Caches and Returns to the Browser#
OS:
→ Caches the result locally
→ Hands the IP list to the browser: [142.250.195.83, 74.125.24.83]
Step 9 — Browser Picks an IP and Connects#
The browser now has a list of IPs. It doesn't just pick the first one blindly.
Browser applies the Happy Eyeballs algorithm (RFC 6555/8305):
→ Initiates TCP connection to the first IP
→ After 250ms, if no response, opens a parallel connection to the second IP
→ Uses whichever IP responds first
→ Cancels the other
→ TLS handshake
→ HTTP request
→ Page loads
Happy Eyeballs was designed specifically to handle scenarios where one IP is slow or unreachable without making the user wait for a timeout. It's elegant in its simplicity.
The Full Flow at a Glance#
Browser
│
├─ [Cache hit?] ──YES──► Use cached IP
NO
▼
OS Stub Resolver
├─ [/etc/hosts match?] ──YES──► Use it
├─ [OS Cache hit?] ──────YES──► Use cached IP
NO
▼
Recursive Resolver (8.8.8.8)
├─ [Resolver cache hit?] ──YES──► Return cached IP
NO
▼
Root Name Server
└─ "Go ask the .com TLD servers"
▼
.com TLD Name Server
└─ "Go ask google.com's authoritative servers"
▼
Authoritative Name Server (ns1.google.com)
└─ "Here are the IPs: [IP1, IP2] — TTL: 300"
▼
Recursive Resolver → caches → returns to OS
▼
OS → caches → returns to Browser
▼
Browser → Happy Eyeballs → picks fastest responding IP
▼
TCP + TLS → HTTP Request → Response → Page loads
DNS-Based Load Balancing#
When companies like Google or Netflix use DNS for load balancing, the authoritative name server itself becomes intelligent:
Authoritative Server (with DNS LB):
→ Receives query from resolver
→ Checks health of all backend IPs
→ Checks geo-location of the resolver (not the user — important distinction)
→ Checks current traffic weights or routing policies
→ Returns a filtered, ordered list of healthy IPs
This is exactly how AWS Route 53, Cloudflare DNS, and NS1 work. You configure health checks, latency-based routing, weighted policies, or failover rules — and the authoritative server executes that logic on every query.
One limitation to understand: DNS has no visibility into actual HTTP request counts. It only controls which IPs get returned. Once the client picks an IP and connects, DNS is out of the picture. True traffic balancing happens at the load balancer layer.
Common Misconceptions — Busted#
| Misconception | Reality |
|---|---|
| "DNS is just a simple lookup" | DNS is a globally distributed system handling trillions of queries per day, with its own redundancy, caching layers, and failover mechanisms |
| "DNS changes are instant" | Changes propagate based on TTL. The old TTL must fully expire before clients start seeing the new value |
| "Low TTL is always better" | Low TTL means faster failover, but significantly higher load on your DNS infrastructure and more latency per uncached lookup |
| "DNS load balances traffic evenly" | DNS only controls which IPs are returned. It has no view into how many requests each server is handling |
| "There are 13 root servers" | There are 13 logical root server identifiers. Physically, there are 1,000+ machines globally, replicated via Anycast |
Key Takeaways#
- DNS is a distributed hierarchy, not a single server. Root → TLD → Authoritative, each layer delegating to the next.
- Caching is everywhere — browser, OS, and recursive resolver all cache independently. TTL governs all of them.
- The recursive resolver does all the work. Your device only ever talks to one server; the resolver fans out across the full hierarchy on your behalf.
- TTL is an operational lever. Drop it before any planned change. Raise it after things stabilise.
- DNS load balancing is IP selection, not traffic balancing. Real load distribution happens downstream at Layer 4/7.
- Happy Eyeballs ensures that getting back multiple IPs from DNS actually translates into better reliability for end users.
DNS is deceptively simple on the surface and genuinely sophisticated underneath. Every engineer who works on distributed systems, infrastructure, or anything internet-facing owes it to themselves to understand it deeply — because when things go wrong at the DNS layer, the blast radius is enormous and the debugging is painful if you don't know what you're looking at.
Further reading: RFC 1034 (DNS Concepts), RFC 1035 (DNS Implementation), RFC 8305 (Happy Eyeballs v2)