Networking Deep Dive — Complete Revision Notes#
Full stack: REST → Sockets → TCP → HTTP/2 → Frameworks
Table of Contents#
- REST APIs
- How One Port Serves Millions
- What is a Socket
- What is a File Descriptor
- What's Inside the Socket Struct
- Buffer Size and RAM
- How Streaming Works at Socket Level
- Streaming JSON — TypeScript and Python
- Backpressure — Slow Consumer Problem
- Circular Buffer — How OS Tracks What's Read
- TCP is Always a Stream
- REST Stateless vs TCP Connections
- HTTP Versions — 1.0, 1.1, 2, 3
- TLS vs TCP Handshake
- HTTP/2 Multiplexing
- HTTP/2 Frame Structure
- Body, Query and Path Params in HTTP/2
- socket.connect() — OS vs HTTP Client
- How a Framework Decodes Packets
- The Full Stack — End to End
1. REST APIs#
Simple intuition#
A standard way for two applications to talk over the internet using HTTP — the same protocol browsers use.
Analogy: a waiter — you don't go to the kitchen yourself. You tell the waiter (API), the kitchen (server) prepares it, waiter brings it back.
Why it exists#
Before REST, every team invented their own conventions. SOAP was complex. REST said: "reuse HTTP and add simple rules on top." Roy Fielding formalized this in his 2000 PhD dissertation.
Core components#
Resource + URL /users/42, /orders, /products
HTTP Verbs GET=read, POST=create, PUT/PATCH=update, DELETE=delete
Statelessness server remembers NOTHING between requests
every request must carry all info it needs (JWT, session token)
Trade-offs#
| Use REST when | Avoid REST when |
|---|---|
| Public APIs, CRUD ops | Real-time bidirectional (use WebSockets) |
| Wide tooling support needed | Complex nested data (use GraphQL) |
| Simplicity matters | High-performance internal services (use gRPC) |
30-second interview answer#
"REST is a set of conventions for building APIs over HTTP. Every resource has its own URL, and HTTP methods define the action. The most important constraint is statelessness — the server remembers nothing between requests, so every request is self-contained. REST became the standard because it's simple and works everywhere HTTP works. Main trade-off is over/under-fetching, which is why GraphQL was created."
Senior insights#
- Statelessness is a scaling superpower — any server behind a load balancer can handle any request because no session state is stored server-side
- Most "REST" APIs are actually just HTTP APIs — almost nobody implements HATEOAS (Fielding's full REST spec)
2. How One Port Serves Millions#
Simple intuition#
A port is just a door. The door number doesn't limit how many conversations can happen inside. What matters is the 4-tuple that uniquely identifies each connection.
The 4-tuple#
Every connection is uniquely identified by:
(Source IP, Source Port, Destination IP, Destination Port)
Example — 3 users all hitting port 443:
103.21.5.1:54231 ↔ 13.0.0.1:443 ← unique connection
203.55.2.8:61024 ↔ 13.0.0.1:443 ← unique connection
91.12.8.44:49871 ↔ 13.0.0.1:443 ← unique connection
OS keeps a hash table keyed by 4-tuple — packet arrives → nanosecond lookup → right socket found.
What actually enables millions#
1. Non-blocking async I/O (event loop)
One thread handles thousands of idle connections
"wake me up when data arrives" instead of blocking
2. Load balancers
Distribute connections across multiple servers
3. HTTP Keep-Alive / HTTP/2 multiplexing
Reuse connections — fewer total connections needed
Real limits (not the port)#
- File descriptors per process (
ulimit -n, default 1024) - Memory — each socket ~2KB struct + buffers
- CPU — processing requests
- Bandwidth — network I/O
Senior insights#
- The C10K problem (1999) — solving 10K concurrent connections required switching from thread-per-connection to event-driven I/O (epoll). This is why Nginx crushed Apache.
- NAT breaks the assumption — corporate offices share one public IP. 65,535 ports shared across entire office. Can cause connection exhaustion from a single IP.
3. What is a Socket#
Simple intuition#
A socket is a file that represents a network connection. The OS makes it look like a file so your program can just read and write — exactly like a text file — and OS handles all network complexity underneath.
Why it was created#
Berkeley BSD Unix (1983): "What if we hide all network complexity behind something programmers already know — a file?"
The "everything is a file" philosophy#
Actual file on disk → file
Network socket → file
Keyboard input (stdin) → file
Terminal output → file
Pipe between processes → file
All have one thing in common: you read from them and write to them.
Same interface. Different things underneath.
Types of sockets#
| Type | Protocol | Analogy | Use case |
|---|---|---|---|
| Stream socket | TCP | Phone call — reliable, ordered | HTTP, SSH, databases |
| Datagram socket | UDP | Postcards — fast, no guarantee | Video calls, gaming, DNS |
| Unix domain socket | None (local only) | Note in same building | Nginx ↔ app on same machine |
Socket lifecycle#
SERVER: CLIENT:
socket() → create fd socket() → create fd
bind() → attach to port connect() → connect to server
listen() → ready to accept
accept() → new fd per client
read/write → communicate read/write → communicate
close() → done close() → done
accept() always returns a brand new socket for each client. Original listening socket stays open.
Senior insights#
- A socket is not a connection — it's an endpoint. It exists before connection is made (
socket()call). Connection happens atconnect()oraccept(). - Slow consumers are dangerous — if your app reads slowly, recv buffer fills, TCP flow control kicks in. One slow consumer can back-pressure the entire pipeline.
4. What is a File Descriptor#
Simple intuition#
A file descriptor is just a number that acts as a nickname for something your program has opened. OS says: "you want to work with this? I'll call it number 4. Whenever you say 4, I'll know what you mean."
The word "file" is misleading#
It doesn't mean a file on disk. It means "something you can read/write to." Unix uses one unified interface for everything.
The 3 tables#
YOUR PROCESS
File Descriptor Table (per process — just an array)
┌────┬──────────┐
│ 0 │ stdin │ ← keyboard
│ 1 │ stdout │ ← terminal
│ 2 │ stderr │ ← terminal
│ 3 │ *───────┼──→ Open File Table entry
│ 4 │ *───────┼──→ Open File Table entry (socket)
└────┴──────────┘
OS: Open File Table (shared across all processes)
tracks: current position, access mode, reference count
OS: Inode Table / Socket Table (actual resource)
for files: inode → data on disk
for sockets: struct in MEMORY only → ip, port, buffers, tcp state
Does OS create a file on disk for a socket?#
No. For sockets, OS creates a data structure in memory only. The file descriptor machinery is reused — because it already works — but nothing is written to disk.
Prove it#
ls -la /proc/$$/fd # see all open fds in your shell
# socket shows as: socket:[12345678] ← listed like a file, no disk
Why this design is brilliant#
read(fd, buffer, size) # works for file, socket, pipe, keyboard
write(fd, buffer, size) # works for everything
close(fd) # works for everything
# write code once, works everywhere
Senior insights#
- File descriptor limit is a real production problem — default 1024 per process. Each open socket = 1 fd. Server handling 10K connections needs 10K+ fds. Hitting the limit causes
EMFILE: too many open files. Fix:ulimit -nand/etc/security/limits.conf. - fd inheritance is a security trap — forked child processes inherit all parent's open fds including database connections. Fix: mark fds with
FD_CLOEXEC.
5. What's Inside the Socket Struct#
Simple intuition#
The socket struct is a dashboard for one network connection — everything the OS needs to send, receive, and manage data for that connection.
The 5 buckets#
SOCKET STRUCT
┌─────────────────────────────────────────────┐
│ IDENTITY │
│ local_ip: 192.168.1.5 │
│ local_port: 54231 │
│ remote_ip: 142.250.80.46 │
│ remote_port: 443 │
│ protocol: TCP │
├─────────────────────────────────────────────┤
│ BUFFERS │
│ send_buffer [ data waiting to send ] │
│ recv_buffer [ data waiting to read ] │
│ ~4MB each (tunable) │
├─────────────────────────────────────────────┤
│ TCP STATE │
│ LISTEN → SYN_RECEIVED → ESTABLISHED │
│ → FIN_WAIT_1 → FIN_WAIT_2 │
│ → TIME_WAIT (2 min) → CLOSED │
├─────────────────────────────────────────────┤
│ SEQUENCE NUMBERS │
│ snd_seq, rcv_seq │
│ last_ack_sent, last_ack_received │
│ (TCP uses these for reliable delivery) │
├─────────────────────────────────────────────┤
│ TIMERS │
│ retransmit_timer (resend if no ACK) │
│ keepalive_timer (detect dead clients) │
│ timewait_timer (2 min after close) │
└─────────────────────────────────────────────┘
The buffers are the heart of the socket#
Your code calls write(fd, "Hello", 5)
→ OS copies "Hello" into SEND BUFFER
→ returns immediately ← your code doesn't wait!
→ OS sends over network in background
Packet arrives from network
→ OS puts data into RECV BUFFER
→ your app calls read(fd, buf, size)
→ OS copies from RECV BUFFER into your app's memory
Your app never touches the network directly. It just reads/writes to buffers. OS does everything else.
Memory cost#
In Linux kernel (struct tcp_sock): ~1.5KB–2KB per socket struct. 1 million connections = ~2GB RAM just for structs, before buffer data.
Senior insights#
- recv buffer IS the TCP flow control window — free space in recv buffer = TCP window size advertised to sender. Slow app fills buffer → window shrinks → sender automatically slows. No extra code needed.
- TIME_WAIT = 2 minutes — after connection closes, socket stays in TIME_WAIT to ensure no stale packets confuse new connections. Under heavy load, thousands of TIME_WAIT sockets consume memory.
6. Buffer Size and RAM#
The real cost of a large payload#
A 5MB payload doesn't just use 5MB of RAM — data gets copied multiple times:
Client sends 5MB
↓
Kernel recv buffer: 5MB (OS puts it here first)
↓ app calls read()
↓ OS COPIES it
App buffer: 5MB (now lives here too)
↓ framework parses
Parsed object: 5MB+ (JSON parsed, headers, metadata)
Total per request: ~15MB
With thread stack: ~20MB+
Real concurrent request count with 1GB RAM#
Not 200 requests (1GB / 5MB).
More like 65-100 requests (1GB / 15MB).
Real bottleneck order#
1. FIRST: CPU → parsing, business logic
2. SECOND: Threads → context switching
3. THIRD: RAM → buffers filling up
4. FOURTH: Bandwidth → 1Gbps NIC = 125MB/s ÷ 5MB = 25 req/s max
Production solutions#
1. Streaming → never buffer full body, process 64KB at a time
2. Reject early → check Content-Length, return 413 before reading
3. Upload to S3 → client uploads direct to object storage
API receives just the URL (~1KB)
Senior insights#
- Slow HTTP attack — if you don't check payload size before reading, attacker sends infinite stream exhausting recv buffers and RAM. Always set hard limits at reverse proxy (Nginx
client_max_body_size). - SO_RCVBUF tuning at scale — increasing recv buffer globally means every socket gets that size, including health check pings. 50,000 connections × 5MB buffer = 244GB reserved. Always tune per-socket.
7. How Streaming Works at Socket Level#
Core insight#
TCP doesn't have "messages" or "files". It's just bytes flowing continuously — like water through a pipe. Streaming = embracing this reality instead of waiting to collect everything first.
Non-streaming vs streaming#
NON-STREAMING:
wait for ALL data → copy to app memory → process
recv buffer holds entire payload idle
STREAMING:
chunk arrives → read immediately → process → buffer drained → TCP window opens → more arrives
recv buffer never holds more than one chunk
Memory comparison (5MB payload, 100 users)#
Without streaming: 100 × 15MB ≈ 1.5GB
With streaming: 100 × 64KB ≈ 13MB ← 75x less memory
The raw read loop#
CHUNK_SIZE = 65536 # 64KB
while True:
chunk = socket.recv(CHUNK_SIZE) # give me UP TO 64KB available NOW
if not chunk:
break # connection closed
process(chunk) # handle immediately
# recv buffer freed → TCP window grows → sender can send more
recv(n) doesn't wait for exactly n bytes. It returns whatever is available now — could be 10KB, could be 64KB.
How frameworks hide this#
// Node.js — req is a readable stream
req.on('data', (chunk) => { // called every time a chunk arrives
process(chunk)
})
req.on('end', () => {
res.send('done')
})
# FastAPI
async for chunk in request.stream():
process(chunk)
Both are just the same recv() loop underneath. Framework calls recv() and emits each chunk to your handler.
Senior insights#
- TCP is always a stream — "streaming mode" doesn't exist at TCP level. It's entirely your application's choice to process chunks vs accumulate everything.
- You can stream response simultaneously while reading request — read 64KB → process → write 64KB output → repeat. Memory stays flat regardless of file size. This is how FFmpeg processes 100GB video on 512MB RAM.
8. Streaming JSON — TypeScript and Python#
When streaming JSON helps#
| Scenario | Helps? | Why |
|---|---|---|
| Flat JSON object (any size) | NO | Need full body — closing } not arrived yet |
| Large array of objects | YES | Each item is independent |
| CSV upload | YES | Each row is independent |
| NDJSON | YES | Each line is complete valid JSON |
NDJSON — designed for streaming#
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}
Each line = valid JSON. Used by log pipelines, Kafka, bulk APIs.
TypeScript#
npm install stream-json # large JSON arrays
npm install ndjson # newline-delimited JSON
// stream-json — large array
app.post('/users', (req, res) => {
req.pipe(parser()).pipe(streamArray())
.on('data', ({ value }) => saveToDatabase(value)) // one object at a time
.on('end', () => res.json({ done: true }))
})
Python#
pip install ijson # large JSON arrays
pip install jsonlines # NDJSON
# ijson — large array
for obj in ijson.items(stream, 'item'):
save_to_database(obj) # full array NEVER in memory
The honest caveat#
Libraries maintain internal state for partial chunks. If an object is split across chunks:
chunk1: '{"id": 1, "name": "Ali' ← incomplete, held internally
chunk2: 'ce"}, {"id": 2...' ← completes object 1, emits it
Library does the bookkeeping. You just get complete objects out.
9. Backpressure — Slow Consumer Problem#
The mechanism#
Your app slow to process
→ recv_buffer fills
→ free space shrinks
→ TCP window in ACK shrinks
→ sender slows down
→ sender's send_buffer fills
→ client's write() blocks
Slowness of YOUR app propagates all the way back to client.
No extra code. Pure buffer mechanics.
Three scenarios#
App reads fast: buffer mostly empty → window large → sender at full speed
App reads slow: buffer fills → window shrinks → sender slows automatically
App stops reading: buffer full → window = 0 → sender fully pauses
TCP connection stays ALIVE, just paused
no data lost, resumes when app reads
The timeout caveat#
TCP will wait forever. But clients have timeouts:
App slow for > timeout → client closes connection regardless
→ all that buffered data = wasted
Fix for slow processing:
request arrives → return 202 Accepted immediately
push to queue (Kafka/SQS) → process async at own pace
Senior insights#
- TCP Zero Window (
[TCP ZeroWindow]in Wireshark) = receiver buffer full, sender stuck. Fix is never network tuning — it's making consumer process faster or scaling horizontally. - This pattern repeats at every layer — Node.js streams have
pause()/resume(), Kafka hasmax.poll.records, gRPC has flow control tokens. All model the same TCP mechanic at application layer.
10. Circular Buffer — How OS Tracks What's Read#
The wrong mental model#
buffer = [chunk1, chunk2, chunk3]
app reads chunk1 → buffer.remove(chunk1) ← NOT how it works
The real model — two pointers, no deletion#
recv_buffer (4MB fixed block of RAM)
┌────────────────────────────────────────────────────┐
│ consumed │ data waiting to be read │ free │
└────────────────────────────────────────────────────┘
↑ ↑
READ ptr WRITE ptr
(moves on recv() call) (moves on data arrival)
free space = total(4MB) - (WRITE - READ)
free space → TCP window size in next ACK
The recv() call IS the signal#
app calls recv(fd, buf, 64KB)
1. OS copies buffer[READ...READ+64KB] → app's memory
2. READ pointer moves forward 64KB
3. free space recalculated → TCP window in next ACK increases
4. sender knows it can send more
No separate "I read it" notification. The act of calling recv() moves the pointer.
Why circular?#
When WRITE pointer hits end of 4MB block, it wraps to start — reusing already-consumed memory.
[consumed][consumed][new data ][unread ][consumed]
↑ ↑
WRITE (wrapped) READ
Same 4MB block reused forever. No allocation, no GC.
senior insights#
- Circular buffer pattern is everywhere — Linux pipes, GPU command queues, LMAX Disruptor, Kafka producer buffers. Always: fixed memory, zero allocation, no GC, just pointer arithmetic.
- recv() returning 0 = EOF, not error — when remote closes connection,
recv()returns 0. This is clean shutdown signal. Break the read loop gracefully — don't treat as error.
11. TCP is Always a Stream#
The key insight#
TCP has no concept of "streaming mode" or "buffered mode". There is no flag, no setting, no configuration.
TCP always: receives bytes → puts in recv_buffer → done
TCP never: knows what HTTP is, what JSON is, what streaming means
The streaming vs buffering decision is 100% your application code:
# "buffered" — app waits for everything
data = b''
while len(data) < content_length:
chunk = socket.recv(65536)
data += chunk # accumulating in memory
process(data)
# "streaming" — app processes as it goes
while True:
chunk = socket.recv(65536)
if not chunk: break
process(chunk) # process immediately
Same socket. Same recv_buffer. Same TCP. Only difference: what your code does with each chunk.
12. REST Stateless vs TCP Connections#
The confusion#
"REST is stateless" does NOT mean a new TCP connection per request. These are different layers entirely:
REST (stateless) = server has no memory between requests
TCP connections = completely separate concern, managed by HTTP layer
HTTP version behavior#
HTTP/1.0: 1 connection per request. 10 calls = 10 TCP handshakes. (nobody uses this)
HTTP/1.1: Keep-alive by default. 1 connection reused for all requests.
But requests are ordered — head of line blocking.
Browser workaround: opens 6 parallel connections per origin.
HTTP/2: 1 connection, multiple parallel streams. No head of line blocking at HTTP level.
HTTP/3: 1 QUIC connection (UDP-based). Per-stream reliability.
The 6-connection browser hack#
HTTP/1.1's head of line blocking: slow request blocks all behind it. Browser fix: open 6 connections to same server simultaneously.
Problems with this hack:
6 × TCP handshakes
6 × TLS handshakes
6 × socket structs in OS
6 × TCP slow start
10,000 users × 6 = 60,000 connections server handles
HTTP/2 made this unnecessary — one connection, unlimited parallel streams.
Connection pooling (HTTP client)#
HTTP client keeps a pool of open connections keyed by host+port:
Pool entry: {
fd: 4, ← only link to OS
host: "api.example.com",
port: 443,
http_version: "h2",
in_use: false,
idle_since: timestamp
}
When you make a request → check pool → found open connection → reuse fd → skip handshakes.
Common Python bug#
# BAD — new TCP connection every call, slow
for item in items:
requests.get('/api/data') # opens + closes connection each time
# GOOD — one Session, connections reused
session = requests.Session()
for item in items:
session.get('/api/data') # same fd reused
Senior insights#
- HTTP/2 is a massive performance win for REST APIs — many small requests fly in parallel over one connection. Just upgrading often gives 30–50% latency improvement with zero app code changes.
- Stateless server + connection reuse is not a contradiction — TCP connection is maintained by OS network stack, below your application entirely. Your Express app being stateless means no session store. Says nothing about OS keeping TCP connection open.
13. HTTP Versions — 1.0, 1.1, 2, 3#
How client and server agree on HTTP version#
ALPN (Application Layer Protocol Negotiation) — happens inside TLS handshake:
Client TLS ClientHello:
+ ALPN extension: ["h2", "http/1.1"] ← what I support
Server TLS ServerHello:
+ ALPN extension: "h2" ← what I picked
No extra round trip. Piggybacks on TLS.
If server doesn't support HTTP/2 → picks "http/1.1" → graceful downgrade.
HTTP/3 discovery (different — uses QUIC, not TCP)#
First visit (HTTP/2):
Server response includes header:
Alt-Svc: h3=":443"; ma=86400 ← "I support HTTP/3"
Second visit:
Client opens QUIC connection on UDP port 443
Uses HTTP/3 directly
Installing/enabling HTTP versions#
You don't "install" HTTP versions. They're built into your server/client software. You just enable them:
# Nginx
listen 443 ssl; # HTTP/1.1
listen 443 ssl http2; # + HTTP/2
listen 443 quic reuseport; # + HTTP/3 (needs Nginx 1.25+)
add_header Alt-Svc 'h3=":443"'; # advertise HTTP/3
# FastAPI — HTTP/2 automatic with SSL
uvicorn main:app --ssl-keyfile key.pem --ssl-certfile cert.pem
# Caddy — HTTP/2 and HTTP/3 automatic, zero config
api.example.com {
reverse_proxy localhost:3000
}
Verify what's actually running#
curl -v --http2 https://yourdomain.com 2>&1 | grep -E "ALPN|HTTP/"
# → ALPN, offering h2
# → ALPN, server accepted h2
# → HTTP/2 200
Senior insights#
- ALPN failure silently downgrades — a proxy stripping ALPN extension means server never sees h2 preference, both fall back to HTTP/1.1 silently. Always verify with curl or DevTools.
- Domain sharding (HTTP/1.1 optimization) hurts HTTP/2 — spreading assets across subdomains forces multiple connections, losing HTTP/2's single-connection multiplexing benefit.
14. TLS vs TCP Handshake#
Simple intuition#
TCP handshake = "Can we talk?" — just establishing channel exists. TLS handshake = "Can we talk privately?" — verifying identity and agreeing on encryption.
TCP handshake — 3 steps#
Client Server
|──── SYN (seq=1000) ──────────►| "I want to connect"
|◄─── SYN-ACK (seq=5000) ───────| "OK, ready"
|──── ACK ──────────────────────►| "Let's go"
Total: 1.5 round trips
No encryption. No identity. Just: are you there?
TLS handshake — after TCP connects#
Client Server
|──── ClientHello ────────────────────►|
| - TLS version: 1.3 |
| - Supported ciphers |
| - client_random |
| - ALPN: ["h2", "http/1.1"] |
| |
|◄─── ServerHello ────────────────────|
| - Chosen cipher: AES-256 |
| - server_random |
| - ALPN chosen: "h2" |
| - Certificate (public key) |
| - Digital signature |
| |
| Client verifies certificate: |
| → signed by trusted CA? |
| → domain name matches? |
| → not expired? |
| |
|──── Key Exchange (Diffie-Hellman) ──►|
| Both independently compute same |
| session key — key never sent over |
| wire |
| |
|◄─── Finished ────────────────────── |
|──── Finished ────────────────────── |
| TLS established, HTTP/2 starts |
Total: 1 round trip (TLS 1.3)
Comparison#
| TCP | TLS | |
|---|---|---|
| Purpose | Open a channel | Secure the channel |
| Steps | 3 | 4-6 |
| Checks identity? | No | Yes (certificate) |
| Encrypts? | No | Yes (session key) |
| Negotiates? | Sequence numbers | Cipher + HTTP version |
| Required? | Always | Only for HTTPS |
Full timeline for one HTTPS request#
0ms TCP SYN
10ms TCP complete ← 1 round trip
10ms TLS ClientHello
20ms TLS complete ← 1 round trip (TLS 1.3)
20ms First HTTP byte ← real request starts here
Senior insights#
- TLS 1.3 0-RTT resumption — if you've connected before, client reuses session ticket to send encrypted data in first message (zero extra round trips). Trade-off: vulnerable to replay attacks, so only safe for idempotent GET requests.
- Certificate Transparency — all issued certs must be publicly logged. DigiNotar was hacked in 2011, issued fake Google certs, Iranian users intercepted. CT makes fake certs detectable. The CA trust chain is the weakest point in TLS security.
15. HTTP/2 Multiplexing#
The problem HTTP/2 solved#
HTTP/1.1 on one connection:
req1 (500ms DB query) → res1 → req2 → res2 → req3 → res3
Total: 600ms sequential
HTTP/2 on one connection:
req1 → req2 → req3 (all at once)
← res2 (fast query, back first)
← res3
←──── res1 (slow query, back last, nobody waited)
Total: 500ms (just the slowest one)
How it works — frames and stream IDs#
HTTP/2 breaks everything into small frames. Every frame has a 9-byte header:
┌─────────────────────────────────────────────────┐
│ Length (3 bytes) — payload size │
│ Type (1 byte) — HEADERS/DATA/SETTINGS │
│ Flags (1 byte) — END_STREAM/END_HEADERS │
│ Stream ID (4 bytes) — which request this is │
└─────────────────────────────────────────────────┘
Stream ID is the key — multiple requests share one TCP connection by tagging their frames:
One TCP connection carries all streams simultaneously:
stream 1 frames: [H:1][D:1][D:1] ← GET /orders
stream 3 frames: [H:3][D:3] ← GET /profile
stream 5 frames: [H:5] ← GET /settings
On the wire: [H:1][H:3][D:1][H:5][D:3][D:1]
interleaved freely, sorted by receiver using stream ID
Stream ID rules#
Client-initiated: odd numbers (1, 3, 5, 7...)
Server-initiated: even numbers (2, 4, 6...)
Always increasing, never reused
Frame types#
| Frame | Purpose |
|---|---|
| HEADERS | HTTP headers (method, path, status) |
| DATA | Request/response body |
| SETTINGS | Connection config (max streams, frame size) |
| WINDOW_UPDATE | Flow control signals |
| RST_STREAM | Cancel one stream without closing connection |
| GOAWAY | Closing the connection |
| CONTINUATION | Overflow of HEADERS when compressed headers are large |
Two-level flow control#
Connection level: total bytes across ALL streams combined (default 65535)
Stream level: bytes for ONE specific stream (default 65535 per stream)
Why two levels?
File download stream consuming whole connection window
→ all 99 other streams starve
With stream-level control:
→ file download gets limited window
→ other streams get their own windows
→ all 100 streams make progress
HPACK header compression#
Request 1: full headers sent → both sides add to table (~800 bytes)
Request 2: send index numbers only → ~20 bytes
only changed headers sent in full
800 bytes → 20 bytes per request
What HTTP/2 does NOT fix#
HTTP/2 fixed HTTP-level head of line blocking. But TCP-level still exists:
One lost TCP packet → ALL streams wait for retransmit
TCP doesn't know about HTTP/2 streams
→ This is what HTTP/3 + QUIC solves
Senior insights#
- Stream prioritization is underused — HTTP/2 lets clients assign weights to streams. Browsers use it to load HTML before CSS before JS before images. Most backend clients never use it.
- HTTP/2 multiplexing can hurt with slow backends — clients pile up hundreds of concurrent streams. Without
SETTINGS_MAX_CONCURRENT_STREAMS, one HTTP/2 client can overwhelm backend. Always set it in production (Nginx default: 128).
16. HTTP/2 Frame Structure#
The 9-byte frame header is always exactly 9 bytes#
This never changes. The variable part is the payload:
HTTP/2 FRAME:
┌─────────────────────────────────────────────────┐
│ FRAME HEADER (always exactly 9 bytes) │
│ ┌──────────┬──────┬───────┬───────────────┐ │
│ │ Length │ Type │ Flags │ Stream ID │ │
│ │ (3 bytes)│(1b) │ (1b) │ (4 bytes) │ │
│ └──────────┴──────┴───────┴───────────────┘ │
├─────────────────────────────────────────────────┤
│ PAYLOAD (0 to 16,383 bytes, variable) │
│ ← your actual HTTP headers or body live here │
└─────────────────────────────────────────────────┘
The Length field tells receiver exactly what to read#
3 bytes = 24 bits → max 16,383 bytes payload (default)
(negotiable up to 16MB via SETTINGS frame)
Receiver always:
1. Read exactly 9 bytes → parse frame header
2. Read exactly Length bytes → that's the payload
3. No ambiguity, no scanning
When HTTP headers are too large — CONTINUATION frames#
Big JWT token or many cookies → HEADERS payload > 16KB
Frame 1: HEADERS
Flags = 0x0 ← END_HEADERS NOT set (more coming)
Payload: [first 16KB of HPACK compressed headers]
Frame 2: CONTINUATION
Flags = 0x4 ← END_HEADERS set (done now)
Stream ID: same as HEADERS frame
Payload: [rest of headers]
Receiver buffers CONTINUATION frames until END_HEADERS = 1
The END_HEADERS and END_STREAM flags#
Flags byte (8 bits):
Bit 0 (0x1) = END_STREAM → no more data on this stream
Bit 2 (0x4) = END_HEADERS → headers complete, no CONTINUATION coming
Examples:
GET request (no body): Flags = 0x5 → END_HEADERS + END_STREAM
POST request (body): HEADERS Flags = 0x4 → END_HEADERS only
last DATA Flags = 0x1 → END_STREAM
Why CONTINUATION frames are rare#
HPACK compresses repeated headers aggressively:
Request 1: ~800 bytes of headers → both sides store in table
Request 2: ~20 bytes (just index references)
→ fits in one frame easily
Senior insights#
- CONTINUATION frames were the HTTP/2 Rapid Reset attack vector (CVE-2023-44487) — attackers sent HEADERS without END_HEADERS, forcing servers to buffer unbounded CONTINUATION frames, then RST_STREAM to cancel — but buffering already happened. Millions of times per second = largest DDoS in internet history (late 2023). Fix: server-side limits on CONTINUATION frames per stream.
17. Body, Query and Path Params in HTTP/2#
Where everything lives#
| Request part | Frame type | Notes |
|---|---|---|
| HTTP method | HEADERS | :method pseudo-header |
| Path params | HEADERS | Part of :path |
| Query params | HEADERS | Part of :path after ? |
| HTTP headers | HEADERS | HPACK compressed |
| Request body | DATA | Raw bytes, uncompressed |
| Large headers | HEADERS + CONTINUATION | Until END_HEADERS flag |
| Large body | Multiple DATA frames | END_STREAM on last one |
Path and query params — always in HEADERS#
GET /users/42/orders?status=pending&limit=10
HEADERS frame payload (HPACK):
:method = GET
:path = /users/42/orders?status=pending&limit=10
:scheme = https
:authority = api.example.com
authorization = Bearer abc
No DATA frame for GET — body is empty
Request body — DATA frames, separate from HEADERS#
POST /orders
Content-Type: application/json
{"item": "coffee", "qty": 2}
Frame 1: HEADERS
Flags = END_HEADERS (no END_STREAM — body coming)
:method = POST, :path = /orders, content-type = application/json
Frame 2: DATA
Flags = END_STREAM (body done, stream done)
Payload: {"item":"coffee","qty":2} ← raw bytes, no compression
Large body — multiple DATA frames#
POST with 5MB body:
HEADERS [END_HEADERS]
DATA (16KB)
DATA (16KB)
...
DATA (last chunk) [END_STREAM] ← this flag says body is complete
Content-Length in HTTP/2#
Content-Length is advisory — END_STREAM flag on last DATA frame is the real signal. Still sent for validation — if actual bytes ≠ Content-Length → treat as error.
18. socket.connect() — OS vs HTTP Client#
Two completely different things#
RAW SOCKET (talking directly to OS):
s = socket.socket() # system call → OS creates socket struct
s.connect(("google.com", 80)) # system call → OS does TCP handshake
s.send(b"GET / HTTP/1.1\r\n") # system call → OS writes to send_buffer
data = s.recv(1024) # system call → OS reads recv_buffer
HTTP CLIENT (talking to library):
client = httpx.Client()
client.get("https://google.com") # library call → library calls OS internally
What axios.get() actually does internally#
axios.get("https://api.example.com/orders")
↓ check connection pool
↓ OS: socket() → fd = 4
↓ OS: connect(fd=4, ip, 443) → TCP handshake
↓ TLS library: wrap(fd=4) → TLS handshake + ALPN
↓ OS: write(fd=4, HTTP2_frames) → HEADERS frame to send_buffer
↓ OS: recv(fd=4, buf, n) → read response from recv_buffer
↓ parse frames → build response object
↓ return to your code
You called one line. Library made a dozen system calls.
Server side — what app.listen() does#
app.listen(3000)
↓ OS: socket() → fd = 4 (listening socket)
↓ OS: bind(fd=4, port=3000) → "port 3000 = this process"
↓ OS: listen(fd=4, backlog=511) → mark as passive
↓ OS: accept(fd=4) → block, wait for clients
TCP handshake done by OS before accept() returns
new fd = 5 for this client
fd=4 goes back to waiting
The two sockets that always exist#
fd = 4 LISTENING socket → "front door", bound to port, never reads data
fd = 5 CONNECTED socket → one per client, actual data flows here
fd = 6 CONNECTED socket → another client
fd = 7 CONNECTED socket → another client
When your server also makes HTTP calls#
app.get('/orders', async (req, res) => {
const data = await axios.get('http://inventory-service/items')
// ↑ your server acting as HTTP CLIENT here
res.json(data)
})
Your process is simultaneously HTTP server (receiving) and HTTP client (calling other services). Two separate socket pools. This is exactly what microservices are.
19. How a Framework Decodes Packets#
The framework is responsible for decoding — not you#
recv_buffer bytes (raw hex)
↓
HTTP parser (C library — llhttp in Node.js, httptools in Python)
↓ state machine eats bytes:
read until space → METHOD ("GET")
read until space → PATH ("/orders")
read until \r\n → VERSION
loop:
read until ":" → header name
read until \r\n → header value
if \r\n\r\n → headers done, body starts
read remaining → body
↓
Middleware stack runs on parsed data:
express.json() → raw body bytes → req.body JavaScript object
cookieParser() → Cookie string → req.cookies object
authMiddleware() → Auth header → req.user object
↓
YOUR handler runs:
req.method, req.path, req.headers, req.body ← all clean, all decoded
you never saw a single raw byte
HTTP parsers in popular frameworks#
| Language | Framework | HTTP Parser |
|---|---|---|
| JavaScript | Express | llhttp (C, ships with Node.js) |
| Python | FastAPI | httptools (C, pip installed) |
| Java | Spring | Netty (Java NIO) |
| Go | net/http | Built into Go stdlib |
| Rust | Actix | httparse (Rust, zero-copy) |
The exact boundary#
─────────────────── recv_buffer bytes ────────────────────
FRAMEWORK'S WORLD
TCP byte stream → HTTP frame parsing → HPACK decompression
→ headers decoded → body assembled → middleware chain
──────────────────── your route handler ──────────────────
YOUR WORLD
req.body, req.headers, req.params, business logic
──────────────────────── res.send() ─────────────────────
FRAMEWORK'S WORLD AGAIN
serialize response → build HTTP frames → write to send_buffer
──────────────────── OS send_buffer bytes ────────────────
Framework vs HTTP server vs HTTP client#
HTTP CLIENT (axios, httpx, curl):
INITIATES connections, sends requests, receives responses
system calls: connect()
HTTP SERVER (Express, FastAPI, Rails):
WAITS for connections, receives requests, sends responses
system calls: bind(), listen(), accept()
FRAMEWORK = HTTP server + HTTP parser + router + middleware system
20. The Full Stack — End to End#
Complete mental model#
SILICON
Electrical signals / radio waves on network hardware
IP LAYER (OS)
Raw packets — may arrive out of order, may be lost
Reassembles into bytes
TCP LAYER (OS)
Ordered, reliable byte stream
Handles: retransmits, flow control, congestion control
Writes bytes into recv_buffer
Manages circular buffer (READ/WRITE pointers)
HTTP/2 LAYER (framework's C library)
Reads raw bytes from recv_buffer
Parses 9-byte frame headers
Reads exactly Length bytes payload
Sorts frames to streams by stream ID
Runs HPACK decompressor on HEADERS frames
Assembles body from DATA frames
MIDDLEWARE STACK (framework)
JSON parsing → req.body
Cookie parsing → req.cookies
Auth checks → req.user
YOUR HANDLER
req.method, req.path, req.headers, req.body
business logic
res.json(), res.send()
REVERSE (sending response):
Your object → framework serializes → HTTP/2 frames → send_buffer → network
What each layer owns#
| Layer | Owns |
|---|---|
| Your code | Business logic only |
| Framework | HTTP parsing, routing, middleware |
| OS | Socket struct, buffers, TCP state |
| Network | Moving bits between machines |
Why swapping TCP for QUIC (HTTP/3) only affects one layer#
HTTP/3 = same HTTP/2 framing + same HPACK + same stream multiplexing
but QUIC replaces TCP at transport layer
Your app code: unchanged
Framework: unchanged
HTTP/2 frames: unchanged
Transport: TCP → QUIC (UDP-based, per-stream reliability)
Each layer is independent. Replace one layer without touching others.
Quick Reference — Interview Answers#
What is a socket?#
"A socket is an abstraction the OS provides representing one end of a network connection. Under the hood it's a file descriptor — an integer — pointing to an in-memory struct holding the connection's identity (4-tuple), send/receive buffers, TCP state, sequence numbers, and timers. Your code never directly touches the network — it reads and writes to buffers, and the OS handles everything else."
How does one port serve millions?#
"A port alone doesn't define a connection. The OS uses a 4-tuple: source IP, source port, destination IP, destination port. Millions of users all connecting to port 443 each have a unique source IP/port combination. The OS creates a dedicated socket per connection while the listening socket stays open. What actually enables millions is async I/O — one thread handles thousands of idle connections using an event loop — combined with load balancers across multiple machines."
How does streaming work?#
"TCP is always a stream of bytes — no concept of messages or files. Streaming means reading bytes as they arrive in chunks (typically 64KB) and processing each chunk immediately, instead of waiting for everything. At the socket level this is just calling recv() in a loop and handling each return. This integrates with TCP flow control: recv buffer fills when you're slow, TCP window shrinks, sender automatically slows down. Frameworks expose this as streams/generators. Memory stays constant regardless of payload size."
What is HTTP/2 multiplexing?#
"HTTP/2 breaks everything into binary frames with a 9-byte header containing a stream ID. Multiple requests share one TCP connection by tagging their frames with different stream IDs. The receiver sorts frames to streams by ID. A slow stream 1 response doesn't block stream 3 — their frames are independent. This eliminates HTTP-level head of line blocking. HTTP/2 also adds two-level flow control and HPACK header compression. The one remaining problem — TCP-level head of line blocking — is what HTTP/3 solves with QUIC."
How does a framework work?#
"A framework is an HTTP server plus an HTTP parser plus a router. When a request arrives, the OS does the TCP handshake and puts bytes in the recv buffer. The framework's HTTP parser — a C library like llhttp — reads raw bytes, runs a state machine to find frame boundaries, HPACK-decompresses headers, assembles body from DATA frames, runs middleware, then calls your handler with a clean request object. Your business logic never sees a single raw byte."
Topics to explore next: gRPC and Protocol Buffers · WebSockets · Service mesh (Envoy/Istio) · QUIC internals · Database connection pooling