The Complete Guide to Message Queue Selection

A practical reference for engineers designing distributed systems — from quick prototypes to large-scale production architectures.

1. Why Queues Exist — The Mental Model#

Before picking a tool, understand the three fundamental problems queues solve:

Problem 1: Rate Mismatch — A producer generates events faster than a consumer can process them. A queue acts as a buffer, absorbing spikes and letting consumers work at their own pace.

Example: Your API receives 10,000 order events per second during a flash sale, but your email service can only send 500/second. A queue holds the backlog and drains it over time.

Problem 2: Decoupling — Service A shouldn't need to know Service B exists. With a queue, producers publish events into a channel — any number of consumers can subscribe without the producer caring.

Problem 3: Reliability & Fault Tolerance — If Service B goes down, a queue persists messages and re-delivers them when the consumer comes back.

The Two Fundamental Delivery Models#

Message Queue (Point-to-Point): One message is consumed by exactly one consumer. Once processed, it is deleted.

Producer → [Queue] → Consumer A (message gone after processing)

Use when distributing work — sending an email, processing a payment, resizing an image.

Pub/Sub (Publish-Subscribe): One message is delivered to ALL subscribers. Each subscriber gets its own copy.

Producer → [Topic] → Consumer A (gets a copy)
                   → Consumer B (gets a copy)
                   → Consumer C (gets a copy)

Use when broadcasting events — an "order placed" event that triggers billing, inventory, and notifications simultaneously.

Critical Insight: Most queues support both models, but are optimized for one. Kafka is optimized for Pub/Sub at scale. RabbitMQ is optimized for flexible routing.

2. The Four Players: Quick Summary#

Queue	Best One-Line Description	Primary Model
Kafka	A distributed, persistent, ordered event log	Pub/Sub (with consumer groups)
RabbitMQ	A flexible, feature-rich message broker	Point-to-Point (with Pub/Sub)
Amazon SQS	A simple, fully managed cloud task queue	Point-to-Point
Redis Pub/Sub	An in-memory, fire-and-forget broadcast channel	Pub/Sub (ephemeral)

3. Apache Kafka#

Kafka is not a traditional message queue — it's a distributed commit log. Consumers don't delete messages; they track their own offset in the log.

[Partition 0]: event1 | event2 | event3 | event4 | event5 ...
                                              ↑
                                     Consumer A is at offset 3
                                     Consumer B is at offset 5

Because nothing is deleted, multiple consumer groups can independently read the same data at different positions.

Core Concepts:

Topic — A named stream of events
Partition — A topic is split into N partitions, each an independent ordered log. More partitions = more parallelism
Consumer Group — Consumers that collectively read a topic; each partition goes to exactly one consumer
Offset — A cursor marking where a consumer is in a partition
Retention — Kafka retains messages for a configured period (e.g., 7 days) regardless of consumption

Key Features:

Ordering — Guaranteed within a partition. Use a partition key (e.g., user ID) to ensure all events for one user go to the same partition
Replay / Rewind — Reset a consumer's offset to any point in time and replay events
Exactly-Once Semantics — Supported but complex; at-least-once + idempotent consumers is simpler for most cases
Log Compaction — Keeps only the latest message per key, useful for maintaining state snapshots

Use Kafka when: High throughput (millions of events/sec), multiple teams consuming the same stream, audit logs, event sourcing, stream processing, data pipelines.

Don't use Kafka when: Complex routing, small teams avoiding ops overhead, per-message TTL, simple task queues.

Gotchas: Partition count is hard to change after the fact. Consumer lag monitoring is critical. Ordering is per-partition only.

4. RabbitMQ#

RabbitMQ is a traditional message broker implementing AMQP. Producers send to an exchange, which routes to queues via bindings. Messages are deleted after acknowledgment.

Exchange Types:

Direct Exchange — Routes to a queue whose binding key exactly matches the routing key.

Fanout Exchange — Broadcasts every message to ALL bound queues. Routing keys ignored.

Topic Exchange — Pattern matching with wildcards (* matches one word, # matches zero or more).

Headers Exchange — Routes based on message header attributes.

Key Features:

Dead Letter Queues — Failed messages route to a DLQ for inspection and retry
Message TTL — Set expiration on messages or entire queues
Message Priority — Higher priority messages delivered first (0–10 range)
Delayed Messages — Via plugin, schedule messages for future delivery
Quorum Queues — Raft-based replicated queues for production durability

Use RabbitMQ when: Complex routing, per-message TTL/priority, delayed delivery, task queues with DLQ, request/reply patterns.

Don't use RabbitMQ when: Millions of messages/second, message replay, multiple independent consumer groups on the same stream.

5. Amazon SQS#

SQS is Amazon's fully managed queue — nothing to install or scale.

Standard Queue — At-least-once delivery, best-effort ordering, unlimited throughput
FIFO Queue — Exactly-once, strict ordering, 300 TPS limit (3,000 with batching)
Visibility Timeout — Message becomes invisible after pickup; redelivered if consumer crashes before deleting it
Long Polling — Keep connection open 20s until a message arrives (always use in production)
SQS + SNS Fan-Out — SNS publishes to multiple SQS queues simultaneously (one per service)
Lambda Integration — Native trigger: Lambda auto-scales with queue depth

Use SQS when: Fully on AWS, simple task distribution, zero infrastructure management, native AWS integration.

Don't use SQS when: Message replay, message priority, streaming, complex routing, not on AWS.

Gotchas: Standard queue can deliver duplicates — consumers must be idempotent. Visibility timeout must be longer than processing time. Messages up to 256KB only (use S3 + claim-check pattern for larger payloads).

6. Redis Pub/Sub#

Redis Pub/Sub is a fire-and-forget broadcast channel. If a consumer is offline when a message is published, it misses that message forever — no persistence, no ACK, no retry.

Key Features:

Sub-millisecond delivery latency (in-memory, no disk I/O)
Pattern subscribe (PSUBSCRIBE order.*) for wildcard channel subscriptions
Extremely fast fan-out to thousands of subscribers

Redis Streams (better alternative): Adds persistence, consumer groups, ACK, replay, and message history — essentially Kafka Lite for moderate scale.

Use Redis Pub/Sub when: Real-time presence updates, live dashboard updates, cache invalidation signals, ephemeral chat, you already have Redis.

Don't use when: Losing a message is unacceptable, you need history or replay, processing workloads.

Gotchas: No durability — Redis restart loses all in-flight messages. Slow consumers block publishing. Memory pressure disconnects slow subscribers.

7. Feature Comparison Matrix#

Delivery Semantics#

Feature	Kafka	RabbitMQ	SQS Standard	SQS FIFO	Redis Pub/Sub
At-least-once	✅	✅	✅	✅	❌

8. The Decision Framework#

Run through these questions in order:

Step 1: Distributing work or broadcasting events?

Work (one message, one consumer) → SQS or RabbitMQ
Events (one message, many consumers) → Kafka or SNS+SQS or Redis Pub/Sub

Step 2: Need message replay/rewind?

Yes → Kafka (or Redis Streams for moderate scale)
No → continue

Step 3: Throughput requirements?

< 10K/sec → any queue, choose by features
500K+/sec → Kafka or SQS Standard
Sub-millisecond to many subscribers → Redis Pub/Sub

Step 4: Complex routing (patterns, headers, content)?

Yes → RabbitMQ
No → continue

Step 5: Priority, TTL, or scheduled delivery?

Priority or arbitrary delay → RabbitMQ
Short delay (< 15 min) → SQS Delay Queue

Step 6: Infrastructure philosophy?

Fully on AWS → SQS + SNS
Open-source, control → Kafka or RabbitMQ
Already have Redis, moderate scale → Redis Streams

Step 7: Durability requirements?

Messages cannot be lost → Kafka, SQS, or RabbitMQ Quorum Queues
Some loss acceptable → Redis Pub/Sub is fine

9. Common System Design Patterns#

Order Processing#

User Places Order → [Kafka/SNS] → Billing, Inventory, Notification, Fraud

Why Kafka: One event, multiple independent services. Each is a separate consumer group. Use SNS+SQS on AWS if no replay needed.

Background Job Queue#

API Server → [SQS/RabbitMQ] → Worker Pool

Why SQS/RabbitMQ: Classic task queue — one job, one worker. RabbitMQ if you need priority queues (premium users first).

Real-Time Analytics Pipeline#

App Events → [Kafka] → Stream Processor → Data Warehouse + Dashboard + Anomaly Detection

Why Kafka: High throughput, multiple consumers, durable log for reprocessing.

Notification Routing#

Events → [RabbitMQ Topic Exchange] → Email Queue, SMS Queue, Push Queue

Why RabbitMQ: Topic exchange routes notification.email.* to email queue, notification.sms.* to SMS queue.

Chat / Presence#

User message → [Redis Pub/Sub] → All users in room receive it

Why Redis: Messages are ephemeral; offline users catch up from a database, not the queue.

Delayed Jobs (> 15 minutes)#

Store in database with deliver_at timestamp. Scheduler service pushes to SQS/RabbitMQ at delivery time.

10. Hybrid Architectures#

Kafka + SQS#

Kafka as durable event log/source of truth. SQS for operational task queues. A Kafka consumer bridges them — reads events from Kafka, pushes actionable tasks to SQS.

SNS + SQS (AWS Fan-Out)#

SNS publishes to multiple SQS queues simultaneously. Each service has its own queue — independent scaling and isolated failure.

Kafka + Redis Pub/Sub#

Kafka ensures nothing is lost. A consumer processes events and publishes results to Redis for sub-millisecond delivery to live browser clients. If client misses a Redis message, it fetches from DB.

11. Quick-Reference Decision Tree#

START
  │
  ├─ Need replay/rewind of historical events?
  │   YES → Kafka
  │   NO  ↓
  │
  ├─ Need to broadcast to multiple independent consumers?
  │   YES ─┬─ High throughput or stream processing?
  │         │   YES → Kafka
  │         │   NO  → SNS+SQS | RabbitMQ Fanout | Redis Pub/Sub (ephemeral)
  │   NO  ↓
  │
  ├─ Need complex routing (patterns, headers)?
  │   YES → RabbitMQ
  │   NO  ↓
  │
  ├─ Need priority, TTL, or arbitrary delay?
  │   Priority/arbitrary delay → RabbitMQ
  │   Short delay only (< 15 min) → SQS Delay Queue
  │   NO  ↓
  │
  ├─ Fully on AWS? Want managed service?
  │   YES → SQS
  │   NO  ↓
  │
  ├─ Sub-millisecond broadcast, can tolerate loss?
  │   YES → Redis Pub/Sub
  │   NO  ↓
  │
  ├─ Already using Redis, moderate scale?
  │   YES → Redis Streams
  │   NO  ↓
  │
  └─ Default → RabbitMQ (versatile, well-understood)

12. Summary Table#

Scenario	Best Fit
High-throughput event streaming	Kafka
Event sourcing / audit log	Kafka
Multiple teams consuming same events	Kafka
Replay historical data	Kafka
Stream processing (windowed aggregations)	Kafka + Kafka Streams
Complex routing by content or pattern	RabbitMQ
Message priority queues	RabbitMQ
Per-message TTL	RabbitMQ
Delayed/scheduled delivery (arbitrary)	RabbitMQ (plugin)
Task queues with dead-letter handling	RabbitMQ or SQS
Simple AWS task queue, no ops	SQS
Fan-out on AWS	SNS + SQS
Strict ordering + exactly-once (AWS)	SQS FIFO
Real-time ephemeral broadcast	Redis Pub/Sub
Live presence / typing indicators	Redis Pub/Sub
Cache invalidation signals	Redis Pub/Sub
Persistent consumer-group streams (Redis)	Redis Streams

13. Queue vs Stream — The Most Confused Distinction#

Queue = A task handed off. Done. Forget it. Stream = A permanent record of what happened. Process it whenever you want.

A Queue is like a Post Office: You write a letter and drop it in the mailbox. Once delivered, it's gone. Its only job was to move the message from A to B.

A Stream is like a Bank Ledger: Every transaction is recorded permanently, in order. Multiple people — auditors, accountants, analysts — can independently read the same ledger.

The Five Fundamental Differences#

1. What happens after a message is consumed?

Queue → deleted
Stream → stays; multiple consumers read it at their own position

2. How many consumers can independently read?

Queue → multiple workers share the queue; each message goes to exactly one worker
Stream → unlimited independent consumer groups, each reads the full stream

3. Is ordering a first-class concern?

Queue → best-effort in most implementations
Stream → ordering is the entire point

4. Command vs Fact?

Queue message → a command: { "action": "send_email", "to": "..." } (imperative)
Stream event → a fact: { "event": "user.signed_up", "at": "..." } (past tense, immutable)

5. Replay?

Queue → cannot replay; once processed, messages are gone forever
Stream → reset offset to any point and re-read; stream is the source of truth

Does Persistence Make Something a Stream?#

No. SQS and RabbitMQ both persist to disk — neither is a stream. Persistence in a queue means "survive until one consumer gets it." Persistence in a stream means "the log is the system — read it any time from any position."

Is Kafka a Stream by Default?#

Yes. But your consumption pattern decides whether you're using it as a stream or a queue:

Stream mode → multiple independent consumer groups, each reads every event
Queue mode → competing consumers in one group, each event processed by exactly one worker

When to Use Each#

Use a Queue when: Message is a command that happens once, work distribution is the goal, no history needed.

Use a Stream when: Event is a fact multiple systems need, multiple teams consume same data, you need history/replay, real-time data pipelines, order of events carries meaning.

Queue: "Here's something to do — someone please handle it." Stream: "Here's something that happened — anyone who cares can read it, now or later."

Last updated: 2026. Kafka 3.x. RabbitMQ 3.12+. SQS: current AWS. Redis 7.x.

The Complete Guide to Message Queue Selection

1. Why Queues Exist — The Mental Model#

The Two Fundamental Delivery Models#

2. The Four Players: Quick Summary#

3. Apache Kafka#

4. RabbitMQ#

5. Amazon SQS#

6. Redis Pub/Sub#

7. Feature Comparison Matrix#

Delivery Semantics#

8. The Decision Framework#

9. Common System Design Patterns#

Order Processing#

Background Job Queue#

Real-Time Analytics Pipeline#

Notification Routing#

Chat / Presence#

Delayed Jobs (> 15 minutes)#

10. Hybrid Architectures#

Kafka + SQS#

SNS + SQS (AWS Fan-Out)#

Kafka + Redis Pub/Sub#

11. Quick-Reference Decision Tree#

12. Summary Table#

13. Queue vs Stream — The Most Confused Distinction#

The Five Fundamental Differences#

Does Persistence Make Something a Stream?#

Is Kafka a Stream by Default?#

When to Use Each#

Related Posts

Load Balancing Techniques

Managed Load Balancing Services

Queue Systems Mastery Guide