Logo
Technical Article

The Complete Guide to Message Queue Selection

12 min read

A practical reference for engineers designing distributed systems — from quick prototypes to large-scale production architectures.


1. Why Queues Exist — The Mental Model#

Before picking a tool, understand the three fundamental problems queues solve:

Problem 1: Rate Mismatch — A producer generates events faster than a consumer can process them. A queue acts as a buffer, absorbing spikes and letting consumers work at their own pace.

Example: Your API receives 10,000 order events per second during a flash sale, but your email service can only send 500/second. A queue holds the backlog and drains it over time.

Problem 2: Decoupling — Service A shouldn't need to know Service B exists. With a queue, producers publish events into a channel — any number of consumers can subscribe without the producer caring.

Problem 3: Reliability & Fault Tolerance — If Service B goes down, a queue persists messages and re-delivers them when the consumer comes back.

The Two Fundamental Delivery Models#

Message Queue (Point-to-Point): One message is consumed by exactly one consumer. Once processed, it is deleted.

Producer → [Queue] → Consumer A (message gone after processing)

Use when distributing work — sending an email, processing a payment, resizing an image.

Pub/Sub (Publish-Subscribe): One message is delivered to ALL subscribers. Each subscriber gets its own copy.

Producer → [Topic] → Consumer A (gets a copy)
                   → Consumer B (gets a copy)
                   → Consumer C (gets a copy)

Use when broadcasting events — an "order placed" event that triggers billing, inventory, and notifications simultaneously.

Critical Insight: Most queues support both models, but are optimized for one. Kafka is optimized for Pub/Sub at scale. RabbitMQ is optimized for flexible routing.


2. The Four Players: Quick Summary#

QueueBest One-Line DescriptionPrimary Model
KafkaA distributed, persistent, ordered event logPub/Sub (with consumer groups)
RabbitMQA flexible, feature-rich message brokerPoint-to-Point (with Pub/Sub)
Amazon SQSA simple, fully managed cloud task queuePoint-to-Point
Redis Pub/SubAn in-memory, fire-and-forget broadcast channelPub/Sub (ephemeral)

3. Apache Kafka#

Kafka is not a traditional message queue — it's a distributed commit log. Consumers don't delete messages; they track their own offset in the log.

[Partition 0]: event1 | event2 | event3 | event4 | event5 ...
                                              ↑
                                     Consumer A is at offset 3
                                     Consumer B is at offset 5

Because nothing is deleted, multiple consumer groups can independently read the same data at different positions.

Core Concepts:

  • Topic — A named stream of events
  • Partition — A topic is split into N partitions, each an independent ordered log. More partitions = more parallelism
  • Consumer Group — Consumers that collectively read a topic; each partition goes to exactly one consumer
  • Offset — A cursor marking where a consumer is in a partition
  • Retention — Kafka retains messages for a configured period (e.g., 7 days) regardless of consumption

Key Features:

  • Ordering — Guaranteed within a partition. Use a partition key (e.g., user ID) to ensure all events for one user go to the same partition
  • Replay / Rewind — Reset a consumer's offset to any point in time and replay events
  • Exactly-Once Semantics — Supported but complex; at-least-once + idempotent consumers is simpler for most cases
  • Log Compaction — Keeps only the latest message per key, useful for maintaining state snapshots

Use Kafka when: High throughput (millions of events/sec), multiple teams consuming the same stream, audit logs, event sourcing, stream processing, data pipelines.

Don't use Kafka when: Complex routing, small teams avoiding ops overhead, per-message TTL, simple task queues.

Gotchas: Partition count is hard to change after the fact. Consumer lag monitoring is critical. Ordering is per-partition only.


4. RabbitMQ#

RabbitMQ is a traditional message broker implementing AMQP. Producers send to an exchange, which routes to queues via bindings. Messages are deleted after acknowledgment.

Exchange Types:

Direct Exchange — Routes to a queue whose binding key exactly matches the routing key.

Fanout Exchange — Broadcasts every message to ALL bound queues. Routing keys ignored.

Topic Exchange — Pattern matching with wildcards (* matches one word, # matches zero or more).

Headers Exchange — Routes based on message header attributes.

Key Features:

  • Dead Letter Queues — Failed messages route to a DLQ for inspection and retry
  • Message TTL — Set expiration on messages or entire queues
  • Message Priority — Higher priority messages delivered first (0–10 range)
  • Delayed Messages — Via plugin, schedule messages for future delivery
  • Quorum Queues — Raft-based replicated queues for production durability

Use RabbitMQ when: Complex routing, per-message TTL/priority, delayed delivery, task queues with DLQ, request/reply patterns.

Don't use RabbitMQ when: Millions of messages/second, message replay, multiple independent consumer groups on the same stream.


5. Amazon SQS#

SQS is Amazon's fully managed queue — nothing to install or scale.

  • Standard Queue — At-least-once delivery, best-effort ordering, unlimited throughput
  • FIFO Queue — Exactly-once, strict ordering, 300 TPS limit (3,000 with batching)
  • Visibility Timeout — Message becomes invisible after pickup; redelivered if consumer crashes before deleting it
  • Long Polling — Keep connection open 20s until a message arrives (always use in production)
  • SQS + SNS Fan-Out — SNS publishes to multiple SQS queues simultaneously (one per service)
  • Lambda Integration — Native trigger: Lambda auto-scales with queue depth

Use SQS when: Fully on AWS, simple task distribution, zero infrastructure management, native AWS integration.

Don't use SQS when: Message replay, message priority, streaming, complex routing, not on AWS.

Gotchas: Standard queue can deliver duplicates — consumers must be idempotent. Visibility timeout must be longer than processing time. Messages up to 256KB only (use S3 + claim-check pattern for larger payloads).


6. Redis Pub/Sub#

Redis Pub/Sub is a fire-and-forget broadcast channel. If a consumer is offline when a message is published, it misses that message forever — no persistence, no ACK, no retry.

Key Features:

  • Sub-millisecond delivery latency (in-memory, no disk I/O)
  • Pattern subscribe (PSUBSCRIBE order.*) for wildcard channel subscriptions
  • Extremely fast fan-out to thousands of subscribers

Redis Streams (better alternative): Adds persistence, consumer groups, ACK, replay, and message history — essentially Kafka Lite for moderate scale.

Use Redis Pub/Sub when: Real-time presence updates, live dashboard updates, cache invalidation signals, ephemeral chat, you already have Redis.

Don't use when: Losing a message is unacceptable, you need history or replay, processing workloads.

Gotchas: No durability — Redis restart loses all in-flight messages. Slow consumers block publishing. Memory pressure disconnects slow subscribers.


7. Feature Comparison Matrix#

Delivery Semantics#

FeatureKafkaRabbitMQSQS StandardSQS FIFORedis Pub/Sub
At-least-once

8. The Decision Framework#

Run through these questions in order:

Step 1: Distributing work or broadcasting events?

  • Work (one message, one consumer) → SQS or RabbitMQ
  • Events (one message, many consumers) → Kafka or SNS+SQS or Redis Pub/Sub

Step 2: Need message replay/rewind?

  • Yes → Kafka (or Redis Streams for moderate scale)
  • No → continue

Step 3: Throughput requirements?

  • < 10K/sec → any queue, choose by features
  • 500K+/sec → Kafka or SQS Standard
  • Sub-millisecond to many subscribers → Redis Pub/Sub

Step 4: Complex routing (patterns, headers, content)?

  • Yes → RabbitMQ
  • No → continue

Step 5: Priority, TTL, or scheduled delivery?

  • Priority or arbitrary delay → RabbitMQ
  • Short delay (< 15 min) → SQS Delay Queue

Step 6: Infrastructure philosophy?

  • Fully on AWS → SQS + SNS
  • Open-source, control → Kafka or RabbitMQ
  • Already have Redis, moderate scale → Redis Streams

Step 7: Durability requirements?

  • Messages cannot be lost → Kafka, SQS, or RabbitMQ Quorum Queues
  • Some loss acceptable → Redis Pub/Sub is fine

9. Common System Design Patterns#

Order Processing#

User Places Order → [Kafka/SNS] → Billing, Inventory, Notification, Fraud

Why Kafka: One event, multiple independent services. Each is a separate consumer group. Use SNS+SQS on AWS if no replay needed.

Background Job Queue#

API Server → [SQS/RabbitMQ] → Worker Pool

Why SQS/RabbitMQ: Classic task queue — one job, one worker. RabbitMQ if you need priority queues (premium users first).

Real-Time Analytics Pipeline#

App Events → [Kafka] → Stream Processor → Data Warehouse + Dashboard + Anomaly Detection

Why Kafka: High throughput, multiple consumers, durable log for reprocessing.

Notification Routing#

Events → [RabbitMQ Topic Exchange] → Email Queue, SMS Queue, Push Queue

Why RabbitMQ: Topic exchange routes notification.email.* to email queue, notification.sms.* to SMS queue.

Chat / Presence#

User message → [Redis Pub/Sub] → All users in room receive it

Why Redis: Messages are ephemeral; offline users catch up from a database, not the queue.

Delayed Jobs (> 15 minutes)#

Store in database with deliver_at timestamp. Scheduler service pushes to SQS/RabbitMQ at delivery time.


10. Hybrid Architectures#

Kafka + SQS#

Kafka as durable event log/source of truth. SQS for operational task queues. A Kafka consumer bridges them — reads events from Kafka, pushes actionable tasks to SQS.

SNS + SQS (AWS Fan-Out)#

SNS publishes to multiple SQS queues simultaneously. Each service has its own queue — independent scaling and isolated failure.

Kafka + Redis Pub/Sub#

Kafka ensures nothing is lost. A consumer processes events and publishes results to Redis for sub-millisecond delivery to live browser clients. If client misses a Redis message, it fetches from DB.


11. Quick-Reference Decision Tree#

START
  │
  ├─ Need replay/rewind of historical events?
  │   YES → Kafka
  │   NO  ↓
  │
  ├─ Need to broadcast to multiple independent consumers?
  │   YES ─┬─ High throughput or stream processing?
  │         │   YES → Kafka
  │         │   NO  → SNS+SQS | RabbitMQ Fanout | Redis Pub/Sub (ephemeral)
  │   NO  ↓
  │
  ├─ Need complex routing (patterns, headers)?
  │   YES → RabbitMQ
  │   NO  ↓
  │
  ├─ Need priority, TTL, or arbitrary delay?
  │   Priority/arbitrary delay → RabbitMQ
  │   Short delay only (< 15 min) → SQS Delay Queue
  │   NO  ↓
  │
  ├─ Fully on AWS? Want managed service?
  │   YES → SQS
  │   NO  ↓
  │
  ├─ Sub-millisecond broadcast, can tolerate loss?
  │   YES → Redis Pub/Sub
  │   NO  ↓
  │
  ├─ Already using Redis, moderate scale?
  │   YES → Redis Streams
  │   NO  ↓
  │
  └─ Default → RabbitMQ (versatile, well-understood)

12. Summary Table#

ScenarioBest Fit
High-throughput event streamingKafka
Event sourcing / audit logKafka
Multiple teams consuming same eventsKafka
Replay historical dataKafka
Stream processing (windowed aggregations)Kafka + Kafka Streams
Complex routing by content or patternRabbitMQ
Message priority queuesRabbitMQ
Per-message TTLRabbitMQ
Delayed/scheduled delivery (arbitrary)RabbitMQ (plugin)
Task queues with dead-letter handlingRabbitMQ or SQS
Simple AWS task queue, no opsSQS
Fan-out on AWSSNS + SQS
Strict ordering + exactly-once (AWS)SQS FIFO
Real-time ephemeral broadcastRedis Pub/Sub
Live presence / typing indicatorsRedis Pub/Sub
Cache invalidation signalsRedis Pub/Sub
Persistent consumer-group streams (Redis)Redis Streams

13. Queue vs Stream — The Most Confused Distinction#

Queue = A task handed off. Done. Forget it. Stream = A permanent record of what happened. Process it whenever you want.

A Queue is like a Post Office: You write a letter and drop it in the mailbox. Once delivered, it's gone. Its only job was to move the message from A to B.

A Stream is like a Bank Ledger: Every transaction is recorded permanently, in order. Multiple people — auditors, accountants, analysts — can independently read the same ledger.

The Five Fundamental Differences#

1. What happens after a message is consumed?

  • Queue → deleted
  • Stream → stays; multiple consumers read it at their own position

2. How many consumers can independently read?

  • Queue → multiple workers share the queue; each message goes to exactly one worker
  • Stream → unlimited independent consumer groups, each reads the full stream

3. Is ordering a first-class concern?

  • Queue → best-effort in most implementations
  • Stream → ordering is the entire point

4. Command vs Fact?

  • Queue message → a command: { "action": "send_email", "to": "..." } (imperative)
  • Stream event → a fact: { "event": "user.signed_up", "at": "..." } (past tense, immutable)

5. Replay?

  • Queue → cannot replay; once processed, messages are gone forever
  • Stream → reset offset to any point and re-read; stream is the source of truth

Does Persistence Make Something a Stream?#

No. SQS and RabbitMQ both persist to disk — neither is a stream. Persistence in a queue means "survive until one consumer gets it." Persistence in a stream means "the log is the system — read it any time from any position."

Is Kafka a Stream by Default?#

Yes. But your consumption pattern decides whether you're using it as a stream or a queue:

  • Stream mode → multiple independent consumer groups, each reads every event
  • Queue mode → competing consumers in one group, each event processed by exactly one worker

When to Use Each#

Use a Queue when: Message is a command that happens once, work distribution is the goal, no history needed.

Use a Stream when: Event is a fact multiple systems need, multiple teams consume same data, you need history/replay, real-time data pipelines, order of events carries meaning.


Queue: "Here's something to do — someone please handle it." Stream: "Here's something that happened — anyone who cares can read it, now or later."

Last updated: 2026. Kafka 3.x. RabbitMQ 3.12+. SQS: current AWS. Redis 7.x.

Related Posts