Publish / Subscribe
Like a queue, but the same message goes to many consumers. Each does its own thing with it.
Queue vs Pub/Sub. One to one vs one to many
A queue is one to one. A producer drops a message in. Exactly one worker picks it up.
Pub/Sub (publish and subscribe) is one to many. A publisher drops a message into a topic. Every subscriber gets its own copy.
Each subscriber processes the message on its own, for its own purpose.
Picture this. A user signs up. A "user.signed-up" event is published. The subscribers are:
Email service. Sends a welcome email. Analytics service. Bumps the signup counter. Recommendations service. Sets up the user's profile. CRM service. Creates a lead.
The publisher (the signup code) does not know or care who is listening. New consumers can be added without touching the publisher.
Why this separation is huge
In the world before pub/sub, the signup handler had to call every downstream service directly.
sendWelcomeEmail(); incrementSignupCounter(); createCrmLead(); ...
Every new downstream service means changing the signup code. The signup handler picks up more and more dependencies. If any one of those calls is slow or down, signups slow down or fail.
With pub/sub, the signup handler does one thing. Publish the event. Done. The subscribers live on their own. New ones can be added with zero code changes upstream.
This is the base of event-driven architecture. Systems talk by publishing what happened, not by calling each other directly.
Common implementations
Kafka is the most popular pub/sub system in big tech. Persistent. Replayable. Partitioned. Subscribers can rewind and read old events. Built for high throughput.
AWS SNS is simple and managed. Topics. Subscribers. Done. Often paired with SQS to do fan-out into queues.
Google Pub/Sub is managed and similar to SNS.
Redis Pub/Sub is lightweight and in-memory. Fast, but messages are lost if no subscriber is listening when one is published.
NATS is similar to Redis but built as a real messaging system.
Pick based on what durability you need. For one-off notifications, Redis is fine. For a replayable event log, use Kafka. For simple managed fan-out, use SNS.
Things to watch out for
Subscriber lag. If a subscriber is slow, it falls behind. Kafka lets you watch lag per consumer group. If the lag keeps growing, you have a problem.
Duplicate delivery. Most pub/sub systems offer at-least-once delivery. The same event can reach the same subscriber twice. Your subscribers must be idempotent.
Ordering across topics. Events from different topics are not ordered with respect to each other. If you publish "user created" and "order placed," a subscriber might see them in either order. Design for it.
Hard to debug. In a normal request-response system, you can trace one call through the services. In an event-driven system, you have to follow the events through many subscribers. Good distributed tracing is a must.
Pub/sub is powerful. It is also more operational work than direct calls. Use it where the separation buys you something real.