Kafka Basics: Tables vs Streams

When consuming topics with Kafka Streams there are two kinds of data you’ll want to work with. One is a stream and one is a table.

Let’s look at some data:

Users

| key | data                       |
| --- | -------------------------- |
| 17  | name: Gail,  color: Green  |
| 201 | name: Oscar, color: Red    |
| 11  | name: Sam,   color: Purple |
| 201 | name: Oscar, color: Orange |

Purchases

| key | data                           |
| --- | ------------------------------ |
| 384 | title: Soap,       price: $7   |
| 385 | title: TV,         price: $500 |
| 386 | title: Basketball, price: $15  |
| 387 | title: Sunglasses, price: $24  |

These look like tables, but don’t be fooled. They are streams. Every time new data is produced for one of these streams, a new record (a key with attached data) is added to the end of the stream.

The data is mostly self explanatory, but I’ll point out that the Users topic has two entries for Oscar where he starts with the color Red and changes it to Orange. This will be used later.

All Data Are Streams

To clear one thing up, all Kafka topics are stored as a stream. The difference is: when we want to consume that topic, we can either consume it as a table or a stream. Let’s look at how they’re different.

Tables

Take the Users topic above. If we want to look at all of our users and their chosen color, we only want to see the latest version of each user and their color. We only want to see Oscar once, with his current color.

This is what the KTable type in Kafka Streams does. It takes a topic stream of records from a topic and reduces it down to unique entries.

Streams

When we want to work with a stream, we grab all records from it. A good example is the Purchases stream above. If we want to see how much money we made, we go through every record in our purchase topic, add up all the profit, and get our number.

This is what the KStream type in Kafka Streams is.

Tables For Nouns, Streams For Verbs

I’ve found it helpful to think of tables as representing nouns (users, documents, flights) and streams as verbs (purchases, edits, movements). This is because with a noun, we mostly want the current state of that noun: the current document or the current flight. But with verbs, we need to see the trail of how we got here: the history of edits to this document or the path this plane took to its destination.

Resources

While they are slightly different, tables are also sometimes called a changelog stream. In truth, everything is a stream and KTables are an abstraction over that stream. Similarlly, streams are sometimes called a record stream and the same abstraction princible applies. You may see this termonology come up when looking into Kafka.

  • https://docs.confluent.io/current/streams/concepts.html