When consuming topics with Kafka Streams there are two kinds of data you’ll want to work with. One is a stream and one is a table.
Let’s look at some data:
Users
| key | data |
| --- | -------------------------- |
| 17 | name: Gail, color: Green |
| 201 | name: Oscar, color: Red |
| 11 | name: Sam, color: Purple |
| 201 | name: Oscar, color: Orange |
Purchases
| key | data |
| --- | ------------------------------ |
| 384 | title: Soap, price: $7 |
| 385 | title: TV, price: $500 |
| 386 | title: Basketball, price: $15 |
| 387 | title: Sunglasses, price: $24 |
These look like tables,
but don’t be fooled.
They are streams.
Every time new data is produced for one of these streams,
a new record
(a key
with attached data
)
is added to the end of the stream.
The data is mostly self explanatory,
but I’ll point out that the Users topic has two entries for Oscar
where he starts with the color Red
and changes it to Orange
.
This will be used later.
All Data Are Streams
To clear one thing up, all Kafka topics are stored as a stream. The difference is: when we want to consume that topic, we can either consume it as a table or a stream. Let’s look at how they’re different.
Tables
Take the Users topic above.
If we want to look at all of our users
and their chosen color,
we only want to see the latest version of each user
and their color.
We only want to see Oscar
once,
with his current color
.
This is what the KTable type in Kafka Streams does. It takes a topic stream of records from a topic and reduces it down to unique entries.
Streams
When we want to work with a stream, we grab all records from it. A good example is the Purchases stream above. If we want to see how much money we made, we go through every record in our purchase topic, add up all the profit, and get our number.
This is what the KStream type in Kafka Streams is.
Tables For Nouns, Streams For Verbs
I’ve found it helpful to think of tables as representing nouns (users, songs, cars) and streams as verbs (buys, plays, drives). This is because with a noun, we mostly want the current state of that noun: the current document or the current flight. But with verbs, we need to see the trail of how we got here: the history of edits to this document or the path this plane took to its destination.
Resources
While they are slightly different,
tables are also sometimes called a changelog stream
.
In truth, everything is a stream
and KTables are an abstraction over that stream.
Similarlly, streams are sometimes called a record stream
and the same abstraction princible applies.
You may see this termonology come up when looking into Kafka.
- https://docs.confluent.io/current/streams/concepts.html