When consuming topics with Kafka Streams there are two kinds of data you’ll want to work with. One is a stream and one is a table.
Let’s look at some data:
| key | data | | --- | -------------------------- | | 17 | name: Gail, color: Green | | 201 | name: Oscar, color: Red | | 11 | name: Sam, color: Purple | | 201 | name: Oscar, color: Orange |
| key | data | | --- | ------------------------------ | | 384 | title: Soap, price: $7 | | 385 | title: TV, price: $500 | | 386 | title: Basketball, price: $15 | | 387 | title: Sunglasses, price: $24 |
These look like tables,
but don’t be fooled.
They are streams.
Every time new data is produced for one of these streams,
a new record
key with attached
is added to the end of the stream.
The data is mostly self explanatory,
but I’ll point out that the Users topic has two entries for
where he starts with the color
and changes it to
This will be used later.
To clear one thing up, all Kafka topics are stored as a stream. The difference is: when we want to consume that topic, we can either consume it as a table or a stream. Let’s look at how they’re different.
Take the Users topic above.
If we want to look at all of our users
and their chosen color,
we only want to see the latest version of each user
and their color.
We only want to see
with his current
This is what the KTable type in Kafka Streams does. It takes a topic stream of records from a topic and reduces it down to unique entries.
When we want to work with a stream, we grab all records from it. A good example is the Purchases stream above. If we want to see how much money we made, we go through every record in our purchase topic, add up all the profit, and get our number.
This is what the KStream type in Kafka Streams is.
I’ve found it helpful to think of tables as representing nouns (users, songs, cars) and streams as verbs (buys, plays, drives). This is because with a noun, we mostly want the current state of that noun: the current document or the current flight. But with verbs, we need to see the trail of how we got here: the history of edits to this document or the path this plane took to its destination.
While they are slightly different,
tables are also sometimes called a
In truth, everything is a stream
and KTables are an abstraction over that stream.
Similarlly, streams are sometimes called a
and the same abstraction princible applies.
You may see this termonology come up when looking into Kafka.