Kafka It's a distributed distribution - Subscribe to the messaging system . It was originally made by LinkedIn Companies to develop , Then become Apache Part of the project .Kafka It's a distributed one , Divisible , Persistent log service for redundant backups . It is mainly used to process active streaming data .
Similar to JMS Characteristics of , But it's totally different in design and implementation .
JMS（Java Message Service,Java Message service ）API It's a set of messaging standards , Allow application components based on JavaEE Platform creation 、 send out 、 Receive and read messages .
Asynchronous communication between systems , Reduce the coupling between systems .
Supports two message models
- Point to point or queue model
- Publisher / Subscriber model
Kafka Overall framework
Topic：Kafka Seed the message (Feed) Be arranged , Each type of message is called a topic .
Producer： The object that publishes the message is called the subject producer .
Consumer： An object that subscribes to a message and processes the seed of a published message is called a topic consumer .
Broker： Published messages are saved in a set of servers , be called Kafka colony . Each server in the cluster is a proxy (Broker). Consumers can subscribe to one or more topics and from Broker Pull data , To consume these published messages .
Partition：Topic Physical grouping , One topic Can be divided into multiple partition, Every partition It's an ordered queue .partition Each message is assigned an ordered one id（offset）.
For each of these Topic,Kafka The cluster maintains this partition's log, Messages are stored in log files ;
Each partition is sequential 、 Immutable message queue , And you can keep adding . Messages in the partition are assigned a sequence number , Called offset (offset), This offset is unique in each partition ;
Kafka The cluster will keep the message for a period of time （ Configurable ）, Whether the message is consumed or not , Delete after expiration , Consumers only hold the offset of the message .
Offset has consumer control , Consumers can reset the offset to an older offset , Reread the message , So one consumer's actions don't affect other consumers log To deal with .
Partition is a physical concept , stay Broker In the form of a catalog , Each partition contains multiple segments (Segment), Every Segment Corresponding to a log file , The file is named message's offset.
Topic The partitions are distributed to multiple servers in the cluster , Each server processes the partition it is assigned to .
According to the configuration, multiple backups can be set for each partition （replicas）, Copy to other servers as backup fault tolerance .
Each partition has one leader, Zero or more follower.Leader Handle all read and write requests for this partition , and follower Passively copying data . If leader Downtime , The other one follower Will be promoted as new leader.
leader Responsible for tracking all follower state , If follower“ backward ” Too much or failure ,leader Will take it from replicas Delete from the sync list . When all follower Each saved a message successfully , This message is considered to be “committed”, So at this time consumer To consume it .
A server may be a partition at the same time leader, Another partition of follower. This can balance the load , Avoid all requests being processed by only one or several servers .
The producer goes to some Topic Post message on , At the same time, the producer is also responsible for choosing to publish to Topic Which partition on （ To the partition Leader）;
Message sending strategy is divided into synchronization 、 There are two kinds of asynchrony ;
Which message is routed to partition On , By producer The client decides . For example, the client uses Random,Hash And RoundRobin Polling, etc ;
Generally speaking , There are two types of message models , Queues and releases - Subscription .Kafka Provides a single consumer abstract model for both models ： Consumer group （consumer group）, Consumers label themselves with a consumer group name .
One was posted in Topic The message is distributed to a consumer in this consumer group . If all the consumers are in one group , So this becomes queue Model ; If all the consumers are in different groups , So it's all about Publishing - Subscription model .
We can create groups of consumers as logical subscribers . Each group contains a different number of consumers , Multiple consumers within a group can be used to extend performance and fault tolerance .
Kafka The design of the
High throughput , High availability is the core design goal .
Data disk persistence ： The message is not in memory cache, Write directly to disk , Make full use of the disk's sequential read-write performance .
zero-copy： Reduce IO Operation steps （ disk -> kernel -> network card ）.
Support bulk data sending and pulling .
Support data compression .
Topic Divided into multiple partition, Improve parallel processing capability .
Horizontal expansion .
Message queue comparison
Applicable scenario ：
- User behavior data analysis
- Operational monitoring
- Log collection
- The messaging system
Not applicable to the scene ：
- Need something to support ;
- Strict order consumption .
Kafka In risk control