Whether it's a traditional system based on relational database design , Or distributed, like
HDFS wait , The way to achieve high availability is usually to use redundant design , Redundancy is used to solve the problem of node downtime and unavailability .
First of all, a brief understanding Kafka A few concepts ：
- The physical model
- logical model
Broker（ node ）：Kafka Service node , To put it simply, one
BrokerIt's just one Kafka The server , A physical node .
Topic（ The theme ）： stay Kafka Messages are classified in terms of topics , Every topic has a
Topic Name, Producer according to Topic Name Send a message to a specific Topic, Consumers are also based on Topic Name From the corresponding Topic Consumption .
Partition（ Partition ）：
Topic（ The theme ） It's a unit of message classification , But each theme can be subdivided into one or more
Partition（ Partition ）, A partition can only belong to one topic . Topics and partitions are logical concepts , for instance , news 1 And news 2 All sent to the subject 1, They can go into the same partition, they can go into different partitions （ So different partitions under the same topic contain different messages ）, Then it will be sent to the corresponding Broker Node .
Offset（ Offset ）： A partition can be seen as a queue that can't get in or out （Kafka Only ensure that the messages in a partition are ordered ）, Messages are appended to the end of the queue , Every message will have an offset after it enters the partition , Identify the location of the message in the partition , If the consumer wants to consume the message, he will identify it by offset .
Actually , According to the above concepts , Did you guess Kafka The redundant design of multiple copies is realized ？ Don't worry. , Let's move on .
stay Kafka 0.8 Version before , There is no multi copy redundancy mechanism , Once a node fails , So all the nodes on this node
Partition You can't consume your data anymore . It's like sending to Topic Part of the data is lost .
stay 0.8 After the release, the introduction of a duplicate reporter can solve the problem of data loss after downtime . The copy is made with
Topic Each of them
Partition The unit of data is , Every Partition The data will be synchronized to other physical nodes , Make multiple copies .
Partition All of the copies include a
Leader Copies and multiple
Follower copy ,Leader From all the copies elected together , The other copies are Follower copy . When producers write or consumers read , Will only be with Leader Dealing with , After writing data Follower Will pull the data for data synchronization .
It's that simple ？ Yes , Based on the above multi copy architecture diagram, it is realized Kafka High availability . When a
Broker Hang up , Don't worry , This
Partition Among others
Broker There are also copies on the node . You said if it was
Leader What do I do ？ Then in
Follower In the election of a
Leader that will do , Producers and consumers can interact with new
Leader Have fun , This is high availability .
You may have questions , How many copies is enough ？Follower and Leader What if there's no full synchronization between them ？ When a node goes down Leader What are the rules of the election ？
Jump to conclusions :
How many copies are enough ？ The more copies there are, the more guaranteed Kafka High availability , But more copies means the Internet 、 More disk resources are consumed , There will be a drop in performance , Generally speaking, the number of copies is 3 High availability , In extreme cases
replication-factor Just increase the parameter .
Follower and Lead What if there's no full synchronization between them ？ Follower and Leader It's not exactly in sync , But it's not completely asynchronous either , It's a kind of
ISR Mechanism （
In-Sync Replica）. Every Leader Will dynamically maintain a ISR list , In this list are and Leader Basically synchronous Follower. If there is Follower Because of the Internet 、GC And so on Leader Initiate pull data request , here Follower be relative to Leader It's out of sync , You'll be kicked out ISR list . So ,ISR In the list Follower They can keep up with Leader Copy of .
When a node goes down Leader What are the rules of the election ？ There are a lot of distributed election rules , image Zookeeper Of
Viewstamped Replication、 Microsoft
PacificA etc. . and Kafka Of Leader The idea of the election is simple , Based on what we mentioned above
ISR list , When it goes down, it will be searched sequentially from all copies , If you find a copy in ISR In the list , He was elected Leader. In addition, we need to ensure that the predecessor Leader It's already abdicated , Otherwise, there will be brain fissure （ There are two Leader）. How to guarantee ？Kafka By setting up a controller To make sure there's only one Leader.
in addition , Here is an interview test Kafka High availability essential knowledge points ：
request.required.asks Parameters .
Asks This parameter is an important configuration of the producer client , This parameter can be set when sending a message . This parameter has three values to configure ：0、1、All.
The first is set to 0, It means that after the producer sends the message , After that, we don't care whether the news is dead or alive , It's a little forgetful , I'm not responsible if I say it . If you don't take responsibility for nature, the information may be lost , That's the loss of usability .
The second is set to 1, It means that after the producer sends the message , As long as the news gets through to Leader, other Follower It doesn't matter if there's synchronization . There is a situation ,Leader I just got the news ,Follower I haven't had time to synchronize Broker It's down. , But the producer already thinks the message was sent successfully , Then the message is lost . Be careful , Set to 1 yes Kafka Default configuration ！！！ so Kafka The default configuration of is not so highly available , It's a trade-off between high availability and high throughput .
The third is set to All（ perhaps -1）, It means that after the producer sends the message , Not only Leader To receive ,ISR In the list Follower Also synchronize to , The producer will send the task message successfully .
Further reflection ,
Asks=All Will there be no loss of messages ？ Is the answer . When ISR The list is just Leader Under the circumstances ,
Asks=All amount to
Asks=1, In this case, if the node goes down , Is there any guarantee that the data will not be lost ？ So only in
Asks=All And there are ISR Only when there are two copies in the database can the data not be lost .
It's a big circle , I understand Kafka High Availability Mechanism of , Finally back to the problem we started with ,
Kafka Why is one of the nodes not available after it goes down ？
I'm developing a test environment configuration
Broker The number of nodes is 3,
Topic Yes, the number of copies is 3,
Partition The number of 6,
Asks Parameter is 1.
When one of the three nodes goes down , What the cluster will do first ？ you 're right , As we said above , The cluster found that Partition Of Leader It doesn't work , This is the time to start from ISR Re elect from the list Leader. If ISR If the list is empty, it will not be available ？ It's not , But from Partition Choose one of the surviving copies as Leader, However, there is a potential risk of data loss .
therefore , As long as Topic The number of copies is set to and Broker The number is the same ,Kafka High availability can be guaranteed by the redundant design of multiple copies , There won't be a situation where you can't use it once it's down （ But here's the thing Kafka There is a protection strategy , When more than half of the nodes are unavailable Kafka Will stop ）. Think about it ,Kafka Is there any number of copies on this page 1 Of Topic？
Here's the problem
__consumer_offset On ,
__consumer_offset It's a Kafka Automatically created
Topic, It's used to store what consumers consume
offset（ Offset ） Information , Default
Partition The number of 50. And that's it Topic, Its default number of copies is 1. If all
Partition All on the same machine , That's the obvious single point of failure ！ When will store
__consumer_offset Of Partition Of Broker to Kill after , You'll find that all consumers have stopped consuming .
How to solve this problem ？
The first point , Need to put
__consumer_offset Delete , Pay attention to this Topic when Kafka Built in Topic, Cannot delete with command , I'm going through
logs Delete to delete .
Second point , It needs to be set by
offsets.topic.replication.factor by 3 to
__consumer_offset Change the number of copies to 3.
By way of
__consumer_offset We also do replica redundancy to solve the consumption problem of consumers after a node goes down .
Last , About why
__consumer_offset Of Partition Will appear only stored in one Broker It's all over the world, not all over the world Broker I feel confused in my mind , If there is a friend to understand, please advise ~
No matter which company , They all attach great importance to it Spring Frame technology , Pay attention to the foundation , So don't underestimate any knowledge . Interview is a two-way choice process , Don't be afraid to interview , Not conducive to their own play .
At the same time, we should not only focus on salary , It depends on whether you really like the company , Well, I hope this article is helpful to you ！
Partial screenshots ：
本文为[Programmer Xia ya]所创，转载请带上原文链接，感谢