编程知识 cdmana.com

Kafka entry to mastery

Link to the original text :https://blog.csdn.net/qq_43323585/article/details/105824989

Kafka An overview of

kafka What is it?

Kafka yes Apache Open source stream processing platform , The platform provides the subscription and publication of messages . With high throughput 、 Simple 、 Easy to deploy and so on .

Kafka What for?

  1. Message queue : For system decoupling 、 asynchronous communication 、 Flow, valley filling, etc .
  2. Kaka Streaming: Real time online streaming .

Message queuing working mode

Two working modes of message queuing :1. At most once 2. There is no limit to . Pictured :
 Insert picture description here
that Kafka How it works ? continue

Kafka Architecture and concepts

Some concepts still need to know , Coding can be used and can help understand ! Finally, I'll attach my git Address , I want to talk about kafka Business with my code base can be basically done

Kafka The production and consumption of news

Kafka With Topic Formal management of messages in cluster classification (Record). Every Record Belong to a Topic And each Topic There are multiple partitions (partition) Deposit Record, Each partition corresponds to a server (Broker), This Broker go by the name of leader, The partition copy corresponds to Broker Become follower.** It should be noted that only leader Can read and write .** you 're right , It's easy for us to think of zookeeper, The so-called only distributed consistency algorithm so far Paxos, This is also for Kafka The reliability of the data provides a guarantee . It may be a little abstract , Look at the picture :
 Insert picture description here  Insert picture description here

Partitions and logs

The journal is log, It's just data , It's kind of like redis Inside log. It's not what we call print logs , ha-ha
3. Every Topic Can be subscribed by multiple consumers ,Kafka adopt Topic management partiton.
4. Consumers can poll , load ( Yes Record Of key modulus ) The way to Record Deposit in partition in .
5. partition It's an ordered and immutable sequence of logs , Every Record There is only offset, Used to record consumption 、 Support persistence strategy .
6. kafka The default configuration log.retention.hours=168, Whether the message is consumed or not Record Can be preserved 168 Hours – Hard disk storage , Specific persistence policies can be customized through configuration files .
7. partiton Internal order , In the case of multiple partitions , Don't expect your husband to consume first . Write business and code to pay attention to
Why? Kafka Give up order , The local order is adopted ?
It's like hadoop The same as , Distributed cluster , Breaking physical limitations , Both the performance capacity and concurrency have been improved qualitatively , One can't make a hundred . After all Kafka It's a big data framework .

Consumer groups

Concept : It's a logical consumer , There are multiple consumer instances . If 4 Consumers subscribe at the same time topic1(4 Zones ), Then a partition will be consumed 4 Time . The introduction of consumption group can avoid repeated consumption .
code : Consumers can use subscribe and assign, use subscribe subscribe topic Consumer group must be specified !
 Insert picture description here

Kafka High performance way

Why? Kafka The throughput is so high

Kafka It can easily support millions of write requests , And the data will persist to the hard disk , Terror . Now think about it , A high-performance technology is an encapsulation of the kernel , such as Redis Called at the bottom epoll(), The most powerful is big brother OS.

Sequential writing 、mmap

Sequential writing : The hard disk is a mechanical structure , Addressing is an extremely time-consuming mechanical action , therefore Kafka In order IO, It avoids a lot of memory overhead and IO Addressing time .
mmap:Memory Mapped Files Memory mapped files , It works by using the operating system PageCache Realize the direct mapping from file to physical memory , It's written directly to pagecache in , All user operations on memory will be automatically refreshed to the disk by the operating system , Greatly reduced IO Usage rate .mmap It is equivalent to that a user mode can directly access the kernel mode shared space , The switch from user state to inner core state and copy.
 Insert picture description here

ZeroCopy

Kafka When the server responds to the client reading , Bottom use ZeroCopy technology , You don't need to copy the disk to user space , Instead, the data is transmitted directly through the kernel space , Data doesn't reach user space .
IO The model will not be repeated here , I'll write it later .
 Insert picture description here

build Kafka colony

Please look at what I wrote zookeeper and kafka Build two articles , Reference resources :Kafka Cluster building

Topic management

  • bootstrap-server: consumer , Pull data , The old version with -zookeeper This parameter . And a lot of data goes into zk It's in , This is very unreasonable .
  • broker-list: producer , Push data
  • partitions: Number of divisions
  • replication-factor : Partition replica factor

establish topic

bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --create --partitions 3 --replication-factor 2 --topic debug
  • 1

See the list of topics

bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --list
  • 1

See topic details

bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --describe --topic debug
  • 1

Delete topic

bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --delete --topic debug
  • 1

Produce a message on a topic

bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic debug
  • 1

Consume news on a topic

bin/kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --topic debug --from-beginning
  • 1
bin/kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --topic debug --group group1
  • 1

These commands are still very useful ! It can help us test .

Check out the consumer group

bin/kafka-consumer-groups.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --list
  • 1

Kafka API

The information comes from the examples of reference books , Very comprehensive , It has been uploaded to GIt, I haven't had time to sort it out yet , But I'll sort it out later , If you think it's good, give a star ! Include : Production and consumption , Custom partition , serialize , Interceptor , Business ,kafka Stream processing, etc .
Git Address :Kafka API

Kafka characteristic

acks、retries Mechanism

acks Response mechanism :

  1. acks=1,leader Write a successful , Don't wait for follower Confirm to return . There is no guarantee of a single point of failure
  2. acks=0, Send to socket cache , Confirm to return . The message security factor is the lowest
  3. acks=-1,leader At least one follower After answering, confirm to return . High availability cluster
    retries Retry mechanism :
    request.timeout.ms=3000 Default timeout retries .
    retries=2147483647xxx Retry count .
    Kafka Message semantics : Messages can be saved at least once

Idempotent writing

Idempotency : The result of multiple requests is consistent with that of one request .
Kafka Idempotent writing solutions :
 Insert picture description here
For some business idempotent problems, we can learn from . To really solve idempotency :{ The blacklist 、 Signature 、token}
It should be noted that
enable.idempotence=false Idempotent is turned off by default
Prerequisite :retries=true;acks=all

Business

because Kafka Idempotent writing does not provide spanning multiple Partition And guarantees in cross session scenarios , therefore , We need a stronger transaction assurance , Can handle multiple atoms Partition Write operation of , Either all the data is written successfully , All or nothing , This is it. Kafka Transactions, namely Kafka Business .
producer Provides initTransactions,beginTransaction,sendOffsetsToTransaction,commitTransaction,abortTransaction Five transaction methods .
consumer Provides read_committed and read_uncommitted.

KafkaEagle monitoring software

Open source address :https://github.com/smartloli/kafka-eagle
Including when I started to learn Kafka He wrote his book, too .

setup script

  1. download kafka-eagle-bin-1.4.0.tar.gz And extract the
  2. mv kafka-eagle-bin-1.4.0 /opt/
  3. After decompressing, there is a compressed package inside , Decompress again
  4. Modify environment variables
vim /etc/profile
------------------------
# kafka-eagle
export KE_HOME=/opt/kafka-eagle
# PATH
export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$KE_HOME/bin
------------------------
source /etc/profile 
echo $PATH
OK~
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  1. modify kafka-eagle To configure
[root@node01 conf]# vim system-config.properties
-----------------------------------------------------
kafka.eagle.zk.cluster.alias=cluster1
cluster1.zk.list=node01:2181,node02:2181,node03:2181
#cluster2.zk.list=xdn10:2181,xdn11:2181,xdn12:2181
cluster1.kafka.eagle.offset.storage=kafka
#cluster2.kafka.eagle.offset.storage=zk
kafka.eagle.metrics.charts=true
cluster1.kafka.eagle.sasl.enable=false
cluster1.kafka.eagle.sasl.protocol=SASL_PLAINTEXT
cluster1.kafka.eagle.sasl.mechanism=SCRAM-SHA-256
cluster1.kafka.eagle.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="kafka" password="kafka-eagle";
cluster1.kafka.eagle.sasl.client.id=

#cluster2.kafka.eagle.sasl.enable=false
#cluster2.kafka.eagle.sasl.protocol=SASL_PLAINTEXT
#cluster2.kafka.eagle.sasl.mechanism=PLAIN
#cluster2.kafka.eagle.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="kafka" password="kafka-eagle";
#cluster2.kafka.eagle.sasl.client.id=
######################################
# kafka sqlite jdbc driver address
######################################
#kafka.eagle.driver=org.sqlite.JDBC
#kafka.eagle.url=jdbc:sqlite:/hadoop/kafka-eagle/db/ke.db
#kafka.eagle.username=root
#kafka.eagle.password=www.kafka-eagle.org

######################################
# kafka mysql jdbc driver address
######################################
kafka.eagle.driver=com.mysql.jdbc.Driver
kafka.eagle.url=jdbc:mysql://127.0.0.1:3306/ke?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull
kafka.eagle.username=root
kafka.eagle.password=123456
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  1. modify kafka A launch configuration , Turn on JMS
vim kafka-server-start.sh
-------------------------------------
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
 export JMX_PORT="7379"
 export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
fi
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  1. Finally start , Found no startup permissions
chmod u+x ke.sh
./ke.sh
  • 1
  • 2
Version 1.4.0
*******************************************************************
* Kafka Eagle Service has started success.
* Welcome, Now you can visit 'http://192.168.83.11:8048/ke'
* Account:admin ,Password:123456
*******************************************************************
* <Usage> ke.sh [start|status|stop|restart|stats] </Usage>
* <Usage> https://www.kafka-eagle.org/ </Usage>
*******************************************************************
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

summary

Only this and nothing more ,3 platform zk+3 platform kafka+kafka-eagle Monitoring can do a lot of things . such as : Log collection 、 The messaging system 、 Activity tracking 、 Operational indicators 、 Stream processing, etc . I hope I can give some help to the people I see !
Supporting documents :
kafka colony : link
zk colony : link

版权声明
本文为[Fengshuwan River Bridge]所创,转载请带上原文链接,感谢

Scroll to Top