编程知识 cdmana.com

Kafka learning notes: the accumulation of Kafka application problems

0x00 Kafka Profile synchronization

To give kafka Process add for GC Log information , It's convenient to restart later , Join in GC journal : modify bin/kafka-server-start.sh:

export KAFKA_OPTS="-Xms4G -Xmx8G -Xmn3G -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -server -Dlog4j.configuration=file:$base_dir/config/log4j.properties -Xloggc:/data0/kafka/log/gc.log -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime"
  1. Writing script files :syncProperty.sh as follows
. /etc/bashrc
. /etc/profile
echo qwe123 > password.pass
chmod 600 password.pass
sudo chown root:root password.pass
sudo rsync root@10.39.3.75::shellResult/huangqiang/kafka-server-start.sh /usr/local/kafka-0.8.0-beta1-src/bin/kafka-server-start.sh --password-file=password.pass
sudo rsync root@10.39.3.75::shellResult/huangqiang/kafka-server-start.sh /usr/local/kafka-0.8.0-release/bin/kafka-server-start.sh --password-file=password.pass
  1. Upload script files to the synchronized machine :
  • export RSYNC_PASSWORD=qwe123 && rsync kafka-server-start.sh root@10.39.3.75::shellResult/huangqiang/ && rsync syncProperty.sh root@10.39.3.75::shellResult/huangqiang/
  1. On the client command line :
  • export RSYNC_PASSWORD=qwe123 && rsync root@10.39.3.75::shellResult/huangqiang/syncProperty.sh ./ && sh syncProperty.sh

0x01 Kafka Leader There is a problem with the metadata information of the machine NotLeaderForPartitionException

On some machines there are the following error messages :

[2016-10-09 15:00:00,504] WARN [ReplicaFetcherThread--1-17], error for partition [weibo_common_act2,14] to broker 17 (kafka.server.ReplicaFetcherThread)
kafka.common.NotLeaderForPartitionException
        at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:70)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:157)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:157)
        at kafka.utils.Logging$class.warn(Logging.scala:88)
        at kafka.utils.ShutdownableThread.warn(ShutdownableThread.scala:23)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:156)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:112)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
        at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:112)
        at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

broker 17 On the machine sever.log There are the following warnings :

[2016-10-09 15:00:02,111] WARN [KafkaApi-17] Fetch request with correlation id 82105147 from client ReplicaFetcherThread--1-17 on partition [weibo_common_act2,14] failed due to Leader not local for partition [weibo_common_act2,14] on broker 17 (kafka.server.KafkaApis)

analysis :partition [weibo_common_act2,14] Of 2 individual broker Synchronous copies are [8,17].broker 8 I don't think I should be leader 了 , So this error will be thrown . Reboot required broker 8.( That is to restart the partition Where leader Of broker machine )

When will this problem arise Some partition It was originally 2 individual replica, however In Sync Replicas There are 3 individual replica. here , If the partition perform kafka-preferred-replica-election.sh, There's going to be an appeal anomaly . The log is as follows

[2016-10-09 16:38:21,752] INFO [Replica Manager on Broker 17]: Handling LeaderAndIsr request Name:LeaderAndIsrRequest;Version:0;Controller:14;ControllerEpoch:33;CorrelationId:81;ClientId:id_14-host_10.39.4.215-port_19092;PartitionState:(weibo_common_act2,4) -> (LeaderAndIsrInfo:(Leader:8,ISR:17,15,8,LeaderEpoch:21,ControllerEpoch:33),ReplicationFactor:2),AllReplicas:8,17);Leaders:id:8,host:10.39.4.210,port:19092 (kafka.server.ReplicaManager)

0x02 consumer The consumption of offset Move forward

Related blog monitor Kafka Delay in consumption :Burrow

Before the offset is reset, there are several consumer Of rebalance.Rebalance It usually happens in Consumers Leave or join Consumer group, Or new topic Or partition programming can be consumed . stay reblance period ,consumer Pass by in turn :

  • Stop consuming data ;
  • Commit their offsets
  • Span group Reallocate partition
  • Get the offset from the new partition
  • Re consumption data

In the previous print log ,initOffset Your Guild says consumer Where will consumption start .

0x03 kafka.common.NotLeaderForPartitionException

 WARN [ReplicaFetcherThread-3-9], error for partition [ols_test,0] to broker 9 (kafka.server.ReplicaFetcherThread)
kafka.common.NotLeaderForPartitionException
        at sun.reflect.GeneratedConstructorAccessor2.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

analysis :ols_test Of partition 0 leader This machine , Can't get the right partition Information about , from kafka manager Look at this partition Of latest offset yes 0. therefore , It's not normal . The suspicion is that the machine ols_test topic The metadata information of is incorrect . however , View the topic The metadata information of partition The number is and zk coincident , There may be other reasons .

solve : utilize kafka-preferred-replica-election.sh Switch leader after , new leader There are no similar errors with this machine ,Latest Offset Also updated normal .

0x04 maven Compile with scala and java Code Project

mvn clean scala:compile compile package

0x05 gmond Service not available

The phenomenon : Central machine ganglia Service not available , adopt telnet 10.39.4.204 8649 Unable to get data for a long time . After restart , Found another 28 platform kafka The machine cannot send data to the central machine normally . Not until the service is restarted .( The reason remains to be seen )

Restart command :service gmond restart

0x05 Storm Official consumption Kafka-Spout Big delay

        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-kafka</artifactId>
            <version>0.9.3</version>
            <scope>compile</scope>
        </dependency>
[INFO 2016-08-26 10:19:04 s.k.ZkCoordinator:89 kafkaStormSpout:3-MultipleThreadSpoutExecutors] Task [1/1] Deleted partition managers: []
[INFO 2016-08-26 10:19:04 s.k.ZkCoordinator:95 kafkaStormSpout:3-MultipleThreadSpoutExecutors] Task [1/1] New partition managers: []
[INFO 2016-08-26 10:19:04 s.k.ZkCoordinator:106 kafkaStormSpout:3-MultipleThreadSpoutExecutors] Task [1/1] Finished refreshing
[INFO 2016-08-26 10:19:22 c.s.i.k.DelayBolt:69 delayBolt:2-BoltExecutors] >30s|>1min|>2min|>3min|
[INFO 2016-08-26 10:19:22 c.s.i.k.DelayBolt:70 delayBolt:2-BoltExecutors] ---|---|---|---|
[INFO 2016-08-26 10:19:22 c.s.i.k.DelayBolt:71 delayBolt:2-BoltExecutors] 85676|60994|48271|725023|
[INFO 2016-08-26 10:19:22 c.s.i.k.DelayBolt:72 delayBolt:2-BoltExecutors] =======================
[INFO 2016-08-26 10:19:22 c.s.i.k.DelayBolt:73 delayBolt:2-BoltExecutors] average delay:532830 ms, messageCount:1000000.
[ERROR 2016-08-26 10:19:41 o.a.c.ConnectionState:201 CuratorFramework-0] Connection timed out for connection string (10.39.1.66:22181,10.39.1.67:22181,10.39.1.68:22181) and timeout (15000) / elapsed (19049)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [curator-client-2.5.0.jar:na]
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.5.0.jar:na]
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.5.0.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:807) [curator-framework-2.5.0.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:793) [curator-framework-2.5.0.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:57) [curator-framework-2.5.0.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:275) [curator-framework-2.5.0.jar:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_67]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_67]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
[INFO 2016-08-26 10:20:10 s.k.ZkCoordinator:78 kafkaStormSpout:3-MultipleThreadSpoutExecutors] Task [1/1] Refreshing partition manager connections
[INFO 2016-08-26 10:20:10 s.k.DynamicBrokersReader:83 kafkaStormSpout:3-MultipleThreadSpoutExecutors] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=yz48155.hadoop.data.sina.com.cn:19092,..... 23=yz48160.hadoop.data.sina.com.cn:19092}}

The phenomenon : If you throw this exception, you will get close to 20min, No consumption data . Causing a lot of data latency . And my own program consumption Kafka Low latency . Keep coming up FGC, 5s once .

analysis : This anomaly does not lead to Kafka Why data is not consumed @fengchao

0x06 JStorm consumption Kafka topic appear OOM

[ERROR 2016-08-25 11:39:39 c.a.j.t.e.s.SpoutExecutors:178 KAFKA_SPOUT:3-MultipleThreadSpoutExecutors] spout execute error 
java.lang.OutOfMemoryError: PermGen space at
java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_67] at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) ~[na:1.7.0_67]
...

Worker To configure worker.memory.size: 419430400

analysis

  1. View process memory information :jmap -heap $PID
    Attaching to process ID 2543, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 2147483648 (2048.0MB)
   NewSize          = 209715200 (200.0MB)
   MaxNewSize       = 209715200 (200.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 4
   PermSize         = 67108864 (64.0MB)
   MaxPermSize      = 134217728 (128.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 174784512 (166.6875MB)
   used     = 174769048 (166.6727523803711MB)
   free     = 15464 (0.01474761962890625MB)
   99.99115253415589% used
Eden Space:
   capacity = 139853824 (133.375MB)
   used     = 139853824 (133.375MB)
   free     = 0 (0.0MB)
   100.0% used
From Space:
   capacity = 34930688 (33.3125MB)
   used     = 34915224 (33.297752380371094MB)
   free     = 15464 (0.01474761962890625MB)
   99.9557294720333% used
To Space:
   capacity = 34930688 (33.3125MB)
   used     = 0 (0.0MB)
   free     = 34930688 (33.3125MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 1937768448 (1848.0MB)
   used     = 1937768408 (1847.9999618530273MB)
   free     = 40 (3.814697265625E-5MB)
   99.99999793576988% used
Perm Generation:
   capacity = 67108864 (64.0MB)
   used     = 30199864 (28.80083465576172MB)
   free     = 36909000 (35.19916534423828MB)
   45.001304149627686% used

7935 interned Strings occupying 854144 bytes.

The information above is modified storm.yaml Parameters of worker.childopts after , Printed information .

worker.childopts: "-Xms1g -Xmx1g -Xmn372m -XX:PermSize=64M -XX:MaxPermSize=64M -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=5 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=85"

The reason for this is Perm Generation The default is 24M, When it starts ,Perm Generation Reached 99.9%,yingyong Is not normal , No data generated . The solution is to increase Perm District .

0x07 Snappy-java fails on Mac OS JDK 1.7

Self encapsulated kafka consumer stay mac Error reported by local operation , It makes it impossible to consume data , Throw an exception .

solve : Reduce jdk To 1.6

0x08 topic Delay time

EA_EXPOSURE:1000001

30s

1min

2min

3min

507758

25978

0

0

0x09 kafka topic Estimate the size of the log

  • find topic partition It's a machine
  • ls /data0/kafka/data* Find the query topic, To a partition Calculate , Estimate the total amount of topic The amount of

0x1A kafka consumption topic Too much , Too much export traffic , Lead to kafka proxy Not working properly

analysis How to go from consumer group seek ols Program , And find the person in charge , Inform them to rectify .

0x1B Druid There is a time when you can't spend topic:wb_ad_druid_analysis,consumer group id:druid-2.

2016-07-21T12:48:02,533 WARN [druid-2_yz2138.hadoop.data.sina.com.cn-1465730148608-f3c110a0-leader-finder-thread] kafka.client.ClientUtils$ - Fetching topic metadata with correlation id 5439 for topics [Set(wb_ad_druid_analysis)] from broker [id:48152,host:yz48152.hadoop.data.sina.com.cn,port:19092] failed
java.lang.ArrayIndexOutOfBoundsException: 13
        at kafka.api.TopicMetadata$$anonfun$readFrom$1.apply$mcVI$sp(TopicMetadata.scala:38) ~[kafka_2.10-0.8.2.1.jar:?]
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) ~[scala-library-2.10.4.jar:?]
        at kafka.api.TopicMetadata$.readFrom(TopicMetadata.scala:36) ~[kafka_2.10-0.8.2.1.jar:?]
        at kafka.api.TopicMetadataResponse$$anonfun$3.apply(TopicMetadataResponse.scala:31) ~[kafka_2.10-0.8.2.1.jar:?]
        at kafka.api.TopicMetadataResponse$$anonfun$3.apply(TopicMetadataResponse.scala:31) ~[kafka_2.10-0.8.2.1.jar:?]
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.4.jar:?]
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.4.jar:?]
        at scala.collection.immutable.Range.foreach(Range.scala:141) ~[scala-library-2.10.4.jar:?]
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.4.jar:?]
        at scala.collection.AbstractTraversable.map(Traversable.scala:105) ~[scala-library-2.10.4.jar:?]
        at kafka.api.TopicMetadataResponse$.readFrom(TopicMetadataResponse.scala:31) ~[kafka_2.10-0.8.2.1.jar:?]
        at kafka.producer.SyncProducer.send(SyncProducer.scala:114) ~[kafka_2.10-0.8.2.1.jar:?]
        at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58) [kafka_2.10-0.8.2.1.jar:?]
        at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93) [kafka_2.10-0.8.2.1.jar:?]
        at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66) [kafka_2.10-0.8.2.1.jar:?]
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) [kafka_2.10-0.8.2.1.jar:?]

analysis

We kafka The cluster version of is kafka-0.8.0-beta1 edition ,druid Currently used kafka_2.10-0.8.2.1.jar Version inconsistency , Please change for client edition .

0x1C OLS Program consumption Topic, Low efficiency

process In the method , Yes string.match Method . It's essentially a method that calls regular expressions ,compile It's time consuming , It should be separated process In the method .

Jstack Focus on RUNNABLE Threads .

  • Optitions: -l long listing. Prints additional information about locks. eg: jstack -l $pid

0x1D Kafka Consumed Topic Of Consumer Instance Owner is None,Rebalence Failure

1. The phenomenon topic weibo_common_act2 By consumer clientSearchBhvGp consumption .

2016-06-23 15:52:31,473 ERROR kafka.consumer.ZookeeperConsumerConnector: [clientSearchBhvGp_yz4834.hadoop.data.sina.com.cn-1466668272656-90a8bbdc], error during syncedRebalance
kafka.common.ConsumerRebalanceFailedException: clientSearchBhvGp_yz4834.hadoop.data.sina.com.cn-1466668272656-90a8bbdc can't rebalance after 4 retries
        at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:397)
        at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:326)

4 Time Rebalance After failure , This process takes up 6 individual Partition, Cause this 6 individual Partition Can't be consumed .

jstack The information is as follows , Waiting for a hand lock, But there are no deadlocks . Waiting to be assigned partition consumption .

"in1 Fetch thread" daemon prio=10 tid=0x00007f564c866800 nid=0xbe85 waiting on condition [0x00007f5641015000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000b1fb92f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
        at com.sina.ols.apu.connector.impl.kafka.KafkaInConnector.fetch(KafkaInConnector.java:107)
        at com.sina.ols.apu.connector.AbstractInConnector$Fetch.run(AbstractInConnector.java:121)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

"pool-3-thread-6" prio=10 tid=0x00007f564c865000 nid=0xbe84 waiting on condition [0x00007f5641116000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000b5d4f138> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:63)
        at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
        at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:61)
        at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:53)
        at com.sina.ols.apu.connector.impl.kafka.KafkaInConnector$KafkaConsumerWorker.run(KafkaInConnector.java:136)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

netstat -nalp|grep 48326 Output , Show and 6 individual broker It establishes the connection .

tcp        0      0 ::ffff:10.39.48.34:36474    ::ffff:10.39.4.203:19092    ESTABLISHED 48326/java
tcp        0      0 ::ffff:10.39.48.34:43536    ::ffff:10.39.4.208:19092    ESTABLISHED 48326/java
tcp        0      0 ::ffff:10.39.48.34:50777    ::ffff:10.39.4.211:19092    ESTABLISHED 48326/java
tcp        0      0 ::ffff:10.39.48.34:50027    ::ffff:10.39.4.207:19092    ESTABLISHED 48326/java
tcp        0      0 ::ffff:10.39.48.34:48512    ::ffff:10.39.1.69:22181     ESTABLISHED 48326/java
tcp        0      0 ::ffff:10.39.48.34:58868    ::ffff:10.39.48.34:34070    ESTABLISHED 48326/java
tcp        0      0 ::ffff:10.39.48.34:41300    ::ffff:10.39.4.202:19092    ESTABLISHED 48326/java
tcp        0      0 ::ffff:10.39.48.34:37169    ::ffff:10.39.4.206:19092    ESTABLISHED 48326/java

2. analysis

rebalance Retried sleep Time :kafka/consumer/ZookeeperConsumerConnector.scala:393

  • "rebalance.backoff.ms","zookeeper.sync.time.ms", 2000

rebalance The number of retries exceeded 4 Time ,syncedRebalance What was thrown was RuntimeException, In the following code process , Catch this exception , Just record here ERROR.

  • kafka/consumer/ZookeeperConsumerConnector.scala:328, The right thing to do is to capture RunTimeException abnormal , adopt exit(-1) Give Way JVM The process exits . about OLS The program will make it , Restart one Container Continue operation .

3. solve

  • Increase the time to try again :"rebalance.backoff.ms=5000"
  • enlarge retry: "rebalance.max.retries=10"
  • Capture "ConsumerRebalanceFailedException", Exit procedure .

4.OLS The way the program is modified

The user modifies the program 2 A step

modify pom.xml Of OLS_Yarn Rely on 0.2.2.2

The submitted workflow.xml Adding

0x1E Storm consumption kafka stay /consumers/onlineGroupId_rtups/owners/clickstream/ Nodes are often lost and rebuilt

analysis

storm The cluster itself is heavily loaded , Cause and zookeeper Connection timeout for , enlarge zookeeper.session.time.out, Can alleviate this problem , But there is no fundamental solution .

Strange place : Set up zookeeper.session.time.out=30 when ,zk The loss and reconstruction time of the node 9s、24s、43s etc. . The reason remains to be found out TODO 20116-8-12

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

版权声明
本文为[Jetpropelledsnake21]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201225111921911o.html

Scroll to Top