编程知识 cdmana.com

Zookeeper of Hadoop big data series

Zookeeper It's a kind of distributed , Open source , Collaborative services for distributed applications . It provides some simple operations , It enables distributed applications to realize synchronization based on these interfaces 、 Configure maintenance and diversity groups or named Services .Zookeeper It's easy to program access to , It uses a data model similar to the file tree structure . have access to Java perhaps C To program access .


as everyone knows , It is difficult for distributed system collaboration services to have satisfying products . These collaborative service products are easy to fall into some traps such as competitive selection conditions or deadlock .Zookeeper The purpose of this paper is to make distributed services no longer need to implement collaborative services due to collaboration conflicts .


Content of this chapter :

1) Zookeeper Data model

2) Zookeeper Access control

3) Zookeeper Application scenarios


1. Zookeeper Data model

ZooKeeper Have a hierarchical namespace , This is very similar to the standard file system

Hadoop Big data combat series Zookeeper


As we can see from the picture ZooKeeper Data model of , Very similar in structure to the standard file system , It's all in this tree structure ,ZooKeeper Each node in the tree is called —Znode. Like the directory tree of the file system ,ZooKeeper Each node in the tree can have children . But there are also differences :

1) How to quote :

Zonde Reference by path , Like Unix File path in . The path must be absolute , So they have to start with a slash character . in addition to , They have to be the only , That is to say, each path has only one representation , So these paths can't be changed . stay ZooKeeper in , Path by Unicode String composition , And there are some limitations . character string "/zookeeper" To save management information , For example, key quota information .

2) Znode structure

ZooKeeper In namespace Znode, Both file and directory characteristics . Maintain data as files 、 Meta information 、ACL、 Data structures such as time stamps , It can also be used as a part of path identification like directory . Each node in the graph is called a Znode. Every Znode from 3 Part of it is made up of :

 stat: This is the status message , Describe the Znode Version of , Permissions and other information

 data: With this Znode Associated data

 children: The Znode Child nodes under

ZooKeeper Although you can associate some data , But it's not designed as a regular database or big data storage , By contrast, , It's used to manage scheduling data , For example, profile information in distributed applications 、 State information 、 Assembly location, etc . The common characteristic of these data is that they are very small data , Usually, the KB In units of size .ZooKeeper Both the server and the client are designed to strictly check and limit each Znode At most 1M, But it should be much less than this value in normal use .

3) The data access

ZooKeeper The data stored by each node in is to be operated atomically . That is, the read operation will get all the data related to the node , The write operation will also replace all the data of the node . in addition , Each node has its own ACL( Access control list ), This list specifies the user's permissions , That is, it defines the operations that a specific user can perform on the target node .

4) Node type

Persistent Nodes: Permanently valid nodes , Unless client Explicitly delete , Otherwise there will always be .

Ephemeral Nodes: Temporary node , Just create the node client Keep the connection active , Once the connection is lost

loss ,zookeeper The node will be deleted automatically .

Sequence Nodes: Sequential node ,client When applying to create this node , ZooKeeper The incremental sequence number is automatically added at the end of the node path , This type implements distributed locks , Distributed queue And other special functions .

5) monitor

The client can set on the node watch, We call it a monitor . When the node state changes (Znode The increased 、 Delete 、 Change ) Will trigger watch The corresponding operation . When watch When triggered ,ZooKeeper Only one notification will be sent to the client , because watch Can only be triggered once , This can reduce network traffic .


ZooKeeper You can set... For all read operations watch, These reading operations include :exists()、getChildren() And getData().watch Events are one-time triggers , When watch When the state of the object changes , Will trigger on this object watch The corresponding event .watch Events are sent asynchronously to the client , also ZooKeeper by watch The mechanism provides an orderly consistency guarantee . Theoretically , Client reception watch The time of the event is faster than it sees watch When the state of the object changes .

2. Zookeeper ask visit control

In traditional file systems ,ACL There are two dimensions , One is the genus group , One is authority , subdirectories / By default, the file inherits the ACL. And in the Zookeeper in ,node Of ACL There is no inheritance , It's independently controlled .Zookeeper Of ACL, It can be understood from three dimensions : One is scheme; Two is user; The third is permission, Usually expressed as scheme:id:permissions, Here are three aspects to introduce :

1) scheme: scheme Corresponding to which scheme is used for authority management ,zookeeper Implemented a pluggable Of ACL programme , You can extend scheme, To expand ACL The mechanism of .

Hadoop Big data combat series Zookeeper


2) User: And scheme It's closely related , The specific situation is introduced above scheme The process has been introduced , No more details here .

3) permission: zookeeper Currently, the following permissions are supported :

Hadoop Big data combat series Zookeeper


Hadoop Big data combat series Zookeeper


3. Zookeeper use application scene

1) Data publishing and subscription ( Configuration center )

Publish and subscribe model , The so-called configuration center , As the name implies, publishers publish data to ZK Node , For subscribers to dynamically access data , Realize centralized management and dynamic update of configuration information . For example, global configuration information , The service address list of service-oriented service framework is very suitable for use .

Hadoop Big data combat series Zookeeper


2) Distributed lock service

Distributed lock , This is mainly due to ZooKeeper For us to ensure a strong consistency of data . There are two types of lock services , One is to keep exclusive , The other is to control the timing .

3) Distributed queues

Queue aspect , In short, there are two kinds , One is the regular first in, first out queue , The other is to wait until the queue members get together before executing in order . For the first FIFO queue , It is consistent with the basic principle of control sequence scenario in distributed lock service , No more details here . The second kind of queue is actually FIFO An enhancement is made on the basis of the queue . Usually in /queue This znode Create a new one in advance /queue/num node , And the assignment is n( Or directly to /queue assignment n), Indicates the size of the queue , Then every time a queue member joins , Judge whether the queue size has been reached , Decide whether you can start to execute . A typical scenario for this usage is , In distributed environment , A big task Task A, It needs to be done in many subtasks ( Or the conditions are ready ) It can only be carried out under the circumstances . This is the time , One of them ren service End become ( Just Clue ), that Well Just Go to /taskList Next build state since Oneself Of In the when when order section spot (CreateMode.EPHEMERAL_SEQUENTIAL), When /taskList Find that the number of child nodes below satisfies the specified number , You can proceed to the next step and process in sequence .


版权声明
本文为[osc_ ed2py9ot]所创,转载请带上原文链接,感谢

Scroll to Top