编程知识 cdmana.com

Application of computing storage separation in message queue

Yun Mei guides reading :

With the development of Internet , Big data with high concurrency is no longer far away , It's a must for most projects . among , Message queuing is almost a must-have skill . There are many mature message queuing tools , This article will introduce a Jingdong Zhilian cloud self-developed message queuing tool ——JCQ.

JCQ full name JD Cloud Message Queue, Jingdong Zhilian cloud self research , have CloudNative Distributed message oriented middleware . JCQ The original intention of the design is to adapt to the cloud characteristics of the message middleware , With high availability 、 Data reliability 、 Replica physical isolation 、 Service autonomy 、 Health status report 、 Less or no operation and maintenance 、 Container deployment 、 Stretch and stretch 、 Tenant isolation 、 Pay as you go 、 Cloud account system 、 And so on .

One 、JCQ Evolution

JCQ As early as 2017 In the middle of the year 1.0 edition ,2018 year 11 Month official GA Online for sale . but 1.0 In the version Topic Limited by single server limit , Can't meet the user's oversized specifications Topic demand .

therefore , We are 2.0 The version focuses on solving the problem of expansion and reduction .2019 year 4 month JCQ 2.0 The official launch , The main new feature is Topic  Expansion and contraction capacity 、 hotspot Topic stay Broker Load balancing between 、 hotspot Broker Of traffic transfer .

2019 year 7 month ,JCQ Another big architecture evolution —— Computational storage separation , The large version number is JCQ 3.0, On 2019 Online at the end of the year . The separation of computing and storage brings obvious benefits to the architecture , It has solved many pain points in daily life .

The following is a detailed description of the advantages of this evolution and what pain points have been solved :

1, Effectively control the scope of the upgrade

stay JCQ2.0 The computing module and the storage module are in the same process , Upgrade computing module will upgrade storage module together . The restart of the storage module is a heavy action , What needs to be done is : Load a lot of data 、 Compare the message data with the message index data 、 Dirty data truncation and other operations . Often fix a small computing module Bug, You need to do the above very heavy storage module restart . And in real work , Most of the upgrade work is due to the update of computing module function or Bugfix Caused by the .

To solve this problem , JCQ3.0 Will calculate the module 、 Independent deployment of storage modules , Through between RPC call . Each upgrade does not affect each other . As shown in the figure below :

Computing node Broker Only responsible for production news 、 Push message 、 authentication 、 authentication 、 Current limiting 、 Congestion control 、 Client load balancing and other business logic , It belongs to stateless service . More light , Fast upgrade .

Storage nodes Store Only responsible for data writing 、 Replica synchronization 、 data fetch . Because the business logic is simple , After the function is stable , There is no need to change except optimization , There's no need to upgrade .

2, Independent deployment , Break the hardware limit

JCQ It's a shared message middleware , Users apply for different specifications TPS Of Topic, Don't feel CPU、Memory、Disk And so on . therefore ,JCQ The service provider needs to consider how to use these hardware indicators reasonably .

JCQ Deploy through containers , There are many types of components , The hardware requirements of these components are also diverse , Among them, computing module and storage module are the most consumed resources . stay JCQ2.0 In the version , Computing modules and storage modules are deployed together , When choosing a model, you should take into account CPU、Memory、Disk Equal index , The requirement of the model is single , It's hard to mix deployment with other product lines . Even if it's the same pool of resources , There's also the scheduling order , The situation that causes scheduling failure . For example, the remaining resources of a machine can just schedule a disk with large size A Containers , But because B The container is first dispatched to this machine , The remaining resources are not enough to create one A Containers , Then the disks on this machine are wasted .

JCQ3.0 after , Computing node Broker And storage nodes Store Independent deployment , These two components can each choose the model suitable for their own business , Deployed in the corresponding resource pool . such _JCQ It can be deployed mixed with other products , Shared pool water level , Instead of having to bear the water line alone ._

3, Cost reduction from architecture improvement

JCQ3.0 Computing nodes in Broker It's stateless service , Master slave switching is lighter , Fail over in seconds ; And the physical device anti affinity is considered in the deployment , Such as span Rack、 Span AZ Deploy . therefore , Can be in usability 、 There is a trade-off between resource costs . If available M:1 How to make high availability cold standby , You don't have to 1:1 The proportion of high available cold standby , And then achieve the purpose of saving hardware resources .

4, solve Raft Performance issues

JCQ 1.0 At the beginning of the design Raft Algorithm , To solve the problem of high service availability 、 The problem of data consistency .Message Log And Raft Log There are many common characteristics , Write in order 、 random block read 、 End heat data . therefore , Direct use Raft Log As a Message Log It's very suitable .

stay JCQ We also found that in the process of evolution Raft Some of its own performance problems , Such as sequential replication 、 The order commit、 Some processes can only be processed by single thread . To address these issues , The most direct and effective way is to expand Raft Number of 、 Expand the number of single threaded processes , Within an order of magnitude , Concurrency capabilities follow Raft Group The growth of numbers , Linear growth , According to the MultiRaft, As shown in the figure below :

Above picture , Every StoreNode The node is a separate process , There are four sets of logic inside RaftGroup( The orange nodes are RaftGroup Of Leader), Each group RaftGroup It's a parallel relationship , It can be done Group Parallel replication between 、 parallel commit.

Due to the extensive use of NIO, these RaftGroup Communication thread pool can be shared between , expand RaftGroup The number of threads does not cause a linear increase in thread resources .

5, Fast fault recovery , Lightweight load balancing

stay JCQ3.0 in ,Broker For lightweight stateless service , Switch between master and slave 、 In terms of fault recovery, it is relatively 2.0 Lighter weight , It can recover its external service ability faster .

meanwhile ,Broker take Producer、Consumer Connection request for , Abstract to PubTask and SubTask, Hereinafter referred to as Task.Task The concept of is very lightweight , Just describe Client And Broker Correspondence of , By metadata manager Manager Unified scheduling 、 management . Transfer Task It just needs to be modified Task The content of , The client reconnects to the new Broker that will do .

Generally speaking ,Broker The main bottleneck is network bandwidth .Broker Regularly count network entrance and exit traffic , And report to the management node Manager.Manager According to the inlet flow 、 Export traffic and bandwidth threshold to determine , When it's over the threshold , Through a certain strategy, the corresponding Task Move to a smaller load Broker On , And inform the appropriate Producer And Consumer;Producer And Consumer After receiving the notice , Recapture Task Routing information for , Automatically reconnects to the new Broker Continue production 、 consumption .

6, High fan out demand

Imagine a scenario , There's a big size Topic, Created n A consumer group . Consumption is always TPS It's the total production TPS Of n times . Increase the consumption group , It will lead to total consumption TPS Linear growth . After reaching a certain consumption group size , single Broker Due to the bandwidth of the network card , Can't satisfy this kind of high fan out scene . Single server can't solve this problem .

stay JCQ 3.0 These different consumption groups can be corresponding to SubTask Spread over several Broker On , Every Broker Responsible for part of SubTask, single Broker from Store Read the message in advance , Push data to Consumer. such _ Multiple Broker Complete the message traffic of all consumption groups together , Work together to provide high fan out capability ._

7, Support for multiple storage engines

Message middleware is characterized by : In most scenes , The heat data is at the end , It is not commonly used to trace back messages a few days ago . therefore , There are hot and cold data .

JCQ Computing nodes design a layer of storage abstraction layer Store Bridge Access to different storage engines , Can access Remote Raft Cluster, Or distributed file systems WOS、 perhaps S3. What's more, it can periodically offload cold data from expensive local disks to cheap storage engines .

8, Side effects

be relative to JCQ2.0, Communication between computing nodes and storage nodes , From interface call to RPC call , There will be a loss in terms of delay . After testing , Most of the delays are in 1ms about , In most cases, sacrifice 1ms The delay around will not have a big impact on the business .

Two 、JCQ Future outlook

JCQ In the future, it will focus on multi protocol compatibility , Automatic expansion and contraction on demand 、 Cloud origin and other aspects of evolution :

1, Multi protocol compatibility

at present JCQ The agreement is private , There are big obstacles in guiding users to migrate . And then it's going to pull away JCQ Kernel, Different protocol access layers are provided externally . Convenient for users to learn from other MQ Access JCQ.

2, Automatic volume expansion and shrinkage

JCQ It's a shared message middleware , But the lack of Serverless Features of automatic expansion and reduction . Every time there is a big promotion , Such as 618,11.11, Service trade association and other important activities . It is difficult for business parties to estimate their peak business volume , Or underestimation , Can cause topic Current limiting and other problems . If you are guaranteeing JCQ When the service itself is capable of , Can do Topic Flexible auto scaling , Will be of great help to users , Play a real role in peak shaving and valley filling .

3, Cloud native

The future will support in Kubernetes Environment deployment and delivery , Will provide the original Operator, Can be quickly deployed in K8s Environment , Better delivery of private cloud 、 Hybrid cloud project .

Recommended reading :

Welcome to click Jingdong Zhilian cloud , Learn about the developer community

 More wonderful technology practice and exclusive dry goods analysis

 Welcome to your attention 【 Jingdong Zhilian cloud Developer 】 official account

版权声明
本文为[Jingdong Zhilian cloud developer]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224214729005r.html

Scroll to Top