One 、 Introduce
Phoenix What is it? ?
It can be understood that HBase There's a layer on the top of SQL engine , Support with sql Access to HBase.
Support low latency from MS to sec OLTP And operational analysis queries
Phoenix What can be done ?
1. Supporting the standard SQL grammar To HBase API
2. Support will operator 、 Under the condition of filtration, it is pushed to server End , Parallel execution
3. Secondary indexes 、 Paging query 、Join、 Lightweight transactions and other capabilities
- client : JDBC Program or command line , take SQL Analytical optimization generation QueryPlan, And into HBase Scans, call HBase API Issue query calculation request , And receive the returned results ;
- Server side :
- Phoenix The metadata table ( As shown in figure, SYSTEM.CATALOG) Store Phoenix The mapping between table and metadata table , Information and table status when creating a table .
- Phoenix Coprocessor (Endpoint type ) Dealing with secondary indexes 、 Polymerization and JOIN Calculation, etc .
1.SQL Low threshold , Easy to use , The cost of user access is low
2. Users don't need to care rowkey Design , hotspot 、 Pagination , Simple join And so on
3. Read performance is better than direct read in some scenarios HBase Improved
Why part of the scene is better than direct reading HBase fast ？
1. compile SQL Be original HBASE Yes Parallel execution Of scan
2. Calculation push down ,server End Of coprocessor perform polymerization
3. Push down where Filter the condition to server Terminal scan filter On
4. Use statistics to optimize 、 Select query plan
5. Secondary indexes Improve non row key query performance
6.skip scan Ability to optimize IN,LIKE and OR Inquire about
7. Optional salt the row keys to achieve load balancing , Avoid hot spots
1. Write performance compared to HBase falling : When writing, you need to deal with the fields , Update the status table , There is a certain decline in performance , stay 20% within
2. complex join、groupby Low performance : There is a large amount of scanning data
3. The consistency with secondary index is not perfect , have only 3 Strategies
4. Read performance has been improved in some scenarios , But if users put their needs to HBase Native API Realization , Performance compared to Phoenix It's better .Phoenix Through the internal analysis , Optimizing the execution plan and other operations are time-consuming .
How to use Phoenix?
2. Use Python Written command line tools （sqlline, sqlline-thin and psql etc. ）
Two 、 Read write link
1. receive sql
2.Parser analysis sql , Generate execution plan QueryPlan
3.Optimizer Optimize query plan ( Determine whether the index table is available , If available, replace with the query plan of the index table )
4. Turn it into HBase Of scan, send out RPC Ask to RegionServer On
5. if there be orderby count Isoaggregate calculation , Request to the corresponding coprocessor method
Server side :
1. Accept RPC request , Processing queries
2. Aggregate computing starts with multiple rs After the calculation , Return the results of multiple calculations to the client , The client recalculates .
1. receive sql
2. The analysis turns into HBase Of put , send out RPC Ask to RegionServer On ( Main table )
Server side :
1. Accept RPC request , Main processing table Put
2. The background thread of the coprocessor , Read out the data newly written to the main table regularly , Again put To the index table corresponding to HBase On the table .
3、 ... and 、 Secondary indexes
It is suitable for the scenario of index table
- Queries that cannot be satisfied by a single primary table
- The filter conditions are almost fixed , Queries that are not on the primary key field of the primary table
phoenix The primary key of a table consists of one or more fields , Suitable for pressing rowkey, Or the prefix part rowkey Filter query . But if there are other query scenarios, the prefix cannot be specified rowkey, To make query criteria by other fields , The query needs to scan the full amount of data in the table .hbase Not suitable for scanning full table operation , If the amount of data is a little larger, scanning the whole table is not available .
Using index tables , When querying, you can quickly scan out the required data according to the primary key of the index table .
1. When creating the index table, specify the query scenario where Fields and select Field of , The primary key of the final index table is made by where The conditional fields and the primary key fields of the primary table are composed of , The bottom layer of secondary index table is also HBase surface .
2. When querying, the client resolves sql, Determine if there is an appropriate secondary index , If so, the optimized query plan will scan The index table corresponds to HBase On the table .
4. In the index table corresponding to HBase On the table , The primary key of an index table is equivalent to rowkey, Press rowkey Just scan .
3.server The background thread of the coprocessor on the side will regularly synchronize the data newly written to the primary table into the secondary index table .( The historical data can be run through the asynchronous secondary index program )
Sync / asynchronous ：
1. Asynchronous index table is more than synchronous index table ASYNC identification
2. The asynchronous index table is always building state ( It's not readable ),
3. Relying on a phoenix The internal data filling thread is very slow , And it will increase the cluster load , Usually run asynchronous secondary index directly , After running, the index table is active state
4. A new main table usually creates a synchronous index table , If the main table already has data Build an asynchronous index table .
( If the main table has data Built a synchronized index table , When building an index table, it will complement , Only when the complement is completed can it be created successfully , It takes a long time , If there are new writes at the same time, they may not catch up with )
Four 、 Common operation and maintenance means and common problems
1. Asynchronous complement secondary index
When to run an asynchronous secondary index ？
After building the asynchronous index table , If you need to supplement the data in the historical main table, you need to run .
How to run ?
Use Phoenix Built in tools for IndexTool perform .
If you need, please contact @ Data engine service number v2 .
IndexTool yes phoenix Built in tools , Used as a Build Main table full index data ：
Use in tools snapshot, Later on IndexTool MR job,output and bulkload Corresponding to hbase Cluster index table , Yes regionserver Read and write without pressure .
2. Primary table write blocked or secondary index table Disabled
Why does it happen ?
Affected by the consistency policy of primary table index table , The default policy is : When writing index table fails ,disable Index table , Do not block main table writing .
The risk is : When the index table fails to write , The index table is disable, All queries fall on the main table , Scan the whole table , Query slow timeout ,rs Downtime , It could also affect other businesses .
Recently, I've been pushing the recommendation strategy ： When writing index table fails , No disable Index table , Block main table write . This does not affect the query , Only new writes are affected .
ps: The first 3 The strategy is disable Index table , Do not block main table writing , So the data will be inconsistent , Not recommended .
influence : When querying, the main table will scan the whole table
1. Determine the reason for the failure to write the index table , The machine problem is removed . Update index table status to active
2. Write or fail may require a restart rs（ The main table and index table should be grouped in different groups as much as possible , Avoid a series of problems caused by the failure of writing index table during rolling upgrade ）
3. Secondary index table not available and recovery
The phenomenon : Index table disable timestamp (INDEX_DISABLE_TIMESTAMP)>0
influence : sql Your query falls on the main table , It's not on the index table . Because the primary keys are different , When it falls on the main table, it will scan the whole table , bring rs The risk of increased pressure and even downtime .
Need to do :
1. Determine the index table status ( check system.catalog surface )
2. Restore the status of all index tables under the main table to active(alter set)
alter index IDX_P002 ON TEST_NS.P003 ACTIVE;
3. If there is a risk of inconsistent data in the main table index table , We need to fill in the data .
The index table is disable period , If you continue to write data , There will be inconsistencies in the data .