编程知识 cdmana.com

Big data OLAP System (1) -- Concept

1.1 What is? OLAP?

OLAP(OnLine Analytical Processing), Online analytical processing .OLAP Perform multidimensional analysis of business data , And provide complex computing , The ability to analyze trends and model complex data . It is mainly used to support enterprise decision management analysis , It's a lot of business intelligence (BI) The technology behind the application .OLAP Enables end users to perform ad hoc analysis of data from multiple dimensions , In order to acquire the knowledge they need , In order to make better decisions .OLAP Technology has been defined as implementing “ Fast access to shared multidimensional information ” The ability of .

1.2 Why multidimensional analysis ?

Business is actually a multidimensional activity . Enterprises track their business activities by considering many variables , When tracking these variables on a spreadsheet , Set them on the axis (x and y) On . for example , You can track sales monthly over a year , One of them can be in y The sales indicators show , And in the x The month can be displayed on the axis . Instead, analyze the health of the business and plan future activities , Many groups of variables or parameters must be tracked continuously . for example , A business should consider at least the following aspects : Customer , place , period , Sales people and products . These dimensions make up the corporate plan , The basis for analysis and reporting activities . Together they represent “ Whole ” Business status , Plan for all businesses 、 Lay the foundation for analysis and activity .

1.3 OLAP The origin of

OLAP This term was first used in 1993 year , By is called “ Father of relational database ” Of Edgar F. Codd In his white paper 《Providing OLAP to User-Analysts: An IT Mandate》 For the first time . In this white paper , He is OLAP The product builds 12 There are two evaluation rules :

  1. Multidimensional Conceptual View( Multidimensional concept view ): In the eyes of user analysts , Enterprises are naturally multidimensional . for example , By Region , product , Time period or program ( For example, actually , Budget or forecast ) Look at the profits . Multidimensional data model enables users to be more direct , More intuitive processing of data , Include “ Slicing and chunking ”.

  1. Transparency( Transparency guidelines ):OLAP It should be part of the open systems architecture , The architecture can be embedded anywhere users expect , Without affecting the functionality of the host tool . Should not OLAP The data source of the tool is exposed to the user , Data sources can be homogeneous or heterogeneous .

  1. Accessibility( Access capability conjectures ):OLAP Tools should be able to apply their own logical structure to access heterogeneous data sources , And perform any transformations needed to present a coherent view to the user . Tools ( Not users ) Attention should be paid to the sources of physical data .

  1. Consistent Reporting Performance( Stable reporting performance ): As the number of dimensions increases ,OLAP The performance of the tool will not be significantly affected .

  1. Client-Server Architecture( Customer / Server architecture ):OLAP The server component of the tool should be smart enough , Various clients can easily connect to it . The server should be able to map and merge data between different databases .

  1. Generic Dimensionalityc( The equivalence criterion of dimension ): Each data dimension should have the same structure and operational capability .

  1. Dynamic Sparse Matrix Handling( Dynamic sparse matrix processing criteria ):OLAP The physical structure of the server should have the best sparse matrix processing .

  1. Multi-User Support( Multi user support capability criteria ):OLAP Tools must provide concurrent retrieval and update access , Integrity and security .

  1. Unrestricted Cross-dimensional Operations( Unrestricted cross dimensional operations ): Computing facilities must allow computation and data processing across any number of data dimensions , And do not restrict any relationship between data units .

  1. Intuitive Data Manipulation( Intuitive data manipulation ): Data operations inherent in the merge path , For example, drilling down or shrinking , This should be done through direct manipulation of the analytical model unit , Instead of using menus or multiple trips across the user interface .

  1. Flexible Reporting( Flexible report generation ): Any information that users want to see in the tool should be viewed .

  1. Unlimited Dimensions and Aggregation Levels( Unlimited dimensions and aggregation levels ).

1.4 OLAP The history of development

although OLAP The concept of is in 1993 I didn't bring it up until I was young , But support OLAP The history of related products , The earliest can be traced back to 1975 year :

  1. The first paragraph OLAP product Express On 1975 It came out in , With Oracle After the acquisition, it prospered 30 years , Finally, the successor Oracle 9i replace .

  1. 1979 year , The first spreadsheet application VisiCalc Put on the market .VisiCalc Has the standard basic row and column structure in most spreadsheet applications today .

  1. 1982 year ,Comshare A new decision support system software is developed (System W), This is the first financial sector OLAP Tools , It is also the first application in its multidimensional modeling hypercube Tools for methods .

  1. 1983 year ,IBM Launched Lotus 1-2-3. Its structure is similar to Visicalc, And quickly replaced Visicalc.Lotus 1-2-3 Become Windows Previous mainstream spreadsheet applications .

  1. 1984 year , The first paragraph ROLAP product Metaphor Release . This multidimensional product creates a new concept , For example, customers / Server computing , Multidimensional data processing , The working group deals with , Object oriented development, etc .

  1. 1985 year ,Excel 1.0 Be born . Microsoft is in Excel The PivotTable function is integrated into the Excel One of the most important enhancements to the product , Because PivotTable has become the most popular and widely used tool in multidimensional analysis .

1989 year ,SQL Language standards were born , It can extract and process business data from relational databases . It could be a turning point . stay 1980‘s years , The spreadsheet is in OLAP Absolutely dominant in application ; and 1990’s After the age , More and more database based OLAP Applications are starting to appear :

  1. 1992 year :Hyperion Solution Release Essbase( Expand the spreadsheet database ), stay 1997 It became the main OLAP Server products .

  1. 1997 year :PARIS Technologies Introduction PowerOLAP: Integrating spreadsheets and transaction databases , So that in the spreadsheet application ( for example Excel) Update data in real time .

  1. 1999 year :Microsoft OLAP Service release , And in 2000 Become a year Microsoft Analysis Services

  1. 2012 year :PARIS Technologies Released OLATION, It will be relational and multidimensional database technology ( stay SQL Server,SAP HANA,Oracle Wait ) Merge together , Make sure that the actual data and the planned data are “ Real online ” Data update .

1.5 OLAP The core concepts and basic operations of

1.5.1 The core concept

dimension (Dimension): A dimension is a set of attributes that describe a business topic , A single attribute or set of attributes can form a dimension . Such as time 、 Location 、 Age and gender are dimensions .

The level of dimension (Level of Dimension): A dimension can often have multiple levels , For example, time is divided into two dimensions 、 quarter 、 Month and day, etc , A regional dimension can be a country 、 region 、 province 、 City level . The hierarchy here represents the degree of data refinement , Corresponding to the concept hierarchy . The roll up operation described later is to map the low-level concept to the high-level concept . In addition to the concept hierarchy can be determined according to the total order and partial order relationship of concepts , It can also be achieved by discretizing and grouping data .

Members of dimension (Member of Dimension): Jovi is multi-level , Then the values of different levels constitute a dimension member . Part of the dimension hierarchy can also form dimension members , for example “ A year, a quarter ”、“ A season, a month ” Etc. can be members of the time dimension .

Measure (Measure): Represents the value of a fact on a dimension member . For example, Han men in the development sector have 39 people , It means that in the Department 、 nation 、 On the three dimensions of gender , A factual measure of the number of people in a business .

1.5.2 Basic operation

OLAP The operation is to query —— That's the database SELECT Operation Oriented , But queries can be complex , For example, queries based on relational databases can be associated with multiple tables , have access to COUNT、SUM、AVG Wait for the aggregate function .OLAP It is based on the multidimensional model to define some common analysis oriented operation types, which makes these operations more intuitive .

OLAP The multidimensional analysis operations include : Drilling (Drill-down Scroll up (Roll-up section (Slice cutting (Dice as well as rotate (Pivot)**, Let's take the data cube as an example to explain one by one :

Drilling (Drill-down): Changes between different levels of dimension , From the top down to the next , Or split the summary data into more detailed data , For example, by dealing with 2010 Drill down the total sales data for the second quarter of 2010 Second quarter 4、5、6 Monthly consumption data , Pictured above ; Of course, you can also drill into Zhejiang Province to see Hangzhou city 、 Ningbo City 、 Wenzhou City …… Sales figures for these cities .

Scroll up (Roll-up): Reverse operation of drilling , From fine-grained data to high-level aggregation , For example, Jiangsu Province 、 The sales data of Shanghai and Zhejiang Province are summarized to view the sales data of Jiangsu, Zhejiang and Shanghai , Pictured above .

section (Slice): Select a specific value in the dimension to analyze , For example, select only the sales data of electronic products , perhaps 2010 Data for the second quarter of 2007 .

cutting (Dice): Select the data of a specific interval in the dimension or a batch of specific values for analysis , Such as choice 2010 First quarter to 2010 Sales figures for the second quarter of 2007 , Or the sales data of electronic products and daily necessities .

rotate (Pivot): That is, the interchange of the positions of dimensions , It's like a row column conversion in a two-dimensional table , As shown in the figure, the product dimension and regional dimension can be exchanged through rotation .

1.6 OLAP The classification of

Classified by the way data is stored , Can be divided into MOLAP、ROLAP、HOLAP.

1.6.1 Multidimensional OLAP (MOLAP)

MOLAP yes OLAP The classic form of .MOLAP Store data in optimized multidimensional arrays , Not in a relational database . Attribute values of dimensions are mapped to subscript values or ranges of subscripts of multidimensional arrays , The measurement data is stored in the cell of the array as the value of multidimensional array . because MOLAP A new storage structure is adopted , From the physical layer , So it's also called physics OLAP(PhysicalOLAP); and ROLAP It is mainly realized by some software tools or intermediate software , The physical layer still adopts the storage structure of relational database , So it's called virtual OLAP(VirtualOLAP).

some MOLAP The tool requires pre calculation and storage of data , In this way MOLAP Tools are often used called “ data cube ” A precomputed dataset of . The data cube contains all possible answers to a given range of questions . therefore , They respond very quickly to queries . On the other hand , According to the degree of precomputation , Updating can take a long time . Precomputing can also lead to so-called data explosion .

1.6.2 Relational OLAP(ROLAP)

ROLAP Store multidimensional data for analysis in relational database . This way depends on SQL Language realizes tradition OLAP The slicing and slicing function of , Essentially , Slicing and slicing are the same as in SQL Add... To the statement “ WHERE” Clause .ROLAP The tool does not use precomputed cubes , Instead, it queries the standard relational database and its tables , To get the data needed to answer the question .ROLAP The tool has the ability to ask any questions , Because the method (SQL) Not just the content of the cube .

Even though ROLAP The underlying relational database uses , But these databases are generally targeted at ROLAP Optimize accordingly , Like parallel storage 、 Parallel queries 、 Parallel data management 、 Cost based query optimization 、 Bitmap index 、SQL Of OLAP Expand (cube,rollup) wait . Specially designed for OLTP The designed database cannot be ROLAP The database works as well .

1.6.3 Hybrid OLAP(HOLAP)

because MOLAP and ROLAP It has its own advantages and disadvantages , And their structures are very different , It's for analysts to design OLAP Structure poses a problem . For this, a new OLAP structure —— mixed type OLAP(HOLAP) Proposed , This tool allows simultaneous use of multidimensional databases (MDDB) And relational databases (RDBMS) As data storage to bridge the technological gap between the two products . It allows model designers to decide which data to store in MDDB in , Which are stored in RDBMS in , for example , Store a lot of detail data in relational tables , The pre calculated aggregate data is stored in the cube . At present, the whole industry is interested in “ blend OLAP” We haven't reached a clear consensus yet .

1.6.4 MOLAP And ROLAP comparative analysis

1.7 OLAP The relationship with other concepts

1.7.1 OLAP vs OLTP

The goals of the two designs are totally different :

  1. OLTP(On-Line Transaction Processing), On line transaction processing , Generally used in business systems .OLTP The requirements for transactional processing are very high , It's generally a highly available online system , It is mainly based on the traditional relational database . The application on it , Generally small transactions and small queries are given priority to . When evaluating its system , It's generally seen that it's executed every second Transaction as well as SQL The number of . In such a system , A single database processes Transaction( increase 、 Delete 、 Change ) It's often hundreds or thousands of ,Select The execution amount of query statements is thousands or even tens of thousands per second . Typical OLTP The system has an e-commerce system 、 Bank trading system 、 Securities trading system, etc .

  2. OLAP, It is generally used to analyze systems . The application on it , Generally, it is mainly based on the query of large amount of data , There are few operations to modify and delete . In such a system ,SQL Statement execution is not an indicator , Because a statement can take a long time to execute , There's also a lot of data to read . therefore , When evaluating its system , It often depends on the throughput of the system 、 Complex query response time 、 Data loading performance, etc .

The detailed comparison between them is as follows :

1.7.2 OLAP vs Data warehouse / The data mart

There are many ways to model data warehouse :

  1. ER Model ( Entity - relational model )

  2. Data Vault Model

  3. Anchor Model

  4. Dimension model

The first three models focus on integrating the data from various business systems into a unified data warehouse , And consistency processing , Provide data models and atomic data that meet the third or higher paradigm . This kind of data warehouse is called CIF(Corporate Information Factory, Enterprise information factory ) Enterprise data warehouse under Architecture . This data warehouse architecture is the father of data warehouse Inmon Respected . But because of the use of the canonical model , This makes it difficult to query these atomic data , This kind of architecture can't be directly used to support analysis and decision . In order to better support the analysis , In this framework , It usually needs to be based on data warehouse , Create subsets of data by topic , Data mart . These data marts usually use a dimensional model ,OLAP Tools can work based on data marts . Data marts are usually based on OLAP System and build .

The fourth model ( Dimension model ) Another master of data warehousing Kimball Proposed , It is the most popular modeling method in the field of data warehouse . The dimension model can support the analysis of decision-making needs , At the same time, it has better response performance for large-scale complex queries . Dimension models can be used directly OLAP Tools connect with it .Kimball The data warehouse architecture is as follows , A data warehouse based on this architecture , Can provide directly OLAP Ability . In this way, the data warehouse itself becomes a OLAP System .

1.7.3 OLAP vs BI Tools

BI yes Business Intelligence English abbreviations , Business intelligence , It's a collection of technologies that use data to improve decision quality , It's the process of drilling information and knowledge from a large amount of data .OLAP and BI Often together ,OLAP yes BI One of the underlying technologies of tools .BI Tools usually dock with OLAP System , But not limited to , You can also work directly with other databases 、 Storage system docking .

1.7.4 OLAP vs Ad hoc inquiry

Ad hoc It's a common Latin phrase , intend “ Ad hoc 、 For a specific purpose ( The earth )、 temporary 、 Of the project ”. Ad hoc inquiry (Ad Hoc Queries) It refers to the query created dynamically by users according to their own needs , Contrary to predefined queries .

Ad hoc queries have no requirements on the data model , As long as it can provide dynamic query capability ; and OLAP System , Generally, data model should be multidimensional data model . about ROLAP System , It usually provides ad hoc query capabilities , There is little difference between the two , So we often mix it up .

author : GEWAN Xiyun
link :https://www.jianshu.com/p/a078fed07acc

版权声明
本文为[osc_ 97kpb2b5]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224152559338d.html

Scroll to Top