BI It's been some years since we entered China , at home and abroad IT The giants are rushing into this field , Some small and medium-sized software enterprises are also involved in it . retail 、 manufacturing 、 Fast consumer goods 、 aviation 、 Finance 、 Telecommunications and other industries have become BI Important territory for implementation .
however , Say something rude , Most of the BI Projects are all failures , At least there are many problems , Can't meet the customer's requirements at all , Data quality 、 System performance is the first major problem .
Among the employees ,50% All of the above are seriously unqualified , You can imagine the quality of the products .
1、 Let's start with architecture and design . Most of the architect I've seen a lot of other people's architecture design , It's basically a piece of architecture come out , Nothing but 4 It's just a layer ：Operational Source Systems; Data Staging Area; Data Presentation Area; Data Access Tools, Or its variants , Or a slightly different name , The essence is almost the same .
And then there was ETL Architecture design and report layer architecture design .
It seems simple, isn't it . however , How many really detailed projects have the team done ？ data structure 、 Primary and foreign key check 、 Citation integrity 、 Range of values 、 Column length limit 、 Null check 、 legal / Illegal value list 、 Implied business rules and so on .
If there are multiple source systems , Data will generally be inconsistent , How many inconsistencies ？ Is there a detailed list ？ How to set up business master data ？ If none of this is taken into account , Or not in detail , So this project can basically be said to deceive customers .
The source system is off , The next step is data staging, Why staging? how staging? staging Which? ？staging form ? staging performance ？staging What kind of cleaning to do in 、 transformation 、 Consistency handling 、 Add 、 duplicate removal ？ In which link do you do ？ Order ？
Then the data is loaded into the data warehouse / The data mart , Before loading , Assignment of surrogate keys , Late dimension information processing , Early arrival data processing , All of these test the wisdom and experience of designers .
however , According to the author's experience , Many project teams don't even think about a lot of them , I don't even know there will be such a problem , So the data quality of the final product is a mess , It's no surprise .
Okay , The data is finally loaded into the data warehouse , What's next ？ We all know that we're going to make a show .
however , There are thousands of possible queries , What do you gather ？ There's too much gathering , Refreshing itself takes too much time , It was originally designed to improve query performance , As a result, customers wait left and right , Finally, I was told that the system was still refreshing the aggregation .
Originally, the customer had to look at the report every morning , It turns out that you are ETL Add an aggregation process , There are other related calculations 2 It's not over yet , So we have to fool the client server performance is not enough 、 Database memory is too small , And so on with all sorts of excuses , You might as well suggest that customers look at the report once a week .
2、 Data warehouse modeling
Most modelers also know that dimensional modeling 、 De paradigm design , The big picture is basically known . however , The most challenging thing about modeling is the details , I've seen a lot of data models that are still de paradigm oriented , Dimension table 、 The fact sheets are all there , But as soon as you get to know , It turns out that the modeler is still 3NF Design thinking , Because except for the subject, paradigm design is everywhere .
SCD( Slowly changing dimensions ) They all know how to deal with , But for fast changing dimension tables 、 Huge dimension tables 、 A large number of dimension tables with few records 、 Complex hierarchical relationship processing 、 The handling of many to one relationships , They are often at a loss .
What's more important is the grasp of granularity , Or thick , In the end, many queries can not be implemented ; Either it's fine , It leads to too much data , Affect performance ; Or the granularity is right , But the granularity of the corresponding dimension table does not match , And then came up with a variety of remedies .
3、 performance tuning
When the ugly daughter-in-law finally meets her mother-in-law , The old man is exposed , Performance can't be good , So it started tuning performance, Left to right , It doesn't improve much either . So they began to cheat customers to upgrade their servers , Add memory .
In my opinion , Performance is not tuned at all , It's designed , You've had problems with all kinds of designs from the beginning , It's no use how to tune it later . First model the data warehouse 、 Aggregation design 、 ETL The design is done , And then from OS、DBMS、SQL Three aspects to optimize , Database which segment It should be on a different hard drive , how partition, Which clusters are placed in which partition On ,SQL We can't just write conveniently without considering its performance , What indexes should be built , What kind of index to build , These all affect performance .
So most BI It's not surprising that the final performance of the system is not good , The designer is not professional or thoughtful , Performance optimization people are inexperienced ,ETL developer 、 Report developers often only know tools , about SQL And a variety of scripts are not in-depth grasp of , The performance of things made in this way is naturally not much better .
4、 The problem with the practitioners
Most people are just tools ,ETL Tools , Reporting tools and so on , Even the tools don't get very deep , Let alone really understand its connotation . I once made one ETL, The data to be extracted is in the unformatted log file , And the log is the best data source .
So are the statements , Simple city will , One to extremely complex multi themes 、 Complex statistics are blind , Sometimes the needs of customers are weird , We have to put all the irrelevant things into one report , You also have to achieve . But in terms of business , There is a reason for his need , What you seem irrelevant , Or cross row calculations that you don't think are necessary , He could see something in it .
I once made a super complex statement for the bank , Put all kinds of credit in one report , There are horizontal Statistics , There are vertical Statistics , And the subtotal , The overdue installment, the previous period and the current period are all in one table , It's also divided into account-level and customer-level Two kinds of Statistics .
I used it all 4 Layer of subquery To do a variety of grouping Statistics , Then use the result set as the source data of the upper layer , It's still used N Multiple set operations . When it's done , For a thousand lines of SQL I have no idea , As soon as the results run , Good performance , In a few minutes , The person of business department checks , And the data are correct . This thing , It's very difficult for you to implement it only with reporting tools .
Many companies are hiring people , In order to save cost , Recruit a few high level , And a bunch of new ones , I think this kind of collocation can improve the overall level . I don't deny that novices do have a strong learning ability 、 smart 、 Logical thinking ability is good , This kind of person can grow up very quickly , But most people, you can never expect them to be qualified software engineers .
There are also big problems with management , Software industry is software industry , It's not manufacturing , You manage software development the way you manage production lines , It can only bring out a team with low morale 、 Dull and dull 、 No creativity 、 The implementation level is also low .
What about customers , More superstitious about big companies , Because big companies are strong and secure 、 There is mature management . In fact, big companies are also , A small company , In the end, there are still a few people who do things , The quality of these people 、 Technical level and sense of responsibility determine the success or failure of the project , A big company is nothing more than a mess and a new group of people , Affordable .
But the problem is that customers can't afford , A good project often ends in tragedy , It cost millions of dollars , The end result is poor performance , Quality is worrying , Let's not say that the decision supports , It's too shabby to even decorate a facade .
本文为[The sail is soft]所创，转载请带上原文链接，感谢