Supported business applications
· Network management and optimization , Including infrastructure construction optimization and network operation management and optimization ;
· Marketing and precision marketing , Including customer portraits 、 Relationship chain research 、 Precision marketing 、 Real time marketing and personalized recommendation ;
· Customer relationship management , Including customer service center optimization and customer life cycle management .
· Enterprise operation management , Including business operation monitoring and operation analysis .
· Data commercialization refers to the external commercialization of data .
· Eliminate data access bottlenecks and discover user habits , More targeted marketing .
· Hbase Manage billing data , The random search and response of massive users is guaranteed .
· Cloudera Business Edition
· Components used ：cloudera manager 、hbase、 kudu、yarn、hive、hdfs、sentry、kerberos、spark etc.
Big data scale
400+ productive CDH node
Traditional data warehouse can not effectively store the growing business data
With the increase of business data volume of operators , At the same time, the amount of data increases with the complexity of application , Massive data increases the data storage and processing pressure of operators' business system ; The data warehouse cannot be expanded linearly , Leading to operators' information system management difficulties 、 The high cost 、 The expansion pressure is high 、 Problems like efficiency decline . There is a huge amount of data recorded on the Internet by operators , The previous scheme is to collect user traffic in the gateway , Analyze traffic data , And then generate the online record Bill , The bill is very large .
Traditional data warehouse can't deal with new business data effectively
Operators pay more and more attention to electronic channels , Now a lot of business can be handled directly on the website , Users can customize some telecom services , Or make a query . All these behaviors of users on the website , In fact, it is also the behavior of running tests to a certain extent . Analysis done by operators before , It's mainly about whether the customer pays the fee in time 、 Is credit rating good , But it doesn't record the user's attention or behavior characteristics on the website . If this part of the data and the original database for integration analysis , It can truly describe the user's purchasing behavior ., The data is all text 、 Unstructured data such as pictures or videos , Different from the characteristics of traditional communication service analysis , Need to be unstructured for content 、 Large amount of information for effective analysis , Traditional architecture processing is hard to cope with . Now it can be realized and perfected gradually by some technical means .
Decentralized systems and less standardized data need unified management of big data platform
At present, many business systems of operators are scattered , It is difficult to share resources and applications . Business analysis 、 Signaling monitoring 、 Comprehensive network analysis 、 Bad information monitoring 、 Online log retention and other big data systems are built by specialties , Some of the systems are constructed by provinces , Cause duplication of resources 、 Application RE Development 、 Expert resources cannot be shared . in addition , Distributed storage of data 、 The lack of standardization is a major problem faced by operators , The data model of each big data system is not unified , The demand for cross system comprehensive analysis is increasing . Difficult to meet the needs of business development . Highly scalable 、 Low cost new big data architecture has become an important direction .
The centralized business intelligence platform under the converged architecture needs big data Provide data and marketing support
An important platform for operators to implement data management and analysis is business intelligence platform , With the rapid growth of data volume and the improvement of customer marketing positioning requirements , With centralized 、 The key to build a powerful big data platform based on high availability and high intelligence for business operators . such as , On the one hand, operators require business intelligence platform to support massive structured and unstructured data analysis and mining , Besides , Combined with the user's online log and Internet page content , Provide user behavior preference analysis for precision marketing ; For the development of Internet business to provide general trends and business competitive product analysis capabilities . Because of centralized construction , Centralization BI The system will face large data scale 、 Data processing is complex 、 Mixed load, diversity and other challenges , Traditional single data warehouse technology is difficult to meet , Big data technology needs to be introduced .
use cloudera Multi tenant enterprise big data platform , Internal business improvement 、 Operation optimization is consumer centered , Carry out the analysis and data mining of user behavior patterns , Support all kinds of data applications , Including infrastructure construction and network operation management optimization , Explore new business and achieve precision marketing , Customer service optimization 、 Enterprise operation decision support, etc .
To mention to the outside world For data service operators to integrate data , Through data mining desensitization to generate a result dataset , To corporate customers , Help enterprises understand users , Improve competitiveness , Including precision advertising 、 Data report 、 Precision marketing 、 Capacity opening and capacity leasing, etc .
Overall deployment architecture ：
· Use mapreduce and spark To transform and process data ;
· Use hbase To manage massive amounts of data ;
· Use hive impala kudu To analyze and process the data ;
· Use flink Stream processing .
Basic operation and maintenance
Including platform monitoring 、 Cluster inspection 、 Data backup migration ：hbase Data migration ,hive Data migration .
· hive canary High indicators ,hive metastore The connection is slow , Metadata synchronization exception , Cause the mission to fail , The business level cannot complete the analysis of data , Affect the release of data, such as billing at the end of the month .
· sentry Authorization cannot be synchronized to hdfs acl jurisdiction , The business level needs to repeat authorization every time it processes the table data , Reduce the timeliness of treatment .
· hbase Long-term rit situation , Business level hbase Table cannot be accessed , Bill form , Data latency for real-time display .
· ha Under the circumstances NAMENODE The standby node cannot checkpoint Merge metadata , At the business level, if namenode The downtime of may cause the cluster to be unavailable , All kinds of calculations don't work properly .
· Cluster use cm Management nodes manage clusters , The operation and maintenance personnel can communicate manually too cm Page indicators or background indicators are used to inspect the cluster and repeat the work , Low work efficiency
· Clusters use multi tenant approach ,cm There is no unified management interface for multi tenant , The process of multi tenant configuration is complex, repetitive and time-consuming .
· Benchmark performance index extraction statistics ：hdfs jmx indicators hbase indicators ,hive Thread network traffic analysis , Deep optimization , Improve the performance of the cluster .
· Platform upgrade / Safety reinforcement ： Platform from cdh5.9 Upgrade to cdh5.14 Plan confirmation and implementation ; Key components of the platform web Interface security reinforcement scheme design ;
namenode Metadata Migration Scheme ; Data skew situation rectification plan .
· The cluster expansion ： Reduce the impact of cluster horizontal expansion on business design .
· Cluster performance tuning ：hbase Table data , Table distribution , Number of tables , Carry out rectification , Reduce for a long time rit Incidence .
The optimization effect
· The timeliness of key data collection has been improved 15% ;
· There is no major accident caused by malicious attack in the cluster ;
· Cluster and component health are significantly improved ;
· The production accident caused by operation error is zero ;
· The stability of clusters has been improved significantly .
Bimao technology as a professional big data solution service provider , Committed to the national digital transformation strategy . Under the leadership of Shanghai Institute of computing technology, Chinese Academy of Sciences , Introduce foreign advanced technology and industry solutions . And Cloudera、 Huawei 、 Star ring and other manufacturers cooperate closely . To provide customers with from the system architecture 、 From data governance to personnel training , Full life cycle technical support , Build more stable 、 More efficient 、 More secure enterprise level big data platform .