Front end monitoring includes behavior monitoring 、 Abnormal monitoring 、 Performance monitoring, etc , This paper mainly discusses abnormal monitoring . For the front end , In the same monitoring system as the back end , The front end has its own monitoring scheme , Back end also has its own monitoring scheme , But the two are not separate , Because if a user has an exception during the operation of the application , It could be the front end , It could also be the back end , There needs to be a mechanism , Connect the front and rear ends in series , Unify the monitoring system with the monitoring system . therefore , Even if only front-end exception monitoring is discussed , In fact, we can't strictly distinguish between the front and back end , According to the actual system design , In the final report, it shows the help of monitoring for development and business .
generally speaking , A monitoring system , It can be roughly divided into four stages ： Log collection 、 The logging stored 、 Statistics and analysis 、 Reports and warnings .
Acquisition phase ： Collect exception logs , Do some local treatment first , Take a certain plan to report to the server .
Storage phase ： The backend receives the exception log reported by the front end , After some treatment , Store according to a certain storage scheme .
analysis phase ： It can be divided into automatic analysis and manual analysis . Automatic machine analysis , Through preset conditions and algorithms , Statistics and filtering of stored log information , Find the problem , Trigger alarm . Artificial analysis , By providing a visual data panel , Let the system user see the specific log data , According to the information , Find out the root cause of abnormal problems .
Alarm phase ： It can be divided into warning and early warning . Alarm according to a certain level of automatic alarm , Through established channels , Follow certain trigger rules . The early warning is before the abnormality , Prejudge in advance , Give a warning .
1 Front end exception
Front end exception refers to the use of Web The application can not get the expected results quickly , Different anomalies have different consequences , If it is light, it will cause user's displeasure , In particular, the product cannot be used , Make the user lose the recognition of the product .
1.1 Front end exception classification
Depending on the extent of the consequences of the exception code , The performance of front-end exceptions can be divided into the following categories
The content presented by the interface does not match the content expected by the user , For example, click to enter the non target interface , The data is not accurate , The error prompt is incomprehensible , Interface dislocation , Jump to the error interface after submitting . When such exceptions occur , Although the product itself can function normally , But users can't achieve their goals .
There is no reaction after the interface operation , For example, click the button can not submit , It indicates that the operation cannot be continued after success . When such exceptions occur , The product already has interface level local unavailability .
The interface fails to achieve the operation purpose , For example, click to enter the target interface , Click to view details, etc . When such exceptions occur , Some functions of the application cannot be used normally .
d. Feign death
The interface appears stuck , The inability to use any function . For example, the user cannot log in, resulting in the inability to use the in app functions , Because a mask layer is blocked and cannot be closed, no subsequent operation can be performed . When such exceptions occur , The user is likely to kill the app .
The application often exits automatically or fails to operate . For example, intermittently crash, The web page can't load normally or can't do anything after loading . This kind of abnormality continues to appear , It will directly lead to the loss of users , Affect product vitality .
1.2 Abnormal error cause classification
The main causes of front-end abnormalities are 5 class ：
|Logic error||1) Wrong business logic judgment condition
2) Wrong event binding order
3) Call stack timing error
4) Wrong operation js object
|Wrong data type||1) take null Read as object property
2) take undefined Traverse as an array
3) Use numbers in string form directly for addition
4) Function parameter not passed
|Grammatical and syntactic errors||Less|
|Network error||1) slow
2) The server does not return data but still 200, The front end traverses the data as normal
3) Network outage when submitting data
4) Server side 500 The front end does not do any error handling when there is an error
|Once in a while|
|System error||1) Out of memory
2) The disk is full
3) The shell doesn't support API
4) Are not compatible
2 Abnormal collection
2.1 Collect content
When an exception occurs , We need to know the specific information of the exception , Determine the solution based on the specific information of the exception . When collecting abnormal information , Can follow 4W principle ：
WHO did WHAT and get WHICH exception in WHICH environment?
a. User information
Information of the user in case of exception , For example, the status of the user at the current time 、 Authority, etc , As well as the need to distinguish when users can log on to multiple terminals , Which terminal does the exception correspond to .
b. Behavioral information
An exception occurred when the user did something ： The interface path ; What was done ; What data is used in the operation ; At that time API Spit out what data to the client ; If it's a submit operation , What data was submitted ; Last path ; Last behavior log ID etc. .
c. Abnormal information
Generate abnormal code information ： User operated DOM Element nodes ; Exception level ; Exception types ; Description of the exception ; Code stack Information, etc .
d. environmental information
The network environment ; Equipment model and identification code ; Operating system version ; Client version ;API Interface version, etc .
|requestId||String||An interface produces an requestId|
|traceId||String||One stage produces one traceId, Used to track all logging related to an exception|
|hash||String||This article log The unique identification code of , amount to logId, But it is generated based on the specific content of the current logging|
|time||Number||Current log generation time （ Save time ）|
|userStatus||Number||at that time , User status information （ Is it available / Ban ）|
|userRoles||Array||at that time , List of roles of former users|
|userGroups||Array||at that time , User's current group , Group permissions can affect results|
|userLicenses||Array||at that time , license , It may expire|
|path||String||Location path ,URL|
|action||String||What operation has been carried out|
|referer||String||Last path , source URL|
|data||Object||Of the current interface state、data|
|dataSources||Array<Object>||The upstream api What data was given|
|dataSend||Object||What data was submitted|
|targetElement||HTMLElement||User operated DOM Elements|
|targetDOMPath||Array<HTMLElement>||The DOM The node path of the element|
|targetCSS||Object||Custom stylesheet for this element|
|targetAttrs||Object||The element's current properties and values|
|errorStack||String||error stack Information|
|errorColNo||Number||Error column position|
|errorMessage||String||Error description （ Developer definition ）|
|pageX||Number||event x Axis coordinates|
|pageY||Number||event y Axis coordinates|
|screenX||Number||event x Axis coordinates|
|screenY||Number||event y Axis coordinates|
|eventKey||String||The key that triggers the event|
|network||String||Network environment description|
|system||String||Operating system description|
This is a very large log field table , It almost includes the occurrence of an exception , All the information that can describe the environment around the exception in detail . In different cases , These fields are not always collected , Because we will use document database to store logs , therefore , Does not affect its actual storage results .
2.2 Exception trapping
The front-end capture exception is divided into global capture and single point capture . Global capture code set , Easier to manage ; Single point capture as a supplement , Capture in some special cases , But scattered , Not conducive to management .
a、 Global capture
Through the global interface , Write the capture code in one place , The available interfaces are ：
- window.addEventListener(‘error’) / window.addEventListener(“unhandledrejection”) / document.addEventListener(‘click’) etc.
- Framework level global monitoring , for example aixos Use in interceptor To intercept ,vue、react All have their own error collection interface
- By encapsulating global functions , The implementation automatically catches exceptions when the function is called
- Rewrite instance methods （Patch）, Wrap a layer around the original function , For example console.error Rewrite , You can also catch exceptions with the same method of use
b、 Single point capture
Wrap individual code blocks in business code , Or in the logical process , Implement targeted exception capture ：
- Write a function to collect exception information , When an exception occurs , Call this function
- Write a function to wrap other functions , Get a new function , The new function runs as like as two peas , Just catch exceptions when they happen
2.3 Cross domain script exception
Due to browser security policy restrictions , Cross domain script error , Unable to get error details directly , You can only get one Script Error. for example , We will introduce third-party dependency , Or put your own script in CDN when .
solve Script Error Methods ：
Scheme 1 ：
- take js Inline to HTML in
- take js File with the HTML Under the same domain
Option two ：
- For page script add crossorigin attribute
- The script is introduced into the server response header , increase Access-Control-Allow-Origin To support cross domain resource sharing
2.4 Abnormal recording
For an exception , Just having this abnormal information is not enough to grasp the essence of the problem , Because of where the anomaly happened , It's not necessarily where the source of the anomaly lies . We need to restore the abnormal site , To recover the whole picture of the problem , Even avoid similar problems in other interfaces . Here we need to introduce a concept , Namely “ Abnormal recording ”. The recording passed “ Time ”“ Space ” Two dimensions record the whole process from before to after the exception occurs , It's more helpful to find the source of the anomaly .
The picture above shows , When an exception occurs , The source of the anomaly may be far from us , We need to go back to the scene of the anomaly , Find the source of the anomaly . Just like solving a case in real life , If there's a surveillance camera recording the crime , It's easier to solve the case . If you just focus on the exception itself , To find the source of the anomaly , With luck , But with the help of abnormal recording , It's easier to find the root cause .
So-called “ Abnormal recording ”, In fact, through technical means , Collect user's operation process , Record every operation of the user , When an exception occurs , Rerun the records in a certain period of time , Form an image to play , Let the debugger not have to ask the user , You can see the operation process of the user at that time .
The above figure is a schematic diagram of an abnormal recording and restore scheme from Ali , The user's operation on the interface produces events and mutation Collected by products , Upload to server , It is stored in the database by queue processing . When it's necessary to reproduce the anomaly , Take these records out of the database , Adopt certain technical scheme , Play these records in sequence , You can restore the exception .
2.5 Exception level
generally speaking , We will divide the level of information collected into info,warn,error etc. , And on this basis to expand .
When we detect an exception , You can divide this exception into “ important —— emergency ” The model is divided into A、B、C、D Four levels . There's something unusual , Although it happened , But it does not affect the normal use of users , Users don't actually perceive , Although in theory it should be fixed , But in fact, compared with other exceptions , It can be put in the back for processing .
Alarm strategies are discussed below , generally speaking , The closer you get to the upper right corner, the faster the exception will be notified , Ensure that relevant personnel can receive information as soon as possible , And process it .A Level exception requires quick response , It even needs the relevant person in charge to know .
In the collection exception phase , The severity of the anomaly can be determined according to the consequence of the anomaly divided in Section 1 , In case of exception, select the corresponding report scheme to report .
3 Sort out and report the plan
As mentioned earlier , Except for the error message itself , We also need to record user operation logs , To restore the scene . This involves the amount and frequency of reporting . If any logs are reported immediately , It's nothing more than self-made DDOS attack . therefore , We need a reasonable reporting plan . As described below 4 Kinds of reporting schemes , But in reality we are not limited to one of them , It's often used at the same time , Choose different reporting schemes for different levels of logs .
3.1 Front end storage logs
We mentioned earlier , We don't just log the exception itself , It also collects user behavior logs related to exceptions . A single exception log does not help us quickly locate the root cause of the problem , Find a solution . But if you want to collect user behavior logs , But also to take certain skills , But not after every operation , Immediately send the behavior log to the server , For applications with a large number of users online at the same time , If the user uploads the log as soon as they operate , It is the same as the log server DDOS attack . therefore , Let's first store these logs on the local user client , When certain conditions are met , And then upload a group of logs at the same time .
that , How to store front-end logs ？ We can't directly save these logs with a variable , This will burst the memory , And once the user does a refresh operation , These logs are lost , therefore , We naturally come up with a front-end data persistence solution .
at present , There are many options available for persistence , There are mainly ：Cookie、localStorage、sessionStorage、IndexedDB、webSQL 、FileSystem wait . So how to choose ？ Let's compare it with a table ：
|performance||Fast reading and slow writing||Read slowly and write fast|
After synthesis ,IndexedDB Is the best choice , It has a lot of capacity 、 The advantage of asynchrony , Asynchronous features ensure that it does not block the rendering of the interface . and IndexedDB It's a sub Treasury , Each library is divided into store, You can also query by index , Have a complete database management thinking , Than localStorage More suitable for structured data management . But it has one drawback , Namely api Very complicated , Unlike localStorage So simple and direct . For that , We can use hello-indexeddb This tool , It USES Promise For complexity api To encapsulate , Simplified operation , send IndexedDB Can also be used to localStorage Just as convenient . in addition ,IndexedDB Is widely supported HTML5 standard , Compatible with most browsers , So don't worry about its future .
Next , How should we use it reasonably IndexedDB, To ensure the rationality of our front-end storage ？
The figure above shows the front-end storage log process and database layout . When an event 、 change 、 After the exception is caught , Form an initial log , Is immediately put in the staging area （indexedDB One of the store）, Then the main program ends the collection process , The next thing is just webworker Occur in the . In a webworker in , A circular task constantly fetches logs from the staging area , Classify the logs , Store the classification results in the index area , And enrich the information recorded in the log , Finally, the log records that will be reported to the server will be transferred to the archive area . When a log exists in the archive for more than a certain number of days , It has no value , But to prevent special circumstances , It's transferred to the recycling area , After another period of time , It will be removed from the recycling area .
3.2 The front end organizes the logs
As mentioned above , In a webworker After the logs are sorted out, they are stored in the index area and archive area , So what's the finishing process like ？
Because of the report we will talk about later , According to the index , therefore , We are in front of the log collation work , Mainly according to the log characteristics , Sort out different indexes . When we collect logs , Every log will be marked with a type, This is used for classification , And create indexes , At the same time through object-hashcode Calculate each log Object's hash value , As this log Unique logo for .
- Keep all the log records in the filing area in time , And add the new log into the index
- BatchIndexes： Batch reporting index （ Include performance and other logs ）, It can be submitted in batch at one time 100 strip
- MomentIndexes： Immediate reporting index , Report all at once
- FeedbackIndexes： User feedback index , Report one at a time
- BlockIndexes： Block escalation index , Press exception / error （traceId,requestId） Block , Report one piece at a time
- When reporting is complete , Delete the index corresponding to the submitted log
- 3 More than days log into the recycle area
- 7 More than days of logs are removed from the recycle area
rquestId： At the same time, track the front and back logs . Because the back end will also record its own logs , therefore , Request... At the front end api When , Default with requestId, The logs recorded at the back end can correspond to the logs recorded at the front end .
traceId： Track a log before and after an exception . When the application starts , Create a traceId, Until an exception happens , Refresh traceId. Put one traceId dependent requestId Gather up , Put these requestId Related logs combine , All logs related to this exception finally , It is used to make a copy of the exception .
The example above shows how to use traceId and requestId Find all logs associated with an exception . In the diagram above ,hash4 It's an exception log , We find hash4 Corresponding traceId by traceId2, In the log list , There are two records with the traceId, however hash3 This record is not the beginning of an action , because hash3 Corresponding requestId by reqId2, and reqId2 It begins with hash2, therefore , We are actually going to put hash2 It is also added to the whole backup record of the duplicate disk where the exception occurs . In summary , We need to find the same traceId All corresponding requestId Corresponding log records , Although it's a little winding , But a little understanding can understand the truth .
Let's put together all the logs related to an exception , Is called a block, Use the log again hash aggregate , Get this block Of hash, And build the index in the index area , Waiting to report .
3.3 Reporting log
The report log is also in webworker In the middle of , In order to distinguish from sorting out , It can be divided into two worker. The process of reporting is roughly ： In every cycle , Take the index of the corresponding number of entries from the index area , Through... In the index hash, Go to the archive area and take out the complete log record , Upload to the server .
According to the frequency of reporting （ Critical urgency ） There are four types of reporting ：
a. Report immediately
After collecting the logs , Trigger escalation function immediately . Only used for A Class exception . And because of the uncertainty of the network ,A Class log reporting needs a confirmation mechanism , Only after confirming that the server has successfully received the reported information , In order to be complete . Otherwise there needs to be a looping mechanism , Make sure the escalation is successful .
b. Batch reporting
Store the collected logs locally , When a certain amount is collected, it can be packaged and reported at one time , Or at a certain frequency （ The time interval ） Package and upload . This is equivalent to combining multiple reports into one report , To reduce the pressure on the server .
c. Block reporting
Package an abnormal scenario into a block and report it . It's different from batch reporting , Batch reporting ensures the integrity of the log , overall , But there will be useless information . Block reporting is aimed at the exception itself , Make sure that all logs related to a single exception are reported .
d. Users actively submit
Provide a button on the interface , User active feedback bug. This is conducive to enhancing interaction with users .
Or when an exception occurs , Although there is no impact on users , But the app monitors , Pop up a prompt box , Let users choose whether they are willing to upload logs . This scheme is suitable for the privacy data of users .
|Report immediately||Batch reporting||Block reporting||User feedback|
|Prescription||immediately||timing||A little delay||Time delay|
|Number of pieces||Report all at once||once 100 strip||Single report related items||once 1 strip|
|emergency||It's urgent||No urgent||Not urgent but important||No urgent|
Instant report is called instant report , But in fact, it is also accomplished through a circular task similar to the queue , It is mainly to submit some important exceptions to the monitoring system as soon as possible , So that the operation and maintenance personnel can find problems , therefore , It corresponds to a higher degree of urgency .
The difference between batch reporting and block reporting ： Batch reporting is to report a certain number of items at a time , Like every 2 Report in minutes 1000 strip , Until the report is completed . The block report is after the exception , Collect all the logs related to the exception immediately , Find out which logs have been submitted in batch , Remove , Upload other related logs , These logs related to exceptions are relatively more important , They can help to recover the abnormal scene as soon as possible , Find out the cause of the abnormality .
Feedback from users , You can report it slowly .
To make sure the escalation is successful , There needs to be a confirmation mechanism when reporting , Because after the server receives the report log , Not immediately stored in the database , It's in a queue , therefore , The front end and back end need to do some more processing to ensure that the logs have been recorded in the database .
The figure above shows a general process of reporting , When reporting , Through the first hash Inquire about , Let the client know the collection of logs to be reported , Whether there is a log that has been saved by the server , If it already exists , Just remove these logs , Avoid double reporting , Waste traffic .
3.4 Compress reported data
When uploading batch data at one time , There must be a lot of data , Waste traffic , Or slow transmission , When the network is not good , May lead to reporting failure . therefore , Data compression before reporting is also a solution .
In the case of consolidated reporting , The amount of data may be more than ten at a time k, For Japan pv For big sites , The traffic generated is still considerable . So it is necessary to compress and report the data .lz-string Is a very good string compression class library , Compatibility is good. , Less code , High compression ratio , Short compression time , The compression rate is amazing 60%. But it is based on LZ78 Compress , If the backend does not support decompression , Can choose gzip Compress , Generally speaking, the backend will be pre installed by default gzip, therefore , choice gzip Compressed data can also , tool kit pako In the gzip Compress , You can try to use .
4 Log receiving and storage
4.1 Access layer and message queue
Generally, the client logs are received by providing an independent log server , During reception , To log the content of the client legitimacy 、 Security, etc , To prevent being attacked . And because log submission is usually more frequent , Concurrent multiple clients are also common . It is also a common scheme to write the log information to the database after processing it one by one through the message queue .
The picture above shows Tencent BetterJS The architecture of the figure , among “ Access layer ” and “ Push Center ” This is the access layer and message queue mentioned here .BetterJS Split the whole front-end monitoring modules , The push center is responsible for pushing logs to the storage center for storage and other systems （ For example, alarm system ） Role , But we can see the queue of receiving log stage independently , Make a transition between the access layer and the storage layer .
4.2 Log storage system
Storing logs is a dirty job , But I have to do . For small applications , Single database single table plus optimization can deal with . A scale application , If we want to provide a more standard and efficient log monitoring service , It is often necessary to work on the log storage architecture . At present, the industry has a relatively complete log storage scheme , There are mainly ：Hbase system ,Dremel system ,Lucene Department etc. . Overall speaking , The main problem of log storage system is the large amount of data , The data structure is irregular , High write concurrency , Great demand for inquiry . Generally a set of log storage system , To solve these problems , We need to solve the write buffer , The storage media is selected according to the log time , Design reasonable index system for convenient and fast reading and so on .
Because the log storage system is relatively mature , No more discussion here .
4.3 Search for
The ultimate goal of the log is to use , Because the volume of general logs is very large , therefore , To find the required log records in the huge data , Need to rely on a better search engine .Splunk Is a set of mature log storage system , But it's paid to use . according to Splunk Framework ,Elk yes Splunk Open source implementation ,Elk yes ElasticSearch、Logstash、Kibana The combination of ,ES be based on Lucene The storage 、 Index search engine ;logstash Is to provide input and output and conversion processing plug-in log standardization pipeline ;Kibana Provide a user interface for visualization and query statistics .
5 Log statistics and Analysis
A complete log statistical analysis tool needs to provide convenient panels in all aspects , Give feedback to loggers and developers in a visual way .
5.1 User latitude
Different requests from the same user actually form different story Line , therefore , A series of operations designed for users are unique request id It's necessary . When the same user operates at different terminals , It can also differentiate . The state of a user in an operation 、 Permissions and other information , It also needs to be reflected in the log system .
5.2 Time dimension
How does an exception happen , Before and after the abnormal operation story The lines are connected in series to observe . It's not just about one user operation , Not even limited to one page , It's the end result of a series of events .
5.3 Performance dimensions
Performance during application operation , for example , Interface loading time ,api Request duration Statistics , Consumption of unit calculation , User dead time .
5.4 Running environment dimension
Environment of applications and services , For example, the network environment where the application is located , operating system , Equipment hardware information, etc , The server cpu、 Memory condition , The Internet 、 Broadband usage, etc .
5.4 Fine grained code tracing
Abnormal code stack Information , Locate the code location and exception stack where the exception occurred .
5.6 Back to the scene
By connecting the exception related user logs , Output abnormal process with dynamic effect .
6 Monitoring and notification
Statistics and analysis of anomalies are just the basis , In case of any abnormality, it can push and alarm , Even automatic processing , It is the ability that an exception monitoring system should have .
6.1 Alarm of custom trigger condition
a. Monitoring implementation
When the log information enters the access layer , You can trigger the monitoring logic . When there is a higher level exception in the log information , You can also start warning immediately . The alarm message queue and log entry queue can be managed separately , To achieve parallelism .
Make statistics on the warehousing log information , Alarm the abnormal information . Respond to monitoring exceptions . So called abnormal monitoring , Refer to ： Regular exceptions are generally more reassuring , What's more troublesome is the sudden abnormality . For example, in a certain period of time suddenly and frequently received D Level exception , although D Level 1 exception is not urgent , But when the monitoring itself is abnormal , We must be vigilant .
b. Custom triggers
In addition to the default alarm conditions configured during system development , It should also provide the log administrator with configurable custom trigger conditions .
- When there is something in the log
- What is the degree of log statistics 、 Quantitative time
- Alert the users who meet the conditions
6.2 Push channels
There are many options , E.g. mail 、 SMS 、 WeChat 、 Telephone .
6.3 Push frequency
For different levels of alarms , Push frequency can also be set . Low risk alarms can be pushed once a day in the form of reports , High risk alarm 10 Minute loop push , Until the handler manually turns off the alarm switch .
6.4 Automatic report
For the push of log statistics , It can automatically generate daily reports 、 weekly 、 Monthly report 、 Annual report and email to related groups .
6.5 Automatically generate bug The repair order
When an exception occurs , The system can call the work order system API Realize automatic generation bug single , After the work order is closed, it is fed back to the monitoring system , Record the tracking information of exception handling , Show it in the report .
7 Fix the exception
Most of the front-end code is compressed and released , Reported stack Information needs to be restored to source information , In order to quickly locate the source code for modification .
When it was released , Just deploy js Script to server , take sourcemap File upload to monitoring system , Show... In the monitoring system stack Information , utilize sourcemap File pair stack Information decoding , Get the specific information in the source code .
But there is a problem , Namely sourcemap It must correspond to the version of the formal environment , Must also be with git One of the commit The node corresponding to the , Only in this way can we make sure that we can make use of stack Information , Find the wrong version of the code . These can be created by CI Mission , Add a deployment process to the integrated deployment , To achieve this .
7.2 From alarm to early warning
The essence of early warning is , Presuppose the conditions under which exceptions may occur , The exception did not actually occur when the condition was triggered , therefore , You can check the user behavior before the exception occurs , In a timely manner to repair , Avoid abnormal or abnormal expansion .
How to do it? ？ In fact, it is a process of statistical clustering . Make statistics of abnormal situations in history , From time 、 regional 、 Users and other different dimensions , Find out the rules , And these laws are automatically added to the early warning conditions through the algorithm , When next triggered , Timely warning .
7.3 Smart fix
Automatically fix errors . for example , The front end requires the interface to return a value , But the interface returns a numeric string , Then there can be a mechanism , The monitoring system sends the correct data type model to the back end , When the back end returns data , Control the type of each field according to the model .
8 Abnormal test
8.1 Active exception testing
Write exception use cases , In an automated test system , Add exception test users . In the course of testing or running , Every time an exception is found , Add it to the original exception use case list .
8.2 Random anomaly test
Simulate the real environment , Simulate the random operation of real users in the simulator , Use automation script to generate random operation action code , And implement .
Define exceptions , For example, pop up a pop-up box , When specific content is included , It's the exception . Record these test results , Then cluster statistical analysis , It's also very helpful for defending against anomalies .
9.1 Multiple clients
A user logs in at different terminals , Or the state of a user before and after login . Generate... By a specific algorithm requestID, Through the requestId You can determine a user's sequence of operations on a stand-alone client , According to the log sequence , It can sort out the specific path for users to generate exceptions .
9.2 Integration convenience
The front end is written as a package , Global reference can complete most of the logging 、 Storage and escalation . In special logic , You can call specific methods to log .
The backend is decoupled from the business code of the application itself , It can be an independent service , Interact with third-party applications through interfaces . Deploy with integration , The system can be expanded at any time 、 Transplant, etc .
9.3 Extensibility of management system
The whole system can be extended , It's not just a service sheet application , It can support multiple applications running at the same time . All applications under the same team can be managed using the same platform .
9.4 Log system permissions
Different people have different permissions when accessing the log system , A visitor can only view their own related applications , If some statistics are sensitive , You can set permissions separately , Sensitive data can be desensitized .
10.1 Performance monitoring
Exception monitoring is mainly for code level error reporting , But you should also focus on performance exceptions . Performance monitoring mainly includes ：
- Runtime performance ： File level 、 Module level 、 Function level 、 Algorithm level
- Network request rate
- system performance
10.2 API Monitor
Back end API The impact on the front end is also very big , Although the front-end code also controls the logic , But the data returned by the back end is the basis , So right. API The monitoring of can be divided into ：
- Stability monitoring
- Data format and type
- Error reporting monitoring
- Data accuracy monitoring
10.3 Data desensitization
Sensitive data is not collected by the log system . Because the storage of the log system is relatively open , Although the data is very important , But most log systems are not classified in storage , therefore , If the application involves sensitive data , Better do ：
- Independent deployment , Don't share the monitoring system with other applications
- Do not collect specific data , Collect user operation data only , In the reappearance , Data can be retrieved through log information api Results to show
- Log encryption , Achieve encryption protection at the software and hardware level
- When necessary, , Can collect specific data ID For debugging , Scene reproduction , use mock Data substitution ,mock Data can be generated by the back end using a fake data source
- Confusing sensitive data
This paper mainly studies the whole framework of front-end anomaly monitoring , There is no specific technical implementation involved , Involving the front-end part and the back-end part as well as some knowledge points related to the whole problem , Focus on the front end , It overlaps and branches with the back-end monitoring , It needs to be practiced in a project , Summarize the monitoring requirements and Strategies of the project .
Thanks for reading , This article from the Tencent CDC, Please indicate the source when reprinting , Thank You for Your Cooperation .