编程知识 cdmana.com

Elastic search uses AWS cloud es service to analyze program logs

Recently, the company's system has been upgraded , There are some API The call interface reaches the daily level 10 Tens of thousands of requests . The current company Log , It's all written in the file . In order to better analyze these log data , The company adopted AWS Of ElasticSearch Service to analyze logs . This article records how to use AWS Upper ElasticSearch, And you need to pay attention to the pits .

 

1. Preparation conditions

1. Apply for registration AWS Account number of ( Register AWS I need a credit card !)

2. Open ElasticSearch Service ( This section will be described in detail later in this article ),ES Chinese introduction of the service . at present ES It's not completely free , before 12 Months , Monthly  750 Hours t2.small.elasticsearch or t3.small.elasticsearch Example usage time and monthly 10GB Optional EBS Storage capacity ( Magnetic or universal ). About ES The latest information about free service , Please move AWS ElasticSearch Free Tier. 

It's opening ES In the process of serving , If you just want to use free services , Make sure you choose the right one Example type , Disk capacity and type And so on . It's suggested that we open ES Before the service , Take a closer look at AWS Free Tier.

3. be familiar with AWS Of ElasticSearch Development documents , The document is only available in English at present .

 

2. establish ElasticSearch Service

The author just according to the current (2021.1.26)AWS Free of charge ES Service to introduce the establishment of ES The process of service , Be familiar with before you start AWS The latest free price information AWS Free Tier.  and  ES Chinese introduction of the service .

Be careful :Elasticsearch This article uses 7.9 Of , In the later setting process , and Data upload property matching (mapping) in , There will be slight differences between different versions .

1. Select the deployment type , Here's the choice Development and testing

2. Select data node , at present t3.small.elasticsearch and t2.small.elasticsearch It's all free ,t2 There is no need to verify the data later t3 convenient , Here's the choice t3.small.elasticsearch For example .

3. Network configuration , Choice Public access rights

4. Start Fine access control , And choose Create primary user , Fill in the user name and password , Here's the user name and password , In the back Data intake and kibana Verification of We will use

5. Next, ignore  SAML authentication and  Amazon Cognito Authentication. 

 

 

6. Access strategy , To simplify the following steps , Here's the choice Allow public access to the domain .

 

3. Data intake

  Data intake ( That's to pass data into ES In service ) There are many ways ,ES Data intake is based on REST API The way , So as long as it can transmit HTTP REST Ask for , Can complete the data intake process .

The official document also gives a detailed introduction to the data intake

  • Elasticsearch How to use command line tools curl Take data .
  • Elasticsearch Sample code for data ingestion (Java, Python, Go, Ruby, Node)
  • Elasticsearch From Amazon Other products are imported into (From Amazon S3,From Amazon Kinesis Data Streams,From Amazon DynamoDB,From Amazon Kinesis Data Firehose,From Amazon CloudWatch, From AWS IoT)
  • Elasticsearch Use Open source framework Logstash Take information . If you want to know more about Logstash, Please move Get Started With Logstash. 

Open source framework Logstash It's the most widely used data intake framework at present , Use ES + Logstash + Filebeat + Kibana The collocation function is very powerful . In the next article , I'll introduce Logstash. This article use first Python The code uploads the log data directly , Let's talk about the limitations of direct upload .

 

Log files :log.txt

185.220.70.83    2016-07-15 15:29:50+0800    GET    200    /index.php    50.1049,8.6295124.200.101.56    2016-07-16 15:29:50+0800    POST    200    /register.php    39.9285,116.385104.233.154.203    2016-07-17 15:29:50+0800    POST    404    /login.php    37.751,-97.822104.233.154.203    2016-07-17 15:29:50+0800    POST    404    /API.php    37.751,-97.822104.233.154.203    2016-07-18 15:29:50+0800    POST    200    /API.php    37.751,-97.82243.251.227.108    2016-07-19 15:29:50+0800    POST    200    /index.php    22.2578,114.1657

This log is very simple , The data in each row is represented by Tab Split it up , They are IP, Time , Access methods , Status code , Access path , and coordinates . I'm here , Explain a little bit about IP And coordinates , The coordinates are IP The coordinates of the address . Some kids may have questions , Why did my journal record IP Address , And record its coordinates , Don't you ElasticSearch You can't put IP Does the address translate into coordinates ?  answer : Use ES Medium geo-plugin It's possible to IP Converted to a coordinate address , however AWS Of ES The plug-in is not installed . Readers can check it before the test AWS ES What are the supported operations , Hope in the future AWS You can add this plug-in .Plugins by Elasticsearch Version

Because of the current AWS No direct basis for IP Generate coordinate information , That's why the author provided additional coordinate information in the log , These coordinates are based on maxmind The result of the inquiry is .MaxMind Offer free IP Coordinate data file , And rich Demo, It's fast and convenient to use . 

General logs don't record coordinate information , Readers can take advantage of maxmind Provided IP Ku , Find out the corresponding coordinate information before the program uploads data . In addition to the way the program is uploaded , Readers can also use logstash frame .

Before we officially upload the data , We need to tell you in advance ES The data type we want to specify , Common data types ( such as :String, Integer, Decimal) You don't have to specify , But like the time type , Coordinate type , and IP Type information needs to be told first ES Service , In this way, the data can be correctly parsed when it is uploaded later . This command only needs to be executed once before uploading data , Here curl To execute this command .

curl -XPUT -u 'username:password' 'https://end_point/logs' -H 'Content-Type: application/json' -d '{    "mappings": {
"properties": { "location": { "type": "geo_point" }, "datetime": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ssZ" }, "ip_addr":{ "type": "ip" } } }}'

above  -u 'username:password'  The name and password of the user , That is, the primary user and password created by enabling fine access control . 'https://end_point/logs'  in end_point For ES The terminal address of the service , You can view it in the console , Add... After the terminal address Index Value , Here we use logs. Properties in the file  location , datetime , and  ip_addr   Respectively designated as  geo_point, geo_point, and  ip Type , These attributes will be used when uploading data below .

Python Archives :update.py

#  Pass... Before execution pip Install elasticsearch, requests-aws4auth,requestsfrom elasticsearch import Elasticsearch, RequestsHttpConnectionfrom requests_aws4auth import AWS4Authimport jsonhost = 'end_point' #  Service terminal HOST, Not included HTTPS The first part , such as :my-test-domain.us-east-1.es.amazonaws.com# username and password The user name and password created for the previous enable fine user awsauth = ('username', 'password')es = Elasticsearch(    hosts = [{'host': host, 'port': 443}],    http_auth = awsauth,    use_ssl = True,    verify_certs = True,    connection_class = RequestsHttpConnection)bulk_file = ''id = 1#  Turn on logs.txt Archives , Index data file = open("logs.txt","r")for line in file:    ip = line.split("\t")[0]    datetime = line.split("\t")[1]        method = line.split("\t")[2]    responsecode = line.split("\t")[3]    path = line.split("\t")[4]    geoloc = line.split("\t")[5].rstrip()     # ip_addr: ip Type ,datetime: date Type ,location: geo_point Type     index = { 'ip_addr': ip, 'datetime': datetime, 'method': method,'responsecode':responsecode,'path':path,'location':geoloc }    bulk_file += '{ "index" : { "_index" : "logs", "_type" : "_doc", "_id" : "' + str(id) + '" } }\n'    bulk_file += json.dumps(index) + '\n'    id += 1# Batch upload data res = es.bulk(bulk_file)print(res)

Execute the instruction code , After seeing the successful upload information , Indicates that the upload was successful .

Be careful : This article uses Elasticsearch7.9, The rest Elasticsearch edition ,Mappings And the uploaded data may be different in format

4. Visit Kibana

After the data is uploaded successfully , The next step is visual analysis . Click on... In the console Kibana Manage interface links , After entering the user name and password , Successful entry into Kibana Management interface .

 

establish Discovery and Visualization Components , I've built four here Visualization, One is based on datetime Time bar chart established , One is based on location The geographic map that we built , One basis responsecode Create a pie chart , and One basis method Create a pie chart . And then build another Dashboard, You can put these Visualiazation and Discovery Put it together . These operations are carried out through UI Click to complete , Here we will not show them one by one , Finally, let's show the results Dashboard Graph .

Now you can see the traffic through the map , The number of visits in a certain period of time , Interfaces and traffic wait . You can also go through Kibana Build a more complex view , Help you mine and analyze logs .

&n

版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/01/20210131162845292V.html

Scroll to Top