编程知识 cdmana.com

Principle of elasticsearch (6): in depth understanding of elasticsearch storage

In this paper, we learn more about Elasticsearch Storage , As we write Elasticsearch How is the data stored on the node .

Elasticsearch The path of

Elasticsearch There are mainly the following paths :

  • path.home: function Elasticsearch Process, use, etc home Catalog , The default is Java System attribute user.dir
  • path.conf:Elasticsearch Configuration file directory
  • path.plugins:Elasticsearch Directory to install third party plug-ins
  • path.work:Elasticsearch A directory where work and temporary files are stored , Now it's abandoned
  • path.logs: Deposit Elasticsearch Log directory
  • path.data: Deposit Elasticsearch Data directory

In this paper, we study in detail path.data Directory storage structure .

path.data Storage details

because Elasticsearch The bottom layer is based on Lucene Of , therefore path.data The index file is mainly composed of Lucene produce .Elasticsearch And Lucene Each has its own division of labor .Lucene Responsible for compiling and maintaining index files ,Elasticsearch It is in Lucene Maintain metadata information on the basis of , such as Mapping And cluster status, etc . some Lucene What can't be done is done by Elasticsearch To make up for .

Reference resources :Elasticsearch principle ( Two ): Index storage

Elasticsearch Storage

Node Data

data(path.data)
└── elasticsearch
    └── nodes
      └── 0
          ├── _state
          ├── indices

          │   └── global-0.st
          └── node.lock

  • node.lock Files are used to ensure that only one data directory can be read at a time / Write a Elasticsearch example .
  • global-0.st A file is a binary file that stores the state of the cluster ,global The following number represents the version number of the cluster state , Every node is voting Master The cluster state version numbers stored in the process are not necessarily consistent , In the election Master After success, the cluster state with the maximum version number will be adopted .
  • indices At the bottom of the table of contents

In theory, these files can be modified with a specific editor , But in principle, it is not recommended to modify , It may cause data loss .
###Index Data
I can see indices The folder mainly stores index data , Here is indices The structure of the directory :

indices
└── index_id
    └── shard_id
      └── 0
      ├── _state
          └── state-0.st

  • index_id This corresponds to the unique identifier of the index ,Elasticsearch The internal is based on this unique identifier to distinguish the different indexes .
  • shard_id It is the fragment number , from 0 Began to increase
  • state-0.st A file is a file that holds the index state , Such as index creation time 、 Settings, etc . It's also binary .state The following number is the version number , It's like a cluster state file .

Shard Data

Sliced data is stored in the above mentioned shard_id in , Different fragments exist in different directories .

0
└── index
├── _state
    └── state-0.st
└── translog

      └── 0
      ├── _state
          ├── state-0.st

  • index The table of contents contains all indexes Lucene file
  • state-0.st File storage fragmentation status ,state Followed by the version number
  • translog Directory storage Elasticsearch Transaction log for

Lucene Index Files

Lucene For the index file record this aspect to do very well , So let's go into Lucene Index file directory .

Name Extension Brief Description
Segments File segments_N Stores information about a commit point
Lock File write.lock The Write lock prevents multiple IndexWriters from writing to the same file.
Segment Info .si Stores metadata about a segment
Compound File .cfs, .cfe An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles.
Fields .fnm Stores information about the fields
Field Index .fdx Contains pointers to field data
Field Data .fdt The stored fields for documents
Term Dictionary .tim The term dictionary, stores term info
Term Index .tip The index into the Term Dictionary
Frequencies .doc Contains the list of docs which contain each term along with frequency
Positions .pos Stores position information about where a term occurs in the index
Payloads .pay Stores additional per-position metadata information such as character offsets and user payloads
Norms .nvd, .nvm Encodes length and boost factors for docs and fields
Per-Document Values .dvd, .dvm Encodes additional scoring factors or other per-document information.
Term Vector Index .tvx Stores offset into the document data file
Term Vector Documents .tvd Contains information about each document that has term vectors
Term Vector Fields .tvf The field level info about term vectors
Live Documents .liv Info about what files are live

summary :

In this paper , We looked at all levels of Elasticsearch A file written to the data directory : node , Index and fragmentation levels . We have seen Lucene Where the index is stored on disk , We'd better not try to modify these documents .

more :Elasticsearch In depth understanding column
——————————————————————————————————
author : Peach blossom cherishes spring breeze
Reprint please indicate the source , Original address :
https://blog.csdn.net/xiaoyu_BD/article/details/82423749
If you feel this article is helpful , Your support is my biggest motivation for writing , thank you !

版权声明
本文为[Peach blossom cherishes spring breeze]所创,转载请带上原文链接,感谢

Scroll to Top