In this paper, we learn more about Elasticsearch Storage , As we write Elasticsearch How is the data stored on the node .
Elasticsearch There are mainly the following paths ：
- path.home： function Elasticsearch Process, use, etc home Catalog , The default is Java System attribute user.dir
- path.conf：Elasticsearch Configuration file directory
- path.plugins：Elasticsearch Directory to install third party plug-ins
- path.work：Elasticsearch A directory where work and temporary files are stored , Now it's abandoned
- path.logs： Deposit Elasticsearch Log directory
- path.data： Deposit Elasticsearch Data directory
In this paper, we study in detail path.data Directory storage structure .
because Elasticsearch The bottom layer is based on Lucene Of , therefore path.data The index file is mainly composed of Lucene produce .Elasticsearch And Lucene Each has its own division of labor .Lucene Responsible for compiling and maintaining index files ,Elasticsearch It is in Lucene Maintain metadata information on the basis of , such as Mapping And cluster status, etc . some Lucene What can't be done is done by Elasticsearch To make up for .
Reference resources ：Elasticsearch principle （ Two ）： Index storage
│ └── global-0.st
- node.lock Files are used to ensure that only one data directory can be read at a time / Write a Elasticsearch example .
- global-0.st A file is a binary file that stores the state of the cluster ,global The following number represents the version number of the cluster state , Every node is voting Master The cluster state version numbers stored in the process are not necessarily consistent , In the election Master After success, the cluster state with the maximum version number will be adopted .
- indices At the bottom of the table of contents
In theory, these files can be modified with a specific editor , But in principle, it is not recommended to modify , It may cause data loss .
I can see indices The folder mainly stores index data , Here is indices The structure of the directory ：
- index_id This corresponds to the unique identifier of the index ,Elasticsearch The internal is based on this unique identifier to distinguish the different indexes .
- shard_id It is the fragment number , from 0 Began to increase
- state-0.st A file is a file that holds the index state , Such as index creation time 、 Settings, etc . It's also binary .state The following number is the version number , It's like a cluster state file .
Sliced data is stored in the above mentioned shard_id in , Different fragments exist in different directories .
- index The table of contents contains all indexes Lucene file
- state-0.st File storage fragmentation status ,state Followed by the version number
- translog Directory storage Elasticsearch Transaction log for
Lucene For the index file record this aspect to do very well , So let's go into Lucene Index file directory .
|Segments File||segments_N||Stores information about a commit point|
|Lock File||write.lock||The Write lock prevents multiple IndexWriters from writing to the same file.|
|Segment Info||.si||Stores metadata about a segment|
|Compound File||.cfs, .cfe||An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles.|
|Fields||.fnm||Stores information about the fields|
|Field Index||.fdx||Contains pointers to field data|
|Field Data||.fdt||The stored fields for documents|
|Term Dictionary||.tim||The term dictionary, stores term info|
|Term Index||.tip||The index into the Term Dictionary|
|Frequencies||.doc||Contains the list of docs which contain each term along with frequency|
|Positions||.pos||Stores position information about where a term occurs in the index|
|Payloads||.pay||Stores additional per-position metadata information such as character offsets and user payloads|
|Norms||.nvd, .nvm||Encodes length and boost factors for docs and fields|
|Per-Document Values||.dvd, .dvm||Encodes additional scoring factors or other per-document information.|
|Term Vector Index||.tvx||Stores offset into the document data file|
|Term Vector Documents||.tvd||Contains information about each document that has term vectors|
|Term Vector Fields||.tvf||The field level info about term vectors|
|Live Documents||.liv||Info about what files are live|
In this paper , We looked at all levels of Elasticsearch A file written to the data directory ： node , Index and fragmentation levels . We have seen Lucene Where the index is stored on disk , We'd better not try to modify these documents .
more ：Elasticsearch In depth understanding column
author ： Peach blossom cherishes spring breeze
Reprint please indicate the source , Original address ：
If you feel this article is helpful , Your support is my biggest motivation for writing , thank you ！
本文为[Peach blossom cherishes spring breeze]所创，转载请带上原文链接，感谢