编程知识 cdmana.com

[Hadoop 3. X series] common file storage formats of Hadoop and use of bigdata file viewer tool (III)

Preface

Current blog Hadoop Most articles stay in Hadoop2.x Stage , This series will be based on the big data of dark horse programmers Hadoop3.x A full set of tutorial , Yes 2.x There are no new features to supplement and update , One click three times plus attention , Don't get lost next time !

Article history

[hadoop3.x series ]HDFS REST HTTP API Use ( One )WebHDFS

[hadoop3.x series ]HDFS REST HTTP API Use ( Two )HttpFS

Hadoop Common file storage formats

Common file storage formats in traditional systems

stay Windows There are many file formats , for example :JPEG Files are used to store pictures 、MP3 Files are used to store music 、DOC Files are used to store WORD file . Each file stores a certain kind of data , for example : We don't use text to store music 、 Don't use text to store pictures .windows There are many storage formats supported on .

img

File system block size

l On the server / On the computer , There are a variety of block devices (Block Device), for example : Hard disk 、CDROM、 Floppy disk, etc .

l Each file system needs to split a partition into multiple blocks , Used to store files . Different file systems have different block sizes .

[[email protected] ~]# stat -f . File: “.” ID: fd0000000000 Namelen: 255 Type: xfsBlock size: 4096 Fundamental block size: 4096Blocks: Total: 15144730 Free: 11924333 Available: 11924333Inodes: Total: 30304256 Free: 30139006

for example : We see that the block size of the file system is :4096 byte = 4KB. If we need to store on disk 5 Bytes of data , Will also occupy 4096 Byte space .

Hadoop File storage format in

Next , What we want to explain is in Hadoop Data storage format in .Hadoop File storage format on , Definitely not like Windows So rich , Because at present we use Hadoop To store 、 Processing data . We won't use Hadoop Come and listen to the song 、 See a movie 、 Or play games .J

img

l File format is a way to define the storage of data in the file system , Various data structures can be stored in files , especially Row、Map, Arrays and strings , Numbers etc. .

l stay Hadoop in , There is no default file format , The choice of format depends on its purpose . And choose an excellent 、 Suitable data storage format is very important .

l What we will learn later , Use HDFS Applications for ( for example MapReduce or Spark) The biggest problem in performance 、 The bottleneck is Time to find data in a specific location and Time to write to another location , And managing the processing and storage of large amounts of data is also complex ( for example : The format of the data is constantly changing , The original line has 12 Column , Store... In the back 20 Column ).

l Hadoop The file format has developed for a long time , These file storage formats can solve most problems . We are developing big data , Choosing the right file format may bring some obvious benefits :

  1. The speed of writing can be guaranteed

  2. It can guarantee the reading speed

  3. Files can be segmented

img

  1. Friendly to compression support

  2. Support schema Change of

l Some file formats are designed for general use ( Such as MapReduce or Spark), Other files are for more specific scenarios , Some design with specific data characteristics in mind . therefore , There are indeed many options .

Each format has advantages and disadvantages , Different formats can be used at different stages of data processing to be more efficient . By selecting a format , Maximize the advantages of this storage format , Minimize disadvantages .

BigData File Viewer Tools

Introduce

l A cross platform (Windows,MAC,Linux) Desktop applications , Used to view common big data binary formats , for example Parquet,ORC,AVRO etc. . Support local file system ,HDFS,AWS S3 etc. .

img

github Address :https://github.com/Eugene-Mark/bigdata-file-viewer

Feature list

l Open and view... In the local directory Parquet,ORC and AVRO,HDFS,AWS S3 etc. .

l Convert data in binary format to data in text format , for example CSV

l Support complex data types , For example, an array of , mapping , Structure etc.

l Support Windows,MAC and Linux And so on

Type data , for example CSV

l Support complex data types , For example, an array of , mapping , Structure etc.

l Support Windows,MAC and Linux And so on

l The code can be extended to cover other data formats

Postscript

Blog home page :https://manor.blog.csdn.net
Welcome to thumb up Collection Leaving a message. Please correct any mistakes !
This paper is written by manor original , First appeared in CSDN Blog

版权声明
本文为[Manor's big data struggle]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/10/20211002003321638a.html

Scroll to Top