编程知识 cdmana.com

Introduction to common tools for big data development

  Java Language and Linux operating system , They are the basis for learning big data .

  java: Just know the basics , There's no need to use deep Java Technology to do big data , Study java SE It's like learning big data .

  Linux: Because the software related to big data is Linux Up operation , therefore Linux You have to learn to be solid , Learn from good examples Linux For you to quickly master the technology related to big data , It can give you a better understanding of hadoop, hive, hbase, spark And other big data software running environment and network environment configuration , Can walk a lot of detours less , Learn to shell It's easier to understand and configure big data clusters . At the same time, it can also let you know more about the development of big data technology in the future .

  hadoop: This is a popular big data processing platform , It's almost synonymous with big data , So be sure to learn it . stay Hadoop It contains HDFS、MapReduce and YARN These three components , HDFS Just like the files on our computer hard disk, they are stored in these files , MapReduce Used to process data , and MapReduce To calculate data , One of its characteristics is , No matter how big the data is , Just give it time , MapReduce You can run the data , But it may not be too fast , So it's called batch processing of data .

  Zookeeper: This is a magic oil , When you install Hadoop Of HA You can use it ,Hbase You can use it later . The software is usually used to store information that works together , This information generally does not exceed 1 M, All software that uses the software depends on this , For us personally , Just install the software correctly , Make it work properly .

  mysql: We learned big data processing , And then I learned mysql Tools for database processing small data , Because it's still in use mysql, mysql How many layers do you need to master ? You can go to Linux Installation on 、 Run it , Configure simple permissions 、 modify root password 、 Create database . ad locum , We mainly study SQL The grammar of , because hive The syntax of is very similar to this .

  sqoop: This file is used from Mysql Import data to Hadoop. alike , You can use it , Direct will Mysql The data table is exported as a file and put into HDFS, Of course , Use... In a production environment Mysql Be careful when .

  Hive: This is a great fit to use SQL Grammar tools , It makes it easy for you to process large amounts of data , And there's no need to write MapReduce Program . Some people say pyg is ? Follow Pig I know almost one of them .

   Now you've learned Hive, I'm sure you need this software , It can help you manage Hive or MapReduce,Spark Script , You can also check that your program is working correctly , If something goes wrong , Send you an alert and try the program again , most important of all , It can also help you configure task dependencies . You'll love it for sure , Otherwise you'll be looking at a bunch of scripts , It's all over the place crond.

  hbase: This is a Hadoop In ecosystem NOSQL database , His data is based on key and value Form storage of , key It's the only , So it can be used for data rearrangement , And MYSQL comparison , It can store a lot of data . therefore , It's often used to store destinations after processing big data .

  Kafka: This is a better queuing tool , Why use queues ? More data also needs to be queued , for example , Hundreds of G How to deal with documents , When you put data in a queue one by one , You can take them out one by one , Of course , You can also use the tool to store or add online real-time data HDFS, At this point, you can work with a Flume Tool collaboration , This tool is designed to provide simple processing of data , And write it to various data receivers ( Such as Kafka).

  Spark: It's used to make up for based on MapReduce Lack of data processing speed of , It is characterized by loading data into memory for calculation , Instead of reading slow 、 It's going to crash 、 Hard drives that have evolved very slowly . It is especially suitable for iterative operations , The optimization of the algorithm is the core .JAVA or Scala Can manipulate it . Zhengzhou Tongji infertility hospital :http://jbk.39.net/yiyuanzaixian/zztjyy/ Zhengzhou male gynecology hospital online consultation :http://news.39.net/ylzx/zztjyy/ Zhengzhou gynecological hospital where good :http://jbk.39.net/yiyuanzaixian/sysdfkyy/

版权声明
本文为[xxxxxx]所创,转载请带上原文链接,感谢

Scroll to Top