编程知识 cdmana.com

Docker builds Hadoop cluster

Article transferred from : https://blog.csdn.net/lizongti/article/details/102756472 

Catalog

Environmental preparation

rely on

install Docker

The singleton pattern (Without Docker)

install

install JDK

install Hadoop

To configure

environment variable

Set password free login

modify hadoop-env.sh

HDFS

Create directory

modify core-site.xml

modify hdfs-site.xml

format HDFS

start-up HDFS

HDFS Web

HDFS test

YARN

modify mapred-site.xml

modify yarn-site.xml

start-up Yarn

Yarn Web

Yarn test

Cluster building (Without Docker)

Get ready

To configure master

Configuration example

modify hdfs-site.xml

modify masters

Delete slaves

copy to slaves node

modify slaves

HDFS

Clear the directory

format HDFS

start-up HDFS

test HDFS

YARN

start-up YARN

test YARN


Environmental preparation

rely on

CentOS7.6

install Docker

reference install ( Click on )

The singleton pattern (Without Docker)

install

install JDK

Go to the official website to download 1.8 Version of tar.gz , If you use yum Install or download rpm Package installation , There will be a lack of Scala2.11 Some of the documents needed .

tar xf jdk-8u221-linux-x64.tar -C /usr/lib/jvm
rm -rf /usr/bin/java
ln -s /usr/lib/jvm/jdk1.8.0_221/bin/java /usr/bin/java

Edit the file

vim /etc/profile.d/java.sh

add to

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_221
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$PATH

Then make the environment variable work

source /etc/profile

Execute the following command to check the environment variables

[root@vm1 bin]# echo $JAVA_HOME
/usr/lib/jvm/jdk1.8.0_221
[root@vm1 bin]# echo $JAVA_HOME
/usr/lib/jvm/jdk1.8.0_221

install Hadoop

For the sake of another article Spark Achieve version compatibility , Use Official website hadoop2.7 edition

wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

decompression

tar xf hadoop-2.7.7.tar.gz -C /opt/

To configure

environment variable

Edit the file

vim /etc/profile.d/hadoop.sh

add to

export HADOOP_HOME=/opt/hadoop-2.7.7
export PATH=$PATH:$HADOOP_HOME/bin

Then make the environment variable work

source /etc/profile

Set password free login

This machine also needs to configure password free login
reference here

modify hadoop-env.sh

Configure the... In the startup script JAVA_HOME

vi /opt/hadoop-2.7.7/etc/hadoop/hadoop-env.sh

Use

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_221

Replace

export JAVA_HOME=${JAVA_HOME}

HDFS

Create directory

mkdir -p /opt/hadoop-2.7.7/hdfs/name
mkdir -p /opt/hadoop-2.7.7/hdfs/data

modify core-site.xml

Configure access nodes

vi /opt/hadoop-2.7.7/etc/hadoop/core-site.xml

Replace

<configuration>
</configuration>

For the following configuration

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/opt/hadoop-2.7.7/tmp</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://vm1:9000</value>
    </property>
</configuration>
  • adopt hadoop.tmp.dir Appoint hadoop Temporary folder for data storage , If not configured hadoop.tmp.dir Parameters , At this time, the default temporary directory is :/tmp/hadoop-root. This directory will be deleted after each restart .
  • adopt fs.defaultFS Specify the default access file system address , Otherwise, the default access to local files , Instead of HDFS File on

modify hdfs-site.xml

Number of configuration copies

vi /opt/hadoop-2.7.7/etc/hadoop/hdfs-site.xml

Replace

<configuration>
</configuration>

For the following configuration

<configuration>
	<property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/opt/hadoop-2.7.7/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/opt/hadoop-2.7.7/hdfs/data</value>
    </property>
</configuration>
  • adopt dfs.replication Appoint HDFS The backup factor of is 1
  • adopt dfs.name.dir Appoint namenode The file storage directory of the node , This parameter is used to determine whether to HDFS What directory is the meta information of the file system stored in . If this parameter is set to multiple directories , So these directories hold multiple copies of meta information .
  • adopt dfs.data.dir Appoint datanode The file storage directory of the node , This parameter is used to determine whether to HDFS What directory is the file system data stored in .
    We can set this parameter to directory on multiple partitions , Can be HDFS Built on different partitions

format HDFS

cd /opt/hadoop-2.7.7/bin
hdfs namenode -format

start-up HDFS

cd /opt/hadoop-2.7.7/sbin
./start-dfs.sh

Running results

Starting namenodes on [vm1]
vm1: starting namenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-namenode-vm1.out
localhost: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-vm1.out

HDFS Web

visit HDFS web Interface http://vm1:50070
 Insert picture description here

HDFS test

Generate test data

mkdir -p /tmp/input
vi /tmp/input/1

Join in

a
b
a
hadoop fs -mkdir -p /tmp/input
hadoop fs -put /tmp/input/1 /tmp/input
hadoop fs -ls  /tmp/input
Found 1 items
-rw-r--r--   1 root supergroup          6 2019-10-28 11:34 /tmp/input/1

YARN

modify mapred-site.xml

Set the scheduler to yarn

cp /opt/hadoop-2.7.7/etc/hadoop/mapred-site.xml.template /opt/hadoop-2.7.7/etc/hadoop/mapred-site.xml
vi /opt/hadoop-2.7.7/etc/hadoop/mapred-site.xml

Replace

<configuration>
</configuration>

For the following configuration

<configuration>
    <property>
	    <name>mapreduce.framework.name</name>
	    <value>yarn</value>
    </property>
	<property>
      <name>mapred.job.tracker</name>
      <value>http://vm1:9001</value>
  </property>
</configuration>
  • By designation mapreduce.framework.name To set up map-reduce Task use yarn The scheduling system of . If set to local Indicates local operation , Set to classic It means classic mapreduce frame .
  • By designation mapred.job.tracker To set up map-reduce Mission job tracker Of IP and Port.

modify yarn-site.xml

vi /opt/hadoop-2.7.7/etc/hadoop/yarn-site.xml

Replace

<configuration>
</configuration>

For the following configuration

<configuration>
	<property>
	    <name>yarn.nodemanager.aux-services</name>
	    <value>mapreduce_shuffle</value>
	</property>
	<property>
	    <name>yarn.resourcemanager.hostname</name>
	    <value>vm1</value>
	</property>
</configuration>
  • By designation yarn.nodemanager.aux-services by mapreduce_shuffle To avoid “The auxService:mapreduce_shuffle does not exist” error
  • By designation yarn.resourcemanager.hostname To set up rm host .

start-up Yarn

./start-yarn.sh

Show

starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-vm1.out
localhost: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm1.out

Yarn Web

visit http://vm1:8088
 Insert picture description here

Yarn test

Carry out orders

```shell
hadoop jar /opt/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /tmp/input /tmp/result

Execution log

19/10/28 11:34:52 INFO client.RMProxy: Connecting to ResourceManager at vm1/192.168.1.101:8032
19/10/28 11:34:53 INFO input.FileInputFormat: Total input paths to process : 1
19/10/28 11:34:53 INFO mapreduce.JobSubmitter: number of splits:1
19/10/28 11:34:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572232474055_0002
19/10/28 11:34:54 INFO impl.YarnClientImpl: Submitted application application_1572232474055_0002
19/10/28 11:34:54 INFO mapreduce.Job: The url to track the job: http://vm1:8088/proxy/application_1572232474055_0002/
19/10/28 11:34:54 INFO mapreduce.Job: Running job: job_1572232474055_0002
19/10/28 11:35:06 INFO mapreduce.Job: Job job_1572232474055_0002 running in uber mode : false
19/10/28 11:35:06 INFO mapreduce.Job:  map 0% reduce 0%
19/10/28 11:35:11 INFO mapreduce.Job:  map 100% reduce 0%
19/10/28 11:35:16 INFO mapreduce.Job:  map 100% reduce 100%
19/10/28 11:35:17 INFO mapreduce.Job: Job job_1572232474055_0002 completed successfully
19/10/28 11:35:18 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=22
		FILE: Number of bytes written=245617
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=98
		HDFS: Number of bytes written=8
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=2576
		Total time spent by all reduces in occupied slots (ms)=3148
		Total time spent by all map tasks (ms)=2576
		Total time spent by all reduce tasks (ms)=3148
		Total vcore-milliseconds taken by all map tasks=2576
		Total vcore-milliseconds taken by all reduce tasks=3148
		Total megabyte-milliseconds taken by all map tasks=2637824
		Total megabyte-milliseconds taken by all reduce tasks=3223552
	Map-Reduce Framework
		Map input records=3
		Map output records=3
		Map output bytes=18
		Map output materialized bytes=22
		Input split bytes=92
		Combine input records=3
		Combine output records=2
		Reduce input groups=2
		Reduce shuffle bytes=22
		Reduce input records=2
		Reduce output records=2
		Spilled Records=4
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=425
		CPU time spent (ms)=1400
		Physical memory (bytes) snapshot=432537600
		Virtual memory (bytes) snapshot=4235526144
		Total committed heap usage (bytes)=304087040
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=6
	File Output Format Counters 
		Bytes Written=8

View the run results

hadoop fs -cat /tmp/result/part-r-00000

Show

a       2
b       1

Cluster building (Without Docker)

Get ready

  • Deploy three machines vm1, vm2,vm3 In a subnet .

To configure master

Configuration example

First in vm1 The configuration process is exactly the same as the singleton configuration

modify hdfs-site.xml

vi /opt/hadoop-2.7.7/etc/hadoop/hdfs-site.xml

Replace

	<property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

For the following configuration

	<property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>

The number of copies here dfs.replication configure 2

modify masters

echo "vm1" > /opt/hadoop-2.7.7/etc/hadoop/masters

Delete slaves

rm /opt/hadoop-2.7.7/etc/hadoop/slaves

copy to slaves node

scp -r /opt/hadoop-2.7.7 root@vm2:/opt/
scp -r /opt/hadoop-2.7.7 root@vm3:/opt/

modify slaves

cat > /opt/hadoop-2.7.7/etc/hadoop/slaves <<EOF
vm1
vm2
vm3
EOF

hold vm2 and vm3 Write to slaves Go inside

HDFS

Clear the directory

rm -rf /opt/hadoop-2.7.7/hdfs/data/*
rm -rf /opt/hadoop-2.7.7/hdfs/name/*

format HDFS

Reformat dfs

hadoop namenode -format

start-up HDFS

/opt/hadoop-2.7.7/sbin
./start-dfs.sh

Show

Starting namenodes on [vm1]
vm1: starting namenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-namenode-vm1.out
vm3: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm3.out
vm2: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm2.out
vm1: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-vm1.out

such as namenode The journal is in /opt/hadoop-2.7.7/logs/hadoop-root-namenode-vm1.log in

Check master process

$ jps
75991 DataNode
76408 Jps
76270 SecondaryNameNode

Check slave process

$ jps
29379 DataNode
29494 Jps

View the cluster status

hdfs dfsadmin -report -safemode

Show

[root@vm1 sbin]# 
Configured Capacity: 160982630400 (149.93 GB)
Present Capacity: 101929107456 (94.93 GB)
DFS Remaining: 101929095168 (94.93 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.1.103:50010 (vm3)
Hostname: vm3
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 10321448960 (9.61 GB)
DFS Remaining: 43339423744 (40.36 GB)
DFS Used%: 0.00%
DFS Remaining%: 80.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Oct 28 13:42:20 CST 2019


Name: 192.168.1.102:50010 (vm2)
Hostname: vm2
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 13661077504 (12.72 GB)
DFS Remaining: 39999795200 (37.25 GB)
DFS Used%: 0.00%
DFS Remaining%: 74.54%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Oct 28 13:42:21 CST 2019


Name: 192.168.1.101:50010 (vm1)
Hostname: vm1
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 35070996480 (32.66 GB)
DFS Remaining: 18589876224 (17.31 GB)
DFS Used%: 0.00%
DFS Remaining%: 34.64%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Oct 28 13:42:21 CST 2019

test HDFS

hadoop fs -mkdir -p /tmp/input
hadoop fs -put /tmp/input/1 /tmp/input
hadoop fs -ls  /tmp/input
Found 1 items
-rw-r--r--   1 root supergroup          6 2019-10-28 11:34 /tmp/input/1

YARN

start-up YARN

/opt/hadoop-2.7.7/sbin
./start-yarn.sh

Show

starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-vm1.out
vm2: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm2.out
vm3: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm3.out
vm1: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm1.out

such as resourcemanager The journal is in /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-vm1.log in

Check master process

$ jps
100464 ResourceManager
101746 Jps
53786 QuorumPeerMain

Check slave process

$jps
36893 NodeManager
37181 Jps

test YARN

hadoop jar /opt/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /tmp/input /tmp/result

The same as above .

More reference :

https://cloud.tencent.com/developer/article/1084166

https://www.jianshu.com/p/7ab2b6168cc9

https://www.cnblogs.com/onetwo/p/6419925.html

Reprint an article , Unverified , Once verified, problems will be corrected improperly .

版权声明
本文为[boonya]所创,转载请带上原文链接,感谢

Scroll to Top