2-单节点hadoop平台搭建(伪分布式)

大数据伪分布平台搭建!

一.简化配置

0.更改主机名并配置hosts映射

1
2
3
4
5
# 更改主机名
hostnamectl set-hostname master
# 配置hosts映射
vi /etc/hosts
[填当前机器真是ip] master

1.配置免密

1
2
ssh-keygen
ssh-copy-id master

2.关闭防火墙与Selinux

1
2
3
systemctl stop firewalld
systemctl disable firewalld
setenforce 0

3.解压组件

1
2
3
4
5
cd /opt
tar -zxvf jdk-8u77-linux-x64.tar.gz
tar -zxvf hadoop-2.6.0.tar.gz
mv jdk1.8.0_77/ jdk
mv hadoop-2.6.0/ hadoop

4.配置文件:hdfs-site.xml

1
2
3
4
5
6
7
cd /opt/hadoop
vi etc/hadoop/hdfs-site.xml
<!-- 添加以下内容 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

5.配置文件:core-site.xml

1
2
3
4
5
6
7
8
9
10
vi etc/hadoop/core-site.xml
<!-- 追加以下内容 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-repo/tmp</value>
</property>

6.配置文件:hadoop-env.sh

1
2
3
vi etc/hadoop/hadoop-env.sh
# 更改以下内容
export JAVA_HOME=/opt/jdk

7.配置环境变量

1
2
3
4
5
6
7
8
9
vi /etc/profile
# 追加以下内容
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 使环境变量立即生效
source /etc/profile

8.格式化hdfs

1
hdfs namenode -format

9.启动hdfs

1
start-dfs.sh

二.详细配置(yarn)

10.配置文件:mapred-site.xml

1
2
3
4
5
6
7
cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vi etc/hadoop/mapred-site.xml
<!-- 追加以下内容 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

11.配置文件:yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
vi etc/hadoop/yarn-site.xml


<!--
其实可以简化配置:将名称相同的两项,在value中使用","分隔开
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
-->
<!-- 追加以下内容 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<!-- 当需要Spark程序要运行在 yarn中时,需要更改以下配置 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>spark_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<!--配置yarn的主机名 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>

<!--放置 spark jar包在运行的时候出现关于内存的报错(无法初始化Saprk context,Spark Session等)
1.是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true。

2.是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true。
-->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

12.启动yarn

1
start-yarn.sh

三.Spark on yarn

当前页面等价于配置单击Spark,前提是安装配置Scala,并配置环境变量

13.预先解压安装Scala,解压Spark

13.拷贝jar包

1
2
3
# 拷贝spark目录中yarn目录下的jar包到==> hadoop的yarn目录下
源目录:/usr/local/spark/yarn/spark-2.4.4-yarn-shuffle.jar
目的目录:/usr/local/hadoop/share/hadoop/yarn

14.修改spark-env.sh

1
2
3
export SPARK_MASTER_IP=master
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export JAVA_HOME=/usr/local/jdk

15.报错总结

参考资源 :balance_scale:Spark集群遇到的问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ERROR client.TransportClient: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException

ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend

ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException

Caused by: java.nio.channels.ClosedChannelException

Exception in thread "main" java.lang.IllegalStateException: Spark context stopped while waiting for backend

ERROR util.Utils: Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult

Caused by: java.io.IOException: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException

Caused by: java.nio.channels.ClosedChannelException
1
2
3
4
5
6
7
8
9
10
11
<!--这个问题是由于内存不足导致的。在yarn-site.xml中添加一下信息 -->

<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
-------------本文结束感谢您的阅读-------------
我们都只是大自然的搬运工!
0%