搭建1个master,2个slave的集群方案。
在每台主机上修改hosts文件
vim /etc/hosts 10.103.29.164 slave1 10.103.31.124 master 10.103.31.186 slave2 123456 123456配置之后,ping一下看是否生效
master: ping slave1 ping slave2 123 123ubuntu只需要安装openssh-server
sudo apt-get install openssh-server 1 1在所有主机上生成私钥和公钥
ssh-keygen -t rsa #一直回车 1 1将每个slave上的id_rsa.pub发给master节点
scp ~/.ssh/id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub.slave1 1 1在master上,将所有的公钥加到用于认证的公钥文件authorized_keys中
cat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys 1 1将公钥文件authorized_keys分发给每台slave
scp ~/.ssh/authorized_keys hadoop@slave1:~/.ssh/ 1 1在每台机子上验证ssh无密码通信
ssh master ssh slave1 ssh slave2 123 123创建Java安装目录
sudo mkdir /usr/lib/jvm/ 1 1将java安装文件解压到此目录
tar -zxvf dk-8u91-linux-x64.tar.gz 1 1修改环境变量
sudo vim /etc/profile 1 1添加下列内容,
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91 export JRE_HOME=$JAVA_HOME/jre export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib 1234 1234然后使用环境生效,验证java是否安装成功
source /etc/profile java -version 12 12很多问题就是,复制的文件的所有者并不是当前用户
在安装目录中解压
tar -zxvf scala-2.11.8.tgz 1 1再次修改环境变量,sudo vim /etc/profile。并添加以下内容
export SCALA_HOME=/usr/local/data/scala-2.11.8 export PATH=$PATH:$SCALA_HOME/bin 12 12使配置生效,并验证Scala是否安装成功
source /etc/profile scala -version 12 12同样将Hadoop安装包解压到安装目录
tar -zxvf hadoop-2.7.2.tar.gz 1 11.配置hadoop-env.sh
# The java implementation to use. export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91 12 122.配置yarn-env.sh
# some Java parameters export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91 12 123.配置slaves
slave1 slave2 12 124.配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/spark/workspace/hadoop-2.6.0/tmp</value> </property> </configuration> 12345678910 123456789105.修改hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/spark/workspace/hadoop-2.6.0/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/spark/workspace/hadoop-2.6.0/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> 123456789101112131415161718 1234567891011121314151617186.修改mapred-site.xml(cp mapred-site.xml.template mapred-site.xml)
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> 123456 1234567.修改yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration> 123456789101112131415161718192021222324252627282930 123456789101112131415161718192021222324252627282930将配置好的hadoop-2.7.2分发给slaves,scp命令。
在hadoop-2.7.2文件中
bin/hadoop namenode -format sbin/start-dfs.sh sbin/start-yarn.sh 123 123用jps查看各个节点启动的进程是否正常。
$ jps #run on master 3407 SecondaryNameNode 3218 NameNode 3552 ResourceManager 3910 Jps 12345 12345 $ jps #run on slaves 2072 NodeManager 2213 Jps 1962 DataNode 1234 1234或者在浏览器中输入 http://master:8088 ,应该有 hadoop 的管理界面出来了,并能看到 slave1 和 slave2 节点。
同样将Spark安装包解压到安装目录
tar -zxvf spark-1.6.1-bin-hadoop2.6.tgz 1 1在spark-env.sh末尾添加以下内容
export SCALA_HOME=/usr/local/data/scala-2.11.8 export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91 export HADOOP_HOME=/usr/local/data/hadoop-2.7.2 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop SPARK_MASTER_IP=master SPARK_LOCAL_DIRS=/usr/local/data/spark-1.6.1-bin-hadoop2.6 SPARK_DRIVER_MEMORY=1G 1234567 1234567修改slaves文件,vim slaves
slave1 slave2 12 12将配置好的spark-1.6.1-bin-hadoop2.6文件夹分发给所有的slaves
scp -r 命令 1 1用jps检查
$ jps #master 7949 Jps 7328 SecondaryNameNode 7805 Master 7137 NameNode 7475 ResourceManager 123456 123456 $jps #slaves 3132 DataNode 3759 Worker 3858 Jps 3231 NodeManager 12345 12345进入Spark的Web管理页面: http://master:8080
运行实例 master上执行命令
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ lib/spark-examples*.jar \ 10 12345 12345参考链接 spark on yarn