下载安装文件 配置Spark-env.sh 和spark-default.properties
sbin/start-master.sh
slaver节点启动 worker
./start-slave.sh spark://cnsz046690:7077
scp -r spark-2.1.1-bin-hadoop2.6 cnsz046691:~ scp -r spark-2.1.1-bin-hadoop2.6 cnsz046745:~ scp -r spark-2.1.1-bin-hadoop2.6 cnsz046746:~
spark 2.1.1 参数修改
import scala.collection.JavaConversions._
show partitions base.UDS_B_I_TRADE_FUND_MOVT;
show partitions base.UDS_B_I_TRADE_FUND_MOVT; select count(1) from base.UDS_B_I_TRADE_FUND_MOVT;
spark-sql –files /etc/spark/log4j.properties
Spark 环境部署和动态资源分配配置
spark2.2及以后,Java要求最低要Java8spark-sql 不支持custer模式 mv spark-2.2.0-bin-hadoop2.6 /usr/lib ln -s spark-2.2.0-bin-hadoop2.6 spark mv /opt/app/spark/conf/* . ln -s /etc/spark/conf conf ln -s /var/log/spark logs启动history server
/usr/lib/spark/sbin/start-history-server.sh
如果使用yarn模式,好像不用修改 修改文件log4j.properties,将日志级别调整为WARN log4j.rootCategory=INFO, console
添加全局路径 export PATH=$PATH:/usr/lib/spark/bin
优雅的解决方法
Jersey problem
If you try to run a spark-submit command on YARN you can expect the following error message:
Exception in thread “main” java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig Jar file jersey-bundle-*.jar is not present in the $SPARK_HOME/jars. Adding it fixes this problem:
sudo -u spark wget http://repo1.maven.org/maven2/com/sun/jersey/jersey-bundle/1.19.1/jersey-bundle-1.19.1.jar -P $SPARK_HOME/jars January 2017 – Update on this issue: If the following is done, Jersey 1 will be used when starting Spark History Server and the applications in Spark History Server will not be shown. The folowing error message will be generated in the Spark History Server output file:
WARN servlet.ServletHandler: /api/v1/applications java.lang.NullPointerException at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)This problem occurs only when one tries to run Spark on YARN, since YARN 2.7.3 uses Jersey 1 and Spark 2.0 uses Jersey 2
One workaround is not to add the Jersey 1 jar described above but disable the YARN Timeline Service in spark-defaults.conf
spark.hadoop.yarn.timeline-service.enabled false [解决方法二](https://my.oschina.net/xiaozhublog/blog/737902) jar包冲突导致:Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig 解决办法: cp /usr/lib/hadoop-yarn/lib/jersey-client-1.9.jar /usr/lib/spark/jars cp /usr/lib/hadoop-yarn/lib/jersey-core-1.9.jar /usr/lib/spark/jars mv /usr/lib/spark/jars/jersey-client-2.22.2.jar /usr/lib/spark/jars/jersey-client-2.22.2.jar.bak解决方法: 关闭 hive.metastore.schema.verification 参数即可,这个参数会根据hive的版本去检查元数据。