>记录生活, 工作的点点滴滴...

Hadoop & Hive环境搭建

java运行环境配置

以java8为例

# 1. 下载`jdk-8u241-linux-x64.tar.gz`
# 2. 解压缩
tar xzvf jdk-8u241-linux-x64.tar.gz
# 3. 进入目录下
cd cd jdk1.8.0_241
# 4. 设置JAVA_HOME环境变量
export JAVA_HOME=`pwd`
# 5. 添加至环境变量
export PATH="$PATH:$JAVA_HOME/bin"

hadoop配置

官网下载

选择国内清华镜像https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common下载速度较快

解压安装

参考官方文档

测试环境是否正常

Try the following command:
$ bin/hadoop This will display the usage documentation for the hadoop script.

配置Pseudo-Distributed Operation

Use the following:

etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

bin/hdfs namenode -format

ssh localhost时出现错误ssh: connect to host localhost port 22: Connection refused, 为sshd服务未开启.
A: 检查是否安装了openssh-server,使用命令ps -e|grep sshd,如果出现了sshd,则说明安装了,反之则没安装。

/usr/sbin/sshd时报Missing privilege separation directory: /run/sshd
A: 创建一个目录mkdir /run/sshd
/usr/sbin/sshd又报/run/sshd must be owned by root and not group or world-writable. 
A: 此为/run/sshd权限问题, chmod 755 /run/sshd设置目录仅root有写权限

$ sbin/start-dfs.sh

ERROR: JAVA_HOME is not set and could not be found.
A: hadoop还是找不到JAVA_HOME,在etc/hadoop/hadoop-env.sh中添加或修改JAVA_HOME,一定要使用绝对路径。

pdsh@XXX: localhost: connect: Connection refused
A: 在/etc/profile文件下追加export PDSH_RCMD_TYPE=ssh

yarn配置

etc/hadoop/mapred-site.xml:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

etc/hadoop/yarn-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

启动服务

 sbin/start-dfs.sh
 sbin/start-yarn.sh

 bin/hdfs dfs -mkdir /user
 bin/hdfs dfs -mkdir /user/<username>

 bin/hdfs dfs -mkdir input
 bin/hdfs dfs -put etc/hadoop/*.xml input

 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'

 bin/hdfs dfs -cat output/*

 sbin/stop-dfs.sh
 sbin/stop-yarn.sh

1主1从配置文件

master下:
etc/hadoop/core-site.xml

<configuration>
 <property>
         <name>fs.defaultFS</name>
         <value>hdfs://master:9000</value>
     </property>
</configuration>

etc/hadoop/hdfs-site.xml

<configuration>
    <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>

    <property>
       <name>dfs.namenode.secondary.http-address</name>
       <value>slave1:9001</value>
    </property>
</configuration>

etc/hadoop/mapred-site.xml

<configuration>
   <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
      </property>
    <property>
            <name>mapreduce.application.classpath</name>
            <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
       </property>
       <property>
             <name>mapreduce.jobhistory.address</name>
             <value>master:10020</value>
        </property>
</configuration>

etc/hadoop/yarn-site.xml

<configuration>
    <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
    </property>
    <property>
           <name>yarn.nodemanager.env-whitelist</name>
           <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>

     <property>
            <name>yarn.resourcemanager.address</name>
            <value>master:8032</value>
     </property>
     <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>master:8030</value>
     </property>
     <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>master:8031</value>
     </property>
</configuration>

启动所有服务

# 启动hadoop
> sbin/start-all.sh
# jps查看java进程, 若未加入环境变量, 可在java目录bin/jps查看
> jps
# 显示如下结果表示启动成功
9873 NameNode
10882 NodeManager
12614 Jps
10393 SecondaryNameNode
10107 DataNode
10670 ResourceManager

相关运行错误及解决方案

在本地windows下子ubuntu系统配置时, NodeManager启动失败, 查看log日志是tmp文件夹权限问题, Permissions incorrectly set for dir /nm-local-dir/usercache, should be rwxr-xr-x, actual value = rwxrwxrwx, 因是将tmp目录设置在了挂载的windows系统盘上, 更改目录设置至/usr/local/hadoop/tmp后, 重新bin/hdfs namenode -format后正常.
问题剖析路径:

  1. 5项服务是否都已开启, 若哪项未开启, 即查看相应log

在启动时,start-dfs.sh start-yarn.sh时报错

Starting namenodes on [namenode]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [datanode1]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

解决办法:

注意是在文件开始空白处

在start-dfs.sh及stop-dfs.sh中:

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root 

在start-yarn.sh及stop-yarn.sh中

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

参考资料 参考资料2

hive配置

为Hive建立相应的MySQL账户,并赋予足够的权限:

mysql 数据库 GRANT ALL ON hive.* TO 'hive'@'%';

mysql驱动包

https://dev.mysql.com/downloads/connector/j/下载mysql驱动包并移动到hive/lib目录下

跟据hive官方文档配置

  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /tmp
  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

修改hive-site.xml文件

(1)复制hive-default.xml.template创建hive-site.xml,一个是系统默认的一个是自定义的,hive优先以自定义的为准。
(2)修改hive-site.xml文件(在这之前,需先将hive-site.xml中原有的东西全部删掉,然后添加下列内容:)
conf/hive-site.xml

<configuration>
     <property>
             <name>javax.jdo.option.ConnectionURL</name>
             <value>jdbc:mysql://slave1:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=utf-8&amp;useSSL=false</value>
     </property>
     <property>
             <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
     </property>
     <property>
             <name>javax.jdo.option.ConnectionUserName</name>
             <value>hive</value>
     </property>
     <property>
           <name>javax.jdo.option.ConnectionPassword</name>
          <value>password</value>
     </property>
     <property>
           <name>hive.metastore.schema.verification</name>
          <value>false</value>
     </property>
  </configuration>

hive?createDatabaseIfNotExist=true如果hive数据库不存在就自动创建一个数据库
xml文件中的&字符需要用&amp;转义字符替代

启动hive时失败并提示如下:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
        at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
        at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:448)
        at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
        at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5099)
        at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:97)
        at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:81)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

为guava包冲突所致,ll $HIVE_HOME/lib | grep guava 查看hive下版本 ll $HADOOP_HOME/share/hadoop/common/lib/ | grep guava | grep guava查看hadoop下版本
删除hive下该包, 并将hadoop下版本复制一份至hive下

初始化:

bin/schematool -dbType mysql -initSchema

提示

Initialization script completed
schemaTool completed

初始化成功

Sat Apr 04 15:37:04 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
将hive-site.xml中的连接参数设置为: jdbc:mysql://localhost/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf-8&useSSL=false

提示Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'useUnicode'
Xml文件中不能使用&,要使用他的转义&amp;来代替。

启动hive

修改hive启动文件,添加HADOOP_HOME``HIVE_CONF_DIR环境变量

Running Hive CLI

在hive目录下运行bin/hive, 当出现 hive>时成功进入hive cli模式

Running HiveServer2 and Beeline

HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline. HiveCLI is now deprecated in favor of Beeline, as it lacks the multi-user, security, and other capabilities of HiveServer2. To run HiveServer2 and Beeline from shell:

$ $HIVE_HOME/bin/hiveserver2

$ $HIVE_HOME/bin/beeline -u jdbc:hive2://$HS2_HOST:$HS2_PORT

Beeline is started with the JDBC URL of the HiveServer2, which depends on the address and port where HiveServer2 was started. By default, it will be (localhost:10000), so the address will look like jdbc:hive2://localhost:10000. Or to start Beeline and HiveServer2 in the same process for testing purpose, for a similar user experience to HiveCLI:

$ $HIVE_HOME/bin/beeline -u jdbc:hive2://

http://www.bubuko.com/infodetail-3286965.html

配置:

master
slave1

debug运行

此状态下更容易看到哪里出了问题,比如,在1主1从的配置中,在slave(从)机上启动hiveserver2服务,总是不成功。

./bin/hive --hiveconf hive.root.logger=DEBUG,console

通过在该状态下运行命令,可以详细地看出是因为防火墙导致的端口不通,及相应配置中没的指定datanode接收端服务,以及没有启动

sbin/mr-jobhistory-daemon.sh start historyserver

etc/hadoop/core-site.xml(主从)

<configuration>
 <property>
         <name>fs.defaultFS</name>
         <value>hdfs://master:9000</value>
     </property>

     <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
         </property>
         <property>
                <name>hadoop.proxyuser.root.groups</name>
            <value>*</value>
    </property>
</configuration>

conf/hive-site.xml 增加如下配置(主,hive连接hiveserver2机)

  <property>
         <name>hive.metastore.uris</name>
         <value>thrift://slave1:9083</value> 
     </property>

conf/hive-site.xml 增加如下配置(从,hiveserver2服务所在机)

   <property>
           <name>hive.metastore.schema.verification</name>
          <value>false</value>
     </property>

     <property>
         <name>hive.server2.thrift.port</name>
         <value>10000</value>
        </property>

        <property>
                <name>hive.server2.thrift.bind.host</name>
                <value>slave1</value>
        </property>

参考1 参考2

发表于:2020-04-25 08:05:10浏览(39) 评论(0) 默认分类