Hadoop组件部署

Hadoop部署 - 严千屹 (qianyios.top)本笔记建立在Hadoop伪分布机子上，可以前往查看安装机子

Zookeeper

你需要克隆出三台hadoop的基础模版机，这里有教程和说明，然后再进行开始操作，这个Zookeeper是hadoop的一个组件，独立出来了的，也就是说是一个独立的Zookeeper集群，Hbase是需要基于Zookeeper运行的，如果你不需要独立的Zookeeper可以不用做这个，hbase有自带的Zookeeper

名称	ip
zk01	192.168.48.11
zk02	192.168.48.12
zk03	192.168.48.13

设置hosts

[root@localhost ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.48.11 zk01
192.168.48.12 zk02
192.168.48.13 zk03

更改主机名

1	hostnamectl set-hostname zk01 && bash

检查java版本

[root@zk01 ~]# java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)

检查hadoop版本

[root@zk01 ~]# hadoop version
Hadoop 3.1.3
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r ba631c436b806728f8ec2f54ab1e289526c90579
Compiled by ztang on 2019-09-12T02:47Z
Compiled with protoc 2.5.0
From source with checksum ec785077c385118ac91aadde5ec9799
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.1.3.jar

安装zookeeper

tar -xf apache-zookeeper-3.8.0-bin.tar.gz -C /opt
echo "export ZOOKEEPER_HOME=/opt/apache-zookeeper-3.8.0-bin" >> /etc/profile
echo "export PATH=\$ZOOKEEPER_HOME/bin:\$PATH" >> /etc/profile
source /etc/profile
cd /opt/apache-zookeeper-3.8.0-bin/
cp conf/zoo_sample.cfg conf/zoo.cfg

vi conf/zoo.cfg
    tickTime=2000
    dataDir=/opt/apache-zookeeper-3.8.0-bin/data
    clientPort=2181
    initLimit=10
    syncLimit=5
    maxClientCnxns=60
    server.1=zk01:2888:3888
    server.2=zk02:2888:3888
    server.3=zk03:2888:3888

mkdir /opt/apache-zookeeper-3.8.0-bin/data
echo 1 > /opt/apache-zookeeper-3.8.0-bin/data/myid
cat /opt/apache-zookeeper-3.8.0-bin/data/myid

关机克隆出两台机 zk02 zk03

zk02 192.168.48.12

1 2	vi /opt/apache-zookeeper-3.8.0-bin/data/myid 2

zk03 192.168.48.13

1 2	vi /opt/apache-zookeeper-3.8.0-bin/data/myid 3

互相ping测试连通性

1
2
3

ping zk01
ping zk02
ping zk03

能互通说明成功

开启zookeeper服务

需开启两台才能看见Mode: follower

1
2
3

zkServer.sh start
zkServer.sh status
zkServer.sh stop

HBase安装

安装hbase

[root@hadoop ~]# ll
-rw-r--r--  1 root root 232190985 3月  17 19:37 hbase-2.2.2-bin.tar.gz

tar -xf hbase-2.2.2-bin.tar.gz -C /usr/local/
mv /usr/local/hbase-2.2.2 /usr/local/hbase
echo "export HBASE_HOME=/usr/local/hbase" >> /etc/profile
echo "export PATH=\$PATH:\$HBASE_HOME/bin" >> /etc/profile
source /etc/profile

------------------------------------------------------------------------
[root@hadoop ~]# vi /usr/local/hbase/bin/hbase
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar:/usr/local/hbase/lib/*
或
[root@hadoop ~]# sed -i "s/CLASSPATH=\${CLASSPATH}:\$JAVA_HOME\/lib\/tools.jar/CLASSPATH=\${CLASSPATH}:\$JAVA_HOME\/lib\/tools.jar:\/usr\/local\/hbase\/lib\/*/g" /usr/local/hbase/bin/hbase
------------------------------------------------------------------------

[root@hadoop ~]# hbase version                                                       SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase 2.2.2
Source code repository git://6ad68c41b902/opt/hbase-rm/output/hbase revision=e6513a76c91cceda95dad7af246ac81d46fa2589
Compiled by hbase-rm on Sat Oct 19 10:10:12 UTC 2019
From source with checksum 4d23f97701e395c5d34db1882ac5021b

HBase配置

HBASE_MANAGES_ZK=true设置为true就是说用hbase自带的Zookeeper，如果你有独立的Zookeeper集群，自行设置

echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> $HBASE_HOME/conf/hbase-env.sh
echo "export HBASE_CLASSPATH=/usr/local/hbase/conf" >> $HBASE_HOME/conf/hbase-env.sh
echo "export HBASE_MANAGES_ZK=true" >> $HBASE_HOME/conf/hbase-env.sh

vi $HBASE_HOME/conf/hbase-site.xml
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://yjx48:9000/hbase</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>#是否分布式运行，false即为单机
        </property>
        <property>
                <name>hbase.unsafe.stream.capability.enforce</name>
                <value>false</value>
        </property>
</configuration>

HBase启动

start-all.sh 
start-hbase.sh
[root@hadoop hbase]# jps
16532 ResourceManager
22502 HMaster-----------
15799 NameNode
16697 NodeManager
23097 Jps
15962 DataNode
22666 HRegionServer-----------
16223 SecondaryNameNode
22431 HQuorumPeer
[root@hadoop ~]# hbase shell
hbase(main):001:0> list
TABLE
0 row(s)
Took 0.3118 seconds
=> []
hbase(main):002:0> exit

访问网页

ip:16010

HBase管理

学号（S_No）	姓名（S_Name）	性别（S_Sex）	年龄（S_Age）
2015001	zhangsan	male	23
2015002	Mary	female	22
2015003	Lisi	male	24

创建学生表

hbase(main):004:0> create 'student','no','name','sex','age'
Created table student
Took 1.3125 seconds
=> Hbase::Table - student
hbase(main):005:0> list
TABLE
student
1 row(s)
Took 0.0074 seconds
=> ["student"]
#查看表结构
hbase(main):001:0> describe 'student'
Table student is ENABLED
student
COLUMN FAMILIES DESCRIPTION
{NAME => 'age', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false'
.......

添加数据

s001为行键

hbase(main):001:0> scan 'student'
ROW                        COLUMN+CELL
0 row(s)
Took 0.2712 seconds
hbase(main):002:0> put 'student','s001','no','2015001'
Took 0.0236 seconds
hbase(main):003:0> put 'student','s001','name','zhangsan'
Took 0.0057 seconds
hbase(main):004:0> scan 'student'
ROW                        COLUMN+CELL
 s001                      column=name:, timestamp=1679058447572, value=zhangsan
 s001                      column=no:, timestamp=1679058447550, value=2015001
1 row(s)
Took 0.0179 seconds

查看整行

hbase(main):001:0> get 'student','s001'
COLUMN                     CELL
 name:                     timestamp=1679058447572, value=zhangsan
 no:                       timestamp=1679058447550, value=2015001
1 row(s)
Took 0.2910 seconds

查看单元格

hbase(main):008:0> get 'student','s001','name'
COLUMN                     CELL
 name:                     timestamp=1679058447572, value=zhangsan
1 row(s)
Took 0.0053 seconds

订单例子

创建order表

create 'order','userinfo','orderinfo'
list
put 'order','1','userinfo:name','sw'
put 'order','1','userinfo:age','24'
put 'order','1','orderinfo:id','23333'
put 'order','1','orderinfo:money','30'
scan 'order'
-----------------------------------------------------------
hbase(main):017:0* create 'order','userinfo','orderinfo'
Created table order
Took 2.3102 seconds
=> Hbase::Table - order
hbase(main):018:0> list
TABLE
order
student
2 row(s)
Took 0.0104 seconds
=> ["order", "student"]
hbase(main):019:0> put 'order','1','userinfo:name','sw'
Took 0.0326 seconds
hbase(main):020:0> put 'order','1','userinfo:age','24'
Took 0.0031 seconds
hbase(main):021:0> put 'order','1','orderinfo:id','23333'
Took 0.0036 seconds
hbase(main):022:0> put 'order','1','orderinfo:money','30'
Took 0.0031 seconds
hbase(main):023:0> scan 'order'
ROW                        COLUMN+CELL
 1                         column=orderinfo:id, timestamp=1679060732699, value=23333
 1                         column=orderinfo:money, timestamp=1679060732711, value=30
 1                         column=userinfo:age, timestamp=1679060732685, value=24
 1                         column=userinfo:name, timestamp=1679060732667, value=sw
1 row(s)
Took 0.0116 seconds

修改数据

hbase(main):001:0> put 'student','s001','name','zhangxiaosan'
Took 0.2879 seconds
hbase(main):002:0> get 'student','s001','name'
COLUMN                     CELL
 name:                     timestamp=1679061655288, value=zhangxiaosan
1 row(s)
Took 0.0280 seconds

时间戳

#数据添加到HBase的时候都会被记录一个时间戳，这个时间戳被我们当做一个版本。

当修改某一条的时候，本质上是往里边新增一条数据，记录的版本加一。

#现在要把这条记录的值改为40，实际上就是多添加一条记录，在读的时候按照时间戳读最新的记录

put 'order','1','orderinfo:money','40'
get 'order','1','orderinfo:money'

hbase(main):008:0> put 'order','1','orderinfo:money','40'
Took 0.0190 seconds
hbase(main):009:0> get 'order','1','orderinfo:money'
COLUMN                     CELL
 orderinfo:money           timestamp=1679064515487, value=40
1 row(s)
Took 0.0096 seconds

删除数据

name一定要加个:

1
2
3

scan 'student'
delete 'student','s001','name:'
get 'student','s001','name'

删除表

1
2
3

disable 'student'
describe 'student'
drop 'student'

访问网页

ip:16010

NoSQL数据库安装

（Redis键值对非关系型数据库）

安装redis

tar -xf redis-5.0.5.tar.gz
mv redis-5.0.5 /opt/redis
cd /opt/redis
yum install -y gcc automake autoconf libtool
#编译安装
make && make install
cd src
[root@yjx48 src]# ./redis-server
5861:C 30 Mar 2023 08:49:48.699 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
5861:C 30 Mar 2023 08:49:48.699 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=5861, just started
5861:C 30 Mar 2023 08:49:48.699 # Warning: no config file specified, using the default config. In order to specify a config file use ./redis-server /path/to/redis.conf
5861:M 30 Mar 2023 08:49:48.699 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 5.0.5 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 5861
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

5861:M 30 Mar 2023 08:49:48.700 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

#另开一个会话
[root@yjx48 ~]# cd /opt/redis/src
[root@yjx48 src]# ./redis-cli
127.0.0.1:6379> set hello world
OK
127.0.0.1:6379> get hello
"world"
127.0.0.1:6379> exit
[root@yjx48 src]#

数据库管理

redis语法

#插入数据
set student:2015001:sname zhangsan
get student:2015001:sname
set student:2015001:sex male
get student:2015001:sex
#修改数据
set student:2015001:sname zhangxiaosan
get student:2015001:sname
#删除数据
get set student:2015001:sname
del set student:2015001:sname
get set student:2015001:sname
#没数据了

Hash数据库

student表

2015001={
	name=zhangsan
	sex=male
	age=23
}

插入和查询数据

hset student:2015001 name zhangsan
hset student:2015001 sex male
hset student:2015001 age 23
hget student:2015001 name 
hget student:2015001 sex
hgetall student:2015001

修改数据

1 2	hset student:2015001 sex female hget student:2015001 sex female

删除数据

1
2
3

hdel student:2015001 sex
hget student:2015001 sex
#无数据

MongoDB

Mongodb是一个基于分布式文件存储的文档数据库，介于关系数据库和非关系数据库之间，是非关系数
据库当中功能最丰富、最像关系数据库的一种 NOSQL数据库。
Mongo最大的特点是支持的查询语言非常强大，语法有点类似于面向对象的查询语言，几乎可以实现类
似关系数据库单表查询的绝大部分功能，而且还支持对数据建立索引。
Mongodb支持的数据结构非常松散，是类似json的bson格式，因此可以存储比较复杂的数据类型。

JSON语法

JSON 语法是 JavaScript 语法的子集。

JSON 数字

JSON 数字可以是整型或者浮点型：
{ “age”:30 }

JSON 对象

JSON 对象在大括号 {} 中书写：
对象可以包含多个名称/值对：

JSON 数组

JSON 数组在中括号 [] 中书写：
数组可包含多个对象：

[
{ key1 : value1-1 , key2:value1-2 },
{ key1 : value2-1 , key2:value2-2 },
{ key1 : value3-1 , key2:value3-2 },
...
{ key1 : valueN-1 , key2:valueN-2 },
]

{
	"sites": [
        { "name":"菜鸟教程" , "url":"www.runoob.com" },
        { "name":"google" , "url":"www.google.com" },
        { "name":"微博" , "url":"www.weibo.com" }
       ]
}

在上面的例子中，对象 sites 是包含三个对象的数组。每个对象代表一条关于某个网站（name、url）
的记录。

JSON 布尔值

JSON 布尔值可以是 true 或者 false：

1	{ "flag":true }

JSON null

JSON 可以设置 null 值：

1	{ "runoob":null }

MongoDB安装

tar -xf mongodb-linux-x86_64-rhel70-5.0.5.tgz 
mv mongodb-linux-x86_64-rhel70-5.0.5 /opt/mongodb
cd /opt/mongodb/bin
./mongo -version

#默认情况下 MongoDB 启动后会初始化以下两个目录，事先创建好：
#数据存储目录：/var/lib/mongodb
#日志文件目录：/var/log/mongodb
mkdir -p /var/lib/mongo
mkdir -p /var/log/mongodb
#启动mongodb服务
cd /opt/mongodb/bin
./mongod --dbpath /var/lib/mongo --logpath /var/log/mongodb/mongod.log --fork
ps ax | grep mongod
./mongo

数据库管理

常用命令

#列出所有数据库

>show dbs
admin 0.000GB
config 0.000GB
local 0.000GB

#切换数据库

1 2	>use admin switched to db admin

#显示当前数据库的所有集合

1 2	>show collections system.version

#显示集合的所有数据

1 2	>db.system.version.find() { "_id" : "featureCompatibilityVersion", "version" : "5.0" }

创建数据库和集合

#mongodb没有创建数据库命令
> use school
switched to db school
#创建集合，同时会自动创建以上的数据库
> db.createCollection('student')
{ "ok" : 1 }
> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
school 0.000GB
> show collections
Student

插入数据

#两种方法插入数据：insert和save
#_id可以手动输入，否则会自动生成
>db.student.insert({
  sno: 2015001,
  name: "zhangsan",
  sex: "male",
  age: 23
})
WriteResult({ "nInserted" : 1 })

> db.student.find()
{ "_id" : ObjectId("642e21279c9d145e592fda70"), "sno" : 2015001, "name" : "zhangsan", "sex" : "male", "age" : 23 }

> db.student.save({sno:2015002,name:'marry',sex:'female',age:22})
WriteResult({ "nInserted" : 1 })

> db.student.find()
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "sno" : 2015001, "name" : "zhangsan", "sex" : "male", "age" : 23 }
{ "_id" : ObjectId("642e259614c45ed3f90756c1"), "sno" : 2015002, "name" : "marry", "sex" : "female", "age" : 22 }

#insert和save区别：手动插入一行时，如_id已经存在，insert则出错，save则替代原值。
> db.student.insert({"_id": ObjectId("642e259014c45ed3f90756c0"),   "sno": 2015001,   "name": "zhangsan",   "sex": "male",   "age": 23 })

WriteResult({
        "nInserted" : 0,
        "writeError" : {
                "code" : 11000,
                "errmsg" : "E11000 duplicate key error collection: test.student index: _id_ dup key: { _id: ObjectId('642e21279c9d145e592fda70') }"
        }
})
#更改年龄23→24
> db.student.save({"_id": ObjectId("642e259014c45ed3f90756c0"),   "sno": 2015001,   "name": "zhangsan",   "sex": "male",   "age": 24 })
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

> db.student.find()
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "sno" : 2015001, "name" : "zhangsan", "sex" : "male", "age" : 24 }
{ "_id" : ObjectId("642e259614c45ed3f90756c1"), "sno" : 2015002, "name" : "marry", "sex" : "female", "age" : 22 }

查找数据

#查询
#查询格式：find([query],[fields]，类似于sql的select语句，query相当于where，fields相当
于显示的列
#查询名字为zhangsan的数据
> db.student.find({name:'zhangsan'})
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "sno" : 2015001, "name" : "zhangsan", "sex" : "male", "age" : 24 }
#查询名字为zhangsan的人的性别
> db.student.find({name:'zhangsan'},{name:1,sex:1})
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "name" : "zhangsan", "sex" : "male" }
#不显示_id
> db.student.find({name:'zhangsan'},{_id:0,name:1,sex:1})
{ "name" : "zhangsan", "sex" : "male" }
#查询指定列
> db.student.find({},{name:1})
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "name" : "zhangsan" }
{ "_id" : ObjectId("642e259614c45ed3f90756c1"), "name" : "marry" }
#and查询条件 
> db.student.find({name:'zhangsan',sex:'female'})
> db.student.find({name:'zhangsan',sex:'male'})
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "sno" : 2015001, "name" : "zhangsan", "sex" : "male", "age" : 24 }
>db.student.find({$or:[{age:24},{age:22}]})
#or查询
> db.student.find({  $or:[{age:24},{age:22}]  })
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "sno" : 2015001, "name" : "zhangsan", "sex" : "male", "age" : 24 }
{ "_id" : ObjectId("642e259614c45ed3f90756c1"), "sno" : 2015002, "name" : "marry", "sex" : "female", "age" : 22 }

修改数据

#格式：update( query, [, upsert_bool, multi_bool] )
#query : update的查询条件，类似sql update查询内where后面的。
#update : update的对象和一些更新的操作符（如$,$inc…）等，也可以理解为sql update查询内
set后面的
#upsert : 可选，这个参数的意思是，如果不存在update的记录，是否插入objNew,true为插入，默认是
false，不插入。
#multi : 可选，mongodb 默认是false,只更新找到的第一条记录，如果这个参数为true,就把按条件查
出来多条记录全部更新。

> db.student.update({name:'zhangsan'},{$set:{age:23}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

> db.student.find()
{ "_id" : ObjectId("642e259014c45ed3f90756c0"), "sno" : 2015001, "name" : "zhangsan", "sex" : "male", "age" : 23 }
{ "_id" : ObjectId("642e259614c45ed3f90756c1"), "sno" : 2015002, "name" : "marry", "sex" : "female", "age" : 22 }

删除数据

> db.student.remove({name:'zhangsan'})
WriteResult({ "nRemoved" : 1 })
> db.student.find()
{ "_id" : ObjectId("642e259614c45ed3f90756c1"), "sno" : 2015002, "name" : "marry", "sex" : "female", "age" : 22 }

删除集合

> db.createCollection('course')
{ "ok" : 1 }
> show collections
course
student
> db.course.drop()
true
> show collections
student

Hive数据仓库安装

hive 是一种底层封装了Hadoop 的数据仓库处理工具，使用类SQL 的hiveSQL 语言实现数据查询，所有hive 的数据都存储在Hadoop 兼容的文件系统、(例如，[Amazon S3](https://baike.baidu.com/item/Amazon S3/10809744)、HDFS)中。hive 在加载数据过程中不会对数据进行任何的修改，只是将数据移动到HDFS 中hive 设定的目录下，因此，hive 不支持对数据的改写和添加，所有的数据都是在加载的时候确定的。

用户接口Client

用户接口主要有三个：CLI，Client 和 WUI。其中最常用的是 Cli，Cli启动的时候，会同时启动一个 hive 副本。Client 是 hive 的客户端，用户连接至 hive Server。在启动 Client 模式的时候，需要指出 hive Server 所在节点，并且在该节点启动 hive Server。 WUI 是通过浏览器访问 hive。

元数据存储 Metastore

hive 将元数据存储在数据库中，如 mysql、derby。hive 中的元数据包括表的名字，表的列和分区及其属性，表的属性（是否为外部表等），表的数据所在目录等。

驱动器：Driver 解释器、编译器、优化器、执行器

解释器、编译器、优化器完成 HQL 查询语句从词法分析、语法分析、编译、优化以及查询计划的生成。生成的查询计划存储在 HDFS 中，并在随后由 MapReduce 调用执行。

Hadoop

hive 的数据存储在 HDFS 中，大部分的查询由 MapReduce 完成（不包含 * 的查询，比如 select * from tbl 不会生成 MapReduce 任务）。

安装hive

tar -xf apache-hive-3.1.2-bin.tar.gz
mv apache-hive-3.1.2-bin /usr/local/hive
echo "export HIVE_HOME=/usr/local/hive" >> /etc/profile
echo "export PATH=\$HIVE_HOME/bin:\$PATH" >> /etc/profile
source /etc/profile
cd /usr/local/hive/conf/
cp hive-default.xml.template hive-default.xml
vi hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://yjx48:3306/hive?useSSL=false</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>Yjx@666.</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.exec.mode.local.auto</name>
    <value>true</value>
  </property>
</configuration>

其中Yjx@666.是mysql密码

安装mysql

cd
yum remove mariadb-libs.x86_64 -y
yum install -y net-tools
mkdir mysql
tar -xf mysql-5.7.37-1.el7.x86_64.rpm-bundle.tar -C mysql
cd mysql
rpm -ivh mysql-community-common-5.7.37-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.37-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-compat-5.7.37-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-5.7.37-1.el7.x86_64.rpm
rpm -ivh mysql-community-devel-5.7.37-1.el7.x86_64.rpm
rpm -ivh mysql-community-server-5.7.37-1.el7.x86_64.rpm
systemctl enable --now mysqld
grep  'temporary password' /var/log/mysqld.log
mysqladmin -uroot -p'darm4hb.2Rsy' password 'Yjx@666.'
mysql -uroot -pYjx@666.
#给root用户授权
grant all privileges on *.* to 'root'@'localhost' identified by 'Yjx@666.' with grant option;
grant all privileges on *.* to 'root'@'%' identified by 'Yjx@666.' with grant option;
flush privileges;
create database hive;
exit

配置和启动hive

cd
tar -xf mysql-connector-java-5.1.40.tar.gz
cp mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar /usr/local/hive/lib/
mv /usr/local/hive/lib/guava-19.0.jar{,.bak}
cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar  /usr/local/hive/lib
start-all.sh 
schematool -dbType mysql -initSchema
hive

Hive数据类型

类型	描述	示例
TINYINT（tinyint）	一个字节（8位）有符号整数， -128~127	1
SMALLINT（smallint）	2字节（16位）有符号整数，-32768~32767	1
INT（int）	4字节（32位）有符号整数	1
BIGINT（bigint）	8字节（64位）有符号整数	1
FLOAT（float）	4字节（32位）单精度浮点数	1
DOUBLE（double）	8字节（64位）双精度浮点数	1
DECIMAL(decimal)	任意精度的带符号小数	1
BOOLEAN（boolean）	true/false	true/false
STRING（string）	字符串，变长	‘a’,‘b’,‘1’
VARCHAR（varchar）	变长字符串	‘a’
CHAR（char）	固定长度字符串	‘a’
BINANY（binany）	字节数组	无法表示
TIMESTAMP（timestamp）	时间戳，纳秒精度	1.22327E+11
DATE（date）	日期	‘2016-03-29’

hive的集合数据类型

类型	描述	示例
ARRAY	有序数组，字段的类型必须相同	Array（1，2）
MAP	一组无序的键值对，键的类型必须是原始数据类型，他的值可以是任何类型，同一个映射的键的类型必须相同，值得类型也必须相同	Map（‘a’,1）
STRUCT	一组命名的字段,字段类型可以不同	Struct（‘a’,1,2.0
UNION	UNION则类似于C语言中的UNION结构，在给定的任何一个时间点，UNION类型可以保存指定数据类型中的任意一种

基本命令

创建数据库和表

1
2
3

create database hive;
use hive;
create table usr(id int,name string,age int);

查看和描述数据库和表

show databases;
show tables;
describe database hive;
describe hive.usr;

向表中装载数据

insert into usr values(1,'sina',20);

#从linux读取数据
[root@yjx48 ~]# echo "2,zhangsan,22" >> /opt/data
hive> use hive;
create table usr1(id int,name string,age int) row format delimited fields terminated by ",";
load data local inpath '/opt/data' overwrite into table usr1;

从hdfs中读取数据

echo "3,lisi,25" >> /opt/test.txt
hdfs dfs -put /opt/test.txt /
hive
load data inpath 'hdfs://yjx48:9000/test.txt' overwrite into table usr1;

从别的表中读取数据

hive> select * from usr;
OK
1       sina    20

hive> select * from usr1;
OK
3       lisi    25
#读取usr1的id=3的数据到usr
insert overwrite table usr select * from usr1 where id=3;

hive> select * from usr;
OK
3       lisi    25

查询表中数据

1	select * from usr1;

Hive实验：词频统计

在linux上创建输入目录：/opt/input；

1	mkdir /opt/input

在以上输入目录中添加多个文本文件，其中文件中包含单词：姓名学号，例如：yjx48；

1
2
3

echo "hello1 yjx48" >> /opt/input/text1.txt
echo "hello2 yjx48" >> /opt/input/text2.txt
echo "hello3 yjx48" >> /opt/input/text3.txt

在Hive中创建表“docs”，并把输入目录的文件数据加载到该表中；

hive
use hive;
create table docs(line string);
load data local inpath '/opt/input' overwrite into table docs;
select * from docs;

编写HiveQL语句对输入目录的文本进行词频统计，统计单词“姓名学号”出现的次数。

create table word_count as
select word,count(1) as count from
(select explode(split(line,' ')) as word from docs) w
group by word
order by word;

select * from word_count;
describe word_count;