大数据hadoop实验

镜像下载:ubuntu-18.04.6-desktop-amd64.iso

安装ubuntu系统

image-20250306172718280

image-20250306172858685

image-20250306172955435

一路确定就行了

image-20250306173133874

设置dhcp模式否则无法联网安装

#dhcp服务

image-20250306181204335

然后开机,然后选择中文,然后按提示安装

image-20250306173400725

image-20250306173436631

然后就开始安装就行了,到后面重启之后,可能会遇到这个界面

image-20250308211726292

解决办法

image-20250308211858778

然后你再开机就行了

切换阿里云镜像源

image-20250306182430499

image-20250306212359634

等待更新缓存

到桌面后右键桌面空白处打开终端进行输入下面指令

一键安装vm-tools可以实现跨端复制粘贴

1
2
3
sudo wget https://resource.qianyios.top/init.sh
sudo chmod +x init.sh
bash init.sh

接下来重启等待软件生效之后,你就关机,这时候你要打个快照,以便后面做项目出错可以恢复,然后开机

image-20250306174127049

创建hadoop用户

创建hadoop用户并且设置密码

1
2
sudo useradd -m hadoop -s /bin/bash
sudo passwd hadoop

image-20250306180546505

给hadoop用户添加sudo权限

1
sudo adduser hadoop sudo

image-20250306180700866

这时候桌面右上角注销账号切换成hadoop

image-20250306182143578

设置ssh免密

一键全部复制,然后粘贴回车就会自动进行免密

代码中有password=“123456”,记得改成你的hadoop用户的密码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
sudo cat >ssh.sh<<"EOF"
#!/bin/bash
sudo apt-get install openssh-server -y
sudo systemctl disable ufw --now
# 确保 PasswordAuthentication 设置为 yes
echo "正在更新 SSH 配置..."
sudo sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication yes/' /etc/ssh/sshd_config
sudo systemctl restart ssh

# 安装 sshpass
echo "正在安装 sshpass..."
sudo apt update
sudo apt install -y sshpass || { echo "安装 sshpass 失败"; exit 1; }
echo "sshpass 安装完成。"

# 创建 .ssh 目录并设置权限
echo "正在检查 .ssh 目录..."
if [ ! -d ~/.ssh ]; then
sudo mkdir -p ~/.ssh
fi
sudo chmod 700 ~/.ssh
sudo chown -R hadoop:hadoop ~/.ssh

# 目标主机列表
hosts=("localhost")
# 密码
password="123456"

# 生成 SSH 密钥对
echo "正在生成 SSH 密钥对..."
if [ ! -f ~/.ssh/id_rsa ]; then
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa || { echo "生成 SSH 密钥对失败"; exit 1; }
fi
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub
echo "SSH 密钥对已生成。"

# 循环遍历目标主机
for host in "${hosts[@]}"
do
echo "正在为 $host 配置免密登录..."

# 确保目标主机的 .ssh 目录存在
sshpass -p "$password" ssh -o StrictHostKeyChecking=no "$host" "mkdir -p ~/.ssh && chmod 700 ~/.ssh"

# 将公钥复制到目标主机
sshpass -p "$password" ssh-copy-id -i ~/.ssh/id_rsa.pub -o StrictHostKeyChecking=no "$host" || { echo "复制公钥到 $host 失败"; exit 1; }

# 验证免密登录是否成功
sshpass -p "$password" ssh -o StrictHostKeyChecking=no "$host" "echo '免密登录成功'" || { echo "验证免密登录失败"; exit 1; }
done

echo "所有配置已完成。"
EOF

运行脚本

1
bash ssh.sh

测试登入localhost是否可以实现无密码登入

1
ssh localhost

image-20250306185319755

成功

安装java和hadoop

将两个文件复制到下载的目录去

image-20250306190055853

然后在这个文件夹下,空白处右键,打开终端

image-20250306190200797

1
确认一下当前文件夹是不是有这两个文件
1
ls

image-20250306190237227

以下的全部复制运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
sudo mkdir /usr/lib/jvm
#安装java8
sudo tar -xf jdk-8u162-linux-x64.tar.gz -C /usr/lib/jvm
echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> ~/.bashrc
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> ~/.bashrc
source ~/.bashrc
java -version

#安装hadoop-3.1.3
sudo tar -zxf hadoop-3.1.3.tar.gz -C /usr/local
sudo mv /usr/local/hadoop-3.1.3/ /usr/local/hadoop
echo "export HADOOP_HOME=/usr/local/hadoop" >> ~/.bashrc
echo "export PATH=\$HADOOP_HOME/bin/:\$HADOOP_HOME/sbin/:\$PATH" >> ~/.bashrc
source ~/.bashrc
sudo chown -R hadoop /usr/local/hadoop
hadoop version

这里是作业要截图的地方

image-20250306191054386

image-20250306191150354

这时候关机打个快照,命名为基础

伪分布安装

编写cort-site.yaml文件

以下的全部复制运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat > /usr/local/hadoop/etc/hadoop/core-site.xml<< "EOF"
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
EOF

编写hdfs-site.xml

以下的全部复制运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat >/usr/local/hadoop/etc/hadoop/hdfs-site.xml<<"EOF"
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
EOF

启动hhdfs服务

hdfs初始化

这条命令只需要运行一次,以后都不要再运行了!!!!!!

这条命令只需要运行一次,以后都不要再运行了!!!!!!

这条命令只需要运行一次,以后都不要再运行了!!!!!!

1
hdfs namenode -format

image-20250306191818068

出现这个说明初始化成功

添加hdfs yarn的环境变量

以下的全部复制运行

1
2
3
4
5
6
7
echo "export HDFS_NAMENODE_USER=hadoop" >> ~/.bashrc
echo "export HDFS_DATANODE_USER=hadoop" >> ~/.bashrc
echo "export HDFS_SECONDARYNAMENODE_USER=hadoop" >> ~/.bashrc
echo "export YARN_RESOURCEMANAGER_USER=hadoop" >> ~/.bashrc
echo "export YARN_NODEMANAGER_USER=hadoop" >> ~/.bashrc
source ~/.bashrc
echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> /usr/local/hadoop/etc/hadoop/hadoop-env.sh
1
2
3
4
#开启hadoop的命令
start-all.sh
#当你要关机的时候先运行下面的命令关掉hadoop先,再关机
stop-all.sh

这里是作业要截图的地方

image-20250306192423176

jps命令用来查看进程是否启动,以上是hadoop正常启动的进程,总共有6个

访问hadoop网页

看看你的ip

1
ip a

image-20250306192632197

如果你这里没有ip说明你没有开启dhcp服务,自行回到最开始,找开启dhcp的方法,关机开启dhcp,然后开机就会有ip了

这里是作业要截图的地方

http://ip:9870

1
http://192.168.48.132:9870/

image-20250306192915923

http://ip:8088

image-20250306193032219

关机步骤

这时候关闭hadoop集群

1
stop-all.sh

然后关机打快照,命名伪分布

1
sudo poweroff

然后在这里打个快照,命名为伪分布安装成功,等你哪天机子坏了,你就可以恢复快照

image-20250318173513618

严肃告知,别说我没提醒你,不要直接关机,也不要挂起虚拟机,否则你的虚拟机和hadoop坏了,你就重装吧

image-20250318173348088

第一次实验

熟悉常用的Linux操作

1)cd命令:切换目录

(1) 切换到目录“/usr/local”

1
cd /usr/local

(2) 切换到当前目录的上一级目录

1
cd ..

(3) 切换到当前登录Linux系统的用户的自己的主文件夹

1
cd ~

image-20250313143132503

2)ls命令:查看文件与目录

查看目录“/usr”下的所有文件和目录

1
2
cd /usr
ls -al

image-20250313143140661

3)mkdir命令:新建目录

(1)进入“/tmp”目录,创建一个名为“a”的目录,并查看“/tmp”目录下已经存在哪些目录

1
2
3
cd /tmp
mkdir a
ls -al

image-20250313143206832

(2)进入“/tmp”目录,创建目录“a1/a2/a3/a4”

1
2
cd /tmp
mkdir -p a1/a2/a3/a4

image-20250313143217857

4)rmdir命令:删除空的目录

(1)将上面创建的目录a(在“/tmp”目录下面)删除

(2)删除上面创建的目录“a1/a2/a3/a4” (在“/tmp”目录下面),然后查看“/tmp”目录下面存在哪些目录

1
2
3
4
5
cd /tmp
rmdir a
cd /tmp
rmdir -p a1/a2/a3/a4
ls -al

image-20250313150519301

5)cp命令:复制文件或目录

(1)将当前用户的主文件夹下的文件.bashrc复制到目录“/usr”下,并重命名为bashrc1

1
sudo cp ~/.bashrc /usr/bashrc1

image-20250313150547907

(2)在目录“/tmp”下新建目录test,再把这个目录复制到“/usr”目录下

1
2
3
cd /tmp
mkdir test
sudo cp -r /tmp/test /usr

image-20250313150608722

6)mv命令:移动文件与目录,或更名

(1)将“/usr”目录下的文件bashrc1移动到“/usr/test”目录下

1
sudo mv /usr/bashrc1 /usr/test

(2)将“/usr”目录下的test目录重命名为test2

1
sudo mv /usr/test /usr/test2

image-20250313150650543

7)rm命令:移除文件或目录

(1)将“/usr/test2”目录下的bashrc1文件删除

1
sudo rm /usr/test2/bashrc1

(2)将“/usr”目录下的test2目录删除

1
sudo rm -r /usr/test2

image-20250313150701498

8)cat命令:查看文件内容

查看当前用户主文件夹下的.bashrc文件内容

1
cat ~/.bashrc

image-20250313150717805

9)tac命令:反向查看文件内容

反向查看当前用户主文件夹下的.bashrc文件的内容

1
tac ~/.bashrc

image-20250313150727016

10)more命令:一页一页翻动查看

翻页查看当前用户主文件夹下的.bashrc文件的内容

1
more ~/.bashrc

image-20250313150745391

11)head命令:取出前面几行

(1)查看当前用户主文件夹下.bashrc文件内容前20行

1
head -n 20 ~/.bashrc

(2)查看当前用户主文件夹下.bashrc文件内容,后面50行不显示,只显示前面几行

1
head -n -50 ~/.bashrc

image-20250313150813349

12)tail命令:取出后面几行

(1)查看当前用户主文件夹下.bashrc文件内容最后20行

1
tail -n 20 ~/.bashrc

(2)查看当前用户主文件夹下.bashrc文件内容,并且只列出50行以后的数据

1
tail -n +50 ~/.bashrc

image-20250313150837188

13)touch命令:修改文件时间或创建新文件

(1)在“/tmp”目录下创建一个空文件hello,并查看文件时间

1
2
3
cd /tmp
touch hello
ls -l hello

image-20250313150847348

(2)修改hello文件,将文件时间整为5天前

1
touch -d "5 days ago" hello

image-20250313150952003

14)chown命令:修改文件所有者权限

将hello文件所有者改为root帐号,并查看属性

1
2
sudo chown root /tmp/hello
ls -l /tmp/hello

image-20250313151030899

15)find命令:文件查找

找出主文件夹下文件名为.bashrc的文件

1
find ~ -name .bashrc

image-20250313151052617

16)tar命令:压缩命令

(1)在根目录“/”下新建文件夹test,然后在根目录“/”下打包成test.tar.gz

1
2
sudo mkdir /test
sudo tar -zcv -f /test.tar.gz test

(2)把上面的test.tar.gz压缩包,解压缩到“/tmp”目录

1
sudo tar -zxv -f /test.tar.gz -C /tmp

image-20250313151121057

17)grep命令:查找字符串

从“~/.bashrc”文件中查找字符串’examples’

1
grep -n 'examples' ~/.bashrc

image-20250313151133044

18)配置环境变量

(1)请在“~/.bashrc”中设置,配置Java环境变量

1
2
3
4
echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> ~/.bashrc
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> ~/.bashrc
source ~/.bashrc
java -version

(2)查看JAVA_HOME变量的值

1
echo $JAVA_HOME

image-20250313143109134

熟悉常用的Hadoop操作

(1)使用hadoop用户登录Linux系统,启动Hadoop(Hadoop的安装目录为“/usr/local/hadoop”),为hadoop用户在HDFS中创建用户目录“/user/hadoop”

1
2
start-dfs.sh
hdfs dfs -mkdir -p /user/hadoop

image-20250313151220243

(2)接着在HDFS的目录“/user/hadoop”下,创建test文件夹,并查看文件列表

1
2
hdfs dfs -mkdir test
hdfs dfs -ls .

image-20250313151235040

(3)将Linux系统本地的“~/.bashrc”文件上传到HDFS的test文件夹中,并查看test

1
2
hdfs dfs -put ~/.bashrc test
hdfs dfs -ls test

image-20250313151248871

(4)将HDFS文件夹test复制到Linux系统本地文件系统的“/usr/local/hadoop”目录下

1
hdfs dfs -get test ./

image-20250313151433060

(3.7.3)实验

安装eclipse

为了提高程序编写和调试效率,本教程采用Eclipse工具编写Java程序。
现在要执行的任务是:假设在目录hdfs://localhost:9000/user/hadoop下面有几个文件,分别是file1.txt、file2.txt、file3.txt、file4.abc和file5.abc,这里需要从该目录中过滤出所有后缀名不为.abc的文件,对过滤之后的文件进行读取,并将这些文件的内容合并到文件hdfs://localhost:9000/user/hadoop/merge.txt中。

要确保HDFS的/user/hadoop目录下已经存在file1.txt、file2.txt、file3.txt、file4.abc和file5.abc,每个文件里面有内容。这里,假设文件内容如下:
file1.txt的内容是: this is file1.txt
file2.txt的内容是: this is file2.txt
file3.txt的内容是: this is file3.txt
file4.abc的内容是: this is file4.abc
file5.abc的内容是: this is file5.abc

后面我会给命令,上面的内容就先看看

登入hadoop用户不多说了,启动hadoop集群

1
start-all.sh

下载eclipse安装包到ubuntu的下载目录,然后在空白处右键打开终端

image-20250318155038350

1
sudo ls
1
2
3
4
5
sudo tar -zxvf eclipse-4.7.0-linux.gtk.x86_64.tar.gz -C /usr/local 
sudo chown -R hadoop /usr/local/eclipse
echo "export ECLIPSE_HOME=/usr/local/eclipse" >> ~/.bashrc
echo "export PATH=\$ECLIPSE_HOME/:\$PATH" >> ~/.bashrc
source ~/.bashrc

启动eclipse

1
eclipse

在Eclipse中创建项目

启动Eclipse。当Eclipse启动以后,会弹出如下图所示界面,提示设置工作空间(workspace)。

image-20250318160051796

image-20250318160340954

选择File-->New-->Java Project菜单,开始创建一个Java工程,会弹出如下图所示界面。在Project name后面输入工程名称HDFSExample,选中Use default location,让这个Java工程的所有文件都保存到/home/hadoop/workspace/HDFSExample目录下。在JRE这个选项卡中,可以选择当前的Linux系统中已经安装好的JDK,比如jdk1.8.0_162。然后,点击界面底部的Next>按钮,进入下一步的设置。

image-20250318160434807

为项目添加需要用到的JAR包

为了能够运行程序,我们有四个目录jar包要添加到工程去

(1)/usr/local/hadoop/share/hadoop/common目录下的所有JAR包,包括

hadoop-common-3.1.3.jarhadoop-kms-3.1.3.jar
hadoop-common-3.1.3-tests.jarhadoop-nfs-3.1.3.jar

注意,不包括目录jdiff、lib、sources和webapps;


(2)/usr/local/hadoop/share/hadoop/common/lib目录下的所有JAR包;
(3)/usr/local/hadoop/share/hadoop/hdfs目录下的所有JAR包,注意,不包括目录jdiff、lib、sources和webapps;
(4)/usr/local/hadoop/share/hadoop/hdfs/lib目录下的所有JAR包。


以下我只演示第一种和第二种!!!!!!!!!

以下我只演示第一种和第二种!!!!!!!!!

以下我只演示第一种和第二种!!!!!!!!!

以下我只演示第一种和第二种!!!!!!!!!


第一种

/usr/local/hadoop/share/hadoop/common目录下的所有JAR包

点击Add External JARs…按钮,点击其他位置,自己看这个路径定位到这/usr/local/hadoop/share/hadoop/common,选择下面的四个包,然后点击ok

image-20250318161304973

第二种

/usr/local/hadoop/share/hadoop/common/lib目录下的所有JAR包;

image-20250318170347078

以下两个目录,我就不演示了,如果有文件夹被全选中,你就按住ctrl然后点击文件夹,就可以取消选中了,我们只添加所有后缀名为.jar的包

(3)/usr/local/hadoop/share/hadoop/hdfs目录下的所有JAR包,注意,不包括目录jdiff、lib、sources和webapps;
(4)/usr/local/hadoop/share/hadoop/hdfs/lib目录下的所有JAR包。

最后是这样的

image-20250318170435251

image-20250318161735778

编写Java应用程序

image-20250318162034984

在该界面中,只需要在Name后面输入新建的Java类文件的名称,这里采用称MergeFile,其他都可以采用默认设置,然后,点击界面右下角Finish按钮。

image-20250318162134278

image-20250318162458949

把下面的代码直接写到MergeFile.java,全选复制粘贴,这就不多说了,然后记得Ctrl+S保存

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import java.io.IOException;
import java.io.PrintStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
/**
* 过滤掉文件名满足特定条件的文件
*/
class MyPathFilter implements PathFilter {
String reg = null;
MyPathFilter(String reg) {
this.reg = reg;
}
public boolean accept(Path path) {
if (!(path.toString().matches(reg)))
return true;
return false;
}
}
/***
* 利用FSDataOutputStream和FSDataInputStream合并HDFS中的文件
*/
public class MergeFile {
Path inputPath = null; //待合并的文件所在的目录的路径
Path outputPath = null; //输出文件的路径
public MergeFile(String input, String output) {
this.inputPath = new Path(input);
this.outputPath = new Path(output);
}
public void doMerge() throws IOException {
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://localhost:9000");
conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fsSource = FileSystem.get(URI.create(inputPath.toString()), conf);
FileSystem fsDst = FileSystem.get(URI.create(outputPath.toString()), conf);
//下面过滤掉输入目录中后缀为.abc的文件
FileStatus[] sourceStatus = fsSource.listStatus(inputPath,
new MyPathFilter(".*\\.abc"));
FSDataOutputStream fsdos = fsDst.create(outputPath);
PrintStream ps = new PrintStream(System.out);
//下面分别读取过滤之后的每个文件的内容,并输出到同一个文件中
for (FileStatus sta : sourceStatus) {
//下面打印后缀不为.abc的文件的路径、文件大小
System.out.print("路径:" + sta.getPath() + " 文件大小:" + sta.getLen()
+ " 权限:" + sta.getPermission() + " 内容:");
FSDataInputStream fsdis = fsSource.open(sta.getPath());
byte[] data = new byte[1024];
int read = -1;
while ((read = fsdis.read(data)) > 0) {
ps.write(data, 0, read);
fsdos.write(data, 0, read);
}
fsdis.close();
}
ps.close();
fsdos.close();
}
public static void main(String[] args) throws IOException {
MergeFile merge = new MergeFile(
"hdfs://localhost:9000/user/hadoop/",
"hdfs://localhost:9000/user/hadoop/merge.txt");
merge.doMerge();
}
}

编译运行程序

在这里强调一下,如果你没启动hadoop自行启动,我早已在7.1告知启动了

编写测试文件

1
2
3
4
5
6
7
8
9
10
11
12
echo "this is file1.txt" > file1.txt
echo "this is file2.txt" > file2.txt
echo "this is file3.txt" > file3.txt
echo "this is file4.abc" > file4.abc
echo "this is file5.abc" > file5.abc
hdfs dfs -mkdir -p /user/hadoop
hdfs dfs -put file1.txt /user/hadoop/
hdfs dfs -put file2.txt /user/hadoop/
hdfs dfs -put file3.txt /user/hadoop/
hdfs dfs -put file4.abc /user/hadoop/
hdfs dfs -put file5.abc /user/hadoop/
hdfs dfs -ls /user/hadoop

image-20250318163431895

image-20250318163620255

image-20250318163921782

image-20250318171657423

最后验证是否成功

1
hdfs dfs -cat /user/hadoop/merge.txt

image-20250318171722425

应用程序的部署

因为前面只是在eclipse运行java项目才会生成merge.txt,我们的目的是通过hadoop去执行这个java项目,所以我们要对工程打包

创建myapp目录

目的:用来存放hadoop应用程序目录

1
mkdir /usr/local/hadoop/myapp

开始打包程序

image-20250318172258928

Launch configuration下拉选择MergeFile-HDFSExample

Export destination填写 /usr/local/hadoop/myapp/HDFSExample.jar

image-20250318172440560

image-20250318172512532

image-20250318172626710

查看是否生成

1
ls /usr/local/hadoop/myapp

image-20250318172709040

重新验证项目的运行

由于我们在eclipse测试过了项目,之前就在hdfs目录生成了/user/hadoop/merge.txt,为了验证刚刚打包的项目,我们要删掉这个/user/hadoop/merge.txt,等等重新运行项目

1
2
3
hdfs dfs -rm /user/hadoop/merge.txt
hadoop jar /usr/local/hadoop/myapp/HDFSExample.jar
hdfs dfs -cat /user/hadoop/merge.txt

image-20250318173057716

image-20250318173128692

如果你没事了,要关机了就回到这里5.6 关机步骤,去执行关机

顺便把eclipse的窗口关掉

严肃告知,别说我没提醒你,不要直接关机,也不要挂起虚拟机,否则你的虚拟机和你的hadoop坏了,你就重装,如果你坏了你也可以恢复快照到伪分布安装成功,但是你只是要重新做这周的实验

image-20250318173348088

练习文件

写入文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;

public class write {
public static void main(String[] args) {
try {
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://localhost:9000");
conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
byte[] buff = "Hello world".getBytes(); // 要写入的内容
String filename = "gcc-test"; //要写入的文件名
FSDataOutputStream os = fs.create(new Path(filename));
os.write(buff,0,buff.length);
System.out.println("Create:"+ filename);
os.close();
fs.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327152823953

1
2
hdfs dfs -ls /user/hadoop
hdfs dfs -cat /user/hadoop/gcc-test

image-20250327153115107

判断文件是否存在

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class panduan {
public static void main(String[] args) {
try {
String filename = "gcc-test";

Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://localhost:9000");
conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(filename))){
System.out.println("文件存在");
}else{
System.out.println("文件不存在");
}
fs.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327153328234

读取文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;

public class read {
public static void main(String[] args) {
try {
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://localhost:9000");
conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
Path file = new Path("gcc-test");
FSDataInputStream getIt = fs.open(file);
BufferedReader d = new BufferedReader(new InputStreamReader(getIt));
String content = d.readLine(); //读取文件一行
System.out.println(content);
d.close(); //关闭文件
fs.close(); //关闭hdfs
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327153426275

第二次实验

编程实现以下指定功能,并利用Hadoop提供的Shell命令完成相同的任务。

① 向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,由用户指定是追加到原有文件末尾还是覆盖原有的文件。

shell

检查文件是否存在,可以使用如下命令:

1
2
3
4
echo "gcc-text" > /home/hadoop/text.txt
hdfs dfs -put /home/hadoop/text.txt /user/hadoop/text.txt
hdfs dfs -test -e text.txt
echo $?

image-20250327163611055

返回 0 表示文件存在。

返回 1 表示文件不存在。

如果结果显示文件已经存在,则用户可以选择追加到原来文件末尾或者覆盖原来文件,具体命令如下:

1
echo "gcc-local" > /home/hadoop/local.txt

local.txt 是本地文件的路径。

/text.txt 是 HDFS 中的文件路径。

1
2
3
4
5
6
7
8
9
#追加到原文件末尾
hdfs dfs -appendToFile local.txt text.txt
hdfs dfs -cat text.txt
#覆盖原来文件,第一种命令形式
hdfs dfs -copyFromLocal -f local.txt text.txt
hdfs dfs -cat text.txt
#覆盖原来文件,第二种命令形式
hdfs dfs -cp -f file:///home/hadoop/local.txt text.txt
hdfs dfs -cat text.txt

image-20250327163724735

image-20250327163755158

image-20250327163824318

实际上,也可以不用上述方式,而是采用如下命令来实现:

1
2
3
4
5
6
7
8
hdfs dfs -rm text.txt
hdfs dfs -put text.txt
hdfs dfs -cat text.txt
if $(hdfs dfs -test -e text.txt);
then $(hdfs dfs -appendToFile local.txt text.txt);
else $(hdfs dfs -copyFromLocal -f local.txt text.txt);
fi
hdfs dfs -cat text.txt

image-20250327163953466

Java

我这里只说一次,自己创建好HDFSApi.java后面的每个实验,都会覆盖前面一个实验的代码

你就不要手欠,去创建别的,你要是自己会也行

后面就不会再说了

image-20250327173112868

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 判断路径是否存在
*/
public static boolean test(Configuration conf, String path) throws IOException {
FileSystem fs = FileSystem.get(conf);
return fs.exists(new Path(path));
}
/**
* 复制文件到指定路径
* 若路径已存在,则进行覆盖
*/
public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path localPath = new Path(localFilePath);
Path remotePath = new Path(remoteFilePath);
/* fs.copyFromLocalFile 第一个参数表示是否删除源文件,第二个参数表示是否覆盖 */
fs.copyFromLocalFile(false, true, localPath, remotePath);
fs.close();
}
/**
* 追加文件内容
*/
public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
/* 创建一个文件读入流 */
FileInputStream in = new FileInputStream(localFilePath);
/* 创建一个文件输出流,输出的内容将追加到文件末尾 */
FSDataOutputStream out = fs.append(remotePath);
/* 读写文件内容 */
byte[] data = new byte[1024];
int read = -1;
while ( (read = in.read(data)) > 0 ) {
out.write(data, 0, read);
}
out.close();
in.close();
fs.close();
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String localFilePath = "/home/hadoop/text.txt"; // 本地路径
String remoteFilePath = "/user/hadoop/text.txt"; // HDFS路径
String choice = "append"; // 若文件存在则追加到文件末尾
// String choice = "overwrite"; // 若文件存在则覆盖
try {
/* 判断文件是否存在 */
Boolean fileExists = false;
if (HDFSApi.test(conf, remoteFilePath)) {
fileExists = true;
System.out.println(remoteFilePath + " 已存在.");
} else {
System.out.println(remoteFilePath + " 不存在.");
}
/* 进行处理 */
if ( !fileExists) { // 文件不存在,则上传
HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
System.out.println(localFilePath + " 已上传至 " + remoteFilePath);
} else if ( choice.equals("overwrite") ) { // 选择覆盖
HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);
} else if ( choice.equals("append") ) { // 选择追加
HDFSApi.appendToFile(conf, localFilePath, remoteFilePath);
System.out.println(localFilePath + " 已追加至 " + remoteFilePath);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327160802735

验证

1
hdfs dfs -cat text.txt

image-20250327173741220

② 从HDFS中下载指定文件,如果本地文件与要下载的文件名称相同,则自动对下载的文件重命名。

shell

1
2
3
4
5
6
ls | grep text
if $(hdfs dfs -test -e file:///home/hadoop/text.txt);
then $(hdfs dfs -copyToLocal text.txt ./text2.txt);
else $(hdfs dfs -copyToLocal text.txt ./text.txt);
fi
ls | grep text

image-20250327161545345

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 下载文件到本地
* 判断本地路径是否已存在,若已存在,则自动进行重命名
*/
public static void copyToLocal(Configuration conf, String remoteFilePath, String localFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
File f = new File(localFilePath);
/* 如果文件名存在,自动重命名(在文件名后面加上 _0, _1 ...) */
if (f.exists()) {
System.out.println(localFilePath + " 已存在.");
Integer i = 0;
while (true) {
f = new File(localFilePath + "_" + i.toString());
if (!f.exists()) {
localFilePath = localFilePath + "_" + i.toString();
break;
}
}
System.out.println("将重新命名为: " + localFilePath);
}
// 下载文件到本地
Path localPath = new Path(localFilePath);
fs.copyToLocalFile(remotePath, localPath);
fs.close();
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String localFilePath = "/home/hadoop/text.txt"; // 本地路径
String remoteFilePath = "/user/hadoop/text.txt"; // HDFS路径
try {
HDFSApi.copyToLocal(conf, remoteFilePath, localFilePath);
System.out.println("下载完成");
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327173448075

验证:

1
ls | grep text

image-20250327162206125

③ 将HDFS中指定文件的内容输出到终端。

shell

1
hdfs dfs -cat text.txt

image-20250327173824043

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 读取文件内容
*/
public static void cat(Configuration conf, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
FSDataInputStream in = fs.open(remotePath);
BufferedReader d = new BufferedReader(new InputStreamReader(in));
String line = null;
while ( (line = d.readLine()) != null ) {
System.out.println(line);
}
d.close();
in.close();
fs.close();
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteFilePath = "/user/hadoop/text.txt"; // HDFS路径
try {
System.out.println("读取文件: " + remoteFilePath);
HDFSApi.cat(conf, remoteFilePath);
System.out.println("\n读取完成");
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327173919281

④ 显示HDFS中指定的文件读写权限、大小、创建时间、路径等信息。

shell

1
hdfs dfs -ls -h text.txt

image-20250327174107993

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;
public class HDFSApi {
/**
* 显示指定文件的信息
*/
public static void ls(Configuration conf, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
FileStatus[] fileStatuses = fs.listStatus(remotePath);
for (FileStatus s : fileStatuses) {
System.out.println("路径: " + s.getPath().toString());
System.out.println("权限: " + s.getPermission().toString());
System.out.println("大小: " + s.getLen());
/* 返回的是时间戳,转化为时间日期格式 */
Long timeStamp = s.getModificationTime();
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
String date = format.format(timeStamp);
System.out.println("时间: " + date);
}
fs.close();
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteFilePath = "/user/hadoop/text.txt"; // HDFS路径
try {
System.out.println("读取文件信息: " + remoteFilePath);
HDFSApi.ls(conf, remoteFilePath);
System.out.println("\n读取完成");
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327174122828

⑤ 给定HDFS中某一个目录,输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息,如果该文件是目录,则递归输出该目录下所有文件相关信息。

shell

1
hdfs dfs -ls -R -h /user/hadoop

image-20250327174252672

别管我这里有什么文件,你能显示出来就行

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;
public class HDFSApi {
/**
* 显示指定文件夹下所有文件的信息(递归)
*/
public static void lsDir(Configuration conf, String remoteDir) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path dirPath = new Path(remoteDir);
/* 递归获取目录下的所有文件 */
RemoteIterator<LocatedFileStatus> remoteIterator = fs.listFiles(dirPath, true);
/* 输出每个文件的信息 */
while (remoteIterator.hasNext()) {
FileStatus s = remoteIterator.next();
System.out.println("路径: " + s.getPath().toString());
System.out.println("权限: " + s.getPermission().toString());
System.out.println("大小: " + s.getLen());
/* 返回的是时间戳,转化为时间日期格式 */
Long timeStamp = s.getModificationTime();
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
String date = format.format(timeStamp);
System.out.println("时间: " + date);
System.out.println();
}
fs.close();
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteDir = "/user/hadoop"; // HDFS路径
try {
System.out.println("(递归)读取目录下所有文件的信息: " + remoteDir);
HDFSApi.lsDir(conf, remoteDir);
System.out.println("读取完成");
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327174420260

⑥ 提供一个HDFS中的文件的路径,对该文件进行创建和删除操作。如果文件所在目录不存在,则自动创建目录。

shell

1
2
3
4
5
if $(hdfs dfs -test -d dir1/dir2);
then $(hdfs dfs -touchz dir1/dir2/filename);
else $(hdfs dfs -mkdir -p dir1/dir2 && hdfs dfs -touchz dir1/dir2/filename);
fi
hdfs dfs -rm dir1/dir2/filename

image-20250327175606597

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 判断路径是否存在
*/
public static boolean test(Configuration conf, String path) throws IOException {
FileSystem fs = FileSystem.get(conf);
return fs.exists(new Path(path));
}
/**
* 创建目录
*/
public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path dirPath = new Path(remoteDir);
boolean result = fs.mkdirs(dirPath);
fs.close();
return result;
}
/**
* 创建文件
*/
public static void touchz(Configuration conf, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
FSDataOutputStream outputStream = fs.create(remotePath);
outputStream.close();
fs.close();
}
/**
* 删除文件
*/
public static boolean rm(Configuration conf, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
boolean result = fs.delete(remotePath, false);
fs.close();
return result;
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteFilePath = "/user/hadoop/input/text.txt"; // HDFS路径
String remoteDir = "/user/hadoop/input"; // HDFS路径对应的目录
try {
/* 判断路径是否存在,存在则删除,否则进行创建 */
if ( HDFSApi.test(conf, remoteFilePath) ) {
HDFSApi.rm(conf, remoteFilePath); // 删除
System.out.println("删除路径: " + remoteFilePath);
} else {
if ( !HDFSApi.test(conf, remoteDir) ) { // 若目录不存在,则进行创建
HDFSApi.mkdir(conf, remoteDir);
System.out.println("创建文件夹: " + remoteDir);
}
HDFSApi.touchz(conf, remoteFilePath);
System.out.println("创建路径: " + remoteFilePath);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327175819965

⑦ 提供一个HDFS的目录的路径,对该目录进行创建和删除操作。创建目录时,如果目录文件所在目录不存在则自动创建相应目录;删除目录时,由用户指定当该目录不为空时是否还删除该目录。

shell

1
2
3
4
hdfs dfs -mkdir -p dir1/dir2
hdfs dfs -rmdir dir1/dir2
#上述命令执行以后,如果目录非空,则会提示not empty,删除操作不会执行。如果要强制删除目录,可以使用如下命令:
hdfs dfs -rm -R dir1/dir2

image-20250327175958011

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 判断路径是否存在
*/
public static boolean test(Configuration conf, String path) throws IOException {
FileSystem fs = FileSystem.get(conf);
return fs.exists(new Path(path));
}
/**
* 判断目录是否为空
* true: 空,false: 非空
*/
public static boolean isDirEmpty(Configuration conf, String remoteDir) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path dirPath = new Path(remoteDir);
RemoteIterator<LocatedFileStatus> remoteIterator = fs.listFiles(dirPath, true);
return !remoteIterator.hasNext();
}
/**
* 创建目录
*/
public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path dirPath = new Path(remoteDir);
boolean result = fs.mkdirs(dirPath);
fs.close();
return result;
}
/**
* 删除目录
*/
public static boolean rmDir(Configuration conf, String remoteDir) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path dirPath = new Path(remoteDir);
/* 第二个参数表示是否递归删除所有文件 */
boolean result = fs.delete(dirPath, true);
fs.close();
return result;
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteDir = "/user/hadoop/input"; // HDFS目录
Boolean forceDelete = false; // 是否强制删除
try {
/* 判断目录是否存在,不存在则创建,存在则删除 */
if ( !HDFSApi.test(conf, remoteDir) ) {
HDFSApi.mkdir(conf, remoteDir); // 创建目录
System.out.println("创建目录: " + remoteDir);
} else {
if ( HDFSApi.isDirEmpty(conf, remoteDir) || forceDelete ) { // 目录为空或强制删除
HDFSApi.rmDir(conf, remoteDir);
System.out.println("删除目录: " + remoteDir);
} else { // 目录不为空
System.out.println("目录不为空,不删除: " + remoteDir);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327180216959

⑧ 向HDFS中指定的文件追加内容,由用户指定将内容追加到原有文件的开头或结尾。

shell

1
2
3
4
5
6
rm -rf text.txt
hdfs dfs -appendToFile local.txt text.txt
hdfs dfs -get text.txt
cat text.txt >> local.txt
hdfs dfs -copyFromLocal -f text.txt text.txt
hdfs dfs -cat text.txt

image-20250327180622881

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 判断路径是否存在
*/
public static boolean test(Configuration conf, String path) throws IOException {
FileSystem fs = FileSystem.get(conf);
return fs.exists(new Path(path));
}
/**
* 追加文本内容
*/
public static void appendContentToFile(Configuration conf, String content, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
/* 创建一个文件输出流,输出的内容将追加到文件末尾 */
FSDataOutputStream out = fs.append(remotePath);
out.write(content.getBytes());
out.close();
fs.close();
}
/**
* 追加文件内容
*/
public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
/* 创建一个文件读入流 */
FileInputStream in = new FileInputStream(localFilePath);
/* 创建一个文件输出流,输出的内容将追加到文件末尾 */
FSDataOutputStream out = fs.append(remotePath);
/* 读写文件内容 */
byte[] data = new byte[1024];
int read = -1;
while ( (read = in.read(data)) > 0 ) {
out.write(data, 0, read);
}
out.close();
in.close();
fs.close();
}
/**
* 移动文件到本地
* 移动后,删除源文件
*/
public static void moveToLocalFile(Configuration conf, String remoteFilePath, String localFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
Path localPath = new Path(localFilePath);
fs.moveToLocalFile(remotePath, localPath);
}
/**
* 创建文件
*/
public static void touchz(Configuration conf, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
FSDataOutputStream outputStream = fs.create(remotePath);
outputStream.close();
fs.close();
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteFilePath = "/user/hadoop/text.txt"; // HDFS文件
String content = "新追加的内容\n";
String choice = "after"; //追加到文件末尾
// String choice = "before"; // 追加到文件开头
try {
/* 判断文件是否存在 */
if ( !HDFSApi.test(conf, remoteFilePath) ) {
System.out.println("文件不存在: " + remoteFilePath);
} else {
if ( choice.equals("after") ) { // 追加在文件末尾
HDFSApi.appendContentToFile(conf, content, remoteFilePath);
System.out.println("已追加内容到文件末尾" + remoteFilePath);
} else if ( choice.equals("before") ) { // 追加到文件开头
/* 没有相应的api可以直接操作,因此先把文件移动到本地*/
/*创建一个新的HDFS,再按顺序追加内容 */
String localTmpPath = "/user/hadoop/tmp.txt";
// 移动到本地
HDFSApi.moveToLocalFile(conf, remoteFilePath, localTmpPath);
// 创建一个新文件
HDFSApi.touchz(conf, remoteFilePath);
// 先写入新内容
HDFSApi.appendContentToFile(conf, content, remoteFilePath);
// 再写入原来内容
HDFSApi.appendToFile(conf, localTmpPath, remoteFilePath);
System.out.println("已追加内容到文件开头: " + remoteFilePath);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327180721307

1
hdfs dfs -cat text.txt

image-20250327180742432

⑨ 删除HDFS中指定的文件。

shell

1
2
3
rm text.txt
hdfs dfs -get text.txt
hdfs dfs -rm text.txt

image-20250327180918344

1
hdfs dfs -put text.txt

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 删除文件
*/
public static boolean rm(Configuration conf, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
boolean result = fs.delete(remotePath, false);
fs.close();
return result;
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteFilePath = "/user/hadoop/text.txt"; // HDFS文件
try {
if ( HDFSApi.rm(conf, remoteFilePath) ) {
System.out.println("文件删除: " + remoteFilePath);
} else {
System.out.println("操作失败(文件不存在或删除失败)");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327181030978

⑩ 在HDFS中将文件从源路径移动到目的路径。

shell

1
2
3
hdfs dfs -put text.txt
hdfs dfs -mv text.txt text2.txt
hdfs dfs -ls

image-20250327181119936

1
hdfs dfs -put text.txt

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
public class HDFSApi {
/**
* 移动文件
*/
public static boolean mv(Configuration conf, String remoteFilePath, String remoteToFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path srcPath = new Path(remoteFilePath);
Path dstPath = new Path(remoteToFilePath);
boolean result = fs.rename(srcPath, dstPath);
fs.close();
return result;
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteFilePath = "hdfs:///user/hadoop/text.txt"; // 源文件HDFS路径
String remoteToFilePath = "hdfs:///user/hadoop/new.txt"; // 目的HDFS路径
try {
if ( HDFSApi.mv(conf, remoteFilePath, remoteToFilePath) ) {
System.out.println("将文件 " + remoteFilePath + " 移动到 " + remoteToFilePath);
} else {
System.out.println("操作失败(源文件不存在或移动失败)");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327181230298

1
hdfs dfs -ls | grep new

image-20250327181334769

(2)编程实现一个类“MyFSDataInputStream”,该类继承“org.apache.hadoop.fs.FSDataInput Stream”,要求如下: 实现按行读取HDFS中指定文件的方法“readLine()”,如果读到文件末尾,则返回空,否则返回文件一行的文本。

shell

1
hdfs dfs -put text.txt

Java

自己创建好MyFSDataInputStream.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import  org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.*;
public class MyFSDataInputStream extends FSDataInputStream {
public MyFSDataInputStream(InputStream in) {
super(in);
}
/**
* 实现按行读取
* 每次读入一个字符,遇到"\n"结束,返回一行内容
*/
public static String readline(BufferedReader br) throws IOException {
char[] data = new char[1024];
int read = -1;
int off = 0;
// 循环执行时,br 每次会从上一次读取结束的位置继续读取
//因此该函数里,off 每次都从0开始
while ( (read = br.read(data, off, 1)) != -1 ) {
if (String.valueOf(data[off]).equals("\n") ) {
off += 1;
break;
}
off += 1;
}
if (off > 0) {
return String.valueOf(data);
} else {
return null;
}
}
/**
* 读取文件内容
*/
public static void cat(Configuration conf, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path remotePath = new Path(remoteFilePath);
FSDataInputStream in = fs.open(remotePath);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line = null;
while ( (line = MyFSDataInputStream.readline(br)) != null ) {
System.out.println(line);
}
br.close();
in.close();
fs.close();
}
/**
* 主函数
*/
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
String remoteFilePath = "/user/hadoop/text.txt"; // HDFS路径
try {
MyFSDataInputStream.cat(conf, remoteFilePath);
} catch (Exception e) {
e.printStackTrace();
}
}
}

image-20250327181734199

(3)查看Java帮助手册或其他资料,用“java.net.URL”和“org.apache.hadoop.fs.FsURLStream HandlerFactory”编程来输出HDFS中指定文件的文本到终端中。

Java

用回HDFSApi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.IOUtils;
import java.io.*;
import java.net.URL;

public class HDFSApi {
static {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}

/**
* 主函数
*/
public static void main(String[] args) throws Exception {
// HDFS文件路径,需包含主机名和端口号
String remoteFilePath = "hdfs://localhost:9000/user/hadoop/text.txt"; // 修改为正确的HDFS URI
InputStream in = null;

try {
/* 通过URL对象打开数据流,从中读取数据 */
in = new URL(remoteFilePath).openStream();
IOUtils.copyBytes(in, System.out, 4096, false); // 将数据输出到控制台
} finally {
IOUtils.closeStream(in); // 关闭输入流
}
}
}

image-20250327182240291

Hbase安装(4.6.1)

这里有两个题,就是要交两个截图,我会注明

进hadoop用户

image-20250408162058097

自行启动hadoop

安装hbase

1
sudo ls

image-20250408162139854

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
sudo tar -xf hbase-2.5.4-bin.tar.gz -C /usr/local/
sudo mv /usr/local/hbase-2.5.4 /usr/local/hbase
sudo chown -R hadoop:hadoop /usr/local/hbase
echo "export HBASE_HOME=/usr/local/hbase" >> ~/.bashrc
echo "export PATH=\$PATH:\$HBASE_HOME/bin" >> ~/.bashrc
source ~/.bashrc
sudo sed -i "s/CLASSPATH=\${CLASSPATH}:\$JAVA_HOME\/lib\/tools.jar/CLASSPATH=\${CLASSPATH}:\$JAVA_HOME\/lib\/tools.jar:\/usr\/local\/hbase\/lib\/*/g" /usr/local/hbase/bin/hbase

echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> $HBASE_HOME/conf/hbase-env.sh
echo "export HBASE_CLASSPATH=/usr/local/hbase/conf" >> $HBASE_HOME/conf/hbase-env.sh
echo "export HBASE_MANAGES_ZK=true" >> $HBASE_HOME/conf/hbase-env.sh
echo "export HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP=true" >> $HBASE_HOME/conf/hbase-env.sh

cat >$HBASE_HOME/conf/hbase-site.xml<<"EOF"
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
</configuration>
EOF
hbase version

这个截图交到4.6.1开头第一个作业

image-20250408162305665

启动hbase

开机顺序:一定是先启动hadoop(大)在启动hbase(小)

开机顺序:一定是先启动hadoop(大)在启动hbase(小)

开机顺序:一定是先启动hadoop(大)在启动hbase(小)

1
2
start-all.sh
start-hbase.sh

然后输入jps,有以下三个个就安装成功

这是4.6.1里最下面的第二个作业截图

image-20250408162520533

测试hbase

1
2
hbase shell
list

image-20250408162546768

能运行没报错就行

退出hbase数据库用exit

image-20250401165245612

访问hbase网页

http://ip:16010/

image-20250408162639789

关闭hbase

关机顺序:先关habse(小)再关hadoop(大)

关机顺序:先关habse(小)再关hadoop(大)

关机顺序:先关habse(小)再关hadoop(大)

关机顺序:先关habse(小)再关hadoop(大)

不按操作来,机器坏了,自己重装吧

1
2
3
stop-hbase.sh
stop-all.sh
sudo poweroff

关机后自己打个habse的快照

(4.6.2)实验

启动eclipse

1
eclipse

image-20250408164622803

image-20250408164638364

新建项目

名为HBaseExample

image-20250408164919609

image-20250408164950280

现在有几个目录要添加注意了!!!

现在有几个目录要添加注意了!!!

现在有几个目录要添加注意了!!!

现在有几个目录要添加注意了!!!

/usr/local/hbase/lib下所有的jar包

/usr/local/hbase/lib/client-facing-thirdparty下所有的jar包

image-20250408165300706

image-20250408165316864

image-20250408165342189

最后直接点击finish完成创建

新建class

image-20250408165715828

然后在你创建的这个java文件输入,别运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
public class ExampleForHBase {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;
public static void main(String[] args)throws IOException{
init();
createTable("student",new String[]{"score"});
insertData("student","zhangsan","score","English","69");
insertData("student","zhangsan","score","Math","86");
insertData("student","zhangsan","score","Computer","77");
getData("student", "zhangsan", "score","English");
close();
}

public static void init(){
configuration = HBaseConfiguration.create();
configuration.set("hbase.rootdir","hdfs://localhost:9000/hbase");
try{
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}catch (IOException e){
e.printStackTrace();
}
}

public static void close(){
try{
if(admin != null){
admin.close();
}
if(null != connection){
connection.close();
}
}catch (IOException e){
e.printStackTrace();
}
}

public static void createTable(String myTableName,String[] colFamily) throws IOException {
TableName tableName = TableName.valueOf(myTableName);
if(admin.tableExists(tableName)){
System.out.println("talbe is exists!");
}else {
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tableName);
for(String str:colFamily){
ColumnFamilyDescriptor family =
ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(str)).build();
tableDescriptor.setColumnFamily(family);
}
admin.createTable(tableDescriptor.build());
}
}

public static void insertData(String tableName,String rowKey,String colFamily,String col,String val) throws IOException {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(rowKey.getBytes());
put.addColumn(colFamily.getBytes(),col.getBytes(), val.getBytes());
table.put(put);
table.close();
}

public static void getData(String tableName,String rowKey,String colFamily, String col)throws IOException{
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(rowKey.getBytes());
get.addColumn(colFamily.getBytes(),col.getBytes());
Result result = table.get(get);
System.out.println(new String(result.getValue(colFamily.getBytes(),col==null?null:col.getBytes())));
table.close();
}
}

自行启动hadoop和hbase,不记得了回去翻记录,我有写启动顺序,别搞错了,搞错了就恢复快照吧,下面是关闭和启动的顺序

9.2 启动hbase

9.3 关闭hbase

没启动就不要做下面的内容!!!

没启动就不要做下面的内容!!!

没启动就不要做下面的内容!!!

没启动就不要做下面的内容!!!

运行代码

image-20250408170354469

就会出现这样的结果

4.6.2实验要交的截图1

image-20250408170422533

这时候进入hbase数据库查看有没有student表

1
hbase shell

这是进入hbase数据库的命令,我前面也有写后面不会再说了,记不住就自己找办法

1
2
list
scan 'student'

4.6.2实验要交的截图2

image-20250408170549890

(4.8实验3)

如果这里你输入第一条和第二条命令就报错,自己找找原因,我不想说了


第一题

(1)编程实现以下指定功能,并用Hadoop提供的HBaseShell命令完成相同的任务。

①列出HBase所有表的相关信息,如表名、创建时间等。

shell

1
2
hbase shell
list

image-20250408172837610

java

自己创建一个test.java,要在HBaseExample的项目下,后面一直都会用这个java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.util.List;
import java.io.IOException;
public class test {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;
public static void main(String[] args)throws IOException{
init();
List<TableDescriptor> tableDescriptors = admin.listTableDescriptors();
for(TableDescriptor tableDescriptor : tableDescriptors){
TableName tableName = tableDescriptor.getTableName();
System.out.println("Table:" + tableName);
}
close();
}

public static void init() {
configuration = HBaseConfiguration.create();
configuration.set("hbase.rootdir", "hbase://localhost:9000/hbase");
try {
connection = ConnectionFactory.createConnection(configuration);
if (connection == null) {
System.err.println("Failed to create HBase connection.");
} else {
System.out.println("HBase connection created successfully.");
}
admin = connection.getAdmin();
if (admin == null) {
System.err.println("Failed to get HBase Admin.");
} else {
System.out.println("HBase Admin initialized successfully.");
}
} catch (IOException e) {
e.printStackTrace();
}
}

public static void close(){
try{
if(admin != null){
admin.close();
}
if(null != connection){
connection.close();
}
}catch (IOException e){
e.printStackTrace();
}
}

public static void createTable(String myTableName,String[] colFamily) throws IOException {
TableName tableName = TableName.valueOf(myTableName);
if(admin.tableExists(tableName)){
System.out.println("talbe is exists!");
}else {
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tableName);
for(String str:colFamily){
ColumnFamilyDescriptor family =
ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(str)).build();
tableDescriptor.setColumnFamily(family);
}
admin.createTable(tableDescriptor.build());
}
}

public static void insertData(String tableName,String rowKey,String colFamily,String col,String val) throws IOException {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(rowKey.getBytes());
put.addColumn(colFamily.getBytes(),col.getBytes(), val.getBytes());
table.put(put);
table.close();
}

public static void getData(String tableName,String rowKey,String colFamily, String col)throws IOException{
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(rowKey.getBytes());
get.addColumn(colFamily.getBytes(),col.getBytes());
Result result = table.get(get);
System.out.println(new String(result.getValue(colFamily.getBytes(),col==null?null:col.getBytes())));
table.close();
}
}

image-20250408180910551

②在终端输出指定表的所有记录数据。

shell

1
scan 'student'

image-20250408175605631

java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.util.List;
import java.io.IOException;
public class test {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;
public static void main(String[] args) throws IOException {
// 指定表名 "student" 并获取所有记录
String tableName = "student";
getData(tableName);
}
// 在终端打印出指定表的所有记录数据
public static void getData(String tableName) throws IOException {
init(); // 初始化连接
Table table = connection.getTable(TableName.valueOf(tableName)); // 获取表对象
Scan scan = new Scan(); // 创建扫描器
ResultScanner scanner = table.getScanner(scan); // 获取扫描结果
System.out.println("表 " + tableName + " 的所有记录如下:");
for (Result result : scanner) { // 遍历每一行数据
printRecoder(result); // 打印每条记录的详情
}
close(); // 关闭连接
}
// 打印一条记录的详情
public static void printRecoder(Result result) throws IOException {
for (Cell cell : result.rawCells()) { // 遍历每个单元格
System.out.print("行键: " + new String(Bytes.toString(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength())));
System.out.print(" 列簇: " + new String(Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength())));
System.out.print(" 列: " + new String(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength())));
System.out.print(" 值: " + new String(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())));
System.out.println(" 时间戳: " + cell.getTimestamp());
}
}


public static void init() {
configuration = HBaseConfiguration.create();
configuration.set("hbase.rootdir", "hbase://localhost:9000/hbase");
try {
connection = ConnectionFactory.createConnection(configuration);
if (connection == null) {
System.err.println("Failed to create HBase connection.");
} else {
System.out.println("HBase connection created successfully.");
}
admin = connection.getAdmin();
if (admin == null) {
System.err.println("Failed to get HBase Admin.");
} else {
System.out.println("HBase Admin initialized successfully.");
}
} catch (IOException e) {
e.printStackTrace();
}
}

public static void close(){
try{
if(admin != null){
admin.close();
}
if(null != connection){
connection.close();
}
}catch (IOException e){
e.printStackTrace();
}
}

public static void createTable(String myTableName,String[] colFamily) throws IOException {
TableName tableName = TableName.valueOf(myTableName);
if(admin.tableExists(tableName)){
System.out.println("talbe is exists!");
}else {
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tableName);
for(String str:colFamily){
ColumnFamilyDescriptor family =
ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(str)).build();
tableDescriptor.setColumnFamily(family);
}
admin.createTable(tableDescriptor.build());
}
}

public static void insertData(String tableName,String rowKey,String colFamily,String col,String val) throws IOException {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(rowKey.getBytes());
put.addColumn(colFamily.getBytes(),col.getBytes(), val.getBytes());
table.put(put);
table.close();
}

public static void getData(String tableName,String rowKey,String colFamily, String col)throws IOException{
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(rowKey.getBytes());
get.addColumn(colFamily.getBytes(),col.getBytes());
Result result = table.get(get);
System.out.println(new String(result.getValue(colFamily.getBytes(),col==null?null:col.getBytes())));
table.close();
}
}

运行结果如下

image-20250408180840552

③向已经创建好的表添加和删除指定的列族或列。

shell

1
2
3
create 's1','score'
put 's1','zhangsan','score:Math','69'
delete 's1','zhangsan','score:Math'

image-20250408175944024

JAVA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class test {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;

public static void main(String[] args) throws IOException {
init(); // 初始化连接

String tableName = "s1"; // 表名
String[] columnFamilies = {"score"}; // 列簇
String rowKey = "zhangsan"; // 行键
String colFamily = "score"; // 列簇
String col = "Math"; // 列名
String val = "69"; // 值

// 创建表
System.out.println("开始创建表...");
createTable(tableName, columnFamilies);

// 插入数据
System.out.println("开始插入数据...");
insertRow(tableName, rowKey, colFamily, col, val);

// 查询数据
System.out.println("验证插入的数据...");
getData(tableName, rowKey, colFamily, col);

// 删除数据
System.out.println("开始删除数据...");
deleteRow(tableName, rowKey, colFamily, col);

// 验证删除
System.out.println("验证删除后的数据...");
getData(tableName, rowKey, colFamily, col);

close(); // 关闭连接
}

public static void init() {
configuration = HBaseConfiguration.create();
configuration.set("hbase.rootdir", "hdfs://localhost:9000/hbase"); // 注意这里用的是 hdfs
configuration.set("hbase.zookeeper.quorum", "localhost"); // 指定 zookeeper 地址
try {
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
System.out.println("HBase connection and Admin initialized successfully.");
} catch (IOException e) {
e.printStackTrace();
}
}

public static void close() {
try {
if (admin != null) {
admin.close();
}
if (connection != null) {
connection.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}

public static void createTable(String myTableName, String[] colFamilies) throws IOException {
TableName tableName = TableName.valueOf(myTableName);
if (admin.tableExists(tableName)) {
System.out.println("表已存在!");
} else {
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tableName);
for (String cf : colFamilies) {
ColumnFamilyDescriptor family =
ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(cf)).build();
tableDescriptor.setColumnFamily(family);
}
admin.createTable(tableDescriptor.build());
System.out.println("表 " + myTableName + " 创建成功!");
}
}

public static void insertRow(String tableName, String rowKey, String colFamily, String col, String val) throws IOException {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(Bytes.toBytes(rowKey));
put.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col), Bytes.toBytes(val));
table.put(put);
table.close();
System.out.println("数据插入成功!");
}

public static void getData(String tableName, String rowKey, String colFamily, String col) throws IOException {
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(Bytes.toBytes(rowKey));
get.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col));
Result result = table.get(get);
byte[] value = result.getValue(Bytes.toBytes(colFamily), Bytes.toBytes(col));
if (value != null) {
System.out.println("获取到数据: " + new String(value));
} else {
System.out.println("未找到数据。");
}
table.close();
}

public static void deleteRow(String tableName, String rowKey, String colFamily, String col) throws IOException {
Table table = connection.getTable(TableName.valueOf(tableName));
Delete delete = new Delete(Bytes.toBytes(rowKey));
delete.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col));
table.delete(delete);
table.close();
System.out.println("数据删除成功!");
}
}

image-20250408181938555


④清空指定表的所有记录数据。

shell

1
2
3
4
create 's1','score'
put 's1','zhangsan','score:Math','69'
truncate 's1'
scan 's1'

image-20250408182046843

1
put 's1','zhangsan','score:Math','69'

java

教材中的代码

clearRows() 方法缺了一个 关键点

在删除表之后重新创建时,需要重新添加原来的列簇(否则建出来的表是空结构)。

所以我用新的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class test {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;
public static void main(String[] args) throws IOException {
String tableName = "s1"; // 你要清空的表名
clearRows(tableName);
}
// 初始化 HBase 连接
public static void init() throws IOException {
configuration = HBaseConfiguration.create();
configuration.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
configuration.set("hbase.zookeeper.quorum", "localhost");
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}
// 关闭 HBase 连接
public static void close() throws IOException {
if (admin != null) admin.close();
if (connection != null) connection.close();
}
// 清空指定表的所有数据,保留列簇结构
public static void clearRows(String tableNameStr) throws IOException {
init(); // 初始化连接
TableName tableName = TableName.valueOf(tableNameStr);
if (!admin.tableExists(tableName)) {
System.out.println("表不存在,无法清空!");
close();
return;
}
// 获取原始表结构
TableDescriptor descriptor = admin.getDescriptor(tableName);
// 禁用表
if (!admin.isTableDisabled(tableName)) {
admin.disableTable(tableName);
}
// 删除表
admin.deleteTable(tableName);
// 重新创建表(使用原结构)
admin.createTable(descriptor);
System.out.println("表 [" + tableNameStr + "] 已清空(保留列簇结构)");
close(); // 关闭连接
}
}

image-20250408182815289

这时候再去查表

image-20250408182848140


⑤统计表的行数。

shell

1
2
put 's1','zhangsan','score:Math','69'
count 's1'

image-20250408182937568

java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;

import java.io.IOException;

public class test {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;

public static void main(String[] args) throws IOException {
String tableName = "s1"; // 你要统计的表名
countRows(tableName);
}

// 初始化 HBase 连接
public static void init() throws IOException {
configuration = HBaseConfiguration.create();
configuration.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
configuration.set("hbase.zookeeper.quorum", "localhost");
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}

// 关闭 HBase 连接
public static void close() throws IOException {
if (admin != null) admin.close();
if (connection != null) connection.close();
}

// 统计表的行数
public static void countRows(String tableName) throws IOException {
init(); // 初始化连接

Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
scan.setCaching(500); // 可选优化,加快扫描速度

ResultScanner scanner = table.getScanner(scan);

int num = 0;
for (Result result = scanner.next(); result != null; result = scanner.next()) {
num++;
}

System.out.println("表 [" + tableName + "] 的总行数为: " + num);

scanner.close();
table.close(); // 关闭 Table 对象
close(); // 关闭连接
}
}

image-20250408183057896

第二题

(2)现有以下关系数据库中的表(见表4-21、表4-22和表4-23),要求将其转换为适合HBase存储的表并插入数据。

表4-21 学生(Student)表

学号(S_No) 姓名(S_Name) 性别(S_Sex) 年龄(S_Age)
2015001 Zhangsan male 23
2015002 Mary female 22
2015003 Lisi male 24

表4-22 课程(Course)表

课程号(C_No) 课程名(C_Name) 学分(C_Credit)
123001 Math 2.0
123002 Computer Science 5.0
123003 English 3.0

表4-23 选课(SC)表

学号(SC_Sno) 课程号(SC_Cno) 成绩(SC_Score)
2015001 123001 86
2015001 123003 69
2015002 123002 77
2015002 123003 99
2015003 123001 98
2015003 123002 95

shell

1
2
disable 'student'
drop 'student'

创建学生 student表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
create 'Student','S_No','S_Name','S_Sex','S_Age'
put 'Student','s001','S_No','2015001'
put 'Student','s001','S_Name','Zhangsan'
put 'Student','s001','S_Sex','male'
put 'Student','s001','S_Age','23'
put 'Student','s002','S_No','2015002'
put 'Student','s002','S_Name','Mary'
put 'Student','s002','S_Sex','female'
put 'Student','s002','S_Age','22'
put 'Student','s003','S_No','2015003'
put 'Student','s003','S_Name','Lisi'
put 'Student','s003','S_Sex','male'
put 'Student','s003','S_Age','24'

image-20250408183744904

创建课程 Course 表

1
2
3
4
5
6
7
8
9
10
create 'Course','C_No','C_Name','C_Credit'
put 'Course','c001','C_No','123001'
put 'Course','c001','C_Name','Math'
put 'Course','c001','C_Credit','2.0'
put 'Course','c002','C_No','123002'
put 'Course','c002','C_Name','Computer'
put 'Course','c002','C_Credit','5.0'
put 'Course','c003','C_No','123003'
put 'Course','c003','C_Name','English'
put 'Course','c003','C_Credit','3.0'

image-20250408183801194

创建选课 SC 表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
create 'SC','SC_Sno','SC_Cno','SC_Score'
put 'SC','sc001','SC_Sno','2015001'
put 'SC','sc001','SC_Cno','123001'
put 'SC','sc001','SC_Score','86'
put 'SC','sc002','SC_Sno','2015001'
put 'SC','sc002','SC_Cno','123003'
put 'SC','sc002','SC_Score','69'
put 'SC','sc003','SC_Sno','2015002'
put 'SC','sc003','SC_Cno','123002'
put 'SC','sc003','SC_Score','77'
put 'SC','sc004','SC_Sno','2015002'
put 'SC','sc004','SC_Cno','123003'
put 'SC','sc004','SC_Score','99'
put 'SC','sc005','SC_Sno','2015003'
put 'SC','sc005','SC_Cno','123001'
put 'SC','sc005','SC_Score','98'
put 'SC','sc006','SC_Sno','2015003'
put 'SC','sc006','SC_Cno','123002'
put 'SC','sc006','SC_Score','95'

image-20250408183820490

验证

1
2
3
scan 'Student'
scan 'Course'
scan 'SC'

image-20250408184041557

1
2
3
4
5
6
disable 'Student'
drop 'Student'
disable 'Course'
drop 'Course'
disable 'SC'
drop 'SC'

java

① createTable(String tableName, String[] fields)。

② addRecord(String tableName, String row, String[] fields, String[]values)。

③ scanColumn(String tableName, String column)。

④ modifyData(String tableName, String row, String column)。

⑤ deleteRow(String tableName, String row)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

public class test {
static Connection connection;
static Admin admin;

public static void init() throws IOException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "localhost");
connection = ConnectionFactory.createConnection(conf);
admin = connection.getAdmin();
}

public static void close() throws IOException {
if (admin != null) admin.close();
if (connection != null) connection.close();
}

public static void createTable(String tableName, String[] fields) throws IOException {
init();
TableName tablename = TableName.valueOf(tableName);
if (admin.tableExists(tablename)) {
System.out.println("表 " + tableName + " 已存在,正在删除...");
admin.disableTable(tablename);
admin.deleteTable(tablename);
}
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tablename);
for (String str : fields) {
tableDescriptor.setColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(str)).build());
}
admin.createTable(tableDescriptor.build());
System.out.println("表 " + tableName + " 创建成功");
close();
}

public static void addRecord(String tableName, String row, String[] fields, String[] values) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(Bytes.toBytes(row));
for (int i = 0; i < fields.length; i++) {
String[] parts = fields[i].split(":");
put.addColumn(Bytes.toBytes(parts[0]), Bytes.toBytes(parts[1]), Bytes.toBytes(values[i]));
}
table.put(put);
table.close();
close();
}

public static void scanColumn(String tableName, String column) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
if (column.contains(":")) {
String[] parts = column.split(":");
scan.addColumn(Bytes.toBytes(parts[0]), Bytes.toBytes(parts[1]));
} else {
scan.addFamily(Bytes.toBytes(column));
}
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next()) {
showCell(result);
}
scanner.close();
table.close();
close();
}

public static void showCell(Result result) {
for (Cell cell : result.rawCells()) {
System.out.println("RowName: " + Bytes.toString(CellUtil.cloneRow(cell)));
System.out.println("Timestamp: " + cell.getTimestamp());
System.out.println("ColumnFamily: " + Bytes.toString(CellUtil.cloneFamily(cell)));
System.out.println("Column: " + Bytes.toString(CellUtil.cloneQualifier(cell)));
System.out.println("Value: " + Bytes.toString(CellUtil.cloneValue(cell)));
System.out.println("----------------------------------------");
}
}

public static void modifyData(String tableName, String row, String column, String val) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
String[] parts = column.split(":");
Put put = new Put(Bytes.toBytes(row));
put.addColumn(Bytes.toBytes(parts[0]), Bytes.toBytes(parts[1]), Bytes.toBytes(val));
table.put(put);
System.out.println("修改表 " + tableName + " 中 " + row + " 行的列 " + column + " 为 " + val);
table.close();
close();
}

public static void deleteRow(String tableName, String row) throws IOException {
init();
Table table = connection.getTable(TableName.valueOf(tableName));
Delete delete = new Delete(Bytes.toBytes(row));
table.delete(delete);
System.out.println("删除表 " + tableName + " 中的行 " + row);
table.close();
close();
}

public static void main(String[] args) throws IOException {
// Student 表
createTable("Student", new String[]{"S_No", "S_Name", "S_Sex", "S_Age"});
addRecord("Student", "s001", new String[]{"S_No:S_No", "S_Name:S_Name", "S_Sex:S_Sex", "S_Age:S_Age"},
new String[]{"2015001", "Zhangsan", "male", "23"});
addRecord("Student", "s002", new String[]{"S_No:S_No", "S_Name:S_Name", "S_Sex:S_Sex", "S_Age:S_Age"},
new String[]{"2015002", "Mary", "female", "22"});
addRecord("Student", "s003", new String[]{"S_No:S_No", "S_Name:S_Name", "S_Sex:S_Sex", "S_Age:S_Age"},
new String[]{"2015003", "Lisi", "male", "24"});
System.out.println("Student 表插入数据成功");

// Course 表
createTable("Course", new String[]{"C_No", "C_Name", "C_Credit"});
addRecord("Course", "c001", new String[]{"C_No:C_No", "C_Name:C_Name", "C_Credit:C_Credit"},
new String[]{"123001", "Math", "2.0"});
addRecord("Course", "c002", new String[]{"C_No:C_No", "C_Name:C_Name", "C_Credit:C_Credit"},
new String[]{"123002", "Computer", "5.0"});
addRecord("Course", "c003", new String[]{"C_No:C_No", "C_Name:C_Name", "C_Credit:C_Credit"},
new String[]{"123003", "English", "3.0"});
System.out.println("Course 表插入数据成功");

// SC 表
createTable("SC", new String[]{"SC_Sno", "SC_Cno", "SC_Score"});
addRecord("SC", "sc001", new String[]{"SC_Sno:SC_Sno", "SC_Cno:SC_Cno", "SC_Score:SC_Score"},
new String[]{"2015001", "123001", "86"});
addRecord("SC", "sc002", new String[]{"SC_Sno:SC_Sno", "SC_Cno:SC_Cno", "SC_Score:SC_Score"},
new String[]{"2015001", "123003", "69"});
addRecord("SC", "sc003", new String[]{"SC_Sno:SC_Sno", "SC_Cno:SC_Cno", "SC_Score:SC_Score"},
new String[]{"2015002", "123002", "77"});
addRecord("SC", "sc004", new String[]{"SC_Sno:SC_Sno", "SC_Cno:SC_Cno", "SC_Score:SC_Score"},
new String[]{"2015002", "123003", "99"});
addRecord("SC", "sc005", new String[]{"SC_Sno:SC_Sno", "SC_Cno:SC_Cno", "SC_Score:SC_Score"},
new String[]{"2015003", "123001", "98"});
addRecord("SC", "sc006", new String[]{"SC_Sno:SC_Sno", "SC_Cno:SC_Cno", "SC_Score:SC_Score"},
new String[]{"2015003", "123002", "95"});
System.out.println("SC 表插入数据成功");

// 示例输出
System.out.println("===== 浏览 Student 表的全部 S_Name 列族 =====");
scanColumn("Student", "S_Name");

System.out.println("===== 修改 Student 表中 s002 的 S_Age 为 25 =====");
modifyData("Student", "s002", "S_Age:S_Age", "25");

System.out.println("===== 删除 Student 表中的 s003 =====");
deleteRow("Student", "s003");
}
}

image-20250408190024787