大数据hadoop实验
镜像下载:ubuntu-18.04.6-desktop-amd64.iso
安装ubuntu系统
一路确定就行了
设置dhcp模式否则无法联网安装
#dhcp服务
然后开机,然后选择中文 ,然后按提示安装
然后就开始安装就行了,到后面重启之后,可能会遇到这个界面
解决办法
然后你再开机就行了
切换阿里云镜像源
等待更新缓存
到桌面后右键桌面空白处打开终端进行输入下面指令
一键安装vm-tools可以实现跨端复制粘贴
1 2 3 sudo wget https://resource.qianyios.top/init.shsudo chmod +x init.shbash init.sh
接下来重启等待软件生效之后,你就关机
,这时候你要打个快照
,以便后面做项目出错可以恢复,然后开机
创建hadoop用户
创建hadoop用户并且设置密码
1 2 sudo useradd -m hadoop -s /bin/bash sudo passwd hadoop
给hadoop用户添加sudo权限
1 sudo adduser hadoop sudo
这时候桌面右上角注销账号切换成hadoop
设置ssh免密
一键全部复制,然后粘贴回车就会自动进行免密
代码中有password=“123456”,记得改成你的hadoop用户的密码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 sudo cat >ssh.sh<<"EOF" sudo apt-get install openssh-server -ysudo systemctl disable ufw --nowecho "正在更新 SSH 配置..." sudo sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication yes/' /etc/ssh/sshd_configsudo systemctl restart sshecho "正在安装 sshpass..." sudo apt updatesudo apt install -y sshpass || { echo "安装 sshpass 失败" ; exit 1; }echo "sshpass 安装完成。" echo "正在检查 .ssh 目录..." if [ ! -d ~/.ssh ]; then sudo mkdir -p ~/.ssh fi sudo chmod 700 ~/.sshsudo chown -R hadoop:hadoop ~/.sshhosts=("localhost" ) password="123456" echo "正在生成 SSH 密钥对..." if [ ! -f ~/.ssh/id_rsa ]; then ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa || { echo "生成 SSH 密钥对失败" ; exit 1; } fi chmod 600 ~/.ssh/id_rsachmod 644 ~/.ssh/id_rsa.pubecho "SSH 密钥对已生成。" for host in "${hosts[@]} " do echo "正在为 $host 配置免密登录..." sshpass -p "$password " ssh -o StrictHostKeyChecking=no "$host " "mkdir -p ~/.ssh && chmod 700 ~/.ssh" sshpass -p "$password " ssh-copy-id -i ~/.ssh/id_rsa.pub -o StrictHostKeyChecking=no "$host " || { echo "复制公钥到 $host 失败" ; exit 1; } sshpass -p "$password " ssh -o StrictHostKeyChecking=no "$host " "echo '免密登录成功'" || { echo "验证免密登录失败" ; exit 1; } done echo "所有配置已完成。" EOF
运行脚本
测试登入localhost是否可以实现无密码登入
成功
安装java和hadoop
将两个文件复制到下载的目录去
然后在这个文件夹下,空白处右键,打开终端
以下的全部复制运行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 sudo mkdir /usr/lib/jvmsudo tar -xf jdk-8u162-linux-x64.tar.gz -C /usr/lib/jvmecho "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> ~/.bashrcecho "export PATH=\$JAVA_HOME/bin:\$PATH" >> ~/.bashrcsource ~/.bashrcjava -version sudo tar -zxf hadoop-3.1.3.tar.gz -C /usr/localsudo mv /usr/local/hadoop-3.1.3/ /usr/local/hadoopecho "export HADOOP_HOME=/usr/local/hadoop" >> ~/.bashrcecho "export PATH=\$HADOOP_HOME/bin/:\$HADOOP_HOME/sbin/:\$PATH" >> ~/.bashrcsource ~/.bashrcsudo chown -R hadoop /usr/local/hadoophadoop version
这里是作业要截图的地方
这时候关机打个快照
,命名为基础
伪分布安装
编写cort-site.yaml文件
以下的全部复制运行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cat > /usr/local/hadoop/etc/hadoop/core-site.xml<< "EOF" <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> EOF
编写hdfs-site.xml
以下的全部复制运行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 cat >/usr/local/hadoop/etc/hadoop/hdfs-site.xml<<"EOF" <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> </configuration> EOF
启动hhdfs服务
hdfs初始化
这条命令只需要运行一次,以后都不要再运行了!!!!!!
这条命令只需要运行一次,以后都不要再运行了!!!!!!
这条命令只需要运行一次,以后都不要再运行了!!!!!!
出现这个说明初始化成功
添加hdfs yarn的环境变量
以下的全部复制运行
1 2 3 4 5 6 7 echo "export HDFS_NAMENODE_USER=hadoop" >> ~/.bashrcecho "export HDFS_DATANODE_USER=hadoop" >> ~/.bashrcecho "export HDFS_SECONDARYNAMENODE_USER=hadoop" >> ~/.bashrcecho "export YARN_RESOURCEMANAGER_USER=hadoop" >> ~/.bashrcecho "export YARN_NODEMANAGER_USER=hadoop" >> ~/.bashrcsource ~/.bashrcecho "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> /usr/local/hadoop/etc/hadoop/hadoop-env.sh
1 2 3 4 start-all.sh stop-all.sh
这里是作业要截图的地方
jps命令用来查看进程是否启动,以上是hadoop正常启动的进程,总共有6个
访问hadoop网页
看看你的ip
如果你这里没有ip说明你没有开启dhcp服务,自行回到最开始,找开启dhcp的方法,关机开启dhcp,然后开机就会有ip了
这里是作业要截图的地方
http://ip:9870
1 http://192.168.48.132:9870/
http://ip:8088
关机步骤
这时候关闭hadoop集群
然后关机打快照,命名伪分布
然后在这里打个快照,命名为伪分布安装成功,等你哪天机子坏了,你就可以恢复快照
严肃告知,别说我没提醒你,不要直接关机,也不要挂起虚拟机,否则你的虚拟机和hadoop坏了,你就重装吧
第一次实验
熟悉常用的Linux操作
1)cd命令:切换目录
(1) 切换到目录“/usr/local”
(2) 切换到当前目录的上一级目录
(3) 切换到当前登录Linux系统的用户的自己的主文件夹
2)ls命令:查看文件与目录
查看目录“/usr”下的所有文件和目录
3)mkdir命令:新建目录
(1)进入“/tmp”目录,创建一个名为“a”的目录,并查看“/tmp”目录下已经存在哪些目录
(2)进入“/tmp”目录,创建目录“a1/a2/a3/a4”
1 2 cd /tmp mkdir -p a1/a2/a3/a4
4)rmdir命令:删除空的目录
(1)将上面创建的目录a(在“/tmp”目录下面)删除
(2)删除上面创建的目录“a1/a2/a3/a4” (在“/tmp”目录下面),然后查看“/tmp”目录下面存在哪些目录
1 2 3 4 5 cd /tmp rmdir a cd /tmp rmdir -p a1/a2/a3/a4 ls -al
5)cp命令:复制文件或目录
(1)将当前用户的主文件夹下的文件.bashrc复制到目录“/usr”下,并重命名为bashrc1
1 sudo cp ~/.bashrc /usr/bashrc1
(2)在目录“/tmp”下新建目录test,再把这个目录复制到“/usr”目录下
1 2 3 cd /tmp mkdir test sudo cp -r /tmp/test /usr
6)mv命令:移动文件与目录,或更名
(1)将“/usr”目录下的文件bashrc1移动到“/usr/test”目录下
1 sudo mv /usr/bashrc1 /usr/test
(2)将“/usr”目录下的test目录重命名为test2
1 sudo mv /usr/test /usr/test2
7)rm命令:移除文件或目录
(1)将“/usr/test2”目录下的bashrc1文件删除
1 sudo rm /usr/test2/bashrc1
(2)将“/usr”目录下的test2目录删除
8)cat命令:查看文件内容
查看当前用户主文件夹下的.bashrc文件内容
9)tac命令:反向查看文件内容
反向查看当前用户主文件夹下的.bashrc文件的内容
10)more命令:一页一页翻动查看
翻页查看当前用户主文件夹下的.bashrc文件的内容
11)head命令:取出前面几行
(1)查看当前用户主文件夹下.bashrc文件内容前20行
(2)查看当前用户主文件夹下.bashrc文件内容,后面50行不显示,只显示前面几行
12)tail命令:取出后面几行
(1)查看当前用户主文件夹下.bashrc文件内容最后20行
(2)查看当前用户主文件夹下.bashrc文件内容,并且只列出50行以后的数据
13)touch命令:修改文件时间或创建新文件
(1)在“/tmp”目录下创建一个空文件hello,并查看文件时间
1 2 3 cd /tmp touch hello ls -l hello
(2)修改hello文件,将文件时间整为5天前
1 touch -d "5 days ago" hello
14)chown命令:修改文件所有者权限
将hello文件所有者改为root帐号,并查看属性
1 2 sudo chown root /tmp/hello ls -l /tmp/hello
15)find命令:文件查找
找出主文件夹下文件名为.bashrc的文件
16)tar命令:压缩命令
(1)在根目录“/”下新建文件夹test,然后在根目录“/”下打包成test.tar.gz
1 2 sudo mkdir /test sudo tar -zcv -f /test.tar.gz test
(2)把上面的test.tar.gz压缩包,解压缩到“/tmp”目录
1 sudo tar -zxv -f /test.tar.gz -C /tmp
17)grep命令:查找字符串
从“~/.bashrc”文件中查找字符串’examples’
1 grep -n 'examples' ~/.bashrc
18)配置环境变量
(1)请在“~/.bashrc”中设置,配置Java环境变量
1 2 3 4 echo "export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162" >> ~/.bashrc echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> ~/.bashrc source ~/.bashrc java -version
(2)查看JAVA_HOME变量的值
熟悉常用的Hadoop操作
(1)使用hadoop用户登录Linux系统,启动Hadoop(Hadoop的安装目录为“/usr/local/hadoop”),为hadoop用户在HDFS中创建用户目录“/user/hadoop”
1 2 start-dfs.sh hdfs dfs -mkdir -p /user/hadoop
(2)接着在HDFS的目录“/user/hadoop”下,创建test文件夹,并查看文件列表
1 2 hdfs dfs -mkdir test hdfs dfs -ls .
(3)将Linux系统本地的“~/.bashrc”文件上传到HDFS的test文件夹中,并查看test
1 2 hdfs dfs -put ~/.bashrc test hdfs dfs -ls test
(4)将HDFS文件夹test复制到Linux系统本地文件系统的“/usr/local/hadoop”目录下
(3.7.3)实验
安装eclipse
为了提高程序编写和调试效率,本教程采用Eclipse工具编写Java程序。
现在要执行的任务是:假设在目录hdfs://localhost:9000/user/hadoop
下面有几个文件,分别是file1.txt、file2.txt、file3.txt、file4.abc和file5.abc,这里需要从该目录中过滤出所有后缀名不为.abc
的文件,对过滤之后的文件进行读取,并将这些文件的内容合并到文件hdfs://localhost:9000/user/hadoop/merge.txt
中。
要确保HDFS的/user/hadoop
目录下已经存在file1.txt、file2.txt、file3.txt、file4.abc和file5.abc,每个文件里面有内容。这里,假设文件内容如下:
file1.txt的内容是: this is file1.txt
file2.txt的内容是: this is file2.txt
file3.txt的内容是: this is file3.txt
file4.abc的内容是: this is file4.abc
file5.abc的内容是: this is file5.abc
后面我会给命令,上面的内容就先看看
登入hadoop用户
不多说了,启动hadoop集群
下载eclipse安装包到ubuntu的下载目录,然后在空白处右键打开终端
1 2 3 4 5 sudo tar -zxvf eclipse-4.7.0-linux.gtk.x86_64.tar.gz -C /usr/local sudo chown -R hadoop /usr/local/eclipseecho "export ECLIPSE_HOME=/usr/local/eclipse" >> ~/.bashrcecho "export PATH=\$ECLIPSE_HOME/:\$PATH" >> ~/.bashrcsource ~/.bashrc
启动eclipse
在Eclipse中创建项目
启动Eclipse。当Eclipse启动以后,会弹出如下图所示界面,提示设置工作空间(workspace)。
选择File-->New-->Java Project
菜单,开始创建一个Java工程,会弹出如下图所示界面。在Project name
后面输入工程名称HDFSExample
,选中Use default location
,让这个Java工程的所有文件都保存到/home/hadoop/workspace/HDFSExample
目录下。在JRE
这个选项卡中,可以选择当前的Linux系统中已经安装好的JDK,比如jdk1.8.0_162。然后,点击界面底部的Next>
按钮,进入下一步的设置。
为项目添加需要用到的JAR包
为了能够运行程序,我们有四个目录
的jar包
要添加到工程去
(1)/usr/local/hadoop/share/hadoop/common
目录下的所有JAR包,包括
hadoop-common-3.1.3.jar
、hadoop-kms-3.1.3.jar
hadoop-common-3.1.3-tests.jar
、hadoop-nfs-3.1.3.jar
注意,不包括目录jdiff、lib、sources和webapps;
(2)/usr/local/hadoop/share/hadoop/common/lib
目录下的所有JAR包;
(3)/usr/local/hadoop/share/hadoop/hdfs
目录下的所有JAR包,注意,不包括目录jdiff、lib、sources和webapps;
(4)/usr/local/hadoop/share/hadoop/hdfs/lib
目录下的所有JAR包。
以下我只演示第一种和第二种!!!!!!!!!
以下我只演示第一种和第二种!!!!!!!!!
以下我只演示第一种和第二种!!!!!!!!!
以下我只演示第一种和第二种!!!!!!!!!
第一种
/usr/local/hadoop/share/hadoop/common
目录下的所有JAR包
点击Add External JARs…
按钮,点击其他位置,自己看这个路径定位到这/usr/local/hadoop/share/hadoop/common
,选择下面的四个包,然后点击ok
第二种
/usr/local/hadoop/share/hadoop/common/lib
目录下的所有JAR包;
以下两个目录,我就不演示了,如果有文件夹被全选中,你就按住ctrl然后点击文件夹,就可以取消选中了,我们只添加所有后缀名为.jar
的包
(3)/usr/local/hadoop/share/hadoop/hdfs
目录下的所有JAR包,注意,不包括目录jdiff、lib、sources和webapps;
(4)/usr/local/hadoop/share/hadoop/hdfs/lib
目录下的所有JAR包。
最后是这样的
编写Java应用程序
在该界面中,只需要在Name
后面输入新建的Java类文件的名称,这里采用称MergeFile
,其他都可以采用默认设置,然后,点击界面右下角Finish
按钮。
把下面的代码直接写到MergeFile.java,全选复制粘贴
,这就不多说了,然后记得Ctrl+S保存
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 import java.io.IOException;import java.io.PrintStream;import java.net.URI;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;class MyPathFilter implements PathFilter { String reg = null ; MyPathFilter(String reg) { this .reg = reg; } public boolean accept (Path path) { if (!(path.toString().matches(reg))) return true ; return false ; } } public class MergeFile { Path inputPath = null ; Path outputPath = null ; public MergeFile (String input, String output) { this .inputPath = new Path (input); this .outputPath = new Path (output); } public void doMerge () throws IOException { Configuration conf = new Configuration (); conf.set("fs.defaultFS" ,"hdfs://localhost:9000" ); conf.set("fs.hdfs.impl" ,"org.apache.hadoop.hdfs.DistributedFileSystem" ); FileSystem fsSource = FileSystem.get(URI.create(inputPath.toString()), conf); FileSystem fsDst = FileSystem.get(URI.create(outputPath.toString()), conf); FileStatus[] sourceStatus = fsSource.listStatus(inputPath, new MyPathFilter (".*\\.abc" )); FSDataOutputStream fsdos = fsDst.create(outputPath); PrintStream ps = new PrintStream (System.out); for (FileStatus sta : sourceStatus) { System.out.print("路径:" + sta.getPath() + " 文件大小:" + sta.getLen() + " 权限:" + sta.getPermission() + " 内容:" ); FSDataInputStream fsdis = fsSource.open(sta.getPath()); byte [] data = new byte [1024 ]; int read = -1 ; while ((read = fsdis.read(data)) > 0 ) { ps.write(data, 0 , read); fsdos.write(data, 0 , read); } fsdis.close(); } ps.close(); fsdos.close(); } public static void main (String[] args) throws IOException { MergeFile merge = new MergeFile ( "hdfs://localhost:9000/user/hadoop/" , "hdfs://localhost:9000/user/hadoop/merge.txt" ); merge.doMerge(); } }
编译运行程序
在这里强调一下
,如果你没启动hadoop自行启动,我早已在7.1告知启动了
编写测试文件
1 2 3 4 5 6 7 8 9 10 11 12 echo "this is file1.txt" > file1.txtecho "this is file2.txt" > file2.txtecho "this is file3.txt" > file3.txtecho "this is file4.abc" > file4.abcecho "this is file5.abc" > file5.abchdfs dfs -mkdir -p /user/hadoop hdfs dfs -put file1.txt /user/hadoop/ hdfs dfs -put file2.txt /user/hadoop/ hdfs dfs -put file3.txt /user/hadoop/ hdfs dfs -put file4.abc /user/hadoop/ hdfs dfs -put file5.abc /user/hadoop/ hdfs dfs -ls /user/hadoop
最后验证是否成功
1 hdfs dfs -cat /user/hadoop/merge.txt
应用程序的部署
因为前面只是在eclipse运行java项目才会生成merge.txt,我们的目的是通过hadoop去执行这个java项目,所以我们要对工程打包
创建myapp目录
目的:用来存放hadoop应用程序目录
1 mkdir /usr/local/hadoop/myapp
开始打包程序
Launch configuration
下拉选择MergeFile-HDFSExample
Export destination
填写 /usr/local/hadoop/myapp/HDFSExample.jar
查看是否生成
1 ls /usr/local/hadoop/myapp
重新验证项目的运行
由于我们在eclipse测试过了项目,之前就在hdfs目录生成了/user/hadoop/merge.txt
,为了验证刚刚打包的项目,我们要删掉这个/user/hadoop/merge.txt
,等等重新运行项目
1 2 3 hdfs dfs -rm /user/hadoop/merge.txt hadoop jar /usr/local/hadoop/myapp/HDFSExample.jar hdfs dfs -cat /user/hadoop/merge.txt
如果你没事了,要关机了就回到这里5.6 关机步骤 ,去执行关机
顺便把eclipse
的窗口关掉
严肃告知,别说我没提醒你,不要直接关机,也不要挂起虚拟机,否则你的虚拟机和你的hadoop坏了,你就重装,如果你坏了你也可以恢复快照到伪分布安装成功
,但是你只是要重新做这周的实验
练习文件
写入文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.Path; public class write { public static void main(String[] args) { try { Configuration conf = new Configuration(); conf.set("fs.defaultFS","hdfs://localhost:9000"); conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem"); FileSystem fs = FileSystem.get(conf); byte[] buff = "Hello world".getBytes(); // 要写入的内容 String filename = "gcc-test"; //要写入的文件名 FSDataOutputStream os = fs.create(new Path(filename)); os.write(buff,0,buff.length); System.out.println("Create:"+ filename); os.close(); fs.close(); } catch (Exception e) { e.printStackTrace(); } } }
1 2 hdfs dfs -ls /user/hadoop hdfs dfs -cat /user/hadoop/gcc-test
判断文件是否存在
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class panduan { public static void main(String[] args) { try { String filename = "gcc-test"; Configuration conf = new Configuration(); conf.set("fs.defaultFS","hdfs://localhost:9000"); conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem"); FileSystem fs = FileSystem.get(conf); if(fs.exists(new Path(filename))){ System.out.println("文件存在"); }else{ System.out.println("文件不存在"); } fs.close(); } catch (Exception e) { e.printStackTrace(); } } }
读取文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 import java.io.BufferedReader; import java.io.InputStreamReader; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FSDataInputStream; public class read { public static void main(String[] args) { try { Configuration conf = new Configuration(); conf.set("fs.defaultFS","hdfs://localhost:9000"); conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem"); FileSystem fs = FileSystem.get(conf); Path file = new Path("gcc-test"); FSDataInputStream getIt = fs.open(file); BufferedReader d = new BufferedReader(new InputStreamReader(getIt)); String content = d.readLine(); //读取文件一行 System.out.println(content); d.close(); //关闭文件 fs.close(); //关闭hdfs } catch (Exception e) { e.printStackTrace(); } } }
第二次实验
编程实现以下指定功能,并利用Hadoop提供的Shell命令完成相同的任务。
① 向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,由用户指定是追加到原有文件末尾还是覆盖原有的文件。
shell
检查文件是否存在,可以使用如下命令:
1 2 3 4 echo "gcc-text" > /home/hadoop/text.txthdfs dfs -put /home/hadoop/text.txt /user/hadoop/text.txt hdfs dfs -test -e text.txt echo $?
返回 0
表示文件存在。
返回 1
表示文件不存在。
如果结果显示文件已经存在,则用户可以选择追加到原来文件末尾或者覆盖原来文件,具体命令如下:
1 echo "gcc-local" > /home/hadoop/local.txt
local.txt
是本地文件的路径。
/text.txt
是 HDFS 中的文件路径。
1 2 3 4 5 6 7 8 9 hdfs dfs -appendToFile local.txt text.txt hdfs dfs -cat text.txt hdfs dfs -copyFromLocal -f local.txt text.txt hdfs dfs -cat text.txt hdfs dfs -cp -f file:///home/hadoop/local.txt text.txt hdfs dfs -cat text.txt
实际上,也可以不用上述方式,而是采用如下命令 来实现:
1 2 3 4 5 6 7 8 hdfs dfs -rm text.txt hdfs dfs -put text.txt hdfs dfs -cat text.txt if $(hdfs dfs -test -e text.txt);then $(hdfs dfs -appendToFile local.txt text.txt);else $(hdfs dfs -copyFromLocal -f local.txt text.txt);fi hdfs dfs -cat text.txt
Java
我这里只说一次,自己创建好HDFSApi.java后面的每个实验,都会覆盖前面一个实验的代码
你就不要手欠,去创建别的,你要是自己会也行
后面就不会再说了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static boolean test (Configuration conf, String path) throws IOException { FileSystem fs = FileSystem.get(conf); return fs.exists(new Path (path)); } public static void copyFromLocalFile (Configuration conf, String localFilePath, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path localPath = new Path (localFilePath); Path remotePath = new Path (remoteFilePath); fs.copyFromLocalFile(false , true , localPath, remotePath); fs.close(); } public static void appendToFile (Configuration conf, String localFilePath, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FileInputStream in = new FileInputStream (localFilePath); FSDataOutputStream out = fs.append(remotePath); byte [] data = new byte [1024 ]; int read = -1 ; while ( (read = in.read(data)) > 0 ) { out.write(data, 0 , read); } out.close(); in.close(); fs.close(); } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String localFilePath = "/home/hadoop/text.txt" ; String remoteFilePath = "/user/hadoop/text.txt" ; String choice = "append" ; try { Boolean fileExists = false ; if (HDFSApi.test(conf, remoteFilePath)) { fileExists = true ; System.out.println(remoteFilePath + " 已存在." ); } else { System.out.println(remoteFilePath + " 不存在." ); } if ( !fileExists) { HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath); System.out.println(localFilePath + " 已上传至 " + remoteFilePath); } else if ( choice.equals("overwrite" ) ) { HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath); System.out.println(localFilePath + " 已覆盖 " + remoteFilePath); } else if ( choice.equals("append" ) ) { HDFSApi.appendToFile(conf, localFilePath, remoteFilePath); System.out.println(localFilePath + " 已追加至 " + remoteFilePath); } } catch (Exception e) { e.printStackTrace(); } } }
验证
② 从HDFS中下载指定文件,如果本地文件与要下载的文件名称相同,则自动对下载的文件重命名。
shell
1 2 3 4 5 6 ls | grep textif $(hdfs dfs -test -e file:///home/hadoop/text.txt);then $(hdfs dfs -copyToLocal text.txt ./text2.txt); else $(hdfs dfs -copyToLocal text.txt ./text.txt); fi ls | grep text
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static void copyToLocal (Configuration conf, String remoteFilePath, String localFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); File f = new File (localFilePath); if (f.exists()) { System.out.println(localFilePath + " 已存在." ); Integer i = 0 ; while (true ) { f = new File (localFilePath + "_" + i.toString()); if (!f.exists()) { localFilePath = localFilePath + "_" + i.toString(); break ; } } System.out.println("将重新命名为: " + localFilePath); } Path localPath = new Path (localFilePath); fs.copyToLocalFile(remotePath, localPath); fs.close(); } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String localFilePath = "/home/hadoop/text.txt" ; String remoteFilePath = "/user/hadoop/text.txt" ; try { HDFSApi.copyToLocal(conf, remoteFilePath, localFilePath); System.out.println("下载完成" ); } catch (Exception e) { e.printStackTrace(); } } }
验证:
③ 将HDFS中指定文件的内容输出到终端。
shell
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static void cat (Configuration conf, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FSDataInputStream in = fs.open(remotePath); BufferedReader d = new BufferedReader (new InputStreamReader (in)); String line = null ; while ( (line = d.readLine()) != null ) { System.out.println(line); } d.close(); in.close(); fs.close(); } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteFilePath = "/user/hadoop/text.txt" ; try { System.out.println("读取文件: " + remoteFilePath); HDFSApi.cat(conf, remoteFilePath); System.out.println("\n读取完成" ); } catch (Exception e) { e.printStackTrace(); } } }
④ 显示HDFS中指定的文件读写权限、大小、创建时间、路径等信息。
shell
1 hdfs dfs -ls -h text.txt
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;import java.text.SimpleDateFormat;public class HDFSApi { public static void ls (Configuration conf, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FileStatus[] fileStatuses = fs.listStatus(remotePath); for (FileStatus s : fileStatuses) { System.out.println("路径: " + s.getPath().toString()); System.out.println("权限: " + s.getPermission().toString()); System.out.println("大小: " + s.getLen()); Long timeStamp = s.getModificationTime(); SimpleDateFormat format = new SimpleDateFormat ("yyyy-MM-dd HH:mm:ss" ); String date = format.format(timeStamp); System.out.println("时间: " + date); } fs.close(); } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteFilePath = "/user/hadoop/text.txt" ; try { System.out.println("读取文件信息: " + remoteFilePath); HDFSApi.ls(conf, remoteFilePath); System.out.println("\n读取完成" ); } catch (Exception e) { e.printStackTrace(); } } }
⑤ 给定HDFS中某一个目录,输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息,如果该文件是目录,则递归输出该目录下所有文件相关信息。
shell
1 hdfs dfs -ls -R -h /user/hadoop
别管我这里有什么文件,你能显示出来就行
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;import java.text.SimpleDateFormat;public class HDFSApi { public static void lsDir (Configuration conf, String remoteDir) throws IOException { FileSystem fs = FileSystem.get(conf); Path dirPath = new Path (remoteDir); RemoteIterator<LocatedFileStatus> remoteIterator = fs.listFiles(dirPath, true ); while (remoteIterator.hasNext()) { FileStatus s = remoteIterator.next(); System.out.println("路径: " + s.getPath().toString()); System.out.println("权限: " + s.getPermission().toString()); System.out.println("大小: " + s.getLen()); Long timeStamp = s.getModificationTime(); SimpleDateFormat format = new SimpleDateFormat ("yyyy-MM-dd HH:mm:ss" ); String date = format.format(timeStamp); System.out.println("时间: " + date); System.out.println(); } fs.close(); } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteDir = "/user/hadoop" ; try { System.out.println("(递归)读取目录下所有文件的信息: " + remoteDir); HDFSApi.lsDir(conf, remoteDir); System.out.println("读取完成" ); } catch (Exception e) { e.printStackTrace(); } } }
⑥ 提供一个HDFS中的文件的路径,对该文件进行创建和删除操作。如果文件所在目录不存在,则自动创建目录。
shell
1 2 3 4 5 if $(hdfs dfs -test -d dir1/dir2);then $(hdfs dfs -touchz dir1/dir2/filename);else $(hdfs dfs -mkdir -p dir1/dir2 && hdfs dfs -touchz dir1/dir2/filename); fi hdfs dfs -rm dir1/dir2/filename
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static boolean test (Configuration conf, String path) throws IOException { FileSystem fs = FileSystem.get(conf); return fs.exists(new Path (path)); } public static boolean mkdir (Configuration conf, String remoteDir) throws IOException { FileSystem fs = FileSystem.get(conf); Path dirPath = new Path (remoteDir); boolean result = fs.mkdirs(dirPath); fs.close(); return result; } public static void touchz (Configuration conf, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FSDataOutputStream outputStream = fs.create(remotePath); outputStream.close(); fs.close(); } public static boolean rm (Configuration conf, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); boolean result = fs.delete(remotePath, false ); fs.close(); return result; } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteFilePath = "/user/hadoop/input/text.txt" ; String remoteDir = "/user/hadoop/input" ; try { if ( HDFSApi.test(conf, remoteFilePath) ) { HDFSApi.rm(conf, remoteFilePath); System.out.println("删除路径: " + remoteFilePath); } else { if ( !HDFSApi.test(conf, remoteDir) ) { HDFSApi.mkdir(conf, remoteDir); System.out.println("创建文件夹: " + remoteDir); } HDFSApi.touchz(conf, remoteFilePath); System.out.println("创建路径: " + remoteFilePath); } } catch (Exception e) { e.printStackTrace(); } } }
⑦ 提供一个HDFS的目录的路径,对该目录进行创建和删除操作。创建目录时,如果目录文件所在目录不存在则自动创建相应目录;删除目录时,由用户指定当该目录不为空时是否还删除该目录。
shell
1 2 3 4 hdfs dfs -mkdir -p dir1/dir2 hdfs dfs -rmdir dir1/dir2 hdfs dfs -rm -R dir1/dir2
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static boolean test (Configuration conf, String path) throws IOException { FileSystem fs = FileSystem.get(conf); return fs.exists(new Path (path)); } public static boolean isDirEmpty (Configuration conf, String remoteDir) throws IOException { FileSystem fs = FileSystem.get(conf); Path dirPath = new Path (remoteDir); RemoteIterator<LocatedFileStatus> remoteIterator = fs.listFiles(dirPath, true ); return !remoteIterator.hasNext(); } public static boolean mkdir (Configuration conf, String remoteDir) throws IOException { FileSystem fs = FileSystem.get(conf); Path dirPath = new Path (remoteDir); boolean result = fs.mkdirs(dirPath); fs.close(); return result; } public static boolean rmDir (Configuration conf, String remoteDir) throws IOException { FileSystem fs = FileSystem.get(conf); Path dirPath = new Path (remoteDir); boolean result = fs.delete(dirPath, true ); fs.close(); return result; } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteDir = "/user/hadoop/input" ; Boolean forceDelete = false ; try { if ( !HDFSApi.test(conf, remoteDir) ) { HDFSApi.mkdir(conf, remoteDir); System.out.println("创建目录: " + remoteDir); } else { if ( HDFSApi.isDirEmpty(conf, remoteDir) || forceDelete ) { HDFSApi.rmDir(conf, remoteDir); System.out.println("删除目录: " + remoteDir); } else { System.out.println("目录不为空,不删除: " + remoteDir); } } } catch (Exception e) { e.printStackTrace(); } } }
⑧ 向HDFS中指定的文件追加内容,由用户指定将内容追加到原有文件的开头或结尾。
shell
1 2 3 4 5 6 rm -rf text.txt hdfs dfs -appendToFile local.txt text.txt hdfs dfs -get text.txt cat text.txt >> local.txt hdfs dfs -copyFromLocal -f text.txt text.txt hdfs dfs -cat text.txt
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static boolean test (Configuration conf, String path) throws IOException { FileSystem fs = FileSystem.get(conf); return fs.exists(new Path (path)); } public static void appendContentToFile (Configuration conf, String content, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FSDataOutputStream out = fs.append(remotePath); out.write(content.getBytes()); out.close(); fs.close(); } public static void appendToFile (Configuration conf, String localFilePath, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FileInputStream in = new FileInputStream (localFilePath); FSDataOutputStream out = fs.append(remotePath); byte [] data = new byte [1024 ]; int read = -1 ; while ( (read = in.read(data)) > 0 ) { out.write(data, 0 , read); } out.close(); in.close(); fs.close(); } public static void moveToLocalFile (Configuration conf, String remoteFilePath, String localFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); Path localPath = new Path (localFilePath); fs.moveToLocalFile(remotePath, localPath); } public static void touchz (Configuration conf, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FSDataOutputStream outputStream = fs.create(remotePath); outputStream.close(); fs.close(); } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteFilePath = "/user/hadoop/text.txt" ; String content = "新追加的内容\n" ; String choice = "after" ; try { if ( !HDFSApi.test(conf, remoteFilePath) ) { System.out.println("文件不存在: " + remoteFilePath); } else { if ( choice.equals("after" ) ) { HDFSApi.appendContentToFile(conf, content, remoteFilePath); System.out.println("已追加内容到文件末尾" + remoteFilePath); } else if ( choice.equals("before" ) ) { String localTmpPath = "/user/hadoop/tmp.txt" ; HDFSApi.moveToLocalFile(conf, remoteFilePath, localTmpPath); HDFSApi.touchz(conf, remoteFilePath); HDFSApi.appendContentToFile(conf, content, remoteFilePath); HDFSApi.appendToFile(conf, localTmpPath, remoteFilePath); System.out.println("已追加内容到文件开头: " + remoteFilePath); } } } catch (Exception e) { e.printStackTrace(); } } }
⑨ 删除HDFS中指定的文件。
shell
1 2 3 rm text.txt hdfs dfs -get text.txt hdfs dfs -rm text.txt
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static boolean rm (Configuration conf, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); boolean result = fs.delete(remotePath, false ); fs.close(); return result; } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteFilePath = "/user/hadoop/text.txt" ; try { if ( HDFSApi.rm(conf, remoteFilePath) ) { System.out.println("文件删除: " + remoteFilePath); } else { System.out.println("操作失败(文件不存在或删除失败)" ); } } catch (Exception e) { e.printStackTrace(); } } }
⑩ 在HDFS中将文件从源路径移动到目的路径。
shell
1 2 3 hdfs dfs -put text.txt hdfs dfs -mv text.txt text2.txt hdfs dfs -ls
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.*;import java.io.*;public class HDFSApi { public static boolean mv (Configuration conf, String remoteFilePath, String remoteToFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path srcPath = new Path (remoteFilePath); Path dstPath = new Path (remoteToFilePath); boolean result = fs.rename(srcPath, dstPath); fs.close(); return result; } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteFilePath = "hdfs:///user/hadoop/text.txt" ; String remoteToFilePath = "hdfs:///user/hadoop/new.txt" ; try { if ( HDFSApi.mv(conf, remoteFilePath, remoteToFilePath) ) { System.out.println("将文件 " + remoteFilePath + " 移动到 " + remoteToFilePath); } else { System.out.println("操作失败(源文件不存在或移动失败)" ); } } catch (Exception e) { e.printStackTrace(); } } }
(2)编程实现一个类“MyFSDataInputStream”,该类继承“org.apache.hadoop.fs.FSDataInput Stream”,要求如下: 实现按行读取HDFS中指定文件的方法“readLine()”,如果读到文件末尾,则返回空,否则返回文件一行的文本。
shell
Java
自己创建好MyFSDataInputStream.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import java.io.*;public class MyFSDataInputStream extends FSDataInputStream { public MyFSDataInputStream (InputStream in) { super (in); } public static String readline (BufferedReader br) throws IOException { char [] data = new char [1024 ]; int read = -1 ; int off = 0 ; while ( (read = br.read(data, off, 1 )) != -1 ) { if (String.valueOf(data[off]).equals("\n" ) ) { off += 1 ; break ; } off += 1 ; } if (off > 0 ) { return String.valueOf(data); } else { return null ; } } public static void cat (Configuration conf, String remoteFilePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path (remoteFilePath); FSDataInputStream in = fs.open(remotePath); BufferedReader br = new BufferedReader (new InputStreamReader (in)); String line = null ; while ( (line = MyFSDataInputStream.readline(br)) != null ) { System.out.println(line); } br.close(); in.close(); fs.close(); } public static void main (String[] args) { Configuration conf = new Configuration (); conf.set("fs.default.name" ,"hdfs://localhost:9000" ); String remoteFilePath = "/user/hadoop/text.txt" ; try { MyFSDataInputStream.cat(conf, remoteFilePath); } catch (Exception e) { e.printStackTrace(); } } }
(3)查看Java帮助手册或其他资料,用“java.net.URL”和“org.apache.hadoop.fs.FsURLStream HandlerFactory”编程来输出HDFS中指定文件的文本到终端中。
Java
用回HDFSApi
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 import org.apache.hadoop.fs.*;import org.apache.hadoop.io.IOUtils;import java.io.*;import java.net.URL;public class HDFSApi { static { URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory ()); } public static void main (String[] args) throws Exception { String remoteFilePath = "hdfs://localhost:9000/user/hadoop/text.txt" ; InputStream in = null ; try { in = new URL (remoteFilePath).openStream(); IOUtils.copyBytes(in, System.out, 4096 , false ); } finally { IOUtils.closeStream(in); } } }