查看Hadoop里的LZO资料的内容

查看Hadoop里的LZO文件的内容

  最近常常需要查看LZO文件里面的内容,这些文件通常很大,放在hdfs上。我没有好的方法,我以前偶尔查看其中内容都是直接get到本地然后用lzop解压缩然后再more的。这样做当你偶尔使用的时候即使文件稍微大点,也许也是可以接受的。但现在我需要常常grep里面的内容,就不那么欢乐了。

  所以写了个shell脚本lzoc[ lzo cat],用来专门查看HDFS里LZO文件的内容,正常情况下它不输出任何多余的东西,这样就可以和more 、 head、tail等工具一起结合使用了。

  代码如下:

  它有三个选项:

        -c 指示删除已经存在当前目录的同名文件,这往往是为了删除旧的副本而制定的,

       -d 指示最后阶段删除当前目录里中间文件,因为我们会把文件从hdfs中get出来

       -i 指示输出一些交互信息,如果你cat出来的内容要用作它用,那么你不要使用这个选项

使用示例:

        $./lzoc  /user/hadoop/output/filename.lzo | more


#! /bin/sh

#description:
#   cat the lzo file on hadoop
filePath=""       #full Path of the hadoop lzo file
lzoFileName=""    #file with .lzo as extension after hadoop fs -get ....
fileName=""       #file name without extension-name

deleteAfterExecute=N  #has -c option, which indicates that old files should be deleted
deleteBeforeExecute=N #has -d option, which indicates that related files should be deleted in the final state
interactiveMsg=N            #only the text of the file should print

if [ $# -lt 1 ]
  then 
    echo "must has aleast one parameter, which is the fileName."
    exit -1
else
    #normal command style
    eval filePath=\${$#}  #get the last parameter, must guarantee that it is less then 9
    lzoFileName=${filePath##*/}
    fileName=${lzoFileName%.lzo*}
fi

#parase options
if [ $# -gt 1 ]
  then
    while getopts cdi OPTION
    do
      case $OPTION
        in
          c)
            deleteBeforeExecute=Y;;
          d)
            deleteAfterExecute=Y;;
          i)
            interactiveMsg=Y;;
          \?)
            echo "illegal option:$OPTION";
            exit -2;;
      esac
    done
fi


#delete old file if needed
if [ $deleteBeforeExecute == "Y" ]; then
    if [ -e $fileName ]; then
      echo "delete old file"
        rm $fileName;
    fi
    if [ -e $lzoFileName ]; then
      echo "delete old lzo file"
        rm $lzoFileName
    fi
fi

#make sure hadoop is on
which hadoop > /dev/null 2>&1
if [ $? -eq 1 ]; then
  echo "Command not exist,hadoop may not have been started."
  exit -3
fi

#make sure fileExist,should not be a directory
hadoop fs -test -e $filePath > /dev/null 2>&1 
if [ $? -ne 0 ]; then
  echo "No such file for directory:"$filePath
  exit -4
fi

#can not cat a directory
hadoop fs -test -d $filePath > /dev/null 2>&1
if [ $? -eq 0 ]; then
  echo "Can not cat a directory:"$filePath
  exit -4
fi

#make sure lzop is installed
which lzop > /dev/null 2>&1
if [ $? -eq 1 ]; then
  echo "Tool missed:lzop is not installed."
  exit -5
fi

#test whether lzo file exist
if [ -e $lzoFileName ]; then
  if [ $interactiveMsg == "Y" ]; then
    echo "LZO file already exist."
  fi
else
  if [ $interactiveMsg == "Y" ]; then
    echo "LZO file not exist."
  fi
  #get the file from hadoop
  hadoop fs -get $filePath .
fi

#test whether file exist
if [ -e $fileName ]; then
  if [ $interactiveMsg == "Y" ]; then
    echo "File already exist."
  fi
else 
  if [ $interactiveMsg == "Y" ]; then
    echo "File not exist."
  fi
  #decomopress the lzo file
  lzop -dv $lzoFileName > /dev/null 2>&1
fi

#clear

#cat the file
cat $fileName

#delete files in the final state is needed
if [ $deleteAfterExecute == "Y" ]
  then
    if [ -e $fileName ]; then
      rm $fileName
    fi
    if [ -e $lzoFileName ]; then
      rm $lzoFileName
    fi
    if [ $interactiveMsg == "Y" ]; then
      echo "files has been deleted"
    fi
fi