如何使用'catdoc'显示以UTF-8编码的Dock文件

问题描述:

我有很多docx文件,我想在终端上阅读它们.我找到了catdoc http://www.wagner.pp.ru/~vitus/software/catdoc/

I have a a lot of docx files and I want to read them on terminal. And I found catdoc http://www.wagner.pp.ru/~vitus/software/catdoc/

当我使用它时,输出只是不可读的字符.我的docx文件以utf-8编码.我尝试了"catdoc -u my_file.docx",但不起作用.

When I use it, the output are just unreadable chars. My docx files are encoded in utf-8. I tried "catdoc -u my_file.docx" but does not work.

请帮助.非常感谢.

docx是压缩的XML文件.

docx are zipped XML files.

要提取和剥离XML,请尝试基于

To extract and strip the XML try something based on

unzip -p "*.docx" word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'

来自命令行fu