linux命令行抓取网页快照,网页转pdf, 撑持flash显示-(xvfb+CutyCapt / wkhtmltopdf)
linux命令行抓取网页快照,网页转pdf, 支撑flash显示-(xvfb+CutyCapt / wkhtmltopdf)
Install cutycapt and headless webkit on Ubuntu to take screenshots of other websites Just type ./install and it should install all the necessary libraries for you. At the last step, it will also take a screenshot of google and output it to example.png to see that everything worked.
install.sh:
#!/bin/bash echo "now installing cutycapt" sudo apt-get update -y sudo apt-get install build-essential -y sudo apt-get install xvfb -y sudo apt-get install xfs xfonts-scalable xfonts-100dpi -y sudo apt-get install libgl1-mesa-dri -y sudo apt-get install subversion libqt4-webkit libqt4-dev g++ -y mkdir ~/scripts cd ~/scripts svn co https://cutycapt.svn.sourceforge.net/svnroot/cutycapt cd cutycapt/CutyCapt qmake make xvfb-run --server-args="-screen 0, 1024x768x24" ./CutyCapt --url=http://www.google.com --out=example.png
或者
xvfb(在命令行下实现对X-server的模拟,渲染图形进行缓存)-在没有安装X-Server的环境下提供图像渲染
CutyCapt(模拟浏览器对网页进行下载、HTML、css渲染、Javascript执行,并将最终渲染完成的网页进行快照)- 主力干将
Qt(CutyCapt是基于此框架开发的)
实践:
1.安装CutyCapt、Qt及相关软件包:
sudo apt-get install subversion libqt4-webkit libqt4-dev g++ svn co https://cutycapt.svn.sourceforge.net/svnroot/cutycapt cd cutycapt/CutyCapt qmake make
2.安装xvfb:
apt-get install xvfb
3.抓取测试:
xvfb-run –server-args="-screen 0, 1024×768x24" ./CutyCapt –url=http://www.zol.com.cn –out=zol.png
或者
cutycapt --url="http://google.com" --out=./google.jpg
参考:
http://cutycapt.sourceforge.net/
你也可以使用 wkhtmltopdf
用法:
#To convert a remote HTML file to PDF: wkhtmltopdf http://www.google.com google.pdf #To convert a local HTML file to PDF: wkhtmltopdf my.html my.pdf #You can also convert to PS files if you like: wkhtmltopdf my.html my.ps #The eler2.pdf sample file wkhtmltopdf http://geekz.co.uk/lovesraymond/archive/eler-highlights-2008 eler2.pdf -H --outline
视频: http://www.youtube.com/watch?v=Oy3XjawQjlQ
3、安装中文字库 #很多体系是没有中文的,所以你要安装中文字库,不然网页会显示方块的
sudo apt-get install ttf-arphic-ukai ttf-arphic-uming sudo apt-get install ttf-wqy-zenhei sudo fc-cache -v
4、安装flash插件 #如今web网站很多多少都有flash,为了别呈现一个方块,就顺手安装上吧
sudo apt-get install flashplugin-nonfree