使用pdfbox将pdf文件转换为java中的图像时缺少文本
问题描述:
我想将PDF页面转换为图像文件.当我使用Java将PDF页面转换为图像时,缺少文本.
I want to convert a PDF page to image file. Text is missing when I convert a PDF page to image using java.
我要转换的文件 46_2.pdf 转换后显示为 46_2.png
The file which I want to convert 46_2.pdf after converting it shown me like 46_2.png
代码:
import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class ConvertPDFPageToImageWithoutText {
public static void main(String[] args) {
try {
String oldPath = "C:/PDFCopy/46_2.pdf";
File oldFile = new File(oldPath);
if (oldFile.exists()) {
PDDocument document = PDDocument.load(oldPath);
List<PDPage> list = document.getDocumentCatalog().getAllPages();
for (PDPage page : list) {
BufferedImage image = page.convertToImage();
File outputfile = new File("C:/PDFCopy/image.png");
ImageIO.write(image, "png", outputfile);
document.close();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
答
由于您使用的是PDFBox,请尝试使用这篇文章似乎与您要执行的操作有关.
Since you're using PDFBox, try using PDFImageWriter.writeToImage instead of PDPage.convertToImage. This post seems relevant to what you are trying to do.