使用 Apache POI 从文档中获取图像

使用 Apache POI 从文档中获取图像

问题描述:

我正在使用 Apache Poi 从 docx 读取图像.

I am using Apache Poi to read images from docx.

这是我的代码:

enter code here

public Image ReadImg(int imageid) throws IOException {
    XWPFDocument doc = new XWPFDocument(new FileInputStream("import.docx"));
    BufferedImage jpg = null;
    List<XWPFPictureData> pic = doc.getAllPictures();
    XWPFPictureData pict = pic.get(imageid);
    String extract = pict.suggestFileExtension();
    byte[] data = pict.getData();
    //try to read image data using javax.imageio.* (JDK 1.4+)
    jpg = ImageIO.read(new ByteArrayInputStream(data));
    return jpg;
}

它可以正确读取图像,但不是按顺序读取.

It reads images properly but not in order wise.

例如,如果文档包含

image1.jpeg图像2.jpeg图像3.jpegimage4.jpegimage5.jpeg

image1.jpeg image2.jpeg image3.jpeg image4.jpeg image5.jpeg

读起来

图片4图像3图片1图像5图片2

image4 image3 image1 image5 image2

你能帮我解决吗?

我想按顺序阅读图像.

谢谢,西提克

public static void extractImages(XWPFDocument docx) {
    try {

        List<XWPFPictureData> piclist = docx.getAllPictures();
        // traverse through the list and write each image to a file
        Iterator<XWPFPictureData> iterator = piclist.iterator();
        int i = 0;
        while (iterator.hasNext()) {
            XWPFPictureData pic = iterator.next();
            byte[] bytepic = pic.getData();
            BufferedImage imag = ImageIO.read(new ByteArrayInputStream(bytepic));
            ImageIO.write(imag, "jpg", new File("D:/imagefromword/" + pic.getFileName()));
            i++;
        }

    } catch (Exception e) {
        System.exit(-1);
    }

}