使用Apache POI从文档中获取图像

使用Apache POI从文档中获取图像

问题描述:

我正在使用Apache Poi从docx中读取图像.

I am using Apache Poi to read images from docx.

这是我的代码:

enter code here

public Image ReadImg(int imageid) throws IOException {
    XWPFDocument doc = new XWPFDocument(new FileInputStream("import.docx"));
    BufferedImage jpg = null;
    List<XWPFPictureData> pic = doc.getAllPictures();
    XWPFPictureData pict = pic.get(imageid);
    String extract = pict.suggestFileExtension();
    byte[] data = pict.getData();
    //try to read image data using javax.imageio.* (JDK 1.4+)
    jpg = ImageIO.read(new ByteArrayInputStream(data));
    return jpg;
}

它可以正确读取图像,但顺序不正确.

It reads images properly but not in order wise.

例如,如果文档包含

image1.jpeg image2.jpeg image3.jpeg image4.jpeg image5.jpeg

image1.jpeg image2.jpeg image3.jpeg image4.jpeg image5.jpeg

它显示为

image4 image3 图片1 图片5 image2

image4 image3 image1 image5 image2

您能帮我解决吗?

我想按顺序阅读图像.

谢谢, 西西克

public static void extractImages(XWPFDocument docx) {
    try {

        List<XWPFPictureData> piclist = docx.getAllPictures();
        // traverse through the list and write each image to a file
        Iterator<XWPFPictureData> iterator = piclist.iterator();
        int i = 0;
        while (iterator.hasNext()) {
            XWPFPictureData pic = iterator.next();
            byte[] bytepic = pic.getData();
            BufferedImage imag = ImageIO.read(new ByteArrayInputStream(bytepic));
            ImageIO.write(imag, "jpg", new File("D:/imagefromword/" + pic.getFileName()));
            i++;
        }

    } catch (Exception e) {
        System.exit(-1);
    }

}