如何使用iText java读取PDF中的表格？

问题描述：

我对使用java的pdf处理不太了解。我想使用iText java库读取PDF文件中的表格。如何进行？

I dont have much idea on pdf processing using java.I want to read a table in a PDF file using the iText java library. How to proceed?

答

您可以从内容流中提取文本，但对于普通PDF，结果将是纯文本（没有任何结构）。如果页面上有表格，则该表格不会被识别。您将获得内容和一些空白区域，但这不是表格结构！只有拥有标记的PDF，才能获得XML文件。如果PDF包含被识别为表格标签的标签，这将反映在PDF中。

You can extract text from a content stream, but for ordinary PDFs, the result will be plain text (without any structure). If there's a table on the page, that table won't be recognized as such. You'll get the content and some white space, but that's not a tabular structure! Only if you have a tagged PDF, you can obtain an XML-file. If the PDF contains tags that are recognized as table tags, this will be reflected in the PDF.

这就是我发现的这里

如何使用iText java读取PDF中的表格？

相关推荐