如何使用iText java读取PDF中的表格?
我对使用java的pdf处理不太了解。我想使用iText java库读取PDF文件中的表格。如何进行?
I dont have much idea on pdf processing using java.I want to read a table in a PDF file using the iText java library. How to proceed?
您可以从内容流中提取文本,但对于普通PDF,结果将是纯文本(没有任何结构)。如果页面上有表格,则该表格不会被识别。您将获得内容和一些空白区域,但这不是表格结构!只有拥有标记的PDF,才能获得XML文件。如果PDF包含被识别为表格标签的标签,这将反映在PDF中。
You can extract text from a content stream, but for ordinary PDFs, the result will be plain text (without any structure). If there's a table on the page, that table won't be recognized as such. You'll get the content and some white space, but that's not a tabular structure! Only if you have a tagged PDF, you can obtain an XML-file. If the PDF contains tags that are recognized as table tags, this will be reflected in the PDF.
这就是我发现的这里