C#代码从扫描的pdf文档中提取文本?
问题描述:
有人可以指导我从扫描的pdf文档中提取文本的一些c#代码示例吗?我经历了这么多帖子,但找不到合适的帖子,我可以理解如何做到这一点。那些使用的库不是免费的。有些库有限制,例如只能从pdf文档中提取前三页。要提取整个文档,它会要求我下载它们的完整版本库。所以完整版不是免费的。
如果不花钱,请指导我如何做到这一点。
Can anyone direct me to some c# code examples for extracting text from a scanned pdf document? I've went through with so many posts, but couldn't find a proper one where i can understand how to do this. Those libraries that were used are not free ones. Some libraries has restrictions like only able to extract first three pages from a pdf document. To extract whole document it asks me to download their full version of the library. So the full version is not for free.
Please direct me how to do this without spending money.
答
>
请参考以下网址
http://www.codeproject.com/Questions/243295/Is-这可以从文件中提取文本
Hi,
Please refer the following URL
http://www.codeproject.com/Questions/243295/Is-this-possible-to-Extract-Text-from-Scanned-PDF
你可以使用tesseract OCR .net https://code.google.com/p/tesseractdotnet/ [ ^ ]
You can use tesseract OCR .net https://code.google.com/p/tesseractdotnet/[^]