如何编写文本搜索和替换PDF文件

问题描述:

我如何能够以编程方式搜索和替换大量PDF文件中的某些文本?我想删除已添加到一组文件的URL。我已经能够在Adobe Pro的批处理下使用javascript删除链接,但链接文本仍然存在。我见过使用文本touchup的建议,手动工作,但我不想手动修改1300个文件。

How would I be able to programmatically search and replace some text in a large number of PDF files? I would like to remove a URL that has been added to a set of files. I have been able to remove the link using javascript under Batch Processing in Adobe Pro, but the link text remains. I have seen recommendations to use text touchup, which works manually, but I don't want to modify 1300 files manually.

由于文档格式的图形性质,在PDF中查找文本本质上很难 - 您搜索的字母在文件中可能不是连续的。也就是说, CAM :: PDF 具有一些搜索替换功能和启发式功能。尝试 changepagestring.pl ,看看它是否适用于您的PDF。

Finding text in a PDF can be inherently hard because of the graphical nature of the document format -- the letters you are searching for may not be contiguous in the file. That said, CAM::PDF has some search-replace capabilities and heuristics. Give changepagestring.pl a try and see if it works on your PDFs.