阅读PDF文件附件注释与iTextSharp的
我有以下问题。我有附在里面注释的XML文件的PDF文件。
还不如嵌入式文件而是作为注解。现在,我尝试从以下链接code来阅读:
I have the following issue. I have a PDF with a XML file attached as annotation inside it. Not as embedded file but as annotation. Now I try to read it with the code from the following link:
iTextSharp - 如何打开/读/解压缩文件的附件
它适用于嵌入式文件,但没有文件attachemts作为注解。
It works for embedded files but not for file attachemts as annotations.
我谷歌从PDF中提取注释,并找出以下链接:
阅读PDF批注与iText的
I Google for extracting annotations from PDF and find out the following link: Reading PDF Annotations with iText
所以注释类型为文件附件集注
So the annotation type is "File Attachment Annotations"
有人能证明工作的例子?
Could someone show a working example?
在此先感谢您的帮助。
由于常常在涉及的iText和iTextSharp的问题,先要看的上itextpdf.com 的关键字列表。这里你可以找到文件附件,提取附件从的iText在行动 - 第二版:
As so often in questions concerning iText and iTextSharp, one should first look at the keyword list on itextpdf.com. Here you find File attachment, extract attachments referencing two Java samples from iText in Action — 2nd Edition:
- part4.chapter16。 KubrickDvds
- part4.chapter16。 KubrickDocumentary
- part4.chapter16.KubrickDvds
- part4.chapter16.KubrickDocumentary
中类似的 Web化iTextSharp的例子是
KubrickDvds包含以下方法 extractAttachments
/ ExtractAttachments
来提取文件附件注释:
KubrickDvds contains the following method extractAttachments
/ExtractAttachments
to extract File Attachment Annotations:
Java的:
/**
* Extracts attachments from an existing PDF.
* @param src the path to the existing PDF
*/
public void extractAttachments(String src) throws IOException {
PdfReader reader = new PdfReader(src);
PdfArray array;
PdfDictionary annot;
PdfDictionary fs;
PdfDictionary refs;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
array = reader.getPageN(i).getAsArray(PdfName.ANNOTS);
if (array == null) continue;
for (int j = 0; j < array.size(); j++) {
annot = array.getAsDict(j);
if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) {
fs = annot.getAsDict(PdfName.FS);
refs = fs.getAsDict(PdfName.EF);
for (PdfName name : refs.getKeys()) {
FileOutputStream fos
= new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name)));
fos.flush();
fos.close();
}
}
}
}
reader.close();
}
C#:
/**
* Extracts attachments from an existing PDF.
* @param src the path to the existing PDF
* @param zip the ZipFile object to add the extracted images
*/
public void ExtractAttachments(byte[] src, ZipFile zip) {
PdfReader reader = new PdfReader(src);
for (int i = 1; i <= reader.NumberOfPages; i++) {
PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
if (array == null) continue;
for (int j = 0; j < array.Size; j++) {
PdfDictionary annot = array.GetAsDict(j);
if (PdfName.FILEATTACHMENT.Equals(
annot.GetAsName(PdfName.SUBTYPE)))
{
PdfDictionary fs = annot.GetAsDict(PdfName.FS);
PdfDictionary refs = fs.GetAsDict(PdfName.EF);
foreach (PdfName name in refs.Keys) {
zip.AddEntry(
fs.GetAsString(name).ToString(),
PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name))
);
}
}
}
}
}
KubrickDocumentary包含以下方法 extractDocLevelAttachments
/ ExtractDocLevelAttachments
来提取文档级附件:
KubrickDocumentary contains the following method extractDocLevelAttachments
/ExtractDocLevelAttachments
to extract document level attachments:
Java的:
/**
* Extracts document level attachments
* @param filename a file from which document level attachments will be extracted
* @throws IOException
*/
public void extractDocLevelAttachments(String filename) throws IOException {
PdfReader reader = new PdfReader(filename);
PdfDictionary root = reader.getCatalog();
PdfDictionary documentnames = root.getAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES);
PdfDictionary filespec;
PdfDictionary refs;
FileOutputStream fos;
PRStream stream;
for (int i = 0; i < filespecs.size(); ) {
filespecs.getAsString(i++);
filespec = filespecs.getAsDict(i++);
refs = filespec.getAsDict(PdfName.EF);
for (PdfName key : refs.getKeys()) {
fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));
fos.write(PdfReader.getStreamBytes(stream));
fos.flush();
fos.close();
}
}
reader.close();
}
C#:
/**
* Extracts document level attachments
* @param PDF from which document level attachments will be extracted
* @param zip the ZipFile object to add the extracted images
*/
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) {
PdfReader reader = new PdfReader(pdf);
PdfDictionary root = reader.Catalog;
PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles =
documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; ) {
filespecs.GetAsString(i++);
PdfDictionary filespec = filespecs.GetAsDict(i++);
PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
foreach (PdfName key in refs.Keys) {
PRStream stream = (PRStream) PdfReader.GetPdfObject(
refs.GetAsIndirectObject(key)
);
zip.AddEntry(
filespec.GetAsString(key).ToString(),
PdfReader.GetStreamBytes(stream)
);
}
}
}
(出于某种原因,C#示例将提取的文件在一些ZIP文件同时的Java版本把它们放到文件系统...哦...好)
(For some reason the c# examples put the extracted files in some ZIP file while the Java versions put them into the file system... oh well...)