使用/ CCITTFaxDecode过滤器从PDF中提取图像
我有一个从扫描软件生成的pdf。 pdf每页有1个TIFF图像。我想从每个页面中提取TIFF图像。
I have a pdf that was generated from scanning software. The pdf has 1 TIFF image per page. I want to extract the TIFF image from each page.
我正在使用iTextSharp并且我已成功找到图像并可以从 PdfReader.GetStreamBytesRaw 方法。问题是,正如我之前发现的那样,iTextSharp不包含 PdfReader.CCITTFaxDecode
方法。
I am using iTextSharp and I have successfully found the images and can get back the raw bytes from the PdfReader.GetStreamBytesRaw
method. The problem is, as many before me have discovered, iTextSharp does not contain a PdfReader.CCITTFaxDecode
method.
什么我知道吗?即使没有iTextSharp,我也可以在记事本中打开pdf并找到 / Filter / CCITTFaxDecode
的流,我知道来自 / DecodeParams
它正在使用CCITTFaxDecode组4。
What else do I know? Even without iTextSharp I can open the pdf in notepad and find the streams with /Filter /CCITTFaxDecode
and I know from the /DecodeParams
that it is using CCITTFaxDecode group 4.
有没有人知道如何从我的pdf中获取CCITTFaxDecode过滤图像?
Does anyone out there know how I can get the CCITTFaxDecode filter images out of my pdf?
干杯,
Kahu
Cheers, Kahu
实际上,vbcrlfuser的回答对我有帮助,但是当前版本的BitMiracle.LibTiff.NET的代码不太正确,因为我可以下载它。在当前版本中,等效代码如下所示:
Actually, vbcrlfuser's answer did help me, but the code was not quite correct for the current version of BitMiracle.LibTiff.NET, as I could download it. In the current version, equivalent code looks like this:
using iTextSharp.text.pdf;
using BitMiracle.LibTiff.Classic;
...
Tiff tiff = Tiff.Open("C:\\test.tif", "w");
tiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(pd.Get(PdfName.WIDTH).ToString()));
tiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(pd.Get(PdfName.HEIGHT).ToString()));
tiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX4);
tiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(pd.Get(PdfName.BITSPERCOMPONENT).ToString()));
tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
tiff.WriteRawStrip(0, raw, raw.Length);
tiff.Close();
使用上面的代码,我终于在C中获得了有效的Tiff文件:\test.tif。谢谢你,vbcrlfuser!
Using the above code, I finally got a valid Tiff file in C:\test.tif. Thank you, vbcrlfuser!