阅读PDF文件使用iTextSharp的
我使用C#作为编程平台和 iTextSharp的
阅读PDF内容。我用下面的代码阅读PDF内容,但现在看来,这每页读取。
I'm using C# as programming platform and iTextSharp
to read pdf content. I have used the below code to read pdf content but it seems it read per page.
public string ReadPdfFile(object Filename)
{
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader((string)Filename);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
}
reader.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
return strText;
}
谁能帮我如何,我可以写一个读码每行PDF内容?
Can anyone help me on how can I write a code reading pdf content per line?
试试这个,改用的 LocationTextExtractionStrategy
的 SimpleTextExtractionStrategy
到文本返回,将增加新的行字符。然后你可以使用 strText.Split('\\\
来的文字分成
')的String []
和使用它在每行的基础。
Try this, use the LocationTextExtractionStrategy
instead of the SimpleTextExtractionStrategy
it will add new line characters to the text returned. Then you can use strText.Split('\n')
to split your text into a string[]
and consume it on a per line basis.