如何从提取的PDF文本中获取字体属性(字体大小,字体样式,字体颜色)?
问题描述:
大家好,
我使用下面的代码从pdf文件中提取文字,
Hi All,
I am using below code to extract text from pdf file,
public string ReadPdfFile()
{
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader(@"\\FilePath");
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
}
reader.Close();
}
catch (Exception ex)
{
}
return strText;
}
我需要获取所提取文本的字体属性(字体大小,字体样式,字体颜色)比较。
我需要在这行代码下面应用这个逻辑,
I need to get the font properties(font size, font style, font colour) of the extracted text for comparison.
I need that logic to be applied below this line of code,
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
有谁知道如何从提取的pdf文本中获取字体大小,字体样式,字体颜色等字体属性。
提前致谢,
Kane
添加的代码块[/ edit]
答
你可以通过使用iSharptext来实现这个目标
查看这些链接..
http://stackoverflow.com/questions/6882098/how-can-i-get-text-formatting -with-itextsharp [ ^ ]
和
http://stackoverflow.com/questions/3750150/i-want-to-export-pdf-to-xml-with-font-information-as-attribute-values [ ^ ]
或试试这个开源项目
http://sourceforge.net/projects/pdfsharp/ [ ^ ]
you can achieve this by using iSharptext
check these links..
http://stackoverflow.com/questions/6882098/how-can-i-get-text-formatting-with-itextsharp[^]
and
http://stackoverflow.com/questions/3750150/i-want-to-export-pdf-to-xml-with-font-information-as-attribute-values[^]
or try this open source project
http://sourceforge.net/projects/pdfsharp/[^]