如何避免标记空的< TR< TD>.使用Itext 5将单元格转换为PDF

问题描述:

我使用i文本5从html作为输入生成PDF. 作为PDF可访问性的一部分,添加

I an using i text 5 to generate the PDF from html as input . As part of PDF accessibility,adding pdfwriter.settagged().

但是这里所有的空和非空标记都在标记.请您能帮助避免避免标记非空html标记

But here all the empty and non-empty tags are tagging .can you please help how to avoid to tagging the non empty html tags

您可以直接使用

You can do it directly with pdfHTML (basically the solution for HTML to PDF conversion in iText 7).

ConverterProperties props = new ConverterProperties();
props.setTagWorkerFactory(new DefaultTagWorkerFactory() {
                @Override
                public ITagWorker getCustomTagWorker(
                        IElementNode tag, ProcessorContext context) {
                    if (tag.name().equals(TagConstants.TD)) {
                        if (!tag.childNodes().isEmpty()) {
                            return new TdTagWorker(tag, context);
                        } else {
                            return new SpanTagWorker(tag, context);
                        }
                    }


                    return null;
                }
            });


PdfDocument doc = new PdfDocument(new PdfWriter(DEST));
doc.setTagged();

HtmlConverter.convertToPdf(new FileInputStream(ORIG), doc, props);

在上面的代码中,您可以使用文档.在这种情况下,我只是将空的TD标签更改为Span元素,即可实现所需的行为(多余的TD标签消失了).

On the code above, you can use setTagWorkerFactory to have a custom behavior for your tags as detailed in the documentation. In this specific case, I'm simply changing empty TD tags into a Span element, which achieves the desired behavior (the superfluous TD tag disappears).

(老实说,这依赖于TR工作者无法解析SPAN标签,因此它只是会被发布.如果我想出一个更优雅的解决方案,我会更新答案)

(to be completely honest, this relies on the inability of the TR worker to parse the SPAN tag, so it just jumps ship. I'll update the answer if I come up with a more elegant solution)