在Power BI中将HTML表转换为纯文本
我是Power BI的初学者.我必须创建一个包含共享点数据的报告.我已将数据导入到数据集中.但是,某些列的文本带有html表标签或样式,如下所示-
I am a beginner in power BI. I have to create a report with share point data. I have imported the data into dataset. However, some columns have text with html table tags or style like below -
<div class="ExternalClass5DA0D04953B047459697675F266FEABF">
<p></p>
<table width="395" border="0" cellspacing="0" cellpadding="0" style="width:296pt;">
<tbody>
<tr height="115" style="height:86.4pt;">
<td width="395" height="115" class="xl64" style="width:296pt;height:86.4pt;">
I am working on issue. I shall update the progress. <br>
</td>
</tr>
</tbody>
</table>
<p><br></p>
</div>
但是我只想显示纯文本,即我正在研究问题.我将更新进度."
But I would like to show the plain text only which is "I am working on issue. I shall update the progress."
来自此方便的功能,用于剥离所有HTML标签:
From this community thread, you can find a handy function for stripping all the HTML tags:
这是核心逻辑(忽略文档元数据以提高可读性):
Here's the core logic (ignoring the documentation metadata for readability):
let func = (HTML) =>
let
Check = if Value.Is(Value.FromText(HTML), type text) then HTML else "",
Source = Text.From(Check),
SplitAny = Text.SplitAny(Source,"<>"),
ListAlternate = List.Alternate(SplitAny,1,1,1),
ListSelect = List.Select(ListAlternate, each _<>""),
TextCombine = Text.Combine(ListSelect, "")
in
TextCombine
in
func
具有这些方便的代码,创建一个新的空白查询,并将上面的代码粘贴到高级编辑器中,并为其命名,例如, TextFromHTML
.
Having this handy bit of code, create a new blank query and paste the above code into the advanced editor and give it a name, say, TextFromHTML
.
一旦定义了该函数,就可以在任何查询中使用它.例如,以下是转换列 ColWithHTML
的步骤的步骤:
Once you have that function defined, you can use it in any of your queries. For example, here's what a step to transform the column ColWithHTML
might look like:
Table.TransformColumns(#"Prior Step", {{"ColWithHTML", each TextFromHTML(_), type text}})