如何从JavaScript中删除字符串中的HTML标记?
问题描述:
可能重复:
从文本JavaScript中删除HTML
如何删除JavaScript中的字符串中的HTML?
How can I strip the HTML from a string in JavaScript?
答
使用浏览器的解析器可能是当前浏览器中最好的选择。以下内容适用,但有以下注意事项:
Using the browser's parser is the probably the best bet in current browsers. The following will work, with the following caveats:
- 您的HTML在
< div>
元素。包含在< body>
或< html>
或< head> 标记在
< div>
中无效,因此无法正确解析。 -
的textContent
(DOM标准属性)和innerText
(非标准)属性不相同。例如,textContent
将包含< script>
元素中的文本,而innerText
不会(在大多数浏览器中)。这仅影响IE< = 8,这是唯一不支持textContent
的主要浏览器。 - HTML不包含< script> 元素。
- HTML不是
null
- HTML来自可靠来源。使用任意HTML,可以执行任意不受信任的JavaScript。这个例子来自Mike Samuel对重复问题的评论:
< img onerror ='alert(\可以在这里运行任意JS)src = bogus>
- Your HTML is valid within a
<div>
element. HTML contained within<body>
or<html>
or<head>
tags is not valid within a<div>
and may therefore not be parsed correctly. -
textContent
(the DOM standard property) andinnerText
(non-standard) properties are not identical. For example,textContent
will include text within a<script>
element whileinnerText
will not (in most browsers). This only affects IE <=8, which is the only major browser not to supporttextContent
. - The HTML does not contain
<script>
elements. - The HTML is not
null
- The HTML comes from a trusted source. Using this with arbitrary HTML allows arbitrary untrusted JavaScript to be executed. This example is from a comment by Mike Samuel on the duplicate question:
<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>
代码:
var html = "<p>Some HTML</p>";
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || "";