如何从JavaScript中删除字符串中的HTML标记?

如何从JavaScript中删除字符串中的HTML标记?

问题描述:


可能重复:

从文本JavaScript中删除HTML

如何删除JavaScript中的字符串中的HTML?

How can I strip the HTML from a string in JavaScript?

使用浏览器的解析器可能是当前浏览器中最好的选择。以下内容适用,但有以下注意事项:

Using the browser's parser is the probably the best bet in current browsers. The following will work, with the following caveats:


  • 您的HTML在< div> 元素。包含在< body> < html> < head> 标记在< div> 中无效,因此无法正确解析。

  • 的textContent (DOM标准属性)和 innerText (非标准)属性不相同。例如, textContent 将包含< script> 元素中的文本,而 innerText 不会(在大多数浏览器中)。这仅影响IE< = 8,这是唯一不支持 textContent 的主要浏览器。

  • HTML不包含< script> 元素。

  • HTML不是 null

  • HTML来自可靠来源。使用任意HTML,可以执行任意不受信任的JavaScript。这个例子来自Mike Samuel对重复问题的评论:< img onerror ='alert(\可以在这里运行任意JS)src = bogus>

  • Your HTML is valid within a <div> element. HTML contained within <body> or <html> or <head> tags is not valid within a <div> and may therefore not be parsed correctly.
  • textContent (the DOM standard property) and innerText (non-standard) properties are not identical. For example, textContent will include text within a <script> element while innerText will not (in most browsers). This only affects IE <=8, which is the only major browser not to support textContent.
  • The HTML does not contain <script> elements.
  • The HTML is not null
  • The HTML comes from a trusted source. Using this with arbitrary HTML allows arbitrary untrusted JavaScript to be executed. This example is from a comment by Mike Samuel on the duplicate question: <img onerror='alert(\"could run arbitrary JS here\")' src=bogus>

代码:

var html = "<p>Some HTML</p>";
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || "";