如何使用 ActiveSupport 的“starts_with"删除 HTTP 链接?使用Nokogiri?

问题描述:

当我尝试这个时:

item.css("a").each do |a|
  if !a.starts_with? 'http://'
     a.replace a.content
  end
end

我明白了:

NoMethodError: undefined method 'starts_with?' for #<Nokogiri::XML::Element:0x1b48a60> 

当然有更简洁的方法,但这似乎有效.

Sure there is a cleaner way, but this seems to be working.

item.css("a").each do |a|
  unless a["href"].blank?
    if !a["href"].starts_with? 'http://' 
      a.replace a.content
    end
  end
end

问题是您试图在未实现它的对象上使用 starts_with 方法.

The problem is you're trying to use the starts_with method on an object that doesn't implement it.

item.css("a").each do |a|

将返回 a 中的 XML 节点.那些属于Nokogiri.你想要做的是将节点转换为文本,但只有你想检查的部分,因为它是节点的参数,可以这样访问:

will return XML nodes in a. Those belong to Nokogiri. What you want to do is convert the node to text, but only the part you want to check, which, because it's a parameter of the node, can be accessed like this:

a['href']

所以,你想使用这样的东西:

So, you want to use something like this:

item.css("a").each do |a|
  if !(a.starts_with?['href']('http://'))
     a.replace(a.content)
  end
end

这样做的缺点是您必须遍历文档中的每个 <a> 标记,这在包含大量链接的大页面上可能会很慢.

The downside to this is you have to walk through every <a> tag in the document, which can be slow on a big page with lots of links.

另一种方法是使用 XPath 的 starts-with 函数:

An alternate way to go about it is to use XPath's starts-with function:

require 'nokogiri'

item = Nokogiri::HTML('<a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>')
puts item.to_html

输出:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>
>> <a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>
>> </body></html>

以下是使用 XPath 的方法:

Here's how to do it using XPath:

item.search('//a[not(starts-with(@href, "http://"))]').each do |a|
  a.replace(a.content)
end
puts item.to_html

哪些输出:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>foo<a href="http://bar">bar</a>
>> </body></html>

使用 XPath 查找节点的优点是它都在编译的 C 中运行,而不是让 Ruby 来做.

The advantage to using XPath to find the nodes is it all runs in compiled C, rather than letting Ruby do it.