无法在Ruby中提取单个JSON值
我正在尝试抓取Reddit(无API),并且遇到了砖墙.在reddit上,每个页面都有一个JSON表示形式,只需将.json
附加到末尾即可看到,例如https://www.reddit.com/r/AskReddit.json
.
I'm in the process of trying to scrape reddit (API-free) and I've run into a brick wall. On reddit, every page has a JSON representation that can be seen simply by appending .json
to the end, e.g. https://www.reddit.com/r/AskReddit.json
.
我安装了NeatJS,并编写了一小段代码来清理并打印JSON:
I installed NeatJS, and wrote a small chunk of code to clean the JSON up and print it:
require "rubygems"
require "json"
require "net/http"
require "uri"
require 'open-uri'
require 'neatjson'
url = ("https://www.reddit.com/r/AskReddit.json")
result = JSON.parse(open(url).read)
neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)
puts neatJS
工作正常:
(还有更多方法,它还会继续进行几页,完整的JSON在这里: http://pastebin.com/HDzFXqyU )
但是,当我更改它以仅提取所需的值时:
However, when I changed it to extract only the values I want:
url = ("https://www.reddit.com/r/AskReddit.json")
result = JSON.parse(open(url).read)
neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)
neatJS.each do |data|
puts data["title"]
puts data["url"]
puts data["id"]
end
它给了我一个错误:
002----extractallaskredditthreads.rb:17:in `<main>': undefined method `each' for #<String:0x0055f948da9ae8> (NoMethodError)
我一直在尝试提取器的不同变体大约两天,但没有一个起作用.我感觉好像缺少了非常明显的东西.如果有人能指出我在做什么错,那将不胜感激.
I've been trying different variations of the extractor for about two days and none of them have worked. I feel like I'm missing something incredibly obvious. If anyone could point out what I'm doing wrong, that would be appreciated.
编辑
事实证明我输入了错误的变量名:
It turns out I had the wrong variable name:
neatSJ =/= neatJS
但是,更正此错误只会改变我得到的错误:
However, correcting this only changes the error I got:
002----extractallaskredditthreads.rb:17:in `<main>': undefined method `each' for #<String:0x0055f948da9ae8> (NoMethodError)
正如我所说,我一直在尝试多种方法来提取标签,这可能造成了我的错字.
And as I said, I have been attempting multiple ways of extracting the tags, which may have caused my typo.
在此代码中:
result = JSON.parse(open(url).read)
neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)
... result
是Ruby哈希对象,是使用JSON.parse
将JSON解析为Ruby对象的结果.同时,neatJS
是字符串,是在result
哈希上调用JSON.neat_generate
的结果.在字符串上调用each
没有任何意义.如果要访问JSON结构中的值,则要使用result
对象,而不是neatJS
字符串:
...result
is a Ruby Hash object, the result of parsing the JSON into a Ruby object with JSON.parse
. Meanwhile, neatJS
is a String, the result of calling JSON.neat_generate
on the result
Hash. It doesn't make sense to call each
on a string. If you want to access the values inside the JSON structure, you want to use the result
object, not the neatJS
string:
children = result["data"]["children"]
children.each do |child|
puts child["data"]["title"]
puts child["data"]["url"]
puts child["data"]["id"]
end