在R中的JSON文件的循环中循环
我试图将一堆JSON文件聚合为一个文件,以获取三个来源和三年的时间.到目前为止,我只能通过乏味的方式来做到这一点,但我确信我可以以一种更聪明,更优雅的方式来做到这一点.
I am trying to aggregate a bunch of JSON files in to a single one for three sources and three years. While so far I have only been able to do it through the tedious way, I am sure I could do it in a smarter and more elegant manner.
json1 <- lapply(readLines("NYT_1989.json"), fromJSON)
json2 <- lapply(readLines("NYT_1990.json"), fromJSON)
json3 <- lapply(readLines("NYT_1991.json"), fromJSON)
json4 <- lapply(readLines("WP_1989.json"), fromJSON)
json5 <- lapply(readLines("WP_1990.json"), fromJSON)
json6 <- lapply(readLines("WP_1991.json"), fromJSON)
json7 <- lapply(readLines("USAT_1989.json"), fromJSON)
json8 <- lapply(readLines("USAT_1990.json"), fromJSON)
json9 <- lapply(readLines("USAT_1991.json"), fromJSON)
jsonl <- list(json1, json2, json3, json4, json5, json6, json7, json8, json9)
请注意,从1989年到1991年这三个文件的年期间都一样.有什么想法吗?谢谢!
Note that the year period goes equally for the three files from 1989 to 1991. Any ideas? Thanks!
PS:每个文件中数据的示例:
PS: Example of the data inside each file:
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. ", "title": "Prospects;"}
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' ", "title": "Upheaval in the East: Espionage;"}
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. ", "title": "Coping With the Economic Prospects of 1990"}
在这里:
require(jsonlite)
filelist <- c("NYT_1989.json","NYT_1990.json","NYT_1991.json",
"WP_1989.json", "WP_1990.json","WP_1991.json",
"USAT_1989.json","USAT_1990.json","USAT_1991.json")
newJSON <- sapply(filelist, function(x) fromJSON(readLines(x)))
仅从输入文件的每一行中读取body
条目.
您询问如何仅读取JSON文件的子集.引用的文件数据实际上不是JSON格式.就像JSON一样,因此我们必须将输入修改为fromJSON()
才能正确读取数据.我们从fromJSON()$body
解引用结果以仅提取body
变量.
Read in just the body
entry from each line of the input file.
You asked about how to just read in a subset of the JSON file. The file data referenced isn't actually JSON format. It is JSON like, hence we have to modify the input to fromJSON()
to correctly read in the data. We dereference the result from fromJSON()$body
to extract just the body
variable.
filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body)
newJSON
结果
> filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
> newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body)
> newJSON
./data/NYT_1989.json
[1,] "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2,] "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3,] "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
./data/NYT_1990.json
[1,] "Blue temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2,] "BLUE1: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3,] "GREEN4 the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
您可能会发现以下应用教程很有用:
You might find the following apply tutorial useful:
我还建议阅读:
- R地狱-第4章-过度矢量化
- R Inferno - Chapter 4 - Over-Vectorizing
当我说这本在线免费书对我有很大帮助时,请相信我.它也已经多次证实我是白痴:-)
trust my when I say this online free book has helped me a lot. It has also confirmed I am an idiot on multiple occasions :-)