为什么我的抓取代码没有从网页复制表格?
问题描述:
我正在尝试从网页中复制表格.我试过了:
I am trying to copy a table from a webpage. I tried:
library(XML)
url <- "https://www.cmegroup.com//content/cmegroup/en/trading/fx/g10/euro-fx_quotes_settlements_futures.html"
table1 <- readHTMLTable(url,stringsAsFactors = FALSE)
table1
但这没有用.
答
该表不在页面源上.另一种解决方案是使用 XMLHttpRequests (XHR)
The table is not on the page source. An alternative solution is using XMLHttpRequests (XHR)
library(jsonlite)
tbl <- fromJSON("https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/58/FUT?tradeDate=03/26/2020&strategy=DEFAULT&pageSize=500&_=1585333229793")
tbl <- tbl$settlements
PS:对于其他日期,更改 URL 中的日期部分 (03/26/2020
)
PS: For other date, change date part at the URL (03/26/2020
)
输出
tbl
# month open high low last change settle volume openInterest
# 1 APR 20 1.08900 1.10640B 1.08870 1.10440A +.01750 1.10520 282 4,215
# 2 MAY 20 1.09165 1.10775B 1.09045 1.10620A +.01715 1.10675 651 2,627
# 3 JUN 20 1.09230 1.10960 1.09090 1.10685 +.01715 1.10800 205,562 548,213
# 4 JLY 20 1.10650 1.11015B 1.10605A 1.10830A +.01710 1.10905 2 2
# 5 SEP 20 1.09625 1.11265 1.09435 1.11020A +.01710 1.11120 939 3,646
# 6 DEC 20 1.10645 1.11480B 1.10315A 1.11310A +.01725 1.11390 48 2,047
# 7 MAR 21 1.11000 1.11620B 1.10850A 1.11620B +.01725 1.11695 3 240
# 8 JUN 21 - 1.11680B - 1.11680B +.01740 1.11960 0 144
# 9 SEP 21 - - - - +.01745 1.12210 0 1
# 10 DEC 21 - - - - +.01755 1.12460 0 3
# 11 MAR 22 - - - - +.01770 1.12715 0 0
# 12 JUN 22 - - - - +.01785 1.12985 0 0
# 13 SEP 22 - - - - +.01805 1.13280 0 0
# 14 DEC 22 - - - - +.01830 1.13560 0 0
# 15 MAR 23 - - - - +.01845 1.13815 0 0
# 16 JUN 23 - - - - +.01865 1.14110 0 0
# 17 SEP 23 - - - - +.01885 1.14385 0 0
# 18 DEC 23 - - - - +.01905 1.14660 0 0
# 19 MAR 24 - - - - +.01925 1.14935 0 0
# 20 JUN 24 - - - - +.01945 1.15215 0 0
# 21 SEP 24 - - - - +.01965 1.15490 0 0
# 22 DEC 24 - - - - +.01985 1.15765 0 0
# 23 MAR 25 - - - - +.02005 1.16040 0 0
# 24 Total 207,487 561,138
抓取选项页面
ulr <- "https://www.cmegroup.com/CmeWS/mvc/Quotes/Option/8118/G/J0/ATM?_=1585348999038"
jsonlist <- fromJSON("https://www.cmegroup.com/CmeWS/mvc/Quotes/Option/8118/G/J0/ATM?_=1585348999038")
put 和 call 列在单独的数据框中
put and call columns are in separate dataframe
df_put <- jsonlist$optionContractQuotes$put
df_call <- jsonlist$optionContractQuotes$call
关注此链接 找到合适的 XHR url
Follow this link to find appropriate XHR url