如何从维基百科的文章中提取数据?
我有一个关于我的Android应用程序从维基百科的分析数据的问题。我有一个可以通过阅读源下载XML脚本http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME$c$c> (也是JSON替换格式= XML
与格式= JSON
。
但我不能弄清楚是怎么只能从目录中访问某些章节。我要的是当网页加载,用户可以preSS一个按钮,使一个弹出窗口出现,从目录中显示的标题并允许用户读取那块它,只有那件为了方便,我有点摇摇欲坠使用JSON但有可能这样做吗?或者说,有没有从维基百科的API,允许开发人员只能查看某些部分页面?
I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME
(and also the JSON by replacing format=xml
with format=json
.
But what I can't figure out is how to only access certain sections from the table of contents. What I want is when the page is loaded, the user can press a button that makes a pop-up appear that displays the headers from the table of contents and allow the user to read that piece and only that piece of it for convenience. I'm a little shaky with JSON but is it possible to do this? Or, is there an API from Wikipedia that allows the developer to only view certain parts of a page?
谢谢!
很遗憾,似乎的mediawiki.org文档为解析
不会告诉你如何做到这一点。但 API本身确实在文档:您可以使用部分
参数。您还可以使用道具=节
来得到部分的清单。
Unfortunatelly, it seems the mediawiki.org documentation for parse
doesn't tell you how to do this. But the documentation in the API itself does: You can use section
parameter. And you can use prop=sections
to get the list of sections.
所以,你可以先使用:
http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=sections
获得部分的列表,然后
http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=text§ion=26
要获取HTML的某一部分。
to get the HTML for a certain section.