<a href='/Patent/01127821' target='_blank'>
<a href='/Patent/01127832' target='_blank'>";
省略其他标签
Pattern pattern2 = Pattern.compile("(<a href='/Patent/([\\s\\S]*?)target='_blank'>)");
Matcher matcher = pattern2.matcher(pageContent);
while (matcher.find()) {
strPage = matcher.group();
}
要怎么样拿到01127847,01127821,01127832
上面哪里出错了
------解决方案--------------------
"<a\\s*href=\"/Patent/(\\d+)\"\\s*.+?>"
------解决方案--------------------
给你参考一下
public static void main(String[] args) { String pageContent = "<a href='/Patent/01127847' target='_blank'><a href='/Patent/01127821' target='_blank'><a href='/Patent/01127832' target='_blank'>"; Pattern pattern2 = Pattern.compile("<a\\s+href=.+?>"); Matcher matcher = pattern2.matcher(pageContent); while (matcher.find()) { String strPage = matcher.group(); System.out.println(strPage.replaceAll("(<a.+/|'\\s+.+>)", "")); } }
------解决方案--------------------
group(1)
------解决方案--------------------
Pattern pattern2 = Pattern.compile("(?:<a href='/Patent/(.*?)'\\s+target='_blank'>)");
然后group(1)