使用Chrome无头浏览器获取puzzle team club解谜游戏的谜面
分类:
IT文章
•
2022-03-04 08:51:13

零、用什么工具爬取网站
之前的两个游戏谜面,都是眼看,手动输入的,这给解谜带来了一些不方便。尤其是那种special daily battle之类的,谜面都很大,一个个写很费时。有没有什么方法能快速拿到谜面,并且把谜面直接输出到文件里?答案是爬虫,网页抓取。
只是puzzle team club的网页防爬虫措施做得太好,网页里没有关于谜面的信息,抓来的数据包分析不出(我会说是包的数量太多了吗),只能用无头浏览器。
开始使用phantomJS,获取网页代码部分Python代码如下:
def getChessByPhantomJS():
driver = webdriver.PhantomJS()
driver.get('https://www.puzzle-dominosa.com/?size=8')
source = driver.page_source
driver.quit()
#
View Code
但是运行结果不如意,最终只给了一个没有谜面的基本模板网页。
用Chrome效果有如何呢?(不晓得如何配置chrome无头浏览器的可以右转baidu)
def getChessByChrome():
path = r'D:chromedriver.exe'
chrome_options = Options()
#后面的两个是固定写法 必须这么写
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
try:
driver.get('https://www.puzzle-dominosa.com/?size=8')
except Exception as e:
print(e)
source = driver.page_source
driver.quit()
return source
View Code
运行结果(不如说是运行过程,因为这个B一直不退出)
DevTools listening on ws://127.0.0.1:62344/devtools/browser/8c9f8f4a-407a-4045-b
41c-b9f898d4d37b
[1203/174652.884:INFO:CONSOLE(1)] "Uncaught TypeError: window.googletag.pubads i
s not a function", source: https://www.puzzle-dominosa.com/build/js/public/new/d
ominosa-95ac3646ef.js (1)
View Code
可以给程序加个超时退出:
def getChessByChrome():
path = r'D:chromedriver.exe'
chrome_options = Options()
#后面的两个是固定写法 必须这么写
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
try:
driver.set_page_load_timeout(30)
driver.get('https://www.puzzle-dominosa.com/?size=8')
except Exception as e:
print(e)
source = driver.page_source
driver.quit()
return source
View Code
这样就能把网页代码交给分析函数,输出谜面了。
一、如何拿到dominosa谜面
不过就做到这里还没完,我们要的是谜面。为此,我们需要分析代码:

图1.dominosa游戏的谜面代码
看到了吧?这里的谜面直接反映在代码的class名上,cell3对应谜面的3,而且同级元素超过谜面单位长度时,谜面会换行。
代码可以这样写:
def solve():
source = getChessByChrome()
htree = etree.HTML(source)
chessSize = len(htree.xpath('//div[@>))
puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
if len(puzzleId) != 0:
puzzleId = puzzleId[0]
else:
puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
x = (round((4 * chessSize + 1)**0.5) - 1) // 2
print(x)
print(x+1)
chess = ''
for i,className in enumerate(htree.xpath('//div[@>)):
value = className.xpath('./@class')[0].split(' ')[1][4:]
if i % (x+1) == x:
chess += value + '
'
else:
chess += value + ' '
with open('dominosaChess' + puzzleId + '.txt','w') as f:f.write(chess[:-1])
View Code
这样就可以拿到使用Dancing link X (舞蹈链)求解dominosa游戏这里面要求的谜面文件了。
附带一提,这里为了查询谜面方便,输出的文件名字带有谜面ID;如果这是特别谜题,则输出的文件名字带有特别谜题的标题。
附带一些运行结果与谜面对比图(文件名dominosaChess7,092,762.txt):
4 5 2 2 7 3 3 0 6
2 7 5 6 2 6 4 1 5
4 4 5 6 0 2 6 0 2
7 3 3 5 0 0 3 4 4
0 1 3 3 4 1 3 2 1
5 7 0 5 3 2 1 1 6
1 6 6 7 5 2 6 7 1
7 4 0 0 4 5 1 7 7
对应谜面截图:

图2.ID为7,092,762的谜面
二、如何拿到star battle谜面
拿到符合使用深度优先搜索DFS求解star battle游戏这里面要求的谜面文件要费点功夫。
咱们查看下图吧:
图3.star battle谜面代码
这里的谜面代码class名字都有一定意义,比如bl表示左侧有分割线,br表示右侧有分割线。
这里只给我们提供了分割线,我们需要的是标示每个方格所属是哪个块的那种排布。要做到这种,我们需要使用BFS,宽度优先搜索。
def solve():
if url.find('size=') == -1:
limit = 1
else:
size = url.split('size=')[1]
size = int(size)
if size >= 1 and size <= 4:
limit = 1
elif size <= 6:
limit = 2
elif size <= 8:
limit = 3
else:
limit = size - 5
source = getChessByFile()
htree = etree.HTML(source)
chessSize = len(htree.xpath('//div[@>))
puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
if len(puzzleId) != 0:
puzzleId = puzzleId[0]
else:
puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
chessSize = round(chessSize**0.5)
chess = [[-1 for _ in range(chessSize)] for __ in range(chessSize)]
borderss = [['' for _ in range(chessSize)] for __ in range(chessSize)]
chessStr = ''
maxBlockNumber = 0
# br: on the right; bl: on the left; bb: on the down; bt: on the up
for i,className in enumerate(htree.xpath('//div[@>)):
x = i // chessSize
y = i % chessSize
value = className.xpath('./@class')[0]
if value[:4] != 'cell':
continue
value = value.replace('cell selectable','')
value = value.replace('cell-off','')
borderss[x][y] = value
for i in range(chessSize):
for j in range(chessSize):
if chess[i][j] != -1:
continue
queue = [(i, j)]
chess[i][j] = str(maxBlockNumber)
while len(queue) > 0:
oldQueue = deepcopy(queue)
queue = []
for pos in oldQueue:
x, y = pos[0], pos[1]
#
if x > 0 and borderss[x][y].find('bt') == -1 and chess[x-1][y] == -1:
queue.append((x-1, y))
chess[x-1][y] = chess[i][j]
#
if x < chessSize - 1 and borderss[x][y].find('bb') == -1 and chess[x+1][y] == -1:
queue.append((x+1, y))
chess[x+1][y] = chess[i][j]
#
if y > 0 and borderss[x][y].find('bl') == -1 and chess[x][y-1] == -1:
queue.append((x, y-1))
chess[x][y-1] = chess[i][j]
#
if y < chessSize - 1 and borderss[x][y].find('br') == -1 and chess[x][y+1] == -1:
queue.append((x, y+1))
chess[x][y+1] = chess[i][j]
#
maxBlockNumber += 1
chessStr = '
'.join(' '.join(chessRow) for chessRow in chess)
with open('starBattleChess' + puzzleId + '.txt','w') as f:f.write(str(limit)+'
'+chessStr)
View Code
附带一些运行结果与谜面对比图(文件名starBattleChess3,876,706.txt):
1
0 0 1 1 2
0 0 3 1 2
0 0 3 4 4
0 3 3 4 4
0 3 3 4 4
对应谜面截图:

图4.ID为3,876,706的谜面