木偶:浏览器断开后,Chromium实例在后台保持活动状态
我的环境
- 木偶版本:3.1.0
- 平台/操作系统版本:Windows 10
- Node.js版本:12.16.1
我的问题是:
我有一个 for ... of
循环可以访问具有伪造者的3000多个网址。我使用 puppeteer.connect
到 wsEndpoint
来重用一个浏览器实例。每次访问后我都会断开连接并关闭标签。
I have a for...of
loop to visit 3000+ urls with puppeteer. I use puppeteer.connect
to wsEndpoint
so I can reuse one browser instance. I disconnect after each visit and close the tab.
- 前100个网址
page.goto
会立即打开网址, - 在100
page.goto以上
上方,每个网址要进行2-3次重试, - 高于300
page.goto
使用每个网址5-8次重试, - 高于500,我得到
TimeoutError:一直超过30000 ms的导航超时
。
- first 100 urls
page.goto
's open the urls immediately, - above 100
page.goto
uses 2-3 retries per url, - above 300
page.goto
uses 5-8 retries per url, - above 500 I get
TimeoutError: Navigation timeout of 30000 ms exceeded
all the time.
我检查了Windows Task Manager和我实现了数百个Chromium实例在后台运行,每个实例使用80-90MB的内存以及1-2%的CPU。
I checked the Windows Task Manager and I realized hundreds of Chromium instances running in the background and using 80-90MB of memory each and 1-2% of CPU as well.
问题
如何真正杀死已经使用 browser.disconnect
断开连接的Chromium实例?
How can I kill the Chromium instances I've already disconnected with browser.disconnect
for real?
示例脚本
const puppeteer = require('puppeteer')
const urlArray = require('./urls.json') // contains 3000+ urls in an array
async function fn() {
const browser = await puppeteer.launch({ headless: true })
const browserWSEndpoint = await browser.wsEndpoint()
for (const url of urlArray) {
try {
const browser2 = await puppeteer.connect({ browserWSEndpoint })
const page = await browser2.newPage()
await page.goto(url) // in my original code it's also wrapped in a retry function
// doing cool things with the DOM
await page.goto('about:blank') // because of you: https://github.com/puppeteer/puppeteer/issues/1490
await page.close()
await browser2.disconnect()
} catch (e) {
console.error(e)
}
}
await browser.close()
}
fn()
错误
通常的操纵up超时错误。
The usual puppeteer timeout error.
TimeoutError: Navigation timeout of 30000 ms exceeded
at C:\[...]\node_modules\puppeteer\lib\LifecycleWatcher.js:100:111
-- ASYNC --
at Frame.<anonymous> (C:\[...]\node_modules\puppeteer\lib\helper.js:94:19)
at Page.goto (C:\[...]\node_modules\puppeteer\lib\Page.js:476:53)
at Page.<anonymous> (C:\[...]\node_modules\puppeteer\lib\helper.js:95:27)
at example (C:\[...]\example.js:13:18)
at processTicksAndRejections (internal/process/task_queues.js:97:5) {
name: 'TimeoutError'
}
最后,我能够通过添加-获得所需的结果-单进程
和 –无合子
args在启动时(+ -无沙箱
是必需的)。
Finally I was able to achieve the desired result by adding --single-process
and --no-zygote
args at launch (+ --no-sandbox
is required with them).
正在运行的Chromium进程的数量不再呈指数级增长,但是只有两个实例保持活动状态:其中一个是通常位于第一个位置的空选项卡,第二个是被 puppeteer.connect({browserWSEndpoint})
正确使用。
The number of running Chromium processes aren't growing exponentially anymore, but only two instances remain active: one of them is the usual empty tab in the first position, the second is reused correctly by puppeteer.connect({ browserWSEndpoint })
.
[...]
const browser = await puppeteer.launch({
headless: true,
args: ['--single-process', '--no-zygote', '--no-sandbox']
})
const browserWSEndpoint = await browser.wsEndpoint()
[...]
-
-单个过程
:在与浏览器 [source]相同的过程中运行渲染器和插件-无合子
:禁止使用合子过程进行分叉子进程。相反,子进程将被派生并直接执行。请注意,--no-sandbox也应与此标志一起使用,因为沙箱需要受精卵才能工作。 [源代码]--no-zygote
: Disables the use of a zygote process for forking child processes. Instead, child processes will be forked and exec'd directly. Note that --no-sandbox should also be used together with this flag because the sandbox needs the zygote to work. [source]