木偶：浏览器断开后，Chromium实例在后台保持活动状态

问题描述：

我的环境

木偶版本：3.1.0

平台/操作系统版本：Windows 10

Node.js版本：12.16.1

我的问题是：

我有一个 for ... of 循环可以访问具有伪造者的3000多个网址。我使用 puppeteer.connect 到 wsEndpoint 来重用一个浏览器实例。每次访问后我都会断开连接并关闭标签。

I have a for...of loop to visit 3000+ urls with puppeteer. I use puppeteer.connect to wsEndpoint so I can reuse one browser instance. I disconnect after each visit and close the tab.

前100个网址 page.goto 会立即打开网址，

在100 page.goto以上上方，每个网址要进行2-3次重试，

高于300 page.goto 使用每个网址5-8次重试，

高于500，我得到 TimeoutError：一直超过30000 ms的导航超时。

first 100 urls page.goto's open the urls immediately,
above 100 page.goto uses 2-3 retries per url,
above 300 page.goto uses 5-8 retries per url,
above 500 I get TimeoutError: Navigation timeout of 30000 ms exceeded all the time.

我检查了Windows Task Manager和我实现了数百个Chromium实例在后台运行，每个实例使用80-90MB的内存以及1-2％的CPU。

I checked the Windows Task Manager and I realized hundreds of Chromium instances running in the background and using 80-90MB of memory each and 1-2% of CPU as well.

问题

如何真正杀死已经使用 browser.disconnect 断开连接的Chromium实例？

How can I kill the Chromium instances I've already disconnected with browser.disconnect for real?

示例脚本

const puppeteer = require('puppeteer')
const urlArray = require('./urls.json') // contains 3000+ urls in an array


async function fn() {
  const browser = await puppeteer.launch({ headless: true })
  const browserWSEndpoint = await browser.wsEndpoint()

  for (const url of urlArray) {
    try {
      const browser2 = await puppeteer.connect({ browserWSEndpoint })
      const page = await browser2.newPage()
      await page.goto(url) // in my original code it's also wrapped in a retry function

      // doing cool things with the DOM

      await page.goto('about:blank') // because of you: https://github.com/puppeteer/puppeteer/issues/1490
      await page.close()
      await browser2.disconnect()
    } catch (e) {
      console.error(e)
    }
  }
  await browser.close()
}
fn()

错误

通常的操纵up超时错误。

The usual puppeteer timeout error.

TimeoutError: Navigation timeout of 30000 ms exceeded
    at C:\[...]\node_modules\puppeteer\lib\LifecycleWatcher.js:100:111
  -- ASYNC --
    at Frame.<anonymous> (C:\[...]\node_modules\puppeteer\lib\helper.js:94:19)
    at Page.goto (C:\[...]\node_modules\puppeteer\lib\Page.js:476:53)
    at Page.<anonymous> (C:\[...]\node_modules\puppeteer\lib\helper.js:95:27)
    at example (C:\[...]\example.js:13:18)
    at processTicksAndRejections (internal/process/task_queues.js:97:5) {
  name: 'TimeoutError'
}

答

最后，我能够通过添加-获得所需的结果-单进程和 –无合子 args在启动时（+ -无沙箱是必需的）。

Finally I was able to achieve the desired result by adding --single-process and --no-zygote args at launch (+ --no-sandbox is required with them).

正在运行的Chromium进程的数量不再呈指数级增长，但是只有两个实例保持活动状态：其中一个是通常位于第一个位置的空选项卡，第二个是被 puppeteer.connect（{browserWSEndpoint}）正确使用。

The number of running Chromium processes aren't growing exponentially anymore, but only two instances remain active: one of them is the usual empty tab in the first position, the second is reused correctly by puppeteer.connect({ browserWSEndpoint }).

[...]
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--single-process', '--no-zygote', '--no-sandbox']
  })
  const browserWSEndpoint = await browser.wsEndpoint()
[...]

-单个过程：在与浏览器 [source]相同的过程中运行渲染器和插件

-无合子：禁止使用合子过程进行分叉子进程。相反，子进程将被派生并直接执行。请注意，--no-sandbox也应与此标志一起使用，因为沙箱需要受精卵才能工作。 [源代码]

--no-zygote: Disables the use of a zygote process for forking child processes. Instead, child processes will be forked and exec'd directly. Note that --no-sandbox should also be used together with this flag because the sandbox needs the zygote to work. [source]

木偶：浏览器断开后，Chromium实例在后台保持活动状态

相关推荐