puppeteer-cluster:排队而不是执行

问题描述:

我正在试验 Puppeteer Cluster,但我不明白如何正确使用排队.它只能用于不等待响应的呼叫吗?我正在使用 Artillery 同时发出一堆请求,但它们都失败了,而当我直接执行命令时,只有一些失败.

I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.

我直接从 examples 中获取了代码并替换了 execute with queue 我希望它可以工作,除了代码不等待结果.有没有办法实现这一目标?

I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway?

所以这是有效的:

const screen = await cluster.execute(req.query.url);

但这会中断:

const screen = await cluster.queue(req.query.url);

这是带有 queue 的完整示例:

Here's the full example with queue:

const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');

(async () => {
    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 2,
    });
    await cluster.task(async ({ page, data: url }) => {
        // make a screenshot
        await page.goto('http://' + url);
        const screen = await page.screenshot();
        return screen;
    });

    // setup server
    app.get('/', async function (req, res) {
        if (!req.query.url) {
            return res.end('Please specify url like this: ?url=example.com');
        }
        try {
            const screen = await cluster.queue(req.query.url);

            // respond with image
            res.writeHead(200, {
                'Content-Type': 'image/jpg',
                'Content-Length': screen.length //variable is undefined here
            });
            res.end(screen);
        } catch (err) {
            // catch error
            res.end('Error: ' + err.message);
        }
    });

    app.listen(3000, function () {
        console.log('Screenshot server listening on port 3000.');
    });
})();

我在这里做错了什么?我真的很想使用排队,因为没有它,每个传入请求似乎都会减慢所有其他请求.

What am I doing wrong here? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.

puppeteer-cluster的作者> 在这里.

引自文档:

cluster.queue(..): [...] 请注意,出于向后兼容性的原因,此函数仅返回 Promise.此函数不会异步运行,会立即返回.

cluster.queue(..): [...] Be aware that this function only returns a Promise for backward compatibility reasons. This function does not run asynchronously and will immediately return.

cluster.execute(...): [...] 像 Cluster.queue 一样工作,只是这个函数返回一个Promise 将在任务执行后解决.如果在执行过程中发生错误,该函数将拒绝抛出错误的 Promise.不会触发taskerror"事件.

cluster.execute(...): [...] Works like Cluster.queue, just that this function returns a Promise which will be resolved after the task is executed. In case an error happens during the execution, this function will reject the Promise with the thrown error. There will be no "taskerror" event fired.

何时使用哪个函数:

  • 如果您想对大量作业(例如 URL 列表)进行排队,请使用 cluster.queue.任务函数需要通过将结果打印到控制台或将它们存储到数据库中来处理存储结果.
  • 如果您的任务函数返回结果,请使用 cluster.execute.这仍然会将作业排队,因此除了等待作业完成之外,这就像调用 queue 一样.在这种情况下,最常见的是存在空闲集群",当请求到达服务器时会使用该集群(如您的示例代码中所示).
  • Use cluster.queue if you want to queue a large number of jobs (e.g. list of URLs). The task function needs to take care of storing the results by printing them to console or storing them into a database.
  • Use cluster.execute if your task function returns a result. This will still queue the job, so this is like calling queue in addition to waiting for the job to finish. In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).

所以,你肯定想使用 cluster.execute 因为你想等待任务函数的结果.您看不到任何错误的原因是(如上所述)cluster.queue 函数的错误是通过 taskerror 事件发出的.cluster.execute 错误被直接抛出(Promise 被拒绝).最有可能的是,在这两种情况下,您的作业都失败了,但它仅对 cluster.execute

So, you definitely want to use cluster.execute as you want to wait for the results of the task function. The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue function are emitted via a taskerror event. The cluster.execute errors are directly thrown (Promise is rejected). Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute